llvm-project

Commit Graph

Author	SHA1	Message	Date
Matt Arsenault	43e92fe306	AMDGPU: Cleanup subtarget handling. Split AMDGPUSubtarget into amdgcn/r600 specific subclasses. This removes most of the static_casting of the basic codegen classes everywhere, and tries to restrict the features visible on the wrong target. llvm-svn: 273652	2016-06-24 06:30:11 +00:00
Tom Stellard	14416ae6cd	Support/ELF: Add R_AMDGPU_GOTPCREL relocation Summary: We will start generating this in a future patch. Reviewers: arsenm, kzhuravl, rafael, ruiu, tony-tye Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: http://reviews.llvm.org/D21482 llvm-svn: 273628	2016-06-23 23:11:29 +00:00
Matt Arsenault	8d4b0eddd6	AMDGPU: Add option to disable spilling SGPRs to VGPRs. This can help debug spilling problems. llvm-svn: 273605	2016-06-23 20:00:34 +00:00
Valery Pykhtin	a852d695b8	[AMDGPU] Enable absolute expression initializer for amd_kernel_code_t fields. Differential Revision: http://reviews.llvm.org/D21380 llvm-svn: 273561	2016-06-23 14:13:06 +00:00
Diana Picus	e440f99913	[AMDGPU] Remove exit-on-error in test (PR27761) The exit-on-error flag was necessary in order to avoid an assertion when handling DYNAMIC_STACKALLOC nodes in SelectionDAGLegalize. We can avoid the assertion by creating some dummy nodes. This enables us to remove the exit-on-error flag on the first 2 run lines (SI), but on the third run line (R600) we would run into another assertion when trying to reserve indirect registers. This patch also replaces that assertion with an early exit from the function. Fixes PR27761. Differential Revision: http://reviews.llvm.org/D20852 llvm-svn: 273550	2016-06-23 09:19:16 +00:00
Matt Arsenault	529cf25e60	AMDGPU: readlane/writelane do not read exec llvm-svn: 273525	2016-06-23 01:26:16 +00:00
Matt Arsenault	3cb4ddeb4e	AMDGPU: Fix liveness when expanding m0 loop llvm-svn: 273514	2016-06-22 23:40:57 +00:00
Changpeng Fang	47efe1f6db	AMDGPU/SI: Define an intrinsic to expose ds_swizzle_b32 Reviewers: tstellarAMD, arsenm Differential Revision: http://reviews.llvm.org/D21533 llvm-svn: 273496	2016-06-22 21:33:49 +00:00
Matt Arsenault	4a07bf6372	AMDGPU: Run verifier after 2nd run of SIShrinkInstructions llvm-svn: 273469	2016-06-22 20:26:24 +00:00
Matt Arsenault	9babdf4265	AMDGPU: Fix verifier errors in SILowerControlFlow The main sin this was committing was using terminator instructions in the middle of the block, and then not updating the block successors / predecessors. Split the blocks up to avoid this and introduce new pseudo instructions for branches taken with exec masking. Also use a pseudo instead of emitting s_endpgm and erasing it in the special case of a non-void return. llvm-svn: 273467	2016-06-22 20:15:28 +00:00
Matt Arsenault	0e5befe315	AMDGPU: Make FrameLowering stack alignment 16 We don't need it to be that high. The natural alignment for a single workitem's stack is 16. llvm-svn: 273448	2016-06-22 17:47:39 +00:00
Matt Arsenault	180e0d5cef	AMDGPU: Fix gcc warnings Mostly removing dead code. Apparently gcc's warning for unused functions is better llvm-svn: 273363	2016-06-22 01:53:49 +00:00
Rafael Espindola	7b4ef068c6	Delete more dead code. Found by gcc 6. llvm-svn: 273322	2016-06-21 21:51:41 +00:00
Jan Vesely	fea814d531	AMDGPU: Add implicitarg.ptr intrinsic. Points to the start of implicit arguments (appended after explicit arguments) Differential Revision: http://reviews.llvm.org/D20297 llvm-svn: 273317	2016-06-21 20:46:20 +00:00
Rafael Espindola	48975881ab	Delete some dead code. Found by gcc 6. llvm-svn: 273303	2016-06-21 19:48:12 +00:00
Matt Arsenault	2209625387	AMDGPU: Preserve undef flag on vcc when shrinking v_cndmask_b32 The implicit operand is added by the initial instruction construction, so this was adding an additional vcc use. The original one was missing the undef flag the original condition had, so the verifier would complain. llvm-svn: 273182	2016-06-20 18:34:00 +00:00
Matt Arsenault	b6d8c37e1a	AMDGPU: Fold more custom nodes to undef This will help sneak undefs past GVN into the DAG for some tests. Also add missing intrinsic for rsq_legacy, even though the node was already selected to the instruction. Also start passing the debug location to intrinsic errors. llvm-svn: 273181	2016-06-20 18:33:56 +00:00
Matt Arsenault	ff98241f37	Generalize DiagnosticInfoStackSize to support other limits Backends may want to report errors on resources other than stack size. llvm-svn: 273177	2016-06-20 18:13:04 +00:00
Matt Arsenault	a9720c67f1	AMDGPU: Use correct method for determining instruction size llvm-svn: 273172	2016-06-20 17:51:32 +00:00
Tom Stellard	5350894265	AMDGPU: Add support for R_AMDGPU_REL32 relocations Reviewers: arsenm, kzhuravl, rafael Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: http://reviews.llvm.org/D21401 llvm-svn: 273168	2016-06-20 17:33:43 +00:00
Tom Stellard	1c89eb7db0	AMDGPU: Emit R_AMDGPU_ABS32_{HI,LO} for scratch buffer relocations Reviewers: arsenm, rafael, kzhuravl Subscribers: rafael, arsenm, llvm-commits, kzhuravl Differential Revision: http://reviews.llvm.org/D21400 llvm-svn: 273166	2016-06-20 16:59:44 +00:00
NAKAMURA Takumi	fd92154b20	Reformat blank lines. llvm-svn: 273131	2016-06-20 01:05:15 +00:00
NAKAMURA Takumi	fe1202c4cb	Untabify. llvm-svn: 273129	2016-06-20 00:37:41 +00:00
Matt Arsenault	e935f05a94	AMDGPU: Fix kernel argument alignment impacting stack size Don't use AllocateStack because kernel arguments have nothing to do with the stack. The ensureMaxAlignment call was still changing the stack alignment. llvm-svn: 273080	2016-06-18 05:15:53 +00:00
Matt Arsenault	0bb294b224	AMDGPU: Temporarily select trap to s_endpgm This should select to s_trap, but that requires additonal work to setup and enable the trap handler. For now emit s_endpgm so bugpoint stops getting stuck on the unsupported call to abort. Emit a warning that this will only terminate the wave and not really trap. llvm-svn: 273062	2016-06-17 22:27:03 +00:00
Tom Stellard	0114bb5aa0	AMDGPU/SI: Simplify code in SITargetLowering::LowerGlobalAddress() This change were suggested in http://reviews.llvm.org/D21154. llvm-svn: 273059	2016-06-17 22:22:09 +00:00
Matt Arsenault	8885910f8e	AMDGPU: Remove llvm.SI.tid intrinsic Mesa doesn't emit this for llvm >= 3.8 anymore. llvm-svn: 273050	2016-06-17 21:18:41 +00:00
Changpeng Fang	3e06e1edac	AMDGPU/SI: Propagate the Kill flag in storeRegToStackSlot and eliminateFrameIndex Reviewers: arsenm, tstellarAMD Differential Revision: http://reviews.llvm.org/21438 llvm-svn: 272958	2016-06-16 21:20:47 +00:00
Matt Arsenault	01e062f5c6	AMDGPU: Fix maximum instruction size for amdgcn This was causing the conservative estimate of inline asm size to be twice as big as expected. llvm-svn: 272956	2016-06-16 21:14:05 +00:00
Wei Ding	ab3d91b8f1	AMDGPU: Add v_mad 16-bit instructions definition. Differential Revision: http://reviews.llvm.org/D21362 llvm-svn: 272919	2016-06-16 16:50:04 +00:00
Valery Pykhtin	02e2086e41	[AMDGPU] Fix few coding style issues. NFC. llvm-svn: 272785	2016-06-15 13:55:09 +00:00
Nicolai Haehnle	a609259832	AMDGPU: Fix MUBUF offset bugs affecting llvm.amdgcn.buffer.* intrinsics Summary: This fixes two related bugs. First, the generic optimization passes unfortunately generate negative constant offsets but the hardware treats SOffset as an unsigned value. Second, there is a hardware bug on SI and CI, where address clamping in MUBUF instructions does not work correctly when SOffset is larger than the buffer size. This patch works around this bug by never using SOffset. An alternative workaround would be to do the clamping manually when SOffset is too large, but generating the required code sequence during instruction selection would be rather involved, and in any case the resulting code would probably be worse. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96360 Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: http://reviews.llvm.org/D21326 llvm-svn: 272761	2016-06-15 07:13:05 +00:00
Tom Stellard	82785e9fe7	AMDGPU/SI: Correctly encode constant expressions Summary: We we have an MCConstantExpr, we can encode it directly into the instruction instead of emitting fixups. Reviewers: artem.tamazov, vpykhtin, SamWot, nhaustov, arsenm Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: http://reviews.llvm.org/D21236 Change-Id: I88b3edf288d48e65c5d705fc4850d281f8e36948 llvm-svn: 272750	2016-06-15 03:09:39 +00:00
Tom Stellard	89049702ce	AMDGPU/AsmParser: Add support for parsing symbol operands Summary: We can now reference symbols directly in operands, like this: s_mov_b32 s0, global Reviewers: artem.tamazov, vpykhtin, SamWot, nhaustov Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: http://reviews.llvm.org/D21038 llvm-svn: 272748	2016-06-15 02:54:14 +00:00
Matt Arsenault	f42c69206d	AMDGPU: Run pointer optimization passes llvm-svn: 272736	2016-06-15 00:11:01 +00:00
Peter Collingbourne	96efdd6107	IR: Introduce local_unnamed_addr attribute. If a local_unnamed_addr attribute is attached to a global, the address is known to be insignificant within the module. It is distinct from the existing unnamed_addr attribute in that it only describes a local property of the module rather than a global property of the symbol. This attribute is intended to be used by the code generator and LTO to allow the linker to decide whether the global needs to be in the symbol table. It is possible to exclude a global from the symbol table if three things are true: - This attribute is present on every instance of the global (which means that the normal rule that the global must have a unique address can be broken without being observable by the program by performing comparisons against the global's address) - The global has linkonce_odr linkage (which means that each linkage unit must have its own copy of the global if it requires one, and the copy in each linkage unit must be the same) - It is a constant or a function (which means that the program cannot observe that the unique-address rule has been broken by writing to the global) Although this attribute could in principle be computed from the module contents, LTO clients (i.e. linkers) will normally need to be able to compute this property as part of symbol resolution, and it would be inefficient to materialize every module just to compute it. See: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160509/356401.html http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160516/356738.html for earlier discussion. Part of the fix for PR27553. Differential Revision: http://reviews.llvm.org/D20348 llvm-svn: 272709	2016-06-14 21:01:22 +00:00
Tom Stellard	bf3e6e5bb4	AMDGPU/SI: Refactor fixup handling for constant addrspace variables Summary: We now use a standard fixup type applying the pc-relative address of constant address space variables, and we have the GlobalAddress lowering code add the required 4 byte offset to the global address rather than doing it as part of the fixup. This refactoring will make it easier to use the same code for global address space variables and also simplifies the code. Re-commit this after fixing a bug where we were trying to use a reference to a Triple object that had already been destroyed. Reviewers: arsenm, kzhuravl Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: http://reviews.llvm.org/D21154 llvm-svn: 272705	2016-06-14 20:29:59 +00:00
Tom Stellard	b1a523fa68	Revert "AMDGPU/SI: Refactor fixup handling for constant addrspace variables" This reverts commit r272675. llvm-svn: 272677	2016-06-14 15:16:35 +00:00
Tom Stellard	5e6298b0f2	AMDGPU/SI: Refactor fixup handling for constant addrspace variables Summary: We now use a standard fixup type applying the pc-relative address of constant address space variables, and we have the GlobalAddress lowering code add the required 4 byte offset to the global address rather than doing it as part of the fixup. This refactoring will make it easier to use the same code for global address space variables and also simplifies the code. Reviewers: arsenm, kzhuravl Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: http://reviews.llvm.org/D21154 llvm-svn: 272675	2016-06-14 15:11:01 +00:00
Artem Tamazov	17091364d1	[AMDGPU][llvm-mc] Predefined symbols to access -mcpu from the assembly source (.option.machine_version...) The feature allows for conditional assembly etc. TODO: make those symbols read-only. Test added. Differential Revision: http://reviews.llvm.org/D21238 llvm-svn: 272673	2016-06-14 15:03:59 +00:00
Marek Olsak	e93f6d6923	AMDGPU/SI: Set INDEX_STRIDE for scratch coalescing Summary: Mesa and other users must set this to enable coalescing: - STRIDE = 0 - SWIZZLE_ENABLE = 1 This makes one particular compute shader 8x faster. Reviewers: tstellarAMD, arsenm Subscribers: arsenm, kzhuravl Differential Revision: http://reviews.llvm.org/D21136 llvm-svn: 272556	2016-06-13 16:05:57 +00:00
Matt Arsenault	80bc355048	AMDGPU: Fix post-RA verifier errors with trackLivenessAfterRegAlloc The condition reg of the cndmask_b64 expansion can't be killed by the first one, and the implicit super register implicit def is needed. llvm-svn: 272554	2016-06-13 15:53:52 +00:00
Benjamin Kramer	d3f4c05aea	Move instances of std::function. Or replace with llvm::function_ref if it's never stored. NFC intended. llvm-svn: 272513	2016-06-12 16:13:55 +00:00
Benjamin Kramer	bdc4956bac	Pass DebugLoc and SDLoc by const ref. This used to be free, copying and moving DebugLocs became expensive after the metadata rewrite. Passing by reference eliminates a ton of track/untrack operations. No functionality change intended. llvm-svn: 272512	2016-06-12 15:39:02 +00:00
Tom Stellard	f3af841462	AMDGPU/SI: Don't use fixup_si_rodata for scratch rsrc relocations Summary: We need to set the fixup type to FK_Data_4 for the SCRATCH_RSRC_DWORD[01] symbols, since these require absolute relocations, and fixup_si_rodata is for relative relocations. Reviewers: arsenm, kzhuravl Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: http://reviews.llvm.org/D21153 llvm-svn: 272417	2016-06-10 19:26:38 +00:00
Sam Kolton	945231ada5	[AMDGPU] AsmParser: Support for sext() modifier in SDWA. Some code cleaning in AMDGPUOperand. Summary: sext() modifier is supported in SDWA instructions only for integer operands. Spec is unclear should integer operands support abs and neg modifiers with sext - for now they are not supported. Renamed InputModsWithNoDefault to FloatInputMods. Added SextInputMods for operands that support sext() modifier. Added AMDGPUOperand::Modifier struct to handle register and immediate modifiers. Code cleaning in AMDGPUOperand class: organize method in groups (render-, predicate-methods...). Reviewers: vpykhtin, artem.tamazov, tstellarAMD Subscribers: arsenm, kzhuravl Differential Revision: http://reviews.llvm.org/D20968 llvm-svn: 272384	2016-06-10 09:57:59 +00:00
Matt Arsenault	37fefd68cc	AMDGPU: Fix trailing whitespace llvm-svn: 272364	2016-06-10 02:18:02 +00:00
Matt Arsenault	58ddad5bd6	AMDGPU: v_cndmask_b32 does not def vcc Fixes verifier errors after SIShrinkInstructions. llvm-svn: 272351	2016-06-10 00:18:41 +00:00
Tom Stellard	26a2ab7477	AMDGPU/SI: Make sure to emit TargetConstant nodes when matching ds_permute Summary: This fixes a bug with ds_permute instructions where if it was passed a constant address, then the offset operand would get assigned a register operand instead of an immediate. Reviewers: scchan, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19994 llvm-svn: 272349	2016-06-10 00:01:04 +00:00
Tom Stellard	1d3940e8cc	AMDGPU/SI: Use common topological sort algorithm in SIScheduleDAGMI Reviewers: arsenm, axeldavy Subscribers: MatzeB, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19823 llvm-svn: 272346	2016-06-09 23:48:02 +00:00
Matt Arsenault	7757c59e48	AMDGPU: Fix flat atomics The flat atomics could already be selected, but only when using flat instructions for global memory. Add patterns for flat addresses. llvm-svn: 272345	2016-06-09 23:42:54 +00:00
Matt Arsenault	887018179a	AMDGPU: Fix i64 global cmpxchg This was using extract_subreg sub0 to extract the low register of the result instead of sub0_sub1, producing an invalid copy. There doesn't seem to be a way to use the compound subreg indices in tablegen since those are generated, so manually select it. llvm-svn: 272344	2016-06-09 23:42:48 +00:00
Matt Arsenault	e2bd9a32bc	AMDGPU: Run verifer after insert waits pass llvm-svn: 272338	2016-06-09 23:19:14 +00:00
Matt Arsenault	cfb61e780a	AMDGPU: Remove incorrect assertion I'm still not sure under what circumstances the offset here is non-0, but private memory is not limited to 27-bits. llvm-svn: 272337	2016-06-09 23:19:08 +00:00
Matt Arsenault	c3a01ec9db	AMDGPU: Properly initialize SIShrinkInstructions llvm-svn: 272336	2016-06-09 23:18:47 +00:00
Wei Ding	ed0f97fad2	AMDGPU/SI: Fix 32-bit fdiv lowering We were using the fast fdiv lowering for all division, implementation of IEEE754 fdiv is added. http://reviews.llvm.org/D20557 llvm-svn: 272292	2016-06-09 19:17:15 +00:00
Sam Kolton	c9bdcb75c4	[AMDGPU] Disassembler: Support for sdwa instructions Reviewers: vpykhtin, tstellarAMD Subscribers: arsenm, kzhuravl Differential Revision: http://reviews.llvm.org/D21129 llvm-svn: 272255	2016-06-09 11:04:45 +00:00
Nicolai Haehnle	c00e03b8f5	AMDGPU: Add amdgpu-ps-wqm-outputs function attributes Summary: The presence of this attribute indicates that VGPR outputs should be computed in whole quad mode. This will be used by Mesa for prolog pixel shaders, so that derivatives can be taken of shader inputs computed by the prolog, fixing a bug. The generated code could certainly be improved: if a prolog pixel shader is used (which isn't common in modern OpenGL - they're used for gl_Color, polygon stipples, and forcing per-sample interpolation), Mesa will use this attribute unconditionally, because it has to be conservative. So WQM may be used in the prolog when it isn't really needed, and furthermore a silly back-and-forth switch is likely to happen at the boundary between prolog and main shader parts. Fixing this is a bit involved: we'd first have to add a mechanism by which LLVM writes the WQM-related input requirements to the main shader part binary, and then Mesa specializes the prolog part accordingly. At that point, we may as well just compile a monolithic shader... Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95130 Reviewers: arsenm, tstellarAMD, mareko Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: http://reviews.llvm.org/D20839 llvm-svn: 272063	2016-06-07 21:37:17 +00:00
Eric Christopher	538d09d0dd	Revert "Differential Revision: http://reviews.llvm.org/D20557 " Author: Wei Ding <wei.ding2@amd.com> Date: Tue Jun 7 19:04:44 2016 +0000 Differential Revision: http://reviews.llvm.org/D20557 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272044 91177308-0d34-0410-b5e6-96231b3b80d8 as it was breaking the bots. This reverts commit r272044. llvm-svn: 272056	2016-06-07 20:27:12 +00:00
Wei Ding	a70216f1b3	Differential Revision: http://reviews.llvm.org/D20557 llvm-svn: 272044	2016-06-07 19:04:44 +00:00
Matt Arsenault	02458c2d27	AMDGPU: Add function for getting instruction size llvm-svn: 271936	2016-06-06 20:10:33 +00:00
Matt Arsenault	3b2e2a59e8	AMDGPU: Fix constantexpr addrspacecasts If we had a constant group address space cast the queue pointer wasn't enabled for the function, resulting in a crash on noreg later. llvm-svn: 271935	2016-06-06 20:03:31 +00:00
Artem Tamazov	135487767b	[AMDGPU][llvm-mc] v_cndmask_b32: src2 is mandatory; do not enforce VOP2 when src2 == VCC. Another step for unification llvm assembler/disassembler with sp3. Besides, CodeGen output is a bit improved, thus changes in CodeGen tests. Assembler/Disassembler tests updated/added. Differential Revision: http://reviews.llvm.org/D20796 llvm-svn: 271900	2016-06-06 15:23:43 +00:00
Artem Tamazov	f88397c84c	[test/AMDGPU] Square-braced-syntax for registers: add macro test/example. Test added as per discussion in http://reviews.llvm.org/D20588. The macro is just a demonstration, useless in practice. Coding style fixes. Differential Revision: http://reviews.llvm.org/D20797 llvm-svn: 271675	2016-06-03 14:41:17 +00:00
Sam Kolton	a4a99ad1bc	[AMDGPU] Assembler: More tests for SDWA instructions. Fix for SDWA float modifiers. Summary: Depends on D20625 Reviewers: tstellarAMD, vpykhtin, artem.tamazov Subscribers: arsenm, kzhuravl Differential Revision: http://reviews.llvm.org/D20674 llvm-svn: 271662	2016-06-03 11:43:09 +00:00
Sam Kolton	05ef1c940f	[AMDGPU] Assembler: Custom converters for SDWA instructions. Support for _dpp and _sdwa suffixes in mnemonics. Summary: Added custom converters for SDWA instruction to support optional operands and modifiers. Support for _dpp and _sdwa suffixes that allows to force DPP or SDWA encoding for instructions. Reviewers: tstellarAMD, vpykhtin, artem.tamazov Subscribers: arsenm, kzhuravl Differential Revision: http://reviews.llvm.org/D20625 llvm-svn: 271655	2016-06-03 10:27:37 +00:00
Matt Arsenault	43578ec2a8	AMDGPU: Handle flat in getMemOpBaseRegImmOfs It can still report the base register, and the uses give up when it fails. llvm-svn: 271575	2016-06-02 20:05:20 +00:00
Matt Arsenault	d1097a38e2	AMDGPU: Cleanup load tests There are a lot of different kinds of loads to test for, and these were scattered around inconsistently with some redundancy. Try to comprehensively test all loads in a consistent way. llvm-svn: 271571	2016-06-02 19:54:26 +00:00
Matt Arsenault	52dec8d36a	AMDGPU: Temporary fix for broken store combine llvm-svn: 271567	2016-06-02 19:00:55 +00:00
Matt Arsenault	8e00194be8	AMDGPU: Fix crashes on unknown processor name If the processor name failed to parse for amdgcn, the resulting output would have R600 ISA in it. If the processor name was missing or invalid for R600, the wavefront size would not be set and there would be crashes from missing itinerary data. Fixes crashes in future commit caused by dividing by the unset/0 wavefront size. llvm-svn: 271561	2016-06-02 18:37:16 +00:00
Matt Arsenault	598f55387a	AMDGPU: Fix incorrectly setting kill flag when copying register tuples This fixes some verifier errors when trackLivenessAfterRegAlloc is enabled. llvm-svn: 271446	2016-06-02 00:04:30 +00:00
Matt Arsenault	d3e4c646ea	AMDGPU: SIDebuggerInsertNops preserves CFG This saves an additional run of the DominatorTree and MachineLoopInfo llvm-svn: 271444	2016-06-02 00:04:22 +00:00
Matt Arsenault	ec30eb507e	AMDGPU: Remove unused address space Also return a single StringRef instead of building a string. llvm-svn: 271296	2016-05-31 16:57:45 +00:00
Matt Arsenault	d8d304d1d6	AMDGPU: Fix trailing whitespace llvm-svn: 271081	2016-05-28 00:50:51 +00:00
Matt Arsenault	7401516985	AMDGPU: Add fract intrinsic Remove broken patterns matching it. This was matching the unsafe math pattern and expanding the fix for the buggy instruction from the pattern. The problems are also on CI. Remove the workarounds and only use fract with unsafe math or from the intrinsic. llvm-svn: 271078	2016-05-28 00:19:52 +00:00
Artem Tamazov	7da9b82e02	[AMDGPU][llvm-mc] Square-braced-syntax for registers - make ":expr2" optional. Register numbers may be specified as assembly-time expressions. This feature can be useful in macros and alike. However, expressions are supported within sqare braces only. Sqare braces were initially intended to support specifying of multiple (pairs/quads...) registers. Syntax like v[8:8] which specifies single register is also supported. That allows expressions but looks a bit unnatural. This change supports syntax REG[EXPR]. Tests added. Differential Revision: http://reviews.llvm.org/D20588 llvm-svn: 270990	2016-05-27 12:50:13 +00:00
Benjamin Kramer	4fed928f53	Avoid some copies by using const references. clang-tidy's performance-unnecessary-copy-initialization with some manual fixes. No functional changes intended. llvm-svn: 270988	2016-05-27 12:30:51 +00:00
Benjamin Kramer	3e9a5d3468	Apply clang-tidy's misc-static-assert where it makes sense. Also fold conditions into assert(0) where it makes sense. No functional change intended. llvm-svn: 270982	2016-05-27 11:36:04 +00:00
Changpeng Fang	71369b3a39	AMDGPU/SI: Enable load-store-opt by default. Summary: Enable load-store-opt by default, and update LIT tests. Reviewers: arsenm Differential Revision: http://reviews.llvm.org/D20694 llvm-svn: 270894	2016-05-26 19:35:29 +00:00
Artem Tamazov	6edc135d0f	[AMDGPU][llvm-mc] s_getreg/setreg* - hwreg - factor out strings/literals etc. Hwreg(...) syntax implementation unified with sendmsg(...). Common strings moved to Utils MathExtras.h functionality utilized. Added missing build dependency in Disassembler. Differential Revision: http://reviews.llvm.org/D20381 llvm-svn: 270871	2016-05-26 17:00:33 +00:00
Artem Tamazov	b49c3361e5	Fix build warning introduced in r270552 "[AMDGPU][llvm-mc] Disassembler: support for TTMP/TBA/TMA registers." llvm-svn: 270859	2016-05-26 15:52:16 +00:00
Diana Picus	81bc3170e8	[AMDGPU] Remove exit-on-error flag from test (PR27762) Similar to r269948, but for argument lowering. Fixes PR27762 Differential Revision: http://reviews.llvm.org/D20430 llvm-svn: 270856	2016-05-26 15:24:55 +00:00
Matt Arsenault	e57206d81b	AMDGPU: Fix v2i64/v2f64 bitcasts These operations tend to get promoted away to v4i32 so this doesn't happen often. llvm-svn: 270740	2016-05-25 18:07:36 +00:00
Matt Arsenault	1cc4991412	AMDGPU: Fix inconsistent lowering of select of vectors f32 vectors would use a sequence of BFI instructions instead of unrolled cmp + select. This was better in the case of a VALU select with SGPR inputs, but we don't have a way of dealing with that in the DAG. llvm-svn: 270731	2016-05-25 17:34:58 +00:00
Nirav Dave	e003bb7ead	Soften assertion in AMDGPU emitPrologue. [AMDGPU] emitPrologue looks for an unused unallocated SGPR that is not the scratch descriptor. Continue search if unused register found fails other requirements. Reviewers: arsenm, tstellarAMD, nhaehnle Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: http://reviews.llvm.org/D20526 llvm-svn: 270646	2016-05-25 01:45:42 +00:00
Konstantin Zhuravlyov	29ddd2b2f2	[AMDGPU][NFC] Rename ReserveTrapVGPRs -> ReserveRegs Differential Revision: http://reviews.llvm.org/D20081 llvm-svn: 270594	2016-05-24 18:37:18 +00:00
Sam Kolton	11de370cca	[AMDGPU] Assembler: rework parsing of optional operands. Summary: Change process of parsing of optional operands. All optional operands use same parsing method - parseOptionalOperand(). No default values are added to OperandsVector. Get rid of WORKAROUND_USE_DUMMY_OPERANDS_INSTEAD_MUTIPLE_DEFAULT_OPERANDS. Reviewers: tstellarAMD, vpykhtin, artem.tamazov, nhaustov Subscribers: arsenm, kzhuravl Differential Revision: http://reviews.llvm.org/D20527 llvm-svn: 270556	2016-05-24 12:38:33 +00:00
Artem Tamazov	212a251c8d	[AMDGPU][llvm-mc] Disassembler: support for TTMP/TBA/TMA registers. Differential Revision: http://reviews.llvm.org/D20476 llvm-svn: 270552	2016-05-24 12:05:16 +00:00
Sam Kolton	1bdcef7697	[AMDGPU] Assembler: refactor parsing of modifiers and immediates. Allow modifiers for imms. Reviewers: nhaustov, tstellarAMD Subscribers: kzhuravl, arsenm Differential Revision: http://reviews.llvm.org/D20166 llvm-svn: 270415	2016-05-23 09:59:02 +00:00
Matt Arsenault	7f9eabd2c2	AMDGPU: Define priorities for register classes Allocating larger register classes first should give better allocation results (and more importantly for myself, make the lit tests more stable with respect to scheduler changes). Patch by Matthias Braun llvm-svn: 270312	2016-05-21 03:55:07 +00:00
Matt Arsenault	71e6676169	AMDGPU: Cleanup lowering actions These are kind of a mess and hard to follow, particularly for loads and stores. Fix various redundant, unnecessary and dead settings. llvm-svn: 270307	2016-05-21 02:27:49 +00:00
Matt Arsenault	81a709503d	AMDGPU: Fix high bits after division optimization This is essentially doing a 24-bit signed division with FP. We need to truncate to the N bit result. llvm-svn: 270305	2016-05-21 01:53:33 +00:00
Matt Arsenault	b6e1cc2a92	AMDGPU: Fix verifier error when spilling SGPRs The current SGPR spilling test does not stress this because it is using s_buffer_load instructions to increase SGPR pressure and spill, but their output operands have the same SReg_32_XM0 constraint. This fixes an error when the SReg_32 output from most instructions is spilled. llvm-svn: 270301	2016-05-21 00:53:42 +00:00
Matt Arsenault	8f5e008534	AMDGPU: Fix relationship between SReg_32 and SReg_32_XM0 llvm-svn: 270300	2016-05-21 00:53:28 +00:00
Matt Arsenault	4945905f5f	AMDGPU: Handle cbranch vccz/vccnz llvm-svn: 270297	2016-05-21 00:29:40 +00:00
Matt Arsenault	72fcd5f597	AMDGPU: Implement ReverseBranchCondition llvm-svn: 270296	2016-05-21 00:29:34 +00:00
Matt Arsenault	6d09380532	AMDGPU: Implement AnalyzeBranch Original patch by Tom Stellard llvm-svn: 270295	2016-05-21 00:29:27 +00:00
Matt Arsenault	4e3d383c46	AMDGPU: Remove pointless conversions llvm-svn: 270139	2016-05-19 21:09:58 +00:00
Matt Arsenault	4318ea354a	AMDGPU: Also look for s_cbranch_vccz llvm-svn: 270091	2016-05-19 18:20:25 +00:00
Artem Tamazov	8ce1f7177b	[AMDGPU][llvm-mc] Fixes to support buffer atomics. Fixes for MUBUF_Atomic instructions to make operand list valid: - For RTN insns, make a copy of $vdata_in operand as $vdata. - Do not add operand for GLC, it is hardcoded and comes as a token. Workaround to avoid adding multiple default optional operands. Tests added. Differential Revision: http://reviews.llvm.org/D20257 llvm-svn: 270049	2016-05-19 12:22:39 +00:00
Matt Arsenault	c5bebac934	AMDGPU: Fix verifier error when spilling undef subreg llvm-svn: 270002	2016-05-18 23:35:53 +00:00
Matt Arsenault	c438ef574d	AMDGPU: Fix promote alloca for pointer loads If the load has a pointer type, we don't want to change its type. llvm-svn: 270000	2016-05-18 23:20:24 +00:00
Rafael Espindola	8c34dd8257	Delete Reloc::Default. Having an enum member named Default is quite confusing: Is it distinct from the others? This patch removes that member and instead uses Optional<Reloc> in places where we have a user input that still hasn't been maped to the default value, which is now clear has no be one of the remaining 3 options. llvm-svn: 269988	2016-05-18 22:04:49 +00:00
Jan Vesely	ae265c03f7	AMDGPU: Fix incorrect simm check Use signed division otherwise all back jumps fail the check Fixes regression introduced in r269951 Differential Revision: http://reviews.llvm.org/D20380 llvm-svn: 269972	2016-05-18 19:07:58 +00:00
Matt Arsenault	a519cf593f	AMDGPU: Error if branch distance exceeds limit llvm-svn: 269951	2016-05-18 16:10:24 +00:00
Matt Arsenault	1735da460b	AMDGPU: Other sizes of popcnt are fast We can chain bcnt instructions together, so any width popcnt is pretty fast. llvm-svn: 269950	2016-05-18 16:10:19 +00:00
Matt Arsenault	9430b9113a	AMDGPU: Fix assert when erroring on a call For some reason an assert is now hit when a valid chain is not returned, so return the entry chain. llvm-svn: 269948	2016-05-18 16:10:11 +00:00
Matt Arsenault	891fccc0c1	AMDGPU: Handle alloca promoting with null operands If the second pointer in a multi-pointer instruction is a constant, we can replace the type. llvm-svn: 269945	2016-05-18 15:57:21 +00:00
Matt Arsenault	bde80346c1	AMDGPU: Don't run passes that aren't useful llvm-svn: 269943	2016-05-18 15:41:07 +00:00
Matt Arsenault	ab3429c2b4	AMDGPU: Fix assert on ttmp registers Use register class that does not include them when looking for unallocated registers. This is hit by the udiv v8i64 test in the opencl integer conformance test, and takes a few seconds to compile in a debug build so no test included. llvm-svn: 269938	2016-05-18 15:19:50 +00:00
Jan Vesely	687ca8df18	AMDGPU/R600: Use correct number of vector elements when lowering private loads Reviewer: tstellardAMD, arsenm Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: http://reviews.llvm.org/D20032 llvm-svn: 269725	2016-05-16 23:56:32 +00:00
Matt Arsenault	8a028bf4d7	AMDGPU: Fix promote alloca pass creating huge arrays This was assuming it could use all memory before, which is a bad decision because it restricts occupancy. By default, only try to use enough space that could reduce occupancy to 7, an arbitrarily chosen limit. Based on the exist LDS usage, try to round up to the limit in the current tier instead of further hurting occupancy. This isn't ideal, because it doesn't accurately know how much space is going to be used for alignment padding. llvm-svn: 269708	2016-05-16 21:19:59 +00:00
Jan Vesely	91aacad9c3	AMDGPU: Unify LowerGlobalAddress Reviewers: tstellard Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D19794 llvm-svn: 269481	2016-05-13 20:39:34 +00:00
Jan Vesely	1680039a7a	AMDGPU/R600: Fold global address operand Reviewers: tstellard Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D19793 llvm-svn: 269480	2016-05-13 20:39:31 +00:00
Jan Vesely	f97de00745	AMDGPU/R600: Implement memory loads from constant AS Reviewers: tstellard Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D19792 llvm-svn: 269479	2016-05-13 20:39:29 +00:00
Jan Vesely	a1f9fdfcbc	AMDGPU/R600: Add support for emitting MCExpr Reviewers: tstellard Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D19791 llvm-svn: 269478	2016-05-13 20:39:26 +00:00
Jan Vesely	7971464da8	AMDGPU: Add support for MCExpr to instruction printer Reviewers: tstellard Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D19790 llvm-svn: 269477	2016-05-13 20:39:24 +00:00
Jan Vesely	4368c1cf7e	AMDGPU/R600: Use machine operands instead of ints to track literals This will be used for global addresses Reviewers: tstellard Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D19789 llvm-svn: 269476	2016-05-13 20:39:22 +00:00
Jan Vesely	fac8d7ecb1	AMDGPU/R600: There are other uses for ALU_LITERAL besides Imm This will be used for GV Reviewers: tstellard Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D19788 llvm-svn: 269475	2016-05-13 20:39:20 +00:00
Jan Vesely	fbcb754e66	AMDGPU: Make CONST_DATA_PTR available to R600 Rename to AMDGPUconstdata_ptr Reviewers: tstellard Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D19786 llvm-svn: 269474	2016-05-13 20:39:18 +00:00
Jan Vesely	81f1b30035	AMDGPU/EG,CM: Add instruction to read from constant AS (VTX2) Reviewers: tstellard Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D19785 llvm-svn: 269473	2016-05-13 20:39:16 +00:00
Konstantin Zhuravlyov	e3d322af57	[AMDGPU] Update nop insertion for debugger usage - Insert one nop for each high level statement instead of two - Do not insert nop before prologue Differential Revision: http://reviews.llvm.org/D20215 llvm-svn: 269452	2016-05-13 18:21:28 +00:00
Matt Arsenault	999f7dd84c	AMDGPU: Remove verifier check for scc live ins We only really need this to be true for SIFixSGPRCopies. I'm not sure there's any way this could happen before that point. Fixes a case where MachineCSE could introduce a cross block scc use. llvm-svn: 269391	2016-05-13 04:15:48 +00:00
Justin Bogner	95927c0fd0	SDAG: Implement Select instead of SelectImpl in AMDGPUDAGToDAGISel - Where we were returning a node before, call ReplaceNode instead. - Where we would return null to fall back to another selector, rename the method to try* and return a bool for success. - Where we were calling SelectNodeTo, just return afterwards. Part of llvm.org/pr26808. llvm-svn: 269349	2016-05-12 21:03:32 +00:00
Matt Arsenault	8300272823	AMDGPU: Fix getIntegerAttribute type and error message llvm-svn: 269268	2016-05-12 02:45:18 +00:00
Matt Arsenault	a61cb48dd2	AMDGPU: Fix breaking IR on instructions with multiple pointer operands The promote alloca pass would attempt to promote an alloca with a select, icmp, or phi user, even though the other operand was from a non-promotable source, producing a select on two different pointer types. Only do this if we know that both operands derive from the same alloca. In the future we should be able to relax this to an alloca which will also be promoted. llvm-svn: 269265	2016-05-12 01:58:58 +00:00
Matt Arsenault	4234542503	AMDGPU: Make some instructions convergent llvm-svn: 269147	2016-05-11 00:32:31 +00:00
Matt Arsenault	e8ed8e59e5	AMDGPU: Change private_element_size to 4 llvm-svn: 269145	2016-05-11 00:28:54 +00:00
Peter Collingbourne	dba995601b	Cloning: Clean up the interface to the CloneFunction function. Remove the ModuleLevelChanges argument, and the ability to create new subprograms for cloned functions. The latter was added without review in r203662, but it has no in-tree clients (all non-test callers pass false for ModuleLevelChanges [1], so it isn't reachable outside of tests). It also isn't clear that adding a duplicate subprogram to the compile unit is always the right thing to do when cloning a function within a module. If this functionality comes back it should be accompanied with a more concrete use case. Furthermore, all in-tree clients add the returned function to the module. Since that's pretty much the only sensible thing you can do with the function, just do that in CloneFunction. [1] http://llvm-cs.pcc.me.uk/lib/Transforms/Utils/CloneFunction.cpp/rCloneFunction Differential Revision: http://reviews.llvm.org/D18628 llvm-svn: 269110	2016-05-10 20:23:24 +00:00
Konstantin Zhuravlyov	a791932145	[AMDGPU][NFC] Rename SIInsertNops -> SIDebuggerInsertNops Differential Revision: http://reviews.llvm.org/D20117 llvm-svn: 269098	2016-05-10 18:33:41 +00:00
Matthias Braun	31d19d43c7	CodeGen: Move TargetPassConfig from Passes.h to an own header; NFC Many files include Passes.h but only a fraction needs to know about the TargetPassConfig class. Move it into an own header. Also rename Passes.cpp to TargetPassConfig.cpp while we are at it. llvm-svn: 269011	2016-05-10 03:21:59 +00:00
Simon Pilgrim	0a81921cdb	Fixed unused but set variable warning llvm-svn: 268931	2016-05-09 16:42:23 +00:00
Matt Arsenault	a949dc619c	AMDGPU: Fold shift into cvt_f32_ubyteN llvm-svn: 268930	2016-05-09 16:29:50 +00:00
Artem Tamazov	f0b6b40fa4	[AMDGPU][llvm-mc] Some refactoring of .td files Some custom Operands and AsmOperandClasses moved to proper place. No functional changes. Differential Revision: http://reviews.llvm.org/D20012 llvm-svn: 268780	2016-05-06 19:32:38 +00:00
Artem Tamazov	ebe71ce36a	[AMDGPU][llvm-mc] Add support for sendmsg(...) syntax. Added support for sendmsg(MSG[, OP[, STREAM_ID]]) syntax in s_sendmsg and s_sendmsghalt instructions. The syntax matches the SP3 assembler/disassembler rules. That is why implicit inputs (like M0 and EXEC) are not printed to disassembly output anymore. sendmsg(...) allows only known message types and attributes, even if literals are used instead of symbolic names. However, raw literal (without "sendmsg") still can be used, and that allows for any 16-bit value. Tests updated/added. Differential Revision: http://reviews.llvm.org/D19596 llvm-svn: 268762	2016-05-06 17:48:48 +00:00
Nikolay Haustov	6eb050ea4e	Revert "AMDGPU/SI: Add amdgpu_kernel calling convention. Part 2." This reverts commit 47486d52454d60cdf6becc0b2efe533c73794380. It broke calling OpenCL kernel from another kernel. llvm-svn: 268739	2016-05-06 14:59:04 +00:00
Sam Kolton	5f10a137d0	[TableGen] AsmMatcher: support for default values for optional operands Summary: This change allows to specify "DefaultMethod" for optional operand (IsOptional = 1) in AsmOperandClass that return default value for operand. This is used in convertToMCInst to set default values in MCInst. Previously if you wanted to set default value for operand you had to create custom converter method. With this change it is possible to use standard converters even when optional operands presented. Reviewers: tstellarAMD, ab, craig.topper Subscribers: jyknight, dsanders, arsenm, nhaustov, llvm-commits Differential Revision: http://reviews.llvm.org/D18242 llvm-svn: 268726	2016-05-06 11:31:17 +00:00
Nikolay Haustov	dc1bb79b92	AMDGPU/SI: Add amdgpu_kernel calling convention. Part 2. Summary: Check calling convention in AMDGPUMachineFunction::isKernel This will be used for AMDGPU_HSA_KERNEL symbol type in output ELF. Also, in the future unused non-kernels may be optimized. Reviewers: tstellarAMD, arsenm Subscribers: arsenm, joker.eph, llvm-commits Differential Revision: http://reviews.llvm.org/D19917 llvm-svn: 268719	2016-05-06 09:23:13 +00:00
Justin Bogner	b012699741	SDAG: Rename Select->SelectImpl and repurpose Select as returning void This is a step towards removing the rampant undefined behaviour in SelectionDAG, which is a part of llvm.org/PR26808. We rename SelectionDAGISel::Select to SelectImpl and update targets to match, and then change Select to return void and consolidate the sketchy behaviour we're trying to get away from there. Next, we'll update backends to implement `void Select(...)` instead of SelectImpl and eventually drop the base Select implementation. llvm-svn: 268693	2016-05-05 23:19:08 +00:00
Matt Arsenault	539ca882c6	AMDGPU: Simplify control flow / conditions llvm-svn: 268676	2016-05-05 20:27:02 +00:00
Nicolai Haehnle	ffbd56a1c9	AMDGPU: Uniform branch conditions can originate with intrinsics Summary: Discovered by Dave Airlie, fixes an assertion in Khronos OpenGL CTS GL43-CTS.shader_storage_buffer_object.advanced-matrix. In this particular case, the buffer load intrinsic fed into a uniform conditional branch, and led the brcond lowering down the wrong path. Reviewers: tstellarAMD, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19931 llvm-svn: 268650	2016-05-05 17:36:36 +00:00
Tom Stellard	fcfaea4cff	AMDGPU/SI: Add support for AMD code object version 2. Summary: Version 2 is now the default. If you want to emit version 1, use the amdgcn--amdhsa-amdcov1 triple. Reviewers: arsenm, kzhuravl Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19283 llvm-svn: 268647	2016-05-05 17:03:33 +00:00
Jan Vesely	bbc2231983	AMDGPU/R600: Minor cleanup in InstrInfo Use std::make_pair instead of constructor Use C++11 loop Reuse helper var Reviewers: tstellardAMD Subsribers: arsenm Differential Revision: http://reviews.llvm.org/D19787 llvm-svn: 268503	2016-05-04 14:55:45 +00:00
Tom Stellard	4a304b3886	AMDGPU/SI: Use range loops to simplify some code in the SI Scheduler Reviewers: arsenm, axeldavy Subscribers: MatzeB, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19822 llvm-svn: 268396	2016-05-03 16:30:56 +00:00
Aaron Ballman	3bd56b3b43	Silence unused variable warning; NFC. llvm-svn: 268392	2016-05-03 15:17:25 +00:00
Matt Arsenault	bcdfee7030	AMDGPU: Custom lower v2i32 loads and stores This will allow us to split up 64-bit private accesses when necessary. llvm-svn: 268296	2016-05-02 20:13:51 +00:00
Tom Stellard	154c9cdd24	AMDGPU/SI: Use v_readfirstlane_b32 when restoring SGPRs spilled to scratch We were using v_readlane_b32 with the lane set to zero, but this won't work if thread 0 is not active. Differential Revision: http://reviews.llvm.org/D19745 llvm-svn: 268295	2016-05-02 20:11:44 +00:00
Matt Arsenault	2b957b5a6f	AMDGPU: Make i64 loads/stores promote to v2i32 Now that unaligned access expansion should not attempt to produce i64 accesses, we can remove the hack in PreprocessISelDAG where this is done. This allows splitting i64 private accesses while allowing the new add nodes indexing the vector components can be folded with the base pointer arithmetic. llvm-svn: 268293	2016-05-02 20:07:26 +00:00
Reid Kleckner	0549ab6033	Fix instance of -Winconsistent-missing-override in AMDGPU code llvm-svn: 268289	2016-05-02 19:45:10 +00:00
Tom Stellard	ce5e994887	AMDGPU/SI: Set the kill flag on temp VGPRs used to restore SGPRs from scratch Summary: When we restore an SGPR value from scratch, we first load it into a temporary VGPR and then use v_readlane_b32 to copy the value from the VGPR back into an SGPR. We weren't setting the kill flag on the VGPR in the v_readlane_b32 instruction, so the register scavenger wasn't able to re-use this temp value later. I wasn't able to create a lit test for this. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19744 llvm-svn: 268287	2016-05-02 19:37:56 +00:00
Tom Stellard	27233b727f	AMDGPU: Move R600 specific code out of AMDGPUISelLowering.cpp Reviewers: arsenm Subscribers: jvesely, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19736 llvm-svn: 268267	2016-05-02 18:05:17 +00:00
Tom Stellard	341e293d67	AMDGPU/SI: Fix bug in SIInstrInfo::insertWaitStates() uncovered by r268260 We can't use MI->getDebugLoc() when MI is an iterator that could be MBB.end(). llvm-svn: 268265	2016-05-02 18:02:24 +00:00
Tom Stellard	1f520e5c98	AMDGPU/SI: Use the hazard recognizer to break SMEM soft clauses Summary: Add support for detecting hazards in SMEM soft clauses, so that we only break the clauses when necessary, either by adding s_nop or re-ordering other alu instructions. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18870 llvm-svn: 268260	2016-05-02 17:39:06 +00:00
Nicolai Haehnle	119d3d80cb	AMDGPU: llvm.SI.fs.constant is a source of divergence Summary: This intrinsic is used to get flat-shaded fragment shader inputs. Those are uniform across a primitive, but a fragment shader wave may process pixels from multiple primitives (as indicated by the prim_mask), and so that's where divergence can arise. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19747 llvm-svn: 268259	2016-05-02 17:37:01 +00:00
Tom Stellard	a27007eb4f	AMDGPU/SI: Use hazard recognizer to detect DPP hazards Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18603 llvm-svn: 268247	2016-05-02 16:23:09 +00:00
Aaron Ballman	5c190d056d	Silence unused variable warnings; NFC. llvm-svn: 268234	2016-05-02 14:48:03 +00:00
Rafael Espindola	92dd7b82be	Add missing override. llvm-svn: 268163	2016-04-30 15:18:21 +00:00
Tom Stellard	c51e4468b7	AMDGPU/SI: Remove wait state handling for SMRD in SIInsertWaits This was supposed to be part of r268143. llvm-svn: 268154	2016-04-30 04:04:48 +00:00
Tom Stellard	cb6ba62d6f	AMDGPU/SI: Enable the post-ra scheduler Summary: This includes a hazard recognizer implementation to replace some of the hazard handling we had during frame index elimination. Reviewers: arsenm Subscribers: qcolombet, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18602 llvm-svn: 268143	2016-04-30 00:23:06 +00:00
Matt Arsenault	701c21ea10	AMDGPU: Fix crash with unreachable terminators. If a block has no successors because it ends in unreachable, this was accessing an invalid iterator. Also stop counting instructions that don't emit any real instructions. llvm-svn: 268119	2016-04-29 21:52:13 +00:00
Matt Arsenault	dc4ebad6d4	AMDGPU: Add kernarg.segment.ptr intrinsic llvm-svn: 268105	2016-04-29 21:16:52 +00:00
Matt Arsenault	cf2744f1c8	AMDGPU/SI: Move post regalloc run of SIShrinkInstructions Move to addPreEmitPass. This is so it runs after post-RA scheduling so we can merge s_nops emitted by the scheduler and hazard recognizer. llvm-svn: 268095	2016-04-29 20:23:42 +00:00
Artem Tamazov	38e496b175	Fixed/Recommitted r267733 "[AMDGPU][llvm-mc] Add support of TTMP quads. Rework M0 exclusion for SMRD." Previously reverted by r267752. r267733 review: Differential Revision: http://reviews.llvm.org/D19342 llvm-svn: 268066	2016-04-29 17:04:50 +00:00
Tom Stellard	92b24f324b	AMDGPU/SI: Add offset field to ds_permute/ds_bpermute instructions Summary: These instructions can add an immediate offset to the address, like other ds instructions. Reviewers: arsenm Subscribers: arsenm, scchan Differential Revision: http://reviews.llvm.org/D19233 llvm-svn: 268043	2016-04-29 14:34:26 +00:00
Nikolay Haustov	4f672a34ed	AMDGPU/SI: Assembler: Unify parsing/printing of operands. Summary: The goal is for each operand type to have its own parse function and at the same time share common code for tracking state as different instruction types share operand types (e.g. glc/glc_flat, etc). Introduce parseAMDGPUOperand which can parse any optional operand. DPP and Clamp/OMod have custom handling for now. Sam also suggested to have class hierarchy for operand types instead of table. This can be done in separate change. Remove parseVOP3OptionalOps, parseDS*OptionalOps, parseFlatOptionalOps, parseMubufOptionalOps, parseDPPOptionalOps. Reduce number of definitions of AsmOperand's and MatchClasses' by using common base class. Rename AsmMatcher/InstPrinter methods accordingly. Print immediate type when printing parsed immediate operand. Use 'off' if offset/index register is unused instead of skipping it to make it more readable (also agreed with SP3). Update tests. Reviewers: tstellarAMD, SamWot, artem.tamazov Subscribers: qcolombet, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19584 llvm-svn: 268015	2016-04-29 09:02:30 +00:00
Matt Arsenault	7d1b6c81af	AMDGPU: Stop reporting an addressing mode for unknown addrspace This was being treated the same as private, which has an immediate offset. For unknown, it probably means it's for a computation not actually being used for accessing memory, so it should not have a nontrivial addressing mode. llvm-svn: 268002	2016-04-29 06:25:10 +00:00
Matt Arsenault	1c4d0efe56	AMDGPU: Emit error if too much LDS is used llvm-svn: 267922	2016-04-28 19:37:35 +00:00
Matt Arsenault	c5fce69031	AMDGPU: Fix mishandling array allocations when promoting alloca The canonical form for allocas is a single allocation of the array type. In case we see a non-canonical array alloca, make sure we aren't replacing this with an array N times smaller. llvm-svn: 267916	2016-04-28 18:38:48 +00:00
Craig Topper	33772c5375	[CodeGen] Default CTTZ_ZERO_UNDEF/CTLZ_ZERO_UNDEF to Expand in TargetLoweringBase. This is what the majority of the targets want and removes a bunch of code. Set it to Legal explicitly in the few cases where that's the desired behavior. llvm-svn: 267853	2016-04-28 03:34:31 +00:00
Matt Arsenault	0547b016b1	AMDGPU: Account for globals in AMDGPUPromoteAlloca pass Patch by Bas Nieuwenhuizen llvm-svn: 267791	2016-04-27 21:05:08 +00:00
Chad Rosier	03e1647d19	Revert "[AMDGPU][llvm-mc] Add support of TTMP quads. Rework M0 exclusion for SMRD." This reverts commit r267733 due to a -Werror,-Wunused-function error. llvm-svn: 267752	2016-04-27 18:29:11 +00:00
Reid Kleckner	7f0ae15e9d	Silence a -Wdangling-else llvm-svn: 267737	2016-04-27 16:46:33 +00:00
Artem Tamazov	3896f8f83d	[AMDGPU][llvm-mc] Add support of TTMP quads. Rework M0 exclusion for SMRD. Added support of TTMP quads. Reworked M0 exclusion machinery for SMRD and similar instructions to enable usage of TTMP registers in those instructions as destinations. Tests added. Differential Revision: http://reviews.llvm.org/D19342 llvm-svn: 267733	2016-04-27 16:20:23 +00:00
Nicolai Haehnle	f66bdb5ea8	AMDGPU/SI: Add llvm.amdgcn.s.waitcnt.all intrinsic Summary: So it appears that to guarantee some of the ordering requirements of a GLSL memoryBarrier() executed in the shader, we need to emit an s_waitcnt. (We can't use an s_barrier, because memoryBarrier() may appear anywhere in the shader, in particular it may appear in non-uniform control flow.) Reviewers: arsenm, mareko, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19203 llvm-svn: 267729	2016-04-27 15:46:01 +00:00
Artem Tamazov	5cd55b1784	[AMDGPU][llvm-mc] s_getreg/setreg* - Support symbolic names of hardware registers. Possibility to specify code of hardware register kept. Disassemble to symbolic name, if name is known. Tests updated/added. Differential Revision: http://reviews.llvm.org/D19335 llvm-svn: 267724	2016-04-27 15:17:03 +00:00
Ahmed Bougacha	128f8732a5	[CodeGen] Add getBuildVector and getSplatBuildVector helpers. NFCI. Differential Revision: http://reviews.llvm.org/D17176 llvm-svn: 267606	2016-04-26 21:15:30 +00:00
Konstantin Zhuravlyov	71515e57f9	[AMDGPU] Move reserved vgpr count for trap handler usage to SIMachineFunctionInfo + minor commenting changes Differential Revision: http://reviews.llvm.org/D19537 llvm-svn: 267573	2016-04-26 17:24:40 +00:00
Konstantin Zhuravlyov	1d99c4d03c	[AMDGPU] Reserve VGPRs for trap handler usage if instructed Differential Revision: http://reviews.llvm.org/D19235 llvm-svn: 267563	2016-04-26 15:43:14 +00:00
Sam Kolton	3025e7f25f	[AMDGPU] Assembler: basic support for SDWA instructions Support for SDWA instructions for VOP1 and VOP2 encoding. Not done yet: - converters for support optional operands and modifiers - VOPC - sext() modifier - intrinsics - VOP2b (see vop_dpp.s) - V_MAC_F32 (see vop_dpp.s) Differential Revision: http://reviews.llvm.org/D19360 llvm-svn: 267553	2016-04-26 13:33:56 +00:00
Andrew Kaylor	7de74af929	Add optimization bisect opt-in calls for AMDGPU passes Differential Revision: http://reviews.llvm.org/D19450 llvm-svn: 267485	2016-04-25 22:23:44 +00:00
Matt Arsenault	074ea2851c	AMDGPU/SI: Optimize adjacent s_nop instructions Use the operand for how long to wait. This is somewhat distasteful, since it would be better to just emit s_nop with the right argument in the first place. This would require changing TII::insertNoop to emit N operands, which would be easy. Slightly more problematic is the post-RA scheduler and hazard recognizer represent nops as a single null node, and would require inventing another way of representing N nops. llvm-svn: 267456	2016-04-25 19:53:22 +00:00
Matt Arsenault	99c14524ec	AMDGPU: Implement addrspacecast llvm-svn: 267452	2016-04-25 19:27:24 +00:00
Matt Arsenault	48ab526f12	AMDGPU: Add queue ptr intrinsic llvm-svn: 267451	2016-04-25 19:27:18 +00:00
Matt Arsenault	dfaf4261ab	AMDGPU: Add DAG to debug dump Also reorder case to match enum order llvm-svn: 267449	2016-04-25 19:27:09 +00:00
Etienne Bergeron	06c14ec31e	Fix incorrect redundant expression in target AMDGPU. Summary: The expression is detected as a redundant expression. Turn out, this is probably a bug. ``` /home/etienneb/llvm/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp:306:26: warning: both side of operator are equivalent [misc-redundant-expression] if (isSMRD(FirstLdSt) && isSMRD(FirstLdSt)) { ``` Reviewers: rnk, tstellarAMD Subscribers: arsenm, cfe-commits Differential Revision: http://reviews.llvm.org/D19460 llvm-svn: 267415	2016-04-25 15:06:33 +00:00
Artem Tamazov	d6468666b5	[AMDGPU][llvm-mc] s_getreg/setreg* - Add hwreg(...) syntax. Added hwreg(reg[,offset,width]) syntax. Default offset = 0, default width = 32. Possibility to specify 16-bit immediate kept. Added out-of-range checks. Disassembling is always to hwreg(...) format. Tests updated/added. Differential Revision: http://reviews.llvm.org/D19329 llvm-svn: 267410	2016-04-25 14:13:51 +00:00
Craig Topper	855d182656	Fix a couple assertions that can never fire because they just contained the text string which always evaluates to true. Add a ! so they'll evaluate to false. llvm-svn: 267312	2016-04-24 02:01:25 +00:00
Matt Arsenault	7e8de01f84	AMDGPU: sext_inreg (srl x, K), vt -> bfe x, K, vt.Size llvm-svn: 267244	2016-04-22 22:59:16 +00:00
Matt Arsenault	efa3fe14d1	AMDGPU: Re-visit nodes in performAndCombine This fixes test regressions when i64 loads/stores are made promote. llvm-svn: 267240	2016-04-22 22:48:38 +00:00
Konstantin Zhuravlyov	a40d8358e7	[AMDGPU] Insert nop pass: take care of outstanding feedback - Switch few loops to range-based for loops - Fix nop insertion at the end of BB - Fix formatting - Check for endpgm Differential Revision: http://reviews.llvm.org/D19380 llvm-svn: 267167	2016-04-22 17:04:51 +00:00
Nicolai Haehnle	b0c9748709	AMDGPU/SI: add llvm.amdgcn.ps.live intrinsic Summary: This intrinsic returns true if the current thread belongs to a live pixel and false if it belongs to a pixel that we are executing only for derivative computation. It will be used by Mesa to implement gl_HelperInvocation. Note that for pixels that are killed during the shader, this implementation also returns true, but it doesn't matter because those pixels are always disabled in the EXEC mask. This unearthed a corner case in the instruction verifier, which complained about a v_cndmask 0, 1, exec, exec<imp-use> instruction. That's stupid but correct code, so make the verifier accept it as such. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19191 llvm-svn: 267102	2016-04-22 04:04:08 +00:00
Matt Arsenault	98f8394e7c	AMDGPU: Fix debug name of pass to better match I get this wrong every time I try to debug this. llvm-svn: 267030	2016-04-21 18:21:54 +00:00
Nicolai Haehnle	97788020c5	Split IntrReadArgMem into IntrReadMem and IntrArgMemOnly Summary: IntrReadWriteArgMem simply becomes IntrArgMemOnly. So there are fewer intrinsic properties that express their orthogonality better, and correspond more closely to the corresponding IR attributes. Suggested by: Philip Reames Reviewers: joker.eph, reames, tstellarAMD Subscribers: jholewinski, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19291 llvm-svn: 267021	2016-04-21 17:48:02 +00:00
Sam Kolton	201398e8a3	[AMDGPU] Assembler: prevent parseDPPCtrlOps from eating invalid tokens Reviewers: nhaustov, tstellarAMD Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D19317 llvm-svn: 266984	2016-04-21 13:14:24 +00:00
Nikolay Haustov	fb5c307ccd	AMDGPU/SI: Assembler: improvements to support trap handlers. Add ParseAMDGPURegister which can be invoked recursively for parsing lists. Rename getRegForName to getSpecialRegForName. Support legacy SP3 register list syntax: [s2,s3,s4,s5] or [flat_scratch_lo,flat_scratch_hi]. Add 64-bit registers TBA, TMA where missing. Add some tests. Differential Revision: http://reviews.llvm.org/D19163 llvm-svn: 266865	2016-04-20 09:34:48 +00:00
Nicolai Haehnle	b48275f134	Add IntrWrite[Arg]Mem intrinsic property Summary: This property is used to mark an intrinsic that only writes to memory, but neither reads from memory nor has other side effects. An example where this is useful is the llvm.amdgcn.buffer.store.format.* intrinsic, which corresponds to a store instruction that goes through a special buffer descriptor rather than through a plain pointer. With this property, the intrinsic should still be handled as having side effects at the LLVM IR level, but machine scheduling can make smarter decisions. Reviewers: tstellarAMD, arsenm, joker.eph, reames Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18291 llvm-svn: 266826	2016-04-19 21:58:33 +00:00
Nicolai Haehnle	e2dda4f750	AMDGPU: Guard VOPC instructions against incorrect commute Summary: The added testcase, which triggered this, was derived from a shader-db case via bugpoint. A separate question is why scalar branching wasn't used. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19208 llvm-svn: 266825	2016-04-19 21:58:22 +00:00
Nicolai Haehnle	7483937bf0	AMDGPU/SI: SGPR accounting in getSIProgramInfo must ignore exec_lo/hi Summary: A shader stored the live mask (initial exec mask) in an SGPR which was then spilled during register allocation. The allocator quite reasonably optimized turned the spill into v_writelane_b32 %vgpr, exec_lo, N v_writelane_b32 %vgpr, exec_hi, N+1 at the beginning of the shader, confusing the SGPR accounting. No test case, because si-sgpr-spill.ll together with an upcoming patch for WQM handling exhibits the problem. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19199 llvm-svn: 266824	2016-04-19 21:58:17 +00:00
Konstantin Zhuravlyov	8c273ad719	[AMDGPU] Add insert nops pass based on subtarget features instead of cl::opt Also, - Skip pass if machine module does not have debug info - Minor comment changes - Added test Differential Revision: http://reviews.llvm.org/D19079 llvm-svn: 266626	2016-04-18 16:28:23 +00:00
Artem Tamazov	e2762423c2	[AMDGPU][llvm-mc] s_setreg* - Fix order of operands Order should match the sp3 syntax, where destination (simm16 denoting the hwreg) is coming first. Differential Revision: http://reviews.llvm.org/D19161 llvm-svn: 266617	2016-04-18 14:54:26 +00:00
Aaron Ballman	2eeefe8ed8	Silence some "initialized but unused" warnings from MSVC -- the function being called is a static function, so there's no need for an instance variable. NFC. llvm-svn: 266616	2016-04-18 14:47:19 +00:00
Mehdi Amini	b550cb1750	[NFC] Header cleanup Removed some unused headers, replaced some headers with forward class declarations. Found using simple scripts like this one: clear && ack --cpp -l '#include "llvm/ADT/IndexedMap.h"' \| xargs grep -L 'IndexedMap[<]' \| xargs grep -n --color=auto 'IndexedMap' Patch by Eugene Kosov <claprix@yandex.ru> Differential Revision: http://reviews.llvm.org/D19219 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 266595	2016-04-18 09:17:29 +00:00
Matt Arsenault	c10783c42d	AMDGPU: Enable LocalStackSlotAllocation pass This resolves more frame indexes early and folds the immediate offsets into the scratch mubuf instructions. This cleans up a lot of the mess that's currently emitted, such as emitting add 0s and repeatedly initializing the same register to 0 when spilling. llvm-svn: 266508	2016-04-16 02:13:37 +00:00
Matt Arsenault	b6be202779	AMDGPU: Use s_addk_i32 / s_mulk_i32 llvm-svn: 266506	2016-04-16 01:46:49 +00:00
Jun Bum Lim	4c5bd58ebe	[MachineScheduler]Add support for store clustering Perform store clustering just like load clustering. This change add StoreClusterMutation in machine-scheduler. To control StoreClusterMutation, added enableClusterStores() in TargetInstrInfo.h. This is enabled only on AArch64 for now. This change also add support for unscaled stores which were not handled in getMemOpBaseRegImmOfs(). llvm-svn: 266437	2016-04-15 14:58:38 +00:00
Nicolai Haehnle	750082d1fe	AMDGPU/SI: Fix regression with no-return atomics Summary: In the added test-case, the atomic instruction feeds into a non-machine CopyToReg node which hasn't been selected yet, so guard against non-machine opcodes here. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19043 llvm-svn: 266433	2016-04-15 14:42:36 +00:00
Matt Arsenault	9c499c3a74	AMDGPU: Remove custom load/store scalarization llvm-svn: 266385	2016-04-14 23:31:26 +00:00
Matt Arsenault	fd8ab09c0e	AMDGPU: Include LDS size in printed comment llvm-svn: 266382	2016-04-14 22:11:51 +00:00
Matt Arsenault	3d1c1deb04	AMDGPU: Run SIFoldOperands after PeepholeOptimizer PeepholeOptimizer cleans up redundant copies, which makes the operand folding more effective. shader-db stats: Totals: SGPRS: 34200 -> 34336 (0.40 %) VGPRS: 22118 -> 21655 (-2.09 %) Code Size: 632144 -> 633460 (0.21 %) bytes LDS: 11 -> 11 (0.00 %) blocks Scratch: 10240 -> 11264 (10.00 %) bytes per wave Max Waves: 8822 -> 8918 (1.09 %) Wait states: 0 -> 0 (0.00 %) Totals from affected shaders: SGPRS: 7704 -> 7840 (1.77 %) VGPRS: 5169 -> 4706 (-8.96 %) Code Size: 234444 -> 235760 (0.56 %) bytes LDS: 2 -> 2 (0.00 %) blocks Scratch: 0 -> 1024 (0.00 %) bytes per wave Max Waves: 1188 -> 1284 (8.08 %) Wait states: 0 -> 0 (0.00 %) Increases: SGPRS: 35 (0.01 %) VGPRS: 1 (0.00 %) Code Size: 59 (0.02 %) LDS: 0 (0.00 %) Scratch: 1 (0.00 %) Max Waves: 48 (0.02 %) Wait states: 0 (0.00 %) Decreases: SGPRS: 26 (0.01 %) VGPRS: 54 (0.02 %) Code Size: 68 (0.03 %) LDS: 0 (0.00 %) Scratch: 0 (0.00 %) Max Waves: 4 (0.00 %) Wait states: 0 (0.00 %) llvm-svn: 266378	2016-04-14 21:58:24 +00:00
Matt Arsenault	4ac341c8b3	AMDGPU: Directly emit m0 initialization with s_mov_b32 Currently what comes out of instruction selection is a register initialized to -1, and then copied to m0. MachineCSE doesn't consider copies, but we want these to be CSEed. This isn't much of a problem currently, because SIFoldOperands is run immediately after. This avoids regressions when SIFoldOperands is run later from leaving all copies to m0. llvm-svn: 266377	2016-04-14 21:58:15 +00:00
Matt Arsenault	7900334dd5	AMDGPU: Fold bitcasts of scalar constants to vectors This cleans up some messes since the individual scalar components can be CSEed. llvm-svn: 266376	2016-04-14 21:58:07 +00:00
Tom Stellard	000c5af3e6	AMDGPU: Add skeleton GlobalIsel implementation Summary: This adds the necessary target code to be able to run the ir translator. Lowering function arguments and returns is a nop and there is no support for RegBankSelect. Reviewers: arsenm, qcolombet Subscribers: arsenm, joker.eph, vkalintiris, llvm-commits Differential Revision: http://reviews.llvm.org/D19077 llvm-svn: 266356	2016-04-14 19:09:28 +00:00
Nicolai Haehnle	05b127da06	[StructurizeCFG] Annotate branches that were treated as uniform Summary: This fully solves the problem where the StructurizeCFG pass does not consider the same branches as uniform as the SIAnnotateControlFlow pass. The patch in D19013 helps with this problem, but is not sufficient (and, interestingly, causes a "regression" with one of the existing test cases). No tests included here, because tests in D19013 already cover this. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19018 llvm-svn: 266346	2016-04-14 17:42:35 +00:00
Nicolai Haehnle	723b73b4eb	AMDGPU: Remove SIFixSGPRLiveRanges pass Summary: This pass is unnecessary and overly conservative. It was motivated by situations like def %vreg0:SGPR_32 ... if-block: .. def %vreg1:SGPR_32 ... else-block: ... use %vreg0:SGPR_32 ... and similar situations with uses after the non-uniform control flow, where we are not allowed to assign %vreg0 and %vreg1 to the same physical register, even though in the original, thread/workitem-based CFG, it looks like the live ranges of these registers do not overlap. However, by the time register allocation runs, we have moved to a wave-based CFG that accurately represents the fact that the wave may run through both the if- and the else-block. So the live ranges of %vreg0 and %vreg1 already overlap even without the SIFixSGPRLiveRanges pass. In addition to proving this change correct, I have tested it with Piglit and a small number of other tests. Reviewers: arsenm, tstellarAMD Subscribers: MatzeB, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19041 llvm-svn: 266345	2016-04-14 17:42:29 +00:00
Nicolai Haehnle	19f0f5177d	AMDGPU: change a redundant if () to an assert(). NFC Summary: I've been carrying this change around with me for a while, because the if () managed to confuse me while following the code. All callers ensure that the assertion holds. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19042 llvm-svn: 266344	2016-04-14 17:42:18 +00:00
Tom Stellard	79a1fd718c	AMDGPU: allow specifying a workgroup size that needs to fit in a compute unit Summary: For GL_ARB_compute_shader we need to support workgroup sizes of at least 1024. However, if we want to allow large workgroup sizes, we may need to use less registers, as we have to run more waves per SIMD. This patch adds an attribute to specify the maximum work group size the compiled program needs to support. It defaults, to 256, as that has no wave restrictions. Reducing the number of registers available is done similarly to how the registers were reserved for chips with the sgpr init bug. Reviewers: mareko, arsenm, tstellarAMD, nhaehnle Subscribers: FireBurn, kerberizer, llvm-commits, arsenm Differential Revision: http://reviews.llvm.org/D18340 Patch By: Bas Nieuwenhuizen llvm-svn: 266337	2016-04-14 16:27:07 +00:00
Tom Stellard	f110f8f9f7	AMDGPU/SI: Use the correct scratch wave offset register for shaders. Summary: The code previously always used s1 as it was using the user + system SGPR information for compute kernels. This is incorrect for Mesa shaders though, The register should be the next SGPR after all user and system SGPR's. We use that Mesa adds arguments for all input and system SGPR's and take the next available SGPR for the scratch wave offset register. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewers: mareko, arsenm, nhaehnle, tstellarAMD Subscribers: qcolombet, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18941 Patch By: Bas Nieuwenhuizen llvm-svn: 266336	2016-04-14 16:27:03 +00:00
Matt Arsenault	9cd90712f0	AMDGPU: Implement canonicalize Also add generic DAG node for it. llvm-svn: 266272	2016-04-14 01:42:16 +00:00
Tom Stellard	b951f10701	AMDGPU/SI: Add support for spilling VGPRs without having to scavenge registers Summary: When we are spilling SGPRs to scratch memory, we usually don't have free SGPRs to do the address calculation, so we need to re-use the ScratchOffset register for the calculation. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18917 llvm-svn: 266244	2016-04-13 20:44:16 +00:00
Artem Tamazov	eb4d5a9b0b	[AMDGPU][llvm-mc] Support of Trap Handler registers (TTMP0..11 and TBA/TMA)git status Tests added along with implemented feature. Note that there is a small leftover of unecessary MI sheduling issue (more info in the review). CodeGen/AMDGPU/salu-to-valu.ll updated to fix the false regression. TODO: Support for TTMP quads, comma-separated syntax in "[]" and more. Differential Revision: http://reviews.llvm.org/D17825 llvm-svn: 266205	2016-04-13 16:18:41 +00:00
Tom Stellard	703b2ec43f	AMDGPU/SI: Fix spilling of 96-bit registers Summary: It seems like this was broken in r252327. I thought we had test cases for this, but it's really hard to tirgger spills of this exact register size since they aren't used very much. Reviewers: arsenm, nhaehnle Subscribers: nhaehnle, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19021 llvm-svn: 266152	2016-04-12 23:57:30 +00:00
Nicolai Haehnle	df77c9ada4	AMDGPU: add llvm.amdgcn.buffer.load/store intrinsics Summary: They correspond to BUFFER_LOAD/STORE_DWORD[_X2,X3,X4] and mostly behave like llvm.amdgcn.buffer.load/store.format. They will be used by Mesa for SSBO and atomic counters at least when robust buffer access behavior is desired. (These instructions perform no format conversion and do buffer range checking per component.) As a side effect of sharing patterns with llvm.amdgcn.buffer.store.format, it has become trivial to add support for the f32 and v2f32 variants of that intrinsic, so the patch does so. Also DAG-ify (and fix) some tests that I noticed intermittent failures in while developing this patch. Some tests were (temporarily) adjusted for the required mayLoad/hasSideEffects changes to the BUFFER_STORE_DWORD* instructions. See also http://reviews.llvm.org/D18291. Reviewers: arsenm, tstellarAMD, mareko Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18292 llvm-svn: 266126	2016-04-12 21:18:10 +00:00
Tom Stellard	ab1d3a9d50	AMDGPU/SI: Insert wait states required after v_readfirstlane on SI Summary: We will be able to handle this case much better once the hazard recognizer is finished, but this conservative implementation fixes a hang with the piglit test: spec/arb_arrays_of_arrays/execution/sampler/fs-nested-struct-arrays-nonconst-nested-arra Reviewers: arsenm, nhaehnle Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18988 llvm-svn: 266105	2016-04-12 18:40:43 +00:00
Matt Arsenault	3b08238f78	AMDGPU: Eliminate half of i64 or if one operand is zero_extend from i32 This helps clean up some of the mess when expanding unaligned 64-bit loads when changed to be promote to v2i32, and fixes situations where or x, 0 was emitted after splitting 64-bit ors during moveToVALU. I think this could be a generic combine but I'm not sure. llvm-svn: 266104	2016-04-12 18:24:38 +00:00
Nicolai Haehnle	279970c0dc	AMDGPU/SI: Fix a mis-compilation of multi-level breaks Summary: Under certain circumstances, multi-level breaks (or what is understood by the control flow passes as such) could be miscompiled in a way that causes infinite loops, by emitting incorrect control flow intrinsics. This fixes a hang in dEQP-GLES3.functional.shaders.loops.while_dynamic_iterations.conditional_continue_vertex Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18967 llvm-svn: 266088	2016-04-12 16:10:38 +00:00
Matt Arsenault	64fa2f4513	AMDGPU: Implement i64 global atomics llvm-svn: 266075	2016-04-12 14:05:11 +00:00
Matt Arsenault	a9dbdcae04	AMDGPU: Add atomic_inc + atomic_dec intrinsics These are different than atomicrmw add 1 because they have an additional input value to clamp the result. llvm-svn: 266074	2016-04-12 14:05:04 +00:00
Matt Arsenault	21ecfe43ba	AMDGPU: Remove trailing whitespace llvm-svn: 266073	2016-04-12 14:04:54 +00:00
Tom Stellard	0ffdf65eaa	Revert "AMDGPU/SI: Do not generate s_waitcnt after ds_permute/ds_bpermute" This reverts commit r263720. Just confirmed that s_waitcnt is required after ds_permute/ds_bpermute. llvm-svn: 265992	2016-04-11 20:38:40 +00:00
Jan Vesely	43b7b5b846	AMDGPU/SI: Implement atomic load/store for i32 and i64 Standard load/store instructions with GLC bit set. Reviewers: tstellardAMD, arsenm Differential Revision: http://reviews.llvm.org/D18760 llvm-svn: 265709	2016-04-07 19:23:11 +00:00
Tom Stellard	9112758077	AMDGPU/SI: Add latency for export instructions Reviewers: arsenm, nhaehnle Subscribers: nhaehnle, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18599 llvm-svn: 265708	2016-04-07 18:30:05 +00:00
Tom Stellard	d37630e461	AMDGPU/SI: Add MachineBasicBlock parameter to SIInstrInfo::insertWaitStates Summary: This makes it possible to insert nops at the end of blocks. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18549 llvm-svn: 265678	2016-04-07 14:47:07 +00:00
Valery Pykhtin	e23b6deb01	[AMDGPU] fix readlane/readfirstlane src vgpr operand type. For VGPR_32 operand disassembler expects a VGPR register encoded as 0..255 (enum8 src operand). readfirstlane/readline actually has enum9 operand and this change fixes VGPR_32 to VS_32 (enum9 encoding). Differential Revision: http://reviews.llvm.org/D18696 llvm-svn: 265670	2016-04-07 13:41:51 +00:00
Benjamin Kramer	4fb78518f1	Make helper functions static. NFC. llvm-svn: 265653	2016-04-07 10:10:09 +00:00
Nicolai Haehnle	df3a20cd80	AMDGPU: Add a shader calling convention This makes it possible to distinguish between mesa shaders and other kernels even in the presence of compute shaders. Patch By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Differential Revision: http://reviews.llvm.org/D18559 llvm-svn: 265589	2016-04-06 19:40:20 +00:00
Sam Kolton	ff90c60a78	[AMDGPU] AsmParser: disable DPP for unsupported instructions. New dpp tests. Fix v_nop_dpp. Summary: 1. Disable DPP encoding for instructions that do not support it: - VOP1: - v_readfirstlane_b32 - v_clrexcp - v_movreld_b32 - v_movrels_b32 - v_movrelsd_b32 - VOP2: - v_madmk_f16/32 - v_madak_f16/32 - VOPC, VINTRP, VOP3 2. Fix DPP for v_nop 3. New DPP tests for VOP1 and VOP2 instructions Reviewers: nhaustov, tstellarAMD, vpykhtin Subscribers: tstellarAMD, arsenm Differential Revision: http://reviews.llvm.org/D18552 llvm-svn: 265538	2016-04-06 13:29:59 +00:00
Matthias Braun	7dc03f060e	RegisterScavenger: Take a reference as enterBasicBlock() argument. Make it obvious that the argument cannot be nullptr. Remove an unnecessary nullptr check in initRegState. llvm-svn: 265511	2016-04-06 02:47:09 +00:00
Konstantin Zhuravlyov	e63e02cb0c	[AMDGPU] Emit linkonce and linkonce_odr symbols Differential Revision: http://reviews.llvm.org/D18726 llvm-svn: 265408	2016-04-05 16:00:58 +00:00
Tom Stellard	354a43c7bc	AMDGPU: Implement {BUFFER,FLAT}_ATOMIC_CMPSWAP{,_X2} Summary: Implement BUFFER_ATOMIC_CMPSWAP{,_X2} instructions on all GCN targets, and FLAT_ATOMIC_CMPSWAP{,_X2} on CI+. 32-bit instruction variants tested manually on Kabini and Bonaire. Tests and parts of code provided by Jan Veselý. Patch by: Vedran Miletić Reviewers: arsenm, tstellarAMD, nhaehnle Subscribers: jvesely, scchan, kanarayan, arsenm Differential Revision: http://reviews.llvm.org/D17280 llvm-svn: 265170	2016-04-01 18:27:37 +00:00
Valery Pykhtin	5b3559c1ec	[AMDGPU] fix MADAK/MADMK instructions operand namings to match encoding fields. $vsrc1 -> $src1, $k -> $imm Differential Revision: http://reviews.llvm.org/D18659 llvm-svn: 265141	2016-04-01 13:13:12 +00:00
Sam Kolton	1048fb1818	[AMDGPU] Disassembler: support for DPP Review: http://reviews.llvm.org/D18642 llvm-svn: 265015	2016-03-31 14:15:04 +00:00
Matt Arsenault	2fe4fbc184	AMDGPU: Add frexp_exp intrinsic llvm-svn: 264944	2016-03-30 22:28:52 +00:00
Aaron Ballman	ef0fe1eed8	Silencing warnings from MSVC 2015 Update 2. All of these changes silence "C4334 '<<': result of 32-bit shift implicitly converted to 64 bits (was 64-bit shift intended?)". NFC. llvm-svn: 264929	2016-03-30 21:30:00 +00:00
Tom Stellard	1d5e6d4bdc	AMDGPU/SI: Improve MachineSchedModel definition This patch contains a few improvements to the model, including: - Using a single resource with a defined buffers size for each memory unit. - Setting the IssueWidth correctly. - Fixing latency values for memory instructions. shader-db stats: 16429 shaders in 3231 tests Totals: SGPRS: 318232 -> 312328 (-1.86 %) VGPRS: 208996 -> 209346 (0.17 %) Code Size: 7147044 -> 7166440 (0.27 %) bytes LDS: 83 -> 83 (0.00 %) blocks Scratch: 1862656 -> 1459200 (-21.66 %) bytes per wave Max Waves: 49182 -> 49243 (0.12 %) Wait states: 0 -> 0 (0.00 %)A Differential Revision: http://reviews.llvm.org/D18453 llvm-svn: 264877	2016-03-30 16:35:13 +00:00
Tom Stellard	0bc954e3bc	AMDGPU/SI: Enable lanemask tracking in misched Summary: This results in higher register usage, but should make it easier for the compiler to hide latency. This pass is a prerequisite for some more scheduler improvements, and I think the increase register usage with this patch is acceptable, because when combined with the scheduler improvements, the total register usage will decrease. shader-db stats: 2382 shaders in 478 tests Totals: SGPRS: 48672 -> 49088 (0.85 %) VGPRS: 34148 -> 34847 (2.05 %) Code Size: 1285816 -> 1289128 (0.26 %) bytes LDS: 28 -> 28 (0.00 %) blocks Scratch: 492544 -> 573440 (16.42 %) bytes per wave Max Waves: 6856 -> 6846 (-0.15 %) Wait states: 0 -> 0 (0.00 %) Depends on D18451 Reviewers: nhaehnle, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18452 llvm-svn: 264876	2016-03-30 16:35:09 +00:00
Konstantin Zhuravlyov	ecc7cbf611	Test commit access llvm-svn: 264736	2016-03-29 15:15:44 +00:00
Tom Stellard	a76bcc2ea1	AMDGPU/SI: Limit load clustering to 16 bytes instead of 4 instructions Summary: This helps prevent load clustering from drastically increasing register pressure by trying to cluster 4 SMRDx8 loads together. The limit of 16 bytes was chosen, because it seems like that was the original intent of setting the limit to 4 instructions, but more analysis could show that a different limit is better. This fixes yields small decreases in register usage with shader-db, but also helps avoid a large increase in register usage when lane mask tracking is enabled in the machine scheduler, because lane mask tracking enables more opportunities for load clustering. shader-db stats: 2379 shaders in 477 tests Totals: SGPRS: 49744 -> 48600 (-2.30 %) VGPRS: 34120 -> 34076 (-0.13 %) Code Size: 1282888 -> 1283184 (0.02 %) bytes LDS: 28 -> 28 (0.00 %) blocks Scratch: 495616 -> 492544 (-0.62 %) bytes per wave Max Waves: 6843 -> 6853 (0.15 %) Wait states: 0 -> 0 (0.00 %) Reviewers: nhaehnle, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18451 llvm-svn: 264589	2016-03-28 16:10:13 +00:00
Justin Bogner	f2a0d349a6	AMDGPU: Fix a use-after free and a missing break We're erasing MI here, but then immediately using it again inside the `if`. This moves the erase after we're done using it. Doing that reveals a second problem though - this case is missing a break, so we fall through to the default and dereference MI again. This is obviously a bug, though I don't know how to write a test that triggers it - all we do in the error case is print some extra debug output. Both of these issue crash on lots of tests under ASAN with the recycling allocator changes from PR26808 applied. llvm-svn: 264442	2016-03-25 18:33:16 +00:00
Matt Arsenault	8c8fcb2585	AMDGPU: Cost model for basic integer operations This resolves bug 21148 by preventing promotion to i64 induction variables. llvm-svn: 264376	2016-03-25 01:16:40 +00:00
Matt Arsenault	9651813ee0	AMDGPU: Partially implement getArithmeticInstrCost for FP ops llvm-svn: 264374	2016-03-25 01:00:32 +00:00

... 3 4 5 6 7 ...

1028 Commits