llvm-project

Commit Graph

Author	SHA1	Message	Date
Matt Arsenault	d745c28945	AMDGPU: Sign extend constants when splitting them This will confuse later passes which try to look at the immediate value and don't truncate first. llvm-svn: 280974	2016-09-08 17:44:36 +00:00
Matt Arsenault	be90f70d3a	AMDGPU: Try to commute when selecting s_addk_i32/s_mulk_i32 llvm-svn: 280972	2016-09-08 17:35:41 +00:00
Matt Arsenault	bbb47da8a1	AMDGPU: Support commuting with immediate in src0 llvm-svn: 280970	2016-09-08 17:19:29 +00:00
Yaxun Liu	90658fff1b	AMDGPU: Remove a useless variable which caused build failure for lld. llvm-svn: 280841	2016-09-07 18:31:11 +00:00
Yaxun Liu	638914009a	AMDGPU: Add hidden kernel arguments to runtime metadata OpenCL kernels have hidden kernel arguments for global offset and printf buffer. For consistency, these hidden argument should be included in the runtime metadata. Also updated kernel argument kind metadata. Differential Revision: https://reviews.llvm.org/D23424 llvm-svn: 280829	2016-09-07 17:44:00 +00:00
Matt Arsenault	479ba3aac0	AMDGPU: Make some scalar instructions commutable llvm-svn: 280784	2016-09-07 06:25:55 +00:00
Matt Arsenault	6cda10c950	Remove unnecessary call to getAllocatableRegClass This reapplies r252565 and r252674, effectively reverting r252956. This allows VS_32/VS_64 to be unallocatable like they should be. llvm-svn: 280783	2016-09-07 06:16:45 +00:00
Konstantin Zhuravlyov	1d65026ca6	[AMDGPU] Wave and register controls - Implemented amdgpu-flat-work-group-size attribute - Implemented amdgpu-num-active-waves-per-eu attribute - Implemented amdgpu-num-sgpr attribute - Implemented amdgpu-num-vgpr attribute - Dynamic LDS constraints are in a separate patch Patch by Tom Stellard and Konstantin Zhuravlyov Differential Revision: https://reviews.llvm.org/D21562 llvm-svn: 280747	2016-09-06 20:22:28 +00:00
Tom Stellard	2add8a1140	AMDGPU/SI: Teach SIInstrInfo::FoldImmediate() to fold immediates into copies Summary: I put this code here, because I want to re-use it in a few other places. This supersedes some of the immediate folding code we have in SIFoldOperands. I think the peephole optimizers is probably a better place for folding immediates into copies, since it does some register coalescing in the same time. This will also make it easier to transition SIFoldOperands into a smarter pass, where it looks at all uses of instruction at once to determine the optimal way to fold operands. Right now, the pass just considers one operand at a time. Reviewers: arsenm Subscribers: wdng, nhaehnle, arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D23402 llvm-svn: 280744	2016-09-06 20:00:26 +00:00
Wei Ding	5e832e866e	AMDGPU : Add XNACK feature to GPUs that support it. Differential Revision: http://reviews.llvm.org/D24276 llvm-svn: 280742	2016-09-06 19:55:17 +00:00
Valery Pykhtin	8bc659637c	[AMDGPU] Refactor FLAT TD instructions Differential revision: https://reviews.llvm.org/D24072 llvm-svn: 280655	2016-09-05 11:22:51 +00:00
Matt Arsenault	ac42ba8633	AMDGPU: Set sizes of spill pseudos llvm-svn: 280595	2016-09-03 17:25:44 +00:00
Matt Arsenault	5ffe3e1d93	AMDGPU: Fix adding duplicate implicit exec uses I'm not sure if this should be considered a bug in copyImplicitOps or not, but implicit operands that are part of the static instruction definition should not be copied. llvm-svn: 280594	2016-09-03 17:25:39 +00:00
Nicolai Haehnle	3bba6a8438	AMDGPU: Reduce the duration of whole-quad-mode Summary: This contains two changes that reduce the time spent in WQM, with the intention of reducing bandwidth required by VMEM loads: 1. Sampling instructions by themselves don't need to run in WQM, only their coordinate inputs need it (unless of course there is a dependent sampling instruction). The initial scanInstructions step is modified accordingly. 2. When switching back from WQM to Exact, switch back as soon as possible. This affects the logic in processBlock. This should always be a win or at best neutral. There are also some cleanups (e.g. remove unused ExecExports) and some new debugging output. Reviewers: arsenm, tstellarAMD, mareko Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: http://reviews.llvm.org/D22092 llvm-svn: 280590	2016-09-03 12:26:38 +00:00
Nicolai Haehnle	a246dccc26	AMDGPU: Fix an interaction between WQM and polygon stippling Summary: This fixes a rare bug in polygon stippling with non-monolithic pixel shaders. The underlying problem is as follows: the prolog part contains the polygon stippling sequence, i.e. a kill. The main part then enables WQM based on the _reduced_ exec mask, effectively undoing most of the polygon stippling. Since we cannot know whether polygon stippling will be used, the main part of a non-monolithic shader must always return to exact mode to fix this problem. Reviewers: arsenm, tstellarAMD, mareko Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D23131 llvm-svn: 280589	2016-09-03 12:26:32 +00:00
Matt Arsenault	2510a31677	AMDGPU: Fix spilling of m0 readlane/writelane do not support using m0 as the output/input. Constrain the register class of spill vregs to try to avoid this, but also handle spilling of the physreg when necessary by inserting an additional copy to a normal SGPR. llvm-svn: 280584	2016-09-03 06:57:55 +00:00
Jan Vesely	ea45746d5a	AMDGPU/R600: EXTRACT_VECT_ELT should only bypass BUILD_VECTOR if the vectors have the same number of elements. Fixes R600 piglit regressions since r280298 Differential Revision: https://reviews.llvm.org/D24174 llvm-svn: 280535	2016-09-02 20:13:19 +00:00
Jan Vesely	00864886f4	AMDGPU/R600: Expand unaligned writes to local and global AS LOCAL and GLOBAL AS only PRIVATE needs special treatment Differential Revision: https://reviews.llvm.org/D23971 llvm-svn: 280526	2016-09-02 19:07:06 +00:00
Yaxun Liu	add05a8d95	AMDGPU: Add runtime metadata for pointee alignment of argument. Add runtime metdata for pointee alignment of pointer type kernel argument. The key is KeyArgPointeeAlign and the value is a 32 bit unsigned integer. Differential Revision: https://reviews.llvm.org/D24145 llvm-svn: 280399	2016-09-01 18:46:49 +00:00
Changpeng Fang	b28fe0307f	AMDGPU/SI: MIMG TD Refactoring. Summary: Created a new td file MIMGInstructions.td which contains all definitions of MIMG related instructions. Reviewed by: kzhuravl, vpykhtin Differential Revision: http://reviews.llvm.org/D24106 llvm-svn: 280385	2016-09-01 17:54:54 +00:00
Valery Pykhtin	1b13886b5f	[AMDGPU] Scalar Memory instructions TD refactoring Differential revision: https://reviews.llvm.org/D23996 llvm-svn: 280349	2016-09-01 09:56:47 +00:00
Matt Arsenault	b50eb8dc2b	AMDGPU: Fix introducing stack access on unaligned v16i8 llvm-svn: 280298	2016-08-31 21:52:27 +00:00
Matt Arsenault	1d2151781b	AMDGPU: Use copy instead of mov during frame lowering This occurs before RA pseudos are expanded. It's less code to emit the copy. llvm-svn: 280297	2016-08-31 21:52:25 +00:00
Matt Arsenault	57bc4324f8	AMDGPU: Refactor frame lowering This will make future changes easier. llvm-svn: 280296	2016-08-31 21:52:21 +00:00
Tom Stellard	ba5730884b	AMDGPU/SI: Make sure llvm.amdgcn.implicitarg.ptr() is at least 4-byte aligned Summary: This fixes some OpenCV tests that were broken by libclc commit r276443. Reviewers: arsenm, jvesely Subscribers: arsenm, wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D24051 llvm-svn: 280274	2016-08-31 18:46:07 +00:00
Nikolay Haustov	eba808957e	AMDGPU/SI: Handle aliases in AMDGPUAlwaysInlinePass Summary: Simply replace usage of aliases to functions with aliasee. This came up when bitcode linking to builtin library and calls to aliases not being resolved. Also made minor improvements to existing test. Reviewers: tstellarAMD, alex-t, vpykhtin Subscribers: arsenm, wdng, rampitec Differential Revision: https://reviews.llvm.org/D24023 llvm-svn: 280221	2016-08-31 11:18:33 +00:00
Matt Arsenault	a609e2d5ce	AMDGPU: Relax SGPR asm constraint register class s should be SReg_32 to be as general as possible. This can avoid a copy from m0. llvm-svn: 280154	2016-08-30 20:50:08 +00:00
Valery Pykhtin	a34fb49f8f	[AMDGPU] Refactor SOP instructions TD files. Differential revision: https://reviews.llvm.org/D23617 llvm-svn: 280101	2016-08-30 15:20:31 +00:00
NAKAMURA Takumi	9720f57a17	SILoadStoreOptimizer.cpp: Fix a warning in r279991. [-Wunused-variable] llvm-svn: 280075	2016-08-30 11:50:21 +00:00
Jan Vesely	89876673cd	AMDGPU/R600: Cleanup DAGCombine Move SDLoc initialization to comon place. fall back to AMDGPU version in one place Differential Revision: https://reviews.llvm.org/D23900 llvm-svn: 280030	2016-08-29 23:21:46 +00:00
Jan Vesely	77ed6af416	AMDGPU/R600: Remove MergeVectorStores from legalization This is handled by DAGCombiner in a more generic way Differential Revision: https://reviews.llvm.org/D23970 llvm-svn: 280019	2016-08-29 22:05:06 +00:00
Saleem Abdulrasool	43e5fe3fac	AMDGPU: fix mismatch tags, NFC llvm-svn: 280006	2016-08-29 20:42:07 +00:00
Tom Stellard	0d23ebe888	AMDGPU/SI: Implement a custom MachineSchedStrategy Summary: GCNSchedStrategy re-uses most of GenericScheduler, it's just uses a different method to compute the excess and critical register pressure limits. It's not enabled by default, to enable it you need to pass -misched=gcn to llc. Shader DB stats: 32464 shaders in 17874 tests Totals: SGPRS: 1542846 -> 1643125 (6.50 %) VGPRS: 1005595 -> 904653 (-10.04 %) Spilled SGPRs: 29929 -> 27745 (-7.30 %) Spilled VGPRs: 334 -> 352 (5.39 %) Scratch VGPRs: 1612 -> 1624 (0.74 %) dwords per thread Code Size: 36688188 -> 37034900 (0.95 %) bytes LDS: 1913 -> 1913 (0.00 %) blocks Max Waves: 254101 -> 265125 (4.34 %) Wait states: 0 -> 0 (0.00 %) Totals from affected shaders: SGPRS: 1338220 -> 1438499 (7.49 %) VGPRS: 886221 -> 785279 (-11.39 %) Spilled SGPRs: 29869 -> 27685 (-7.31 %) Spilled VGPRs: 334 -> 352 (5.39 %) Scratch VGPRs: 1612 -> 1624 (0.74 %) dwords per thread Code Size: 34315716 -> 34662428 (1.01 %) bytes LDS: 1551 -> 1551 (0.00 %) blocks Max Waves: 188127 -> 199151 (5.86 %) Wait states: 0 -> 0 (0.00 %) Reviewers: arsenm, mareko, nhaehnle, MatzeB, atrick Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: https://reviews.llvm.org/D23688 llvm-svn: 279995	2016-08-29 19:42:52 +00:00
Tom Stellard	c2ff0eb697	AMDGPU/SI: Improve SILoadStoreOptimizer and run it before the scheduler Summary: The SILoadStoreOptimizer can now look ahead more then one instruction when looking for instructions to merge, which greatly improves the number of loads/stores that we are able to merge. Moving the pass before scheduling avoids increasing register pressure after the scheduler, so that the scheduler's register pressure estimates will be more accurate. It also gives more consistent results, since it is no longer affected by minor scheduling changes. Reviewers: arsenm Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: https://reviews.llvm.org/D23814 llvm-svn: 279991	2016-08-29 19:15:22 +00:00
Matt Arsenault	b90fc9b3b4	AMDGPU/R600: Fix fixups used for constant arrays Fixes bug 29289 llvm-svn: 279986	2016-08-29 19:01:48 +00:00
Tom Stellard	5d3f71f721	AMDGPU/SI: Improve register allocation hints for sopk instructions Summary: For shrinking SOPK instructions, we were creating a hint to tell the register allocator to use the register allocated for src0 for the dst operand as well. However, this seems to not work sometimes depending on the order virtual registers are assigned physical registers. To fix this, I've added a second allocation hint which does the reverse, asks that the register allocated for dst is used for src0. Reviewers: arsenm Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D23862 llvm-svn: 279968	2016-08-29 13:06:10 +00:00
Tom Stellard	662f330852	AMDGPU/SI: Query AA, if available, in areMemAccessesTriviallyDisjoint() Summary: The SILoadStoreOptimizer will need to use AliasAnalysis here in order to move it before scheduling. Reviewers: arsenm Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D23813 llvm-svn: 279963	2016-08-29 12:05:32 +00:00
Jan Vesely	38814fa2fd	AMDGPU/R600: Enable Load combine Fix and improve tests Differential Revision: https://reviews.llvm.org/D23899 llvm-svn: 279925	2016-08-27 19:09:43 +00:00
Matt Arsenault	a15ea4e217	AMDGPU: Mark sched model complete Fixes bug 26800 llvm-svn: 279910	2016-08-27 03:39:27 +00:00
Matt Arsenault	71ed8a67e8	AMDGPU: Remove unneeded implicit exec uses/defs SI_BREAK, SI_IF_BREAK, and SI_ELSE_BREAK do not def exec. SI_IF_BREAK and SI_ELSE_BREAK do not read it either. llvm-svn: 279909	2016-08-27 03:00:51 +00:00
Matt Arsenault	2712d4a3d8	AMDGPU: Select mulhi 24-bit instructions llvm-svn: 279902	2016-08-27 01:32:27 +00:00
Matt Arsenault	22e417956d	AMDGPU: Move cndmask pseudo to be isel pseudo There's only one use of this for the convenience of a pattern. I think v_mov_b64_pseudo should also be moved, but SIFoldOperands does currently make use of it. llvm-svn: 279901	2016-08-27 01:00:37 +00:00
Matt Arsenault	e949744474	AMDGPU: Fix sched type for branches llvm-svn: 279900	2016-08-27 00:51:02 +00:00
Matt Arsenault	f98a596954	AMDGPU: Remove register operand from si_mask_branch It isn't used for anything, and is also misleading since it could be spilled at the end of the block, so it can't be relied on. There ends up being a verifier error about using an undefined register since the spill kills the register. llvm-svn: 279899	2016-08-27 00:42:21 +00:00
Matt Arsenault	00e102baf4	AMDGPU: Improve error reporting for maximum branch distance Unfortunately this seems to only help the assembler diagnostic. llvm-svn: 279895	2016-08-27 00:21:22 +00:00
Tom Stellard	e175d8aba5	AMDGPU/SI: Canonicalize offset order for merged DS instructions Summary: If the scheduler clusters the loads, then the offsets will be sorted, but it is possible for the scheduler to scheduler loads together without out explicitly clustering them, which would give us non-sorted offsets. Also, we will want to do this if we move the load/store optimizer before the scheduler. Reviewers: arsenm Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D23776 llvm-svn: 279870	2016-08-26 21:36:47 +00:00
Tom Stellard	4b5cd87ed3	XXX llvm-svn: 279868	2016-08-26 21:16:40 +00:00
Tom Stellard	7c463c9168	AMDGPU/SI: Use a better method for determining the largest pressure sets Summary: There are a few different sgpr pressure sets, but we only care about the one which covers all of the sgprs. We were using hard-coded register pressure set names to determine the reg set id for the biggest sgpr set. However, we were using the wrong name, and this method is pretty fragile, since the reg pressure set names may change. The new method just looks for the pressure set that contains the most reg units and sets that set as our SGPR pressure set. We've also adopted the same technique for determining our VGPR pressure set. Reviewers: arsenm Subscribers: MatzeB, arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D23687 llvm-svn: 279867	2016-08-26 21:16:37 +00:00
Reid Kleckner	a5b1eef846	[MC] Move .cv_loc management logic out of MCContext MCContext already has many tasks, and separating CodeView out from it is probably a good idea. The .cv_loc tracking was modelled on the DWARF tracking which lived directly in MCContext. Removes the inclusion of MCCodeView.h from MCContext.h, so now there are only 10 build actions while I hack on CodeView support instead of 265. llvm-svn: 279847	2016-08-26 17:58:37 +00:00
Changpeng Fang	75f0968b39	AMDGCN/SI: Implement readlane/readfirstlane intrinsics Summary: This patch implements readlane/readfirstlane intrinsics. TODO: need to define a new register class to consider the case that the source could be a vector register or M0. Reviewed by: arsenm and tstellarAMD Differential Revision: http://reviews.llvm.org/D22489 llvm-svn: 279660	2016-08-24 20:35:23 +00:00

1 2 3 4 5 ...

1079 Commits