llvm-project

Commit Graph

Author	SHA1	Message	Date
Matthias Braun	733fe3676c	CodeGen: Remove MachineFunctionAnalysis => Enable (Machine)ModulePasses Re-apply this patch, hopefully I will get away without any warnings in the constructor now. This patch removes the MachineFunctionAnalysis. Instead we keep a map from IR Function to MachineFunction in the MachineModuleInfo. This allows the insertion of ModulePasses into the codegen pipeline without breaking it because the MachineFunctionAnalysis gets dropped before a module pass. Peak memory should stay unchanged without a ModulePass in the codegen pipeline: Previously the MachineFunction was freed at the end of a codegen function pipeline because the MachineFunctionAnalysis was dropped; With this patch the MachineFunction is freed after the AsmPrinter has finished. Differential Revision: http://reviews.llvm.org/D23736 llvm-svn: 279602	2016-08-24 01:52:46 +00:00
Richard Smith	8c3fbdc6c4	Revert r279564. It introduces undefined behavior (binding a reference to a dereferenced null pointer) in MachineModuleInfo::MachineModuleInfo that causes -Werror builds (including several buildbots) to fail. llvm-svn: 279580	2016-08-23 22:08:27 +00:00
Matthias Braun	90799ce8b2	MachineFunction: Introduce NoPHIs property I want to compute the SSA property of .mir files automatically in upcoming patches. The problem with this is that some inputs will be reported as static single assignment with some passes claiming not to support SSA form. In reality though those passes do not support PHI instructions => Track the presence of PHI instructions separate from the SSA property. Differential Revision: https://reviews.llvm.org/D22719 llvm-svn: 279573	2016-08-23 21:19:49 +00:00
Matthias Braun	4c1f1f120c	CodeGen: Remove MachineFunctionAnalysis => Enable (Machine)ModulePasses Re-apply this commit with the deletion of a MachineFunction delegated to a separate pass to avoid use after free when doing this directly in AsmPrinter. This patch removes the MachineFunctionAnalysis. Instead we keep a map from IR Function to MachineFunction in the MachineModuleInfo. This allows the insertion of ModulePasses into the codegen pipeline without breaking it because the MachineFunctionAnalysis gets dropped before a module pass. Peak memory should stay unchanged without a ModulePass in the codegen pipeline: Previously the MachineFunction was freed at the end of a codegen function pipeline because the MachineFunctionAnalysis was dropped; With this patch the MachineFunction is freed after the AsmPrinter has finished. Differential Revision: http://reviews.llvm.org/D23736 llvm-svn: 279564	2016-08-23 20:58:29 +00:00
Matthias Braun	7f66202d38	Revert "(HEAD -> master, origin/master, origin/HEAD) CodeGen: Remove MachineFunctionAnalysis => Enable (Machine)ModulePasses" Reverting while tracking down a use after free. This reverts commit r279502. llvm-svn: 279503	2016-08-23 05:17:11 +00:00
Matthias Braun	fd936841eb	CodeGen: Remove MachineFunctionAnalysis => Enable (Machine)ModulePasses This patch removes the MachineFunctionAnalysis. Instead we keep a map from IR Function to MachineFunction in the MachineModuleInfo. This allows the insertion of ModulePasses into the codegen pipeline without breaking it because the MachineFunctionAnalysis gets dropped before a module pass. Peak memory should stay unchanged without a ModulePass in the codegen pipeline: Previously the MachineFunction was freed at the end of a codegen function pipeline because the MachineFunctionAnalysis was dropped; With this patch the MachineFunction is freed after the AsmPrinter has finished. Differential Revision: http://reviews.llvm.org/D23736 llvm-svn: 279502	2016-08-23 03:20:09 +00:00
Matt Arsenault	78fc9daf8d	AMDGPU: Split SILowerControlFlow into two pieces Do most of the lowering in a pre-RA pass. Keep the skip jump insertion late, plus a few other things that require more work to move out. One concern I have is now there may be COPY instructions which do not have the necessary implicit exec uses if they will be lowered to v_mov_b32. This has a positive effect on SGPR usage in shader-db. llvm-svn: 279464	2016-08-22 19:33:16 +00:00
NAKAMURA Takumi	59a20649c6	Untabify. llvm-svn: 279408	2016-08-22 00:58:04 +00:00
Tim Shen	b5e0f5ac95	[GraphTraits] Make nodes_iterator dereference to NodeType/NodeRef Currently nodes_iterator may dereference to a NodeType or a NodeType&. Make them all dereference to NodeType*, which is NodeRef later. Differential Revision: https://reviews.llvm.org/D23704 Differential Revision: https://reviews.llvm.org/D23705 llvm-svn: 279326	2016-08-19 21:20:13 +00:00
Michael Kuperstein	2bc3d4d46c	[SelectionDAG] Rename fextend -> fpextend, fround -> fpround, frnd -> fround The names of the tablegen defs now match the names of the ISD nodes. This makes the world a slightly saner place, as previously "fround" matched ISD::FP_ROUND and not ISD::FROUND. Differential Revision: https://reviews.llvm.org/D23597 llvm-svn: 279129	2016-08-18 20:08:15 +00:00
Wei Ding	52bb661dec	AMDGPU : Fix QSAD and MQSAD instructions' incorrect data type. Differential Revision: http://reviews.llvm.org/D23689 llvm-svn: 279126	2016-08-18 19:51:14 +00:00
Valery Pykhtin	609c2f8137	[AMDGPU] add s_incperflevel/s_decperflevel intrinsics. Differential revision: https://reviews.llvm.org/D23666 llvm-svn: 279106	2016-08-18 18:06:20 +00:00
Justin Bogner	cd1d5aaf2e	Replace a few more "fall through" comments with LLVM_FALLTHROUGH Follow up to r278902. I had missed "fall through", with a space. llvm-svn: 278970	2016-08-17 20:30:52 +00:00
Matt Arsenault	d42d58cf21	AMDGPU: Remove dead option llvm-svn: 278965	2016-08-17 20:07:16 +00:00
Justin Bogner	b03fd12cef	Replace "fallthrough" comments with LLVM_FALLTHROUGH This is a mechanical change of comments in switches like fallthrough, fall-through, or fall-thru to use the LLVM_FALLTHROUGH macro instead. llvm-svn: 278902	2016-08-17 05:10:15 +00:00
Chandler Carruth	67fc52f067	[PM] Port the always inliner to the new pass manager in a much more minimal and boring form than the old pass manager's version. This pass does the very minimal amount of work necessary to inline functions declared as always-inline. It doesn't support a wide array of things that the legacy pass manager did support, but is alse ... about 20 lines of code. So it has that going for it. Notably things this doesn't support: - Array alloca merging - To support the above, bottom-up inlining with careful history tracking and call graph updates - DCE of the functions that become dead after this inlining. - Inlining through call instructions with the always_inline attribute. Instead, it focuses on inlining functions with that attribute. The first I've omitted because I'm hoping to just turn it off for the primary pass manager. If that doesn't pan out, I can add it here but it will be reasonably expensive to do so. The second should really be handled by running global-dce after the inliner. I don't want to re-implement the non-trivial logic necessary to do comdat-correct DCE of functions. This means the -O0 pipeline will have to be at least 'always-inline,global-dce', but that seems reasonable to me. If others are seriously worried about this I'd like to hear about it and understand why. Again, this is all solveable by factoring that logic into a utility and calling it here, but I'd like to wait to do that until there is a clear reason why the existing pass-based factoring won't work. The final point is a serious one. I can fairly easily add support for this, but it seems both costly and a confusing construct for the use case of the always inliner running at -O0. This attribute can of course still impact the normal inliner easily (although I find that a questionable re-use of the same attribute). I've started a discussion to sort out what semantics we want here and based on that can figure out if it makes sense ta have this complexity at O0 or not. One other advantage of this design is that it should be quite a bit faster due to checking for whether the function is a viable candidate for inlining exactly once per function instead of doing it for each call site. Anyways, hopefully a reasonable starting point for this pass. Differential Revision: https://reviews.llvm.org/D23299 llvm-svn: 278896	2016-08-17 02:56:20 +00:00
Duncan P. N. Exon Smith	db53d99d02	AMDGPU: Avoid looking for the DebugLoc in end() The end() iterator isn't a safe thing to dereference. Pass the DebugLoc into EmitFetchClause and EmitALUClause to avoid it. llvm-svn: 278873	2016-08-17 00:06:43 +00:00
Konstantin Zhuravlyov	e0b87181cf	[AMDGPU] Remove duplicate initialization of SIDebuggerInsertNops pass Differential Revision: https://reviews.llvm.org/D23556 llvm-svn: 278863	2016-08-16 22:30:11 +00:00
Matt Arsenault	7f19298bfa	AMDGPU: Remove excessive padding from ImmOp and RegOp. The structs ImmOp and RegOp are in AArch64AsmParser.cpp (inside anonymous namespace). This diff changes the order of fields and removes the excessive padding (8 bytes). Patch by Alexander Shaposhnikov llvm-svn: 278844	2016-08-16 20:28:06 +00:00
Reid Kleckner	229d32abfc	[AMDGPU] Give enum an explicit 64-bit type to fix MSVC 2013 failures Recall that MSVC always gives enums the type 'int', nothing else. MSVC 2015 does not appear to have this problem anymore. Clang-cl -Wmicrosoft-enum-value flags this, FWIW, so now I have a true positive for my warning. :) llvm-svn: 278762	2016-08-15 23:54:44 +00:00
Jan Vesely	0486f739a4	AMDGPU/R600: Convert buffer id to VTX_READ input Use patterns instead of multiple instructions Add buffer id to asm string https://reviews.llvm.org/D22650 llvm-svn: 278749	2016-08-15 21:38:30 +00:00
Yaxun Liu	c7cbd72921	AMDGPU: Update AMDGPURuntimeMetadata.h for enums of address space qualifiers llvm-svn: 278682	2016-08-15 16:54:25 +00:00
Matt Arsenault	3661e90e71	AMDGPU: Don't fold subregister extracts into tied operands llvm-svn: 278676	2016-08-15 16:18:36 +00:00
Valery Pykhtin	c761675ef4	[AMDGPU] fix failure on printing of non-existing instruction operands. Differential revision: https://reviews.llvm.org/D23323 llvm-svn: 278665	2016-08-15 10:56:48 +00:00
Matt Arsenault	c1ebd82ebe	AMDGPU: Fix not estimating MBB operand sizes correctly llvm-svn: 278590	2016-08-13 01:43:54 +00:00
Matt Arsenault	3cc1e0066d	AMDGPU: Fix missing test for addressing mode with odd offsets Add test if the constant offset looks unaligned. llvm-svn: 278589	2016-08-13 01:43:51 +00:00
Matt Arsenault	44f6d694b3	AMDGPU/R600: Remove macros llvm-svn: 278588	2016-08-13 01:43:46 +00:00
Hans Wennborg	0dd9ed1d45	Fix more dereferenced end() iterators after r278532 llvm-svn: 278587	2016-08-13 01:12:49 +00:00
Duncan P. N. Exon Smith	f197b1f78f	ADT: Remove all ilist_iterator => pointer casts, NFC Remove all ilist_iterator to pointer casts. There were two reasons for casts: - Checking for an uninitialized (i.e., null) iterator. I added MachineInstrBundleIterator::isValid() to check for that case. - Comparing an iterator against the underlying pointer value while avoiding converting the pointer value to an iterator. This is occasionally necessary in MachineInstrBundleIterator, since there is an assertion in the constructors that the underlying MachineInstr is not bundled (but we don't care about that if we're just checking for pointer equality). To support the latter case, I rewrote the == and != operators for ilist_iterator and MachineInstrBundleIterator. - The implicit constructors now use enable_if to exclude const-iterator => non-const-iterator conversions from overload resolution (previously it was a compiler error on instantiation, now it's SFINAE). - The == and != operators are now global (friends), and are not templated. - MachineInstrBundleIterator has overloads to compare against both const_pointer and const_reference. This avoids the implicit conversions to MachineInstrBundleIterator that assert, instead just checking the address (and I added unit tests to confirm this). Notably, the only remaining uses of ilist_iterator::getNodePtrUnchecked are in ilist.h, and no code outside of ilist.h directly relies on this UB end-iterator-to-pointer conversion anymore. It's still needed for ilist_sentinel_traits, but I'll clean that up soon. llvm-svn: 278478	2016-08-12 05:05:36 +00:00
David Majnemer	562e82945e	Use the range variant of find_if instead of unpacking begin/end No functionality change is intended. llvm-svn: 278443	2016-08-12 00:18:03 +00:00
David Majnemer	0d955d0bf5	Use the range variant of find instead of unpacking begin/end If the result of the find is only used to compare against end(), just use is_contained instead. No functionality change is intended. llvm-svn: 278433	2016-08-11 22:21:41 +00:00
Matt Arsenault	18da70dd2d	AMDGPU: Remove unused tablegen utilities llvm-svn: 278414	2016-08-11 21:08:43 +00:00
Wei Ding	70cda07526	AMDGPU : Add intrinsic for instruction v_cvt_pk_u8_f32 Differential Revision: http://reviews.llvm.org/D23336 llvm-svn: 278403	2016-08-11 20:34:48 +00:00
Matt Arsenault	2ffe8fd2ce	AMDGPU: Prune includes llvm-svn: 278391	2016-08-11 19:18:50 +00:00
Matt Arsenault	56684d4538	AMDGPU: Fix crashes on memory functions llvm-svn: 278369	2016-08-11 17:31:42 +00:00
Matt Arsenault	4b5fc093d0	AMDGPU: Remove custom getSubReg This was kind of confusing, the subregister class shouldn't really be necessary. llvm-svn: 278362	2016-08-11 17:15:32 +00:00
Matt Arsenault	69fd2c1179	AMDGPU: Remove unused tracking of flat instructions llvm-svn: 278361	2016-08-11 17:15:28 +00:00
Wei Ding	34e1753585	AMDGPU : Add LLVM intrinsics for SAD related instructions. Differential Revision: http://reviews.llvm.org/D23133 llvm-svn: 278354	2016-08-11 16:33:53 +00:00
Valery Pykhtin	82c73bee2b	Revert "[AMDGPU] fix failure on printing of non-existing instruction operands." This reverts revision 278333, newly added test failed. llvm-svn: 278336	2016-08-11 14:22:05 +00:00
Valery Pykhtin	3048ff6ec3	[AMDGPU] fix failure on printing of non-existing instruction operands. Differential revision: https://reviews.llvm.org/D23323 llvm-svn: 278333	2016-08-11 13:49:46 +00:00
Tim Northover	406024a108	GlobalISel: implement simple function calls on AArch64. We're still limited in the arguments we support, but this at least handles the basic cases. llvm-svn: 278293	2016-08-10 21:44:01 +00:00
Changpeng Fang	fb9c3818dd	AMDGPU/SI: Implement amdgcn image intrinsics with sampler Summary: This patch define and implement amdgcn image intrinsics with sampler. 1. define vdata type to be llvm_anyfloat_ty, address type to be llvm_anyfloat_ty, and rsrc type to be llvm_anyint_ty. As a result, we expect the intrinsics name to have three suffixes to overload each of these three types; 2. D128 as well as two other flags are implied in the three types, for example, if you use v8i32 as resource type, then r128 is 0! 3. don't expose TFE flag, and other flags are exposed in the instruction order: unrm, glc, slc, lwe and da. Differential Revision: http://reviews.llvm.org/D22838 Reviewed by: arsenm and tstellarAMD llvm-svn: 278291	2016-08-10 21:15:30 +00:00
Matt Arsenault	61f8ba8b79	AMDGPU: s_setpc_b64 should be an indirect branch llvm-svn: 278278	2016-08-10 19:20:02 +00:00
Matt Arsenault	c6b1350039	AMDGPU: Set sizes on control flow pseudos llvm-svn: 278276	2016-08-10 19:11:51 +00:00
Matt Arsenault	f4af802381	AMDGPU: Remove empty file comment llvm-svn: 278275	2016-08-10 19:11:48 +00:00
Matt Arsenault	11587d97be	AMDGPU: Remove unnecessary cast llvm-svn: 278274	2016-08-10 19:11:45 +00:00
Matt Arsenault	57431c9680	AMDGPU: Change insertion point of si_mask_branch Insert before the skip branch if one is created. This is a somewhat more natural placement relative to the skip branches, and makes it possible to implement analyzeBranch for skip blocks. The test changes are mostly due to a quirk where the block label is not emitted if there is a terminator that is not also a branch. llvm-svn: 278273	2016-08-10 19:11:42 +00:00
Matt Arsenault	b920e9987d	AMDGPU: Use CreateStackObject instead of CreateSpillStackObject I'm not sure what the difference is, but no other target uses this for emergency spill slots. llvm-svn: 278272	2016-08-10 19:11:36 +00:00
Marek Olsak	355a8642b4	AMDGPU/SI: Increase SGPR limit to 96 on Tonga/Iceland Summary: This is the setting of the Vulkan closed source driver. It decreases the max wave count from 10 to 8. 26010 shaders in 14650 tests Totals: VGPRS: 829593 -> 808440 (-2.55 %) Spilled SGPRs: 81878 -> 42226 (-48.43 %) Spilled VGPRs: 367 -> 358 (-2.45 %) Scratch VGPRs: 1764 -> 1748 (-0.91 %) dwords per thread Code Size: 36677864 -> 35923932 (-2.06 %) bytes There is a massive decrease in SGPR spilling in general and -7.4% spilled VGPRs for DiRT Showdown (= SGPRs spilled to scratch?) Reviewers: arsenm, tstellarAMD, nhaehnle Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D23034 llvm-svn: 277867	2016-08-05 21:23:29 +00:00
Yaxun Liu	86c052238a	[OpenCL] Add missing tests for getOCLTypeName Adding missing tests for OCL type names for half, float, double, char, short, long, and unknown. Patch by Aaron En Ye Shi. Differential Revision: https://reviews.llvm.org/D22964 llvm-svn: 277759	2016-08-04 19:45:00 +00:00

1 2 3 4 5 ...

1028 Commits