llvm-project

Commit Graph

Author	SHA1	Message	Date
David Stuttard	82618baa0f	[AMDGPU] Fix for issue in alloca to vector promotion pass Summary: Alloca promotion pass not dealing with non-canonical input Added some additional checks so the pass simply backs-off forms it can't deal with (non-canonical) Also added some test cases in non-canonical form to check that it no longer crashes Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tpr, t-tye Differential Revision: https://reviews.llvm.org/D31710 llvm-svn: 305079	2017-06-09 14:16:22 +00:00
Chandler Carruth	6bda14b313	Sort the remaining #include lines in include/... and lib/.... I did this a long time ago with a janky python script, but now clang-format has built-in support for this. I fed clang-format every line with a #include and let it re-sort things according to the precise LLVM rules for include ordering baked into clang-format these days. I've reverted a number of files where the results of sorting includes isn't healthy. Either places where we have legacy code relying on particular include ordering (where possible, I'll fix these separately) or where we have particular formatting around #include lines that I didn't want to disturb in this patch. This patch is entirely mechanical. If you get merge conflicts or anything, just ignore the changes in this patch and run clang-format over your #include lines in the files. Sorry for any noise here, but it is important to keep these things stable. I was seeing an increasing number of patches with irrelevant re-ordering of #include lines because clang-format was used. This patch at least isolates that churn, makes it easy to skip when resolving conflicts, and gets us to a clean baseline (again). llvm-svn: 304787	2017-06-06 11:49:48 +00:00
Changpeng Fang	1dbace195d	AMDGPU/SI: Move the local memory usage related checking after calling convention checking in PromoteAlloca Summary: Promoting Alloca to Vector and Promoting Alloca to LDS are two independent handling of Alloca and should not affect each other. As a result, we should not give up promoting to vector if there is not enough LDS. This patch factors out the local memory usage related checking out and replace it after the calling convention checking. Reviewer: arsenm Differential Revision: http://reviews.llvm.org/D33139 llvm-svn: 303684	2017-05-23 20:25:41 +00:00
Francis Visoiu Mistrih	8b61764cbb	[LegacyPassManager] Remove TargetMachine constructors This provides a new way to access the TargetMachine through TargetPassConfig, as a dependency. The patterns replaced here are: * Passes handling a null TargetMachine call `getAnalysisIfAvailable<TargetPassConfig>`. * Passes not handling a null TargetMachine `addRequired<TargetPassConfig>` and call `getAnalysis<TargetPassConfig>`. * MachineFunctionPasses now use MF.getTarget(). * Remove all the TargetMachine constructors. * Remove INITIALIZE_TM_PASS. This fixes a crash when running `llc -start-before prologepilog`. PEI needs StackProtector, which gets constructed without a TargetMachine by the pass manager. The StackProtector pass doesn't handle the case where there is no TargetMachine, so it segfaults. Related to PR30324. Differential Revision: https://reviews.llvm.org/D33222 llvm-svn: 303360	2017-05-18 17:21:13 +00:00
Changpeng Fang	161e8c39af	AMDGPU/SI: Don't promote to vector if the load/store is volatile. Summary: We should not change volatile loads/stores in promoting alloca to vector. Reviewers: arsenm Differential Revision: http://reviews.llvm.org/D33107 llvm-svn: 302943	2017-05-12 20:31:12 +00:00
Matt Arsenault	5c80618fb7	AMDGPU: Don't promote alloca to LDS for leaf functions LDS use in leaf functions not currently handled. llvm-svn: 301958	2017-05-02 18:33:18 +00:00
Stanislav Mekhanoshin	c90347d760	[AMDGPU] Generate range metadata for workitem id If workgroup size is known inform llvm about range returned by local id and local size queries. Differential Revision: https://reviews.llvm.org/D31804 llvm-svn: 300102	2017-04-12 20:48:56 +00:00
Yaxun Liu	1a14bfa022	[AMDGPU] Get address space mapping by target triple environment As we introduced target triple environment amdgiz and amdgizcl, the address space values are no longer enums. We have to decide the value by target triple. The basic idea is to use struct AMDGPUAS to represent address space values. For address space values which are not depend on target triple, use static const members, so that they don't occupy extra memory space and is equivalent to a compile time constant. Since the struct is lightweight and cheap, it can be created on the fly at the point of usage. Or it can be added as member to a pass and created at the beginning of the run* function. Differential Revision: https://reviews.llvm.org/D31284 llvm-svn: 298846	2017-03-27 14:04:01 +00:00
George Burgess IV	56c7e88c2c	Let llvm.objectsize be conservative with null pointers This adds a parameter to @llvm.objectsize that makes it return conservative values if it's given null. This fixes PR23277. Differential Revision: https://reviews.llvm.org/D28494 llvm-svn: 298430	2017-03-21 20:08:59 +00:00
Reid Kleckner	b518054b87	Rename AttributeSet to AttributeList Summary: This class is a list of AttributeSetNodes corresponding the function prototype of a call or function declaration. This class used to be called ParamAttrListPtr, then AttrListPtr, then AttributeSet. It is typically accessed by parameter and return value index, so "AttributeList" seems like a more intuitive name. Rename AttributeSetImpl to AttributeListImpl to follow suit. It's useful to rename this class so that we can rename AttributeSetNode to AttributeSet later. AttributeSet is the set of attributes that apply to a single function, argument, or return value. Reviewers: sanjoy, javed.absar, chandlerc, pete Reviewed By: pete Subscribers: pete, jholewinski, arsenm, dschuff, mehdi_amini, jfb, nhaehnle, sbc100, void, llvm-commits Differential Revision: https://reviews.llvm.org/D31102 llvm-svn: 298393	2017-03-21 16:57:19 +00:00
Stanislav Mekhanoshin	2b913b1f49	[AMDGPU] Account workgroup size in LDS occupancy limits Functions matching LDS use to occupancy return results for a workgroup of 64 workitems. The numbers has to be adjusted for bigger workgroups. For example a workgroup of size 256 already occupies 4 waves just by itself. Given that all numbers of LDS use in the compiler are per workgroup, occupancy shall be multiplied by 4 in this case. Each 64 workitems still limited by the same number, but 4 subrgoups 64 workitems each can afford 4 times more LDS to get the same occupancy. In addition change initializes LDS size in the subtarget to a real value for SI+ targets. This is required since LDS size is a variable in these calculations. Differential Revision: https://reviews.llvm.org/D29423 llvm-svn: 293837	2017-02-01 22:59:50 +00:00
Matthias Braun	8c209aa877	Cleanup dump() functions. We had various variants of defining dump() functions in LLVM. Normalize them (this should just consistently implement the things discussed in http://lists.llvm.org/pipermail/cfe-dev/2014-January/034323.html For reference: - Public headers should just declare the dump() method but not use LLVM_DUMP_METHOD or #if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP) - The definition of a dump method should look like this: #if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP) LLVM_DUMP_METHOD void MyClass::dump() { // print stuff to dbgs()... } #endif llvm-svn: 293359	2017-01-28 02:02:38 +00:00
Changpeng Fang	c85abbd955	AMDGPU/SI: Give up in promote alloca when a pointer may be captured. Differential Revision: http://reviews.llvm.org/D28970 Reviewer: Matt llvm-svn: 292966	2017-01-24 19:06:28 +00:00
Eugene Zelenko	734bb7bb09	[AMDGPU] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC). llvm-svn: 292623	2017-01-20 17:52:16 +00:00
Matt Arsenault	2402b95db0	AMDGPU: Fix AMDGPUPromoteAlloca breaking addrspacecasts The users of the addrspacecast were having their types incorrectly changed, producing invalid bitcasts between address spaces. llvm-svn: 289307	2016-12-10 00:52:50 +00:00
Mehdi Amini	117296c0a0	Use StringRef in Pass/PassManager APIs (NFC) llvm-svn: 283004	2016-10-01 02:56:57 +00:00
Konstantin Zhuravlyov	1d65026ca6	[AMDGPU] Wave and register controls - Implemented amdgpu-flat-work-group-size attribute - Implemented amdgpu-num-active-waves-per-eu attribute - Implemented amdgpu-num-sgpr attribute - Implemented amdgpu-num-vgpr attribute - Dynamic LDS constraints are in a separate patch Patch by Tom Stellard and Konstantin Zhuravlyov Differential Revision: https://reviews.llvm.org/D21562 llvm-svn: 280747	2016-09-06 20:22:28 +00:00
David Majnemer	0d955d0bf5	Use the range variant of find instead of unpacking begin/end If the result of the find is only used to compare against end(), just use is_contained instead. No functionality change is intended. llvm-svn: 278433	2016-08-11 22:21:41 +00:00
Matt Arsenault	210b7cf3e2	AMDGPU: Remove pointless dyn_cast_or_null This is already casted above so non-null llvm-svn: 275881	2016-07-18 19:00:07 +00:00
Matt Arsenault	efb24540b1	AMDGPU: Remove dead check in AMDGPUPromoteAlloca This is currently only called with GEP users. A direct alloca would only happen with current typed pointers for arrays which are a perverse case. Also fix crashes on 0 x and 1 x arrays. llvm-svn: 275869	2016-07-18 18:34:53 +00:00
Matt Arsenault	2e08e181a7	AMDGPU: Remove dead code and redundant check Non intrinsic calls aren't really handled, and this IntrinsicInst dyn_cast checks for the function for us. llvm-svn: 275868	2016-07-18 18:34:48 +00:00
Nicolai Haehnle	bef1ceb815	AMDGPU: Disable AMDGPUPromoteAlloca pass for shader calling conventions. Summary: The work item intrinsics are not available for the shader calling conventions. And even if we did hook them up most shader stages haves some extra restrictions on the amount of available LDS. Reviewers: tstellarAMD, arsenm Subscribers: nhaehnle, arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D20728 llvm-svn: 275779	2016-07-18 09:02:47 +00:00
Matt Arsenault	03d8584590	AMDGPU: Move subtarget feature checks into passes llvm-svn: 273937	2016-06-27 20:32:13 +00:00
Peter Collingbourne	96efdd6107	IR: Introduce local_unnamed_addr attribute. If a local_unnamed_addr attribute is attached to a global, the address is known to be insignificant within the module. It is distinct from the existing unnamed_addr attribute in that it only describes a local property of the module rather than a global property of the symbol. This attribute is intended to be used by the code generator and LTO to allow the linker to decide whether the global needs to be in the symbol table. It is possible to exclude a global from the symbol table if three things are true: - This attribute is present on every instance of the global (which means that the normal rule that the global must have a unique address can be broken without being observable by the program by performing comparisons against the global's address) - The global has linkonce_odr linkage (which means that each linkage unit must have its own copy of the global if it requires one, and the copy in each linkage unit must be the same) - It is a constant or a function (which means that the program cannot observe that the unique-address rule has been broken by writing to the global) Although this attribute could in principle be computed from the module contents, LTO clients (i.e. linkers) will normally need to be able to compute this property as part of symbol resolution, and it would be inefficient to materialize every module just to compute it. See: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160509/356401.html http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160516/356738.html for earlier discussion. Part of the fix for PR27553. Differential Revision: http://reviews.llvm.org/D20348 llvm-svn: 272709	2016-06-14 21:01:22 +00:00
Matt Arsenault	c438ef574d	AMDGPU: Fix promote alloca for pointer loads If the load has a pointer type, we don't want to change its type. llvm-svn: 270000	2016-05-18 23:20:24 +00:00
Matt Arsenault	891fccc0c1	AMDGPU: Handle alloca promoting with null operands If the second pointer in a multi-pointer instruction is a constant, we can replace the type. llvm-svn: 269945	2016-05-18 15:57:21 +00:00
Matt Arsenault	8a028bf4d7	AMDGPU: Fix promote alloca pass creating huge arrays This was assuming it could use all memory before, which is a bad decision because it restricts occupancy. By default, only try to use enough space that could reduce occupancy to 7, an arbitrarily chosen limit. Based on the exist LDS usage, try to round up to the limit in the current tier instead of further hurting occupancy. This isn't ideal, because it doesn't accurately know how much space is going to be used for alignment padding. llvm-svn: 269708	2016-05-16 21:19:59 +00:00
Matt Arsenault	a61cb48dd2	AMDGPU: Fix breaking IR on instructions with multiple pointer operands The promote alloca pass would attempt to promote an alloca with a select, icmp, or phi user, even though the other operand was from a non-promotable source, producing a select on two different pointer types. Only do this if we know that both operands derive from the same alloca. In the future we should be able to relax this to an alloca which will also be promoted. llvm-svn: 269265	2016-05-12 01:58:58 +00:00
Matt Arsenault	c5fce69031	AMDGPU: Fix mishandling array allocations when promoting alloca The canonical form for allocas is a single allocation of the array type. In case we see a non-canonical array alloca, make sure we aren't replacing this with an array N times smaller. llvm-svn: 267916	2016-04-28 18:38:48 +00:00
Matt Arsenault	0547b016b1	AMDGPU: Account for globals in AMDGPUPromoteAlloca pass Patch by Bas Nieuwenhuizen llvm-svn: 267791	2016-04-27 21:05:08 +00:00
Andrew Kaylor	7de74af929	Add optimization bisect opt-in calls for AMDGPU passes Differential Revision: http://reviews.llvm.org/D19450 llvm-svn: 267485	2016-04-25 22:23:44 +00:00
Tom Stellard	79a1fd718c	AMDGPU: allow specifying a workgroup size that needs to fit in a compute unit Summary: For GL_ARB_compute_shader we need to support workgroup sizes of at least 1024. However, if we want to allow large workgroup sizes, we may need to use less registers, as we have to run more waves per SIMD. This patch adds an attribute to specify the maximum work group size the compiled program needs to support. It defaults, to 256, as that has no wave restrictions. Reducing the number of registers available is done similarly to how the registers were reserved for chips with the sgpr init bug. Reviewers: mareko, arsenm, tstellarAMD, nhaehnle Subscribers: FireBurn, kerberizer, llvm-commits, arsenm Differential Revision: http://reviews.llvm.org/D18340 Patch By: Bas Nieuwenhuizen llvm-svn: 266337	2016-04-14 16:27:07 +00:00
Matt Arsenault	0a30e456b4	AMDGPU: Promote alloca should skip volatiles llvm-svn: 264214	2016-03-23 23:17:29 +00:00
Matt Arsenault	bafc9dc591	AMDGPU: Don't use InstVisitor for AMDGPUPromoteAlloca Frontend authors are strongly encouraged to keep allocas in the entry block, so don't bother visiting every instruction in the other blocks of the function. llvm-svn: 263206	2016-03-11 08:20:50 +00:00
Matt Arsenault	56356c8a9c	AMDGPU: Remove a fixme for ptrrtoint handling llvm-svn: 262854	2016-03-07 21:12:46 +00:00
Matt Arsenault	cf84e26fb6	AMDGPU: Preserve alignments on new created globals Also switch to internal linkage, and include the name of the function in the name. llvm-svn: 259911	2016-02-05 19:47:23 +00:00
Matt Arsenault	de4208122b	AMDGPU: Do not promote allocas with non-inbounds GEPs If we can't assume the pointer value isn't within the bounds of the object, it seems risky to try to replace the pointer calculations. llvm-svn: 259573	2016-02-02 21:16:12 +00:00
Matt Arsenault	7e747f1a38	AMDGPU: Handle promoting memmove Also add missing tests for the others. llvm-svn: 259558	2016-02-02 20:28:10 +00:00
Matt Arsenault	8b175672cb	AMDGPU: Skip promote alloca with no optimizations llvm-svn: 259551	2016-02-02 19:32:42 +00:00
Matt Arsenault	fb8cdbae0c	AMDGPU: Minor cleanups for AMDGPUPromoteAlloca Mostly convert to use range loops. llvm-svn: 259550	2016-02-02 19:32:35 +00:00
Matt Arsenault	e5737f7cac	AMDGPU: Report AMDGPUPromoteAlloca changed the function llvm-svn: 259547	2016-02-02 19:18:57 +00:00
Matt Arsenault	ad1348459f	AMDGPU: Whitelist handled intrinsics We shouldn't crash on unhandled intrinsics. Also simplify failure handling in loop. llvm-svn: 259546	2016-02-02 19:18:53 +00:00
Matt Arsenault	853a1fc6d9	AMDGPU: Use inbounds when calculating workitem offset When promoting allocas to LDS, we know we are indexing into a specific area just created, and the calculation will also never overflow. Also emit some of the muls as nsw nuw, because instcombine infers this already from the range metadata. I think putting this on the other adds and muls might be OK too, but I'm not 100% sure. llvm-svn: 259545	2016-02-02 19:18:48 +00:00
Matt Arsenault	e013246462	AMDGPU: Fix emitting invalid workitem intrinsics for HSA The AMDGPUPromoteAlloca pass was emitting the read.local.size calls, which with HSA was incorrectly selected to reading from the offset mesa uses off of the kernarg pointer. Error on intrinsics which aren't supported by HSA, and start emitting the correct IR to read the workgroup size out of the dispatch pointer. Also initialize the pass so it can be tested with opt, and start moving towards not depending on the subtarget as an argument. Start emitting errors for the intrinsics not handled with HSA. llvm-svn: 259297	2016-01-30 05:19:45 +00:00
Matt Arsenault	0b783ef076	AMDGPU: Fix crash with invariant markers The promote alloca pass didn't handle these intrinsics and crashed. These intrinsics should accept any address space, but for now just erase them to avoid breaking. llvm-svn: 258537	2016-01-22 19:47:54 +00:00
Manuel Jacob	5f6eaac611	GlobalValue: use getValueType() instead of getType()->getPointerElementType(). Reviewers: mjacob Subscribers: jholewinski, arsenm, dsanders, dblaikie Patch by Eduard Burtescu. Differential Revision: http://reviews.llvm.org/D16260 llvm-svn: 257999	2016-01-16 20:30:46 +00:00
Pete Cooper	67cf9a723b	Revert "Change memcpy/memset/memmove to have dest and source alignments." This reverts commit r253511. This likely broke the bots in http://lab.llvm.org:8011/builders/clang-ppc64-elf-linux2/builds/20202 http://bb.pgr.jp/builders/clang-3stage-i686-linux/builds/3787 llvm-svn: 253543	2015-11-19 05:56:52 +00:00
Pete Cooper	72bc23ef02	Change memcpy/memset/memmove to have dest and source alignments. Note, this was reviewed (and more details are in) http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html These intrinsics currently have an explicit alignment argument which is required to be a constant integer. It represents the alignment of the source and dest, and so must be the minimum of those. This change allows source and dest to each have their own alignments by using the alignment attribute on their arguments. The alignment argument itself is removed. There are a few places in the code for which the code needs to be checked by an expert as to whether using only src/dest alignment is safe. For those places, they currently take the minimum of src/dest alignments which matches the current behaviour. For example, code which used to read: call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 500, i32 8, i1 false) will now read: call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 8 %dest, i8* align 8 %src, i32 500, i1 false) For out of tree owners, I was able to strip alignment from calls using sed by replacing: (call.llvm\.memset.)i32\ [0-9]\,\ i1 false\) with: $1i1 false) and similarly for memmove and memcpy. I then added back in alignment to test cases which needed it. A similar commit will be made to clang which actually has many differences in alignment as now IRBuilder can generate different source/dest alignments on calls. In IRBuilder itself, a new argument was added. Instead of calling: CreateMemCpy(Dst, Src, getInt64(Size), DstAlign, / isVolatile / false) you now call CreateMemCpy(Dst, Src, getInt64(Size), DstAlign, SrcAlign, / isVolatile */ false) There is a temporary class (IntegerAlignment) which takes the source alignment and rejects implicit conversion from bool. This is to prevent isVolatile here from passing its default parameter to the source alignment. Note, changes in future can now be made to codegen. I didn't change anything here, but this change should enable better memcpy code sequences. Reviewed by Hal Finkel. llvm-svn: 253511	2015-11-18 22:17:24 +00:00
Duncan P. N. Exon Smith	a73371a9b7	AMDGPU: Remove implicit ilist iterator conversions, NFC One of the changes in lib/Target/AMDGPU/AMDGPUMCInstLower.cpp was a new one. Previously, bundle iterators and single-instruction iterators could be compared to each other (comparing on underlying pointers). I changed a comparison from using `MBB->end()` to using `MBB->instr_end()`, since both end iterators should point at the some place anyway. I don't think the implicit conversion between the two iterator types is a good idea since it's fairly easy to accidentally compare to the wrong thing (they aren't always end iterators). Otherwise I would have just added the conversion. Even with that, no there should be functionality change here. llvm-svn: 250218	2015-10-13 20:07:10 +00:00
Matt Arsenault	19c5488015	AMDGPU: Produce error on dynamic_stackalloc llvm-svn: 246048	2015-08-26 18:37:13 +00:00

1 2

54 Commits