llvm-project

Commit Graph

Author	SHA1	Message	Date
Tom Stellard	c7624317d7	AMDGPU: Split AMDGPUTTI into GCNTTI and R600TTI Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D47359 llvm-svn: 333605	2018-05-30 22:55:35 +00:00
Farhana Aleen	eacb1020aa	[AMDGPU] Re-enabled 128bit wide-vector generation for local addr space by default. Summary: Bug reported here https://bugs.freedesktop.org/show_bug.cgi?id=105464 found to be resolved by some other fixes. Author: FarhanaAleen llvm-svn: 333380	2018-05-28 18:15:11 +00:00
Nicola Zaghen	d34e60ca85	Rename DEBUG macro to LLVM_DEBUG. The DEBUG() macro is very generic so it might clash with other projects. The renaming was done as follows: - git grep -l 'DEBUG' \| xargs sed -i 's/\bDEBUG\s\?(/LLVM_DEBUG(/g' - git diff -U0 master \| ../clang/tools/clang-format/clang-format-diff.py -i -p1 -style LLVM - Manual change to APInt - Manually chage DOCS as regex doesn't match it. In the transition period the DEBUG() macro is still present and aliased to the LLVM_DEBUG() one. Differential Revision: https://reviews.llvm.org/D43624 llvm-svn: 332240	2018-05-14 12:53:11 +00:00
Farhana Aleen	e24f3ff8de	[AMDGPU] Support horizontal vectorization of min/max. Author: FarhanaAleen Reviewed By: rampitec Subscribers: AMDGPU Differential Revision: https://reviews.llvm.org/D46604 llvm-svn: 331920	2018-05-09 21:18:34 +00:00
Farhana Aleen	e2dfe8a853	[AMDGPU] Support horizontal vectorization. Author: FarhanaAleen Reviewed By: rampitec, arsenm Subscribers: llvm-commits, AMDGPU Differential Revision: https://reviews.llvm.org/D46213 llvm-svn: 331313	2018-05-01 21:41:12 +00:00
Marek Olsak	a9a58fa236	AMDGPU: enable 128-bit for local addr space under an option Author: Samuel Pitoiset ds_read_b128 and ds_write_b128 have been recently enabled under the amdgpu-ds128 option because the performance benefit is unclear. Though, using 128-bit loads/stores for the local address space appears to introduce regressions in tessellation shaders. Not sure what is broken, but as ds_read_b128/ds_write_b128 are not enabled by default, just introduce a global option and enable 128-bit only if requested (until it's fixed/used correctly). v2: - fix regressions in merge-stores.ll and multiple_tails.ll Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105464 llvm-svn: 329764	2018-04-10 22:48:23 +00:00
Alex Shlyapnikov	79f2c720b5	Revert "AMDGPU: enable 128-bit for local addr space under an option" This reverts commit r329591. It breaks various bots: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/16516 http://lab.llvm.org:8011/builders/clang-ppc64be-linux/builds/17374 http://lab.llvm.org:8011/builders/clang-ppc64le-linux/builds/15992 http://lab.llvm.org:8011/builders/clang-ppc64be-linux-lnt http://lab.llvm.org:8011/builders/clang-ppc64le-linux-lnt/builds/11251 ... llvm-svn: 329610	2018-04-09 19:47:38 +00:00
Marek Olsak	52b033b827	AMDGPU: enable 128-bit for local addr space under an option Author: Samuel Pitoiset ds_read_b128 and ds_write_b128 have been recently enabled under the amdgpu-ds128 option because the performance benefit is unclear. Though, using 128-bit loads/stores for the local address space appears to introduce regressions in tessellation shaders. Not sure what is broken, but as ds_read_b128/ds_write_b128 are not enabled by default, just introduce a global option and enable 128-bit only if requested (until it's fixed/used correctly). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105464 llvm-svn: 329591	2018-04-09 16:56:32 +00:00
Craig Topper	2fa1436206	[IR][CodeGen] Remove dependency on EVT from IR/Function.cpp. Move EVT to CodeGen layer. Currently EVT is in the IR layer only because of Function.cpp needing a very small piece of the functionality of EVT::getEVTString(). The rest of EVT is used in codegen making CodeGen a better place for it. The previous code converted a Type* to EVT and then called getEVTString. This was only expected to handle the primitive types from Type*. Since there only a few primitive types, we can just print them as strings directly. Differential Revision: https://reviews.llvm.org/D45017 llvm-svn: 328806	2018-03-29 17:21:10 +00:00
David Blaikie	36a0f226b1	Fix layering by moving ValueTypes.h from CodeGen to IR ValueTypes.h is implemented in IR already. llvm-svn: 328397	2018-03-23 23:58:31 +00:00
David Blaikie	13e77db2df	Fix layering of MachineValueType.h by moving it from CodeGen to Support This is used by llvm tblgen as well as by LLVM Targets, so the only common place is Support for now. (maybe we need another target for these sorts of things - but for now I'm at least making them correct & we can make them better if/when people have strong feelings) llvm-svn: 328395	2018-03-23 23:58:25 +00:00
Farhana Aleen	a7cb31123c	[AMDGPU] Supported ds_read_b128 generation; Widened vector length for local address-space. Summary: Starting from GCN 2nd generation, ISA supports ds_read_b128 on top of ds_read_b64. This patch supports ds_read_b128 instruction pattern and generation of this instruction. In the vectorizer, this patch also widen the vector length so that vectorizer generates 128 bit loads for local address-space which gets translated to ds_read_b128. Since the performance benefit is not clear; compiler generates ds_read_b128 under -amdgpu-ds128. Author: FarhanaAleen Reviewed By: rampitec, arsenm Subscribers: llvm-commits, AMDGPU Differential Revision: https://reviews.llvm.org/D44210 llvm-svn: 327153	2018-03-09 17:41:39 +00:00
Farhana Aleen	89196642f7	[AMDGPU] Increased vector length for global/constant loads. Summary: GCN ISA supports instructions that can read 16 consecutive dwords from memory through the scalar data cache; loadstoreVectorizer should take advantage of the wider vector length and pack 16/8 elements of dwords/quadwords. Author: FarhanaAleen Reviewed By: rampitec Subscribers: llvm-commits, AMDGPU Differential Revision: https://reviews.llvm.org/D44179 llvm-svn: 326910	2018-03-07 17:09:18 +00:00
Farhana Aleen	347d12b4ce	Revert "[AMDGPU] Widened vector length for global/constant address space." This reverts commit ce988cc100dc65e7c6c727aff31ceb99231cab03. llvm-svn: 326907	2018-03-07 16:55:27 +00:00
Farhana Aleen	0d03d0588d	[AMDGPU] Widened vector length for global/constant address space. llvm-svn: 326904	2018-03-07 16:29:05 +00:00
Alexander Timofeev	2e5eeceeb7	Pass Divergence Analysis data to Selection DAG to drive divergence dependent instruction selection. Differential revision: https://reviews.llvm.org/D35267 llvm-svn: 326703	2018-03-05 15:12:21 +00:00
Konstantin Zhuravlyov	5c1237a1fd	Revert "[AMDGPU] Increased vector length for global/constant loads." https://reviews.llvm.org/rL325518 It breaks following OpenCL conformance tests: - Basic - parameter_types - Basic - vload_private llvm-svn: 325643	2018-02-20 23:30:21 +00:00
Mark Searles	419bdab759	[AMDGPU] Increased vector length for global/constant loads. Summary: GCN ISA supports instructions that can read 16 consecutive dwords from memory through the scalar data cache; loadstoreVectorizer should take advantage of the wider vector length and pack 16/8 elements of dwords/quadwords. Author: FarhanaAleen Reviewed By: rampitec Subscribers: llvm-commits, AMDGPU Differential Revision: https://reviews.llvm.org/D43275 llvm-svn: 325518	2018-02-19 16:42:49 +00:00
Matt Arsenault	923712b6b5	Reapply "AMDGPU: Add 32-bit constant address space" This reverts r324494 and reapplies r324487. llvm-svn: 324747	2018-02-09 16:57:57 +00:00
Rafael Espindola	f4e3f3e31c	Revert "AMDGPU: Add 32-bit constant address space" This reverts commit r324487. It broke clang tests. llvm-svn: 324494	2018-02-07 18:09:35 +00:00
Marek Olsak	871c30e540	AMDGPU: Add 32-bit constant address space Note: This is a candidate for LLVM 6.0, because it was planned to be in that release but was delayed due to a long review period. Merge conflict in release_60 - resolution: Add "-p6:32:32" into the second (non-amdgiz) string. Only scalar loads support 32-bit pointers. An address in a VGPR will fail to compile. That's OK because the results of loads will only be used in places where VGPRs are forbidden. Updated AMDGPUAliasAnalysis and used SReg_64_XEXEC. The tests cover all uses cases we need for Mesa. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D41651 llvm-svn: 324487	2018-02-07 16:01:00 +00:00
Daniil Fukalov	6e1dc68117	[AMDGPU] fix LDS f32 intrinsics - using qualified pointer addrspace in intrinsics class to avoid .f32 mangling - changed too common atomic mangling to ds - added missing intrinsics to AMDGPUTTIImpl::getTgtMemIntrinsic Reviewed by: b-sumner Differential Revision: https://reviews.llvm.org/D42383 llvm-svn: 323516	2018-01-26 11:09:38 +00:00
Daniil Fukalov	d5fca554e2	[AMDGPU] add LDS f32 intrinsics added llvm.amdgcn.atomic.{add\|min\|max}.f32 intrinsics to allow generate ds_{add\|min\|max}[_rtn]_f32 instructions needed for OpenCL float atomics in LDS Reviewed by: arsenm Differential Revision: https://reviews.llvm.org/D37985 llvm-svn: 322656	2018-01-17 14:05:05 +00:00
Matt Arsenault	3e268cc0dd	LSR: Check more intrinsic pointer operands llvm-svn: 320424	2017-12-11 21:38:43 +00:00
Tim Renouf	ef1ae8ffac	[AMDGPU] calling conventions for AMDPAL OS type Summary: This commit adds comments on how the AMDPAL OS type overloads the existing AMDGPU_ calling conventions used by Mesa, and adds a couple of new ones. Reviewers: arsenm, nhaehnle, dstuttard Subscribers: mehdi_amini, kzhuravl, wdng, yaxunl, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D37752 llvm-svn: 314502	2017-09-29 09:51:22 +00:00
Matt Arsenault	376f1bd73c	AMDGPU: Don't assert in TTI with fp32 denorms enabled Also refine for f16 and rcp cases. llvm-svn: 312213	2017-08-31 05:47:00 +00:00
Eugene Zelenko	d16eff816b	[AMDGPU] Fix some Clang-tidy modernize-use-using and Include What You Use warnings; other minor fixes (NFC). llvm-svn: 310429	2017-08-08 23:53:55 +00:00
Matt Arsenault	aac47c1c00	AMDGPU: Use a custom areInlineCompatible Fixes not inlining OpenCL library functions on AMDGPU, which don't have an explicitly set target-cpu. llvm-svn: 310269	2017-08-07 17:08:44 +00:00
Geoff Berry	66d9bdbca8	[LoopUnroll] Pass SCEV to getUnrollingPreferences hook. NFCI. Reviewers: sanjoy, anna, reames, apilipenko, igor-laevsky, mkuper Subscribers: jholewinski, arsenm, mzolotukhin, nemanjai, nhaehnle, javed.absar, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D34531 llvm-svn: 306554	2017-06-28 15:53:17 +00:00
Matt Arsenault	67cd347e93	AMDGPU: Allow vectorization of packed types llvm-svn: 305844	2017-06-20 20:38:06 +00:00
Alexander Timofeev	0f9c84cd93	DivergencyAnalysis patch for review llvm-svn: 305494	2017-06-15 19:33:10 +00:00
Daniel Neilson	c0112ae8da	Const correctness for TTI::getRegisterBitWidth Summary: The method TargetTransformInfo::getRegisterBitWidth() is declared const, but the type erasing implementation classes (TargetTransformInfo::Concept & TargetTransformInfo::Model) that were introduced by Chandler in https://reviews.llvm.org/D7293 do not have the method declared const. This is an NFC to tidy up the const consistency between TTI and its implementation. Reviewers: chandlerc, rnk, reames Reviewed By: reames Subscribers: reames, jfb, arsenm, dschuff, nemanjai, nhaehnle, javed.absar, sbc100, jgravelle-google, llvm-commits Differential Revision: https://reviews.llvm.org/D33903 llvm-svn: 305189	2017-06-12 14:22:21 +00:00
Chandler Carruth	6bda14b313	Sort the remaining #include lines in include/... and lib/.... I did this a long time ago with a janky python script, but now clang-format has built-in support for this. I fed clang-format every line with a #include and let it re-sort things according to the precise LLVM rules for include ordering baked into clang-format these days. I've reverted a number of files where the results of sorting includes isn't healthy. Either places where we have legacy code relying on particular include ordering (where possible, I'll fix these separately) or where we have particular formatting around #include lines that I didn't want to disturb in this patch. This patch is entirely mechanical. If you get merge conflicts or anything, just ignore the changes in this patch and run clang-format over your #include lines in the files. Sorry for any noise here, but it is important to keep these things stable. I was seeing an increasing number of patches with irrelevant re-ordering of #include lines because clang-format was used. This patch at least isolates that churn, makes it easy to skip when resolving conflicts, and gets us to a clean baseline (again). llvm-svn: 304787	2017-06-06 11:49:48 +00:00
Matt Arsenault	3c5e4237c6	AMDGPU: Make some packed shuffles free VOP3P instructions can encode access to either half of the register. llvm-svn: 302730	2017-05-10 21:29:33 +00:00
Marek Olsak	a302a736ec	AMDGPU: Add AMDGPU_HS calling convention Reviewers: arsenm, nhaehnle Subscribers: mehdi_amini, kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D32644 llvm-svn: 301930	2017-05-02 15:41:10 +00:00
Matt Arsenault	4c1ecded63	AMDGPU: Change DivergenceAnalysis for function arguments Stop assuming all functions are kernels. llvm-svn: 300719	2017-04-19 17:42:34 +00:00
Reid Kleckner	f021fab2af	[IR] Make getParamAttributes take argument numbers, not ArgNo+1 Add hasParamAttribute() and use it instead of hasAttribute(ArgNo+1, Kind) everywhere. The fact that the AttributeList index for an argument is ArgNo+1 should be a hidden implementation detail. NFC llvm-svn: 300272	2017-04-13 23:12:13 +00:00
Stanislav Mekhanoshin	478b81982f	[AMDGPU] Unroll more to eliminate phis and conditions Increase threshold to unroll a loop which contains an "if" statement whose condition defined by a PHI belonging to the loop. This may help to eliminate if region and potentially even PHI itself, saving on both divergence and registers used for the PHI. Add a small bonus for each of such "if" statements. Differential Revision: https://reviews.llvm.org/D31693 llvm-svn: 299779	2017-04-07 16:26:28 +00:00
Stanislav Mekhanoshin	baf31ac7c8	[AMDGPU] Boost unroll threshold for loops reading local memory This is less important than increase threshold for private memory, but still brings performance improvements in a wide range of tests. Unrolling more for local memory serves three purposes: it allows to combine ds operations if offset becomes static, saves registers used for offsets in case of static offsets, and allows better lds latency hiding. Differential Revision: https://reviews.llvm.org/D31412 llvm-svn: 298948	2017-03-28 22:13:51 +00:00
Yaxun Liu	1a14bfa022	[AMDGPU] Get address space mapping by target triple environment As we introduced target triple environment amdgiz and amdgizcl, the address space values are no longer enums. We have to decide the value by target triple. The basic idea is to use struct AMDGPUAS to represent address space values. For address space values which are not depend on target triple, use static const members, so that they don't occupy extra memory space and is equivalent to a compile time constant. Since the struct is lightweight and cheap, it can be created on the fly at the point of usage. Or it can be added as member to a pass and created at the beginning of the run* function. Differential Revision: https://reviews.llvm.org/D31284 llvm-svn: 298846	2017-03-27 14:04:01 +00:00
Changpeng Fang	1be9b9f816	AMDGPU/SI: Disable unrolling in the loop vectorizer if the loop is not vectorized. Reviewers: arsenm Differential Revision: http://reviews.llvm.org/D30719 llvm-svn: 297328	2017-03-09 00:07:00 +00:00
Matt Arsenault	f0a88dbaab	LoadStoreVectorizer: Split even sized illegal chains properly Implement isLegalToVectorizeLoadChain for AMDGPU to avoid producing private address spaces accesses that will need to be split up later. This was doing the wrong thing in the case where the queried chain was an even number of elements. A possible <4 x i32> store was being split into store <2 x i32> store i32 store i32 rather than store <2 x i32> store <2 x i32> when legal. llvm-svn: 295933	2017-02-23 03:58:53 +00:00
Matt Arsenault	d2c8a337aa	AMDGPU: Remove SI_fs_constant and SI_fs_interp intrinsics Update test uses with expansion in terms of new intrinsics. llvm-svn: 295269	2017-02-16 02:01:13 +00:00
Stanislav Mekhanoshin	81db53109d	[AMDGPU] Bump -amdgpu-unroll-threshold-private to 2000 This has quite positive performance impact according to measurements. Before previous fixes to limit the optimization that was too high and blowed compile time and scratch usage, but now this is gone and we can bump the threshold. Differential Revision: https://reviews.llvm.org/D29505 llvm-svn: 294032	2017-02-03 20:08:29 +00:00
Matt Arsenault	d9cd736585	AMDGPU: Don't unroll for private with dynamic allocas This won't be elimnated, so this will just bloat code if/when these are ever used/supported. llvm-svn: 294030	2017-02-03 19:36:00 +00:00
Stanislav Mekhanoshin	f29602df65	[AMDGPU] Unroll preferences improvements Exit loop analysis early if suitable private access found. Do not account for GEPs which are invariant to loop induction variable. Do not account for Allocas which are too big to fit into register file anyway. Add option for tuning: -amdgpu-unroll-threshold-private. Differential Revision: https://reviews.llvm.org/D29473 llvm-svn: 293991	2017-02-03 02:20:05 +00:00
Matt Arsenault	41c1499504	AMDGPU: Fix atomic_inc/atomic_dec + ds_swizzle not being divergent llvm-svn: 293504	2017-01-30 17:09:47 +00:00
Mohammed Agabaria	2c96c43388	[X86] updating TTI costs for arithmetic instructions on X86\SLM arch. updated instructions: pmulld, pmullw, pmulhw, mulsd, mulps, mulpd, divss, divps, divsd, divpd, addpd and subpd. special optimization case which replaces pmulld with pmullw\pmulhw\pshuf seq. In case if the real operands bitwidth <= 16. Differential Revision: https://reviews.llvm.org/D28104 llvm-svn: 291657	2017-01-11 08:23:37 +00:00
Nicolai Haehnle	f45ea4bbc5	AMDGPU: llvm.amdgcn.interp.mov is a source of divergence Summary: While the result is constant across a single primitive, each pixel shader wave can have pixels from multiple primitives. Reviewers: tstellarAMD, arsenm Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D27572 llvm-svn: 289447	2016-12-12 16:52:19 +00:00
Volkan Keles	1c38681ae6	Add new target hooks for LoadStoreVectorizer Summary: Added 6 new target hooks for the vectorizer in order to filter types, handle size constraints and decide how to split chains. Reviewers: tstellarAMD, arsenm Subscribers: arsenm, mzolotukhin, wdng, llvm-commits, nhaehnle Differential Revision: https://reviews.llvm.org/D24727 llvm-svn: 283099	2016-10-03 10:31:34 +00:00

1 2

69 Commits