llvm-project

Commit Graph

Author	SHA1	Message	Date
Jay Foad	830ed64ccd	Revert "Revert "[AMDGPU] Reorganize GCN subtarget features for unaligned access"" This reverts commit `8b08fa0103`. The underlying problems were fixed by D90607.	2020-11-11 14:40:14 +00:00
Sebastian Neubauer	a022b1ccd8	[AMDGPU] Add amdgpu_gfx calling convention Add a calling convention called amdgpu_gfx for real function calls within graphics shaders. For the moment, this uses the same calling convention as other calls in amdgpu, with registers excluded for return address, stack pointer and stack buffer descriptor. Differential Revision: https://reviews.llvm.org/D88540	2020-11-09 16:51:44 +01:00
dfukalov	9068c20965	[AMDGPU][CostModel] Refine cost model for half- and quarter-rate instructions. 1. Throughput and codesize costs estimations was separated and updated. 2. Updated fdiv cost estimation for different cases. 3. Added scalarization processing for types that are treated as !isSimple() to improve codesize estimation in getArithmeticInstrCost() and getArithmeticInstrCost(). The code was borrowed from TCK_RecipThroughput path of base implementation. Next step is unify scalarization part in base class that is currently works for TCK_RecipThroughput path only. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D89973	2020-10-24 19:53:08 +03:00
Jay Foad	958130dfda	[AMDGPU] Add simplification/combines for llvm.amdgcn.fma.legacy This follows on from D89558 which added the new intrinsic and D88955 which added similar combines for llvm.amdgcn.fmul.legacy. Differential Revision: https://reviews.llvm.org/D90028	2020-10-23 16:16:13 +01:00
Konstantin Zhuravlyov	3fdf3b1539	AMDGPU: Update AMDHSA code object version handling Differential Revision: https://reviews.llvm.org/D89076	2020-10-14 13:04:27 -04:00
Mirko Brkusanin	8b08fa0103	Revert "[AMDGPU] Reorganize GCN subtarget features for unaligned access" This reverts commit `f5cd7ec9f3`. Certain rocPRIM/rocThrust/hipCUB tests were failing because of this change.	2020-09-29 15:33:34 +02:00
Mirko Brkusanin	f5cd7ec9f3	[AMDGPU] Reorganize GCN subtarget features for unaligned access Features UnalignedBufferAccess and UnalignedDSAccess are now used to determine whether hardware supports such access. UnalignedAccessMode should be used to enable them. hasUnalignedBufferAccessEnabled() and hasUnalignedDSAccessEnabled() can be now used to quickly check both. Differential Revision: https://reviews.llvm.org/D84522	2020-08-21 12:26:31 +02:00
Mirko Brkusanin	5bd1febe21	[AMDGPU] Fix alignment requirements for 96bit and 128bit local loads and stores Adjust alignment requirements for ds_read/write_b96/b128. GFX9 and onwards allow misaligned access for reads and writes but only if SH_MEM_CONFIG.alignment_mode allows it. UnalignedDSAccess is set on GCN subtargets from GFX9 onward to let us know if we can relax alignment requirements. UnalignedAccessMode acts similary to UnalignedBufferAccess for DS instructions but only from GFX9 onward and is supposed to match alignment_mode. By default alignment of 4 is required. Differential Revision: https://reviews.llvm.org/D82788	2020-08-21 12:26:31 +02:00
dfukalov	4ccc38813e	[AMDGPU][CostModel] Add f16, f64 and contract cases to fused costs estimation. Add cases of fused fmul+fadd/fsub with f16 and f64 operands to cost model. Also added operations with contract attribute. Fixed line endings in test. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D84995	2020-08-06 21:43:27 +03:00
Sebastian Neubauer	2a6c871596	[InstCombine] Move target-specific inst combining For a long time, the InstCombine pass handled target specific intrinsics. Having target specific code in general passes was noted as an area for improvement for a long time. D81728 moves most target specific code out of the InstCombine pass. Applying the target specific combinations in an extra pass would probably result in inferior optimizations compared to the current fixed-point iteration, therefore the InstCombine pass resorts to newly introduced functions in the TargetTransformInfo when it encounters unknown intrinsics. The patch should not have any effect on generated code (under the assumption that code never uses intrinsics from a foreign target). This introduces three new functions: TargetTransformInfo::instCombineIntrinsic TargetTransformInfo::simplifyDemandedUseBitsIntrinsic TargetTransformInfo::simplifyDemandedVectorEltsIntrinsic A few target specific parts are left in the InstCombine folder, where it makes sense to share code. The largest left-over part in InstCombineCalls.cpp is the code shared between arm and aarch64. This allows to move about 3000 lines out from InstCombine to the targets. Differential Revision: https://reviews.llvm.org/D81728	2020-07-22 15:59:49 +02:00
Sidharth Baveja	e541e1b757	[NFC] Separate Peeling Properties into its own struct (re-land after minor fix) Summary: This patch separates the peeling specific parameters from the UnrollingPreferences, and creates a new struct called PeelingPreferences. Functions which used the UnrollingPreferences struct for peeling have been updated to use the PeelingPreferences struct. Author: sidbav (Sidharth Baveja) Reviewers: Whitney (Whitney Tsang), Meinersbur (Michael Kruse), skatkov (Serguei Katkov), ashlykov (Arkady Shlykov), bogner (Justin Bogner), hfinkel (Hal Finkel), anhtuyen (Anh Tuyen Tran), nikic (Nikita Popov) Reviewed By: Meinersbur (Michael Kruse) Subscribers: fhahn (Florian Hahn), hiraditya (Aditya Kumar), llvm-commits, LLVM Tag: LLVM Differential Revision: https://reviews.llvm.org/D80580	2020-07-10 18:39:30 +00:00
Stanislav Mekhanoshin	77f8f813a9	[AMDGPU] Return restricted number of regs from TTI This is practically NFC at the moment because nothing really asks the real number or does anything useful with it. Differential Revision: https://reviews.llvm.org/D82202	2020-07-09 14:31:28 -07:00
Nikita Popov	0b39d2d752	Revert "[NFC] Separate Peeling Properties into its own struct" This reverts commit `0369dc98f9`. Many failing tests.	2020-07-08 21:43:32 +02:00
Sidharth Baveja	0369dc98f9	[NFC] Separate Peeling Properties into its own struct Summary: This patch makes the peeling properties of the loop accessible by other loop transformations. Author: sidbav (Sidharth Baveja) Reviewers: Whitney (Whitney Tsang), Meinersbur (Michael Kruse), skatkov (Serguei Katkov), ashlykov (Arkady Shlykov), bogner (Justin Bogner), hfinkel (Hal Finkel) Reviewed By: Meinersbur (Michael Kruse) Subscribers: fhahn (Florian Hahn), hiraditya (Aditya Kumar), llvm-commits, LLVM Tag: LLVM Differential Revision: https://reviews.llvm.org/D80580	2020-07-08 18:59:59 +00:00
Anh Tuyen Tran	6965af43e6	Revert "[NFC] Separate Peeling Properties into its own struct" This reverts commit `fead250b43`.	2020-07-08 18:58:05 +00:00
Anh Tuyen Tran	fead250b43	[NFC] Separate Peeling Properties into its own struct Summary: This patch makes the peeling properties of the loop accessible by other loop transformations. Author: sidbav (Sidharth Baveja) Reviewers: Whitney (Whitney Tsang), Meinersbur (Michael Kruse), skatkov (Serguei Katkov), ashlykov (Arkady Shlykov), bogner (Justin Bogner), hfinkel (Hal Finkel) Reviewed By: Meinersbur (Michael Kruse) Subscribers: fhahn (Florian Hahn), hiraditya (Aditya Kumar), llvm-commits, LLVM Tag: LLVM Differential Revision: https://reviews.llvm.org/D80580	2020-07-08 18:56:03 +00:00
Guillaume Chatelet	1507fc1506	[Alignment][NFC] Migrate TTI::isLegalToVectorize{Load,Store}Chain to Align This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Differential Revision: https://reviews.llvm.org/D82653	2020-06-26 14:14:27 +00:00
Stanislav Mekhanoshin	2b87a44c49	[AMDGPU] Some formatting fixes. NFC.	2020-06-19 09:02:59 -07:00
Sam Parker	321ebfd175	[NFCI][CostModel] Unify FNeg cost Enable TTIImpl::getUserCost to handle FNeg so that getInstructionThroughput can call that instead. This means we can remove the code in the AMDGPU backend too. Differential Revision: https://reviews.llvm.org/D81635	2020-06-15 08:33:04 +01:00
Matt Arsenault	d6671ee90c	InferAddressSpaces: Handle ptrmask intrinsic This one is slightly odd since it counts as an address expression, which previously could never fail. Allow the existing TTI hook to return the value to use, and re-use it for handling how to handle ptrmask. Handles the no-op addrspacecasts for AMDGPU. We could probably do something better based on analysis of the mask value based on the address space, but leave that for now.	2020-05-28 10:04:02 -04:00
Sam Parker	8cc911fa5b	[NFCI][CostModel] Refactor getIntrinsicInstrCost Combine the two API calls into one by introducing a structure to hold the relevant data. This has the added benefit of moving the boiler plate code for arguments and flags, into the constructors. This is intended to be a non-functional change, but the complicated web of logic involved here makes it very hard to guarantee. Differential Revision: https://reviews.llvm.org/D79941	2020-05-20 11:59:08 +01:00
Sam Parker	40574fefe9	[NFC][CostModel] Add TargetCostKind to relevant APIs Make the kind of cost explicit throughout the cost model which, apart from making the cost clear, will allow the generic parts to calculate better costs. It will also allow some backends to approximate and correlate the different costs if they wish. Another benefit is that it will also help simplify the cost model around immediate and intrinsic costs, where we currently have multiple APIs. RFC thread: http://lists.llvm.org/pipermail/llvm-dev/2020-April/141263.html Differential Revision: https://reviews.llvm.org/D79002	2020-05-05 10:35:54 +01:00
Sam Parker	e9c9329aa4	[TTI] Add TargetCostKind argument to getUserCost There are several different types of cost that TTI tries to provide explicit information for: throughput, latency, code size along with a vague 'intersection of code-size cost and execution cost'. The vectorizer is a keen user of RecipThroughput and there's at least 'getInstructionThroughput' and 'getArithmeticInstrCost' designed to help with this cost. The latency cost has a single use and a single implementation. The intersection cost appears to cover most of the rest of the API. getUserCost is explicitly called from within TTI when the user has been explicit in wanting the code size (also only one use) as well as a few passes which are concerned with a mixture of size and/or a relative cost. In many cases these costs are closely related, such as when multiple instructions are required, but one evident diverging cost in this function is for div/rem. This patch adds an argument so that the cost required is explicit, so that we can make the important distinction when necessary. Differential Revision: https://reviews.llvm.org/D78635	2020-04-28 08:57:45 +01:00
Sam Parker	e3056ae9a0	[NFC][TTI] Explicit use of VectorType The API for shuffles and reductions uses generic Type parameters, instead of VectorType, and so assertions and casts are used a lot. This patch makes those types explicit, which means that the clients can't be lazy, but results in less ambiguity, and that can only be a good thing. Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=45562 Differential Revision: https://reviews.llvm.org/D78357	2020-04-20 09:16:52 +01:00
Matt Arsenault	5660bb6bc9	AMDGPU: Remove denormal subtarget features Switch to using the denormal-fp-math/denormal-fp-math-f32 attributes.	2020-04-02 17:17:12 -04:00
Jay Foad	cee65d51fe	AMDGPU: Implement getMemcpyLoopLoweringType Summary: Based on a patch by Matt Arsenault. Reviewers: rampitec, kerbowa, nhaehnle, arsenm Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77057	2020-03-30 22:21:01 +01:00
Anna Welker	a6d3bec83f	[TTI][ARM][MVE] Refine gather/scatter cost model Refines the gather/scatter cost model, but also changes the TTI function getIntrinsicInstrCost to accept an additional parameter which is needed for the gather/scatter cost evaluation. This did require trivial changes in some non-ARM backends to adopt the new parameter. Extending gathers and truncating scatters are now priced cheaper. Differential Revision: https://reviews.llvm.org/D75525	2020-03-11 10:23:41 +00:00
Matt Arsenault	cb7b661d3d	AMDGPU: Analyze divergence of inline asm	2020-02-03 12:42:16 -08:00
Austin Kerbow	c226646337	Resubmit: [DA][TTI][AMDGPU] Add option to select GPUDA with TTI Summary: Enable the new diveregence analysis by default for AMDGPU. Resubmit with test updates since GPUDA was causing failures on Windows. Reviewers: rampitec, nhaehnle, arsenm, thakis Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73315	2020-01-24 10:39:40 -08:00
Nico Weber	cd470717d1	Revert "[DA][TTI][AMDGPU] Add option to select GPUDA with TTI" This reverts commit `a90a6502ab`. Broke tests on Windows: http://lab.llvm.org:8011/builders/clang-x64-windows-msvc/builds/13808	2020-01-22 12:56:19 -05:00
Austin Kerbow	a90a6502ab	[DA][TTI][AMDGPU] Add option to select GPUDA with TTI Summary: Enable the new diveregence analysis by default for AMDGPU. Reviewers: rampitec, nhaehnle, arsenm Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73049	2020-01-21 21:13:20 -08:00
Stanislav Mekhanoshin	58578f7056	[AMDGPU] Implemented fma cost analysis Differential Revision: https://reviews.llvm.org/D71676	2019-12-18 23:54:20 -08:00
David Green	be7a107070	[ARM] Teach the Arm cost model that a Shift can be folded into other instructions This attempts to teach the cost model in Arm that code such as: %s = shl i32 %a, 3 %a = and i32 %s, %b Can under Arm or Thumb2 become: and r0, r1, r2, lsl #3 So the cost of the shift can essentially be free. To do this without trying to artificially adjust the cost of the "and" instruction, it needs to get the users of the shl and check if they are a type of instruction that the shift can be folded into. And so it needs to have access to the actual instruction in getArithmeticInstrCost, which if available is added as an extra parameter much like getCastInstrCost. We otherwise limit it to shifts with a single user, which should hopefully handle most of the cases. The list of instruction that the shift can be folded into include ADC, ADD, AND, BIC, CMP, EOR, MVN, ORR, ORN, RSB, SBC and SUB. This translates to Add, Sub, And, Or, Xor and ICmp. Differential Revision: https://reviews.llvm.org/D70966	2019-12-09 10:24:33 +00:00
Matt Arsenault	db0ed3e429	AMDGPU: Refactor treatment of denormal mode Start moving towards treating this as a property of the calling convention, and not the subtarget. The default denormal mode should not be part of the subtarget, and be moved into a separate function attribute. This patch is still NFC. The denormal mode remains as a subtarget feature for now, but make the necessary changes to switch to using an attribute.	2019-11-19 19:55:43 +05:30
dfukalov	6fd11b14f6	[AMDGPU] Tune inlining parameters for AMDGPU target (part 2) Summary: Most of IR instructions got better code size estimations after commit `47a5c36b`. So default parameters values should be updated to improve inlining and unrolling for the target. Reviewers: rampitec, arsenm Reviewed By: rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, zzheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70391	2019-11-19 16:33:16 +03:00
Daniil Fukalov	3972057511	[AMDGPU] Improve code size cost model Summary: Added estimation for zero size insertelement, extractelement and llvm.fabs operators. Updated inline/unroll parameters default values. Reviewers: rampitec, arsenm Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68881 llvm-svn: 375109	2019-10-17 12:15:35 +00:00
Vitaly Buka	c001144b10	[System Model] [TTI] Define AMDGPUTTIImpl::getST and AMDGPUTTIImpl::getTLI To fix "infinite recursion" warning. llvm-svn: 374222	2019-10-09 20:48:54 +00:00
Matt Arsenault	dbc1f207fa	InferAddressSpaces: Move target intrinsic handling to TTI I'm planning on handling intrinsics that will benefit from checking the address space enums. Don't bother moving the address collection for now, since those won't need th enums. llvm-svn: 368895	2019-08-14 18:13:00 +00:00
Daniil Fukalov	d912a9ba9b	[AMDGPU] Tune inlining parameters for AMDGPU target Summary: Since the target has no significant advantage of vectorization, vector instructions bous threshold bonus should be optional. amdgpu-inline-arg-alloca-cost parameter default value and the target InliningThresholdMultiplier value tuned then respectively. Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, eraman, hiraditya, haicheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64642 llvm-svn: 366348	2019-07-17 16:51:29 +00:00
Matt Arsenault	b10f097833	AMDGPU: Ignore subtarget for InferAddressSpaces Even if the target doesn't have flat instructions, addrspace(0) is still flat. It just happens to not work. llvm-svn: 363561	2019-06-17 14:13:24 +00:00
Matt Arsenault	f426ddbfc7	AMDGPU: Assume ECC is enabled by default if supported The test should really be checking for the property directly in the code object headers, but there are problems with this. I don't see this directly represented in the text form, and for the binary emission this is depending on a function level subtarget feature to emit a global flag. llvm-svn: 357558	2019-04-03 01:58:57 +00:00
Matt Arsenault	aa6fb4c45e	AMDGPU: Remove debugger related subtarget features As far as I know these aren't needed anymore. llvm-svn: 354634	2019-02-21 23:27:46 +00:00
Matt Arsenault	d24296e282	AMDGPU: Ignore CodeObjectV3 when inlining This was inhibiting inlining of library functions when clang was invoking the inliner directly. This is covering a bit of a mess with subtarget feature handling, and this shouldn't be a subtarget feature. The behavior is different depending on whether you are using a -mattr flag in clang, or llc, opt. llvm-svn: 353899	2019-02-12 23:30:11 +00:00
Chandler Carruth	2946cd7010	Update the file headers across all of the LLVM projects in the monorepo to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636	2019-01-19 08:50:56 +00:00
Matt Arsenault	0da6350dc8	AMDGPU: Remove remnants of old address space mapping llvm-svn: 341165	2018-08-31 05:49:54 +00:00
Tom Stellard	5bfbae5cb1	AMDGPU: Refactor Subtarget classes Summary: This is a follow-up to r335942. - Merge SISubtarget into AMDGPUSubtarget and rename to GCNSubtarget - Rename AMDGPUCommonSubtarget to AMDGPUSubtarget - Merge R600Subtarget::Generation and GCNSubtarget::Generation into AMDGPUSubtarget::Generation. Reviewers: arsenm, jvesely Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D49037 llvm-svn: 336851	2018-07-11 20:59:01 +00:00
Tom Stellard	c5a154db48	AMDGPU: Separate R600 and GCN TableGen files Summary: We now have two sets of generated TableGen files, one for R600 and one for GCN, so each sub-target now has its own tables of instructions, registers, ISel patterns, etc. This should help reduce compile time since each sub-target now only has to consider information that is specific to itself. This will also help prevent the R600 sub-target from slowing down new features for GCN, like disassembler support, GlobalISel, etc. Reviewers: arsenm, nhaehnle, jvesely Reviewed By: arsenm Subscribers: MatzeB, kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D46365 llvm-svn: 335942	2018-06-28 23:47:12 +00:00
Konstantin Zhuravlyov	e004b3d97b	AMDGPU: Remove ability to reserve VGPRs for debugger Differential Revision: https://reviews.llvm.org/D48234 llvm-svn: 335288	2018-06-21 20:28:19 +00:00
Tom Stellard	c7624317d7	AMDGPU: Split AMDGPUTTI into GCNTTI and R600TTI Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D47359 llvm-svn: 333605	2018-05-30 22:55:35 +00:00
Tom Stellard	44b30b4537	AMDGPU: Remove #include "MCTargetDesc/AMDGPUMCTargetDesc.h" from common headers Summary: MCTargetDesc/AMDGPUMCTargetDesc.h contains enums for all the instuction and register defintions, which are huge so we only want to include them where needed. This will also make it easier if we want to split the R600 and GCN definitions into separate tablegenerated files. I was unable to remove AMDGPUMCTargetDesc.h from SIMachineFunctionInfo.h because it uses some enums from the header to initialize default values for the SIMachineFunction class, so I ended up having to remove includes of SIMachineFunctionInfo.h from headers too. Reviewers: arsenm, nhaehnle Reviewed By: nhaehnle Subscribers: MatzeB, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D46272 llvm-svn: 332930	2018-05-22 02:03:23 +00:00

1 2

84 Commits