llvm-project

Commit Graph

Author	SHA1	Message	Date
Evgeniy Brevnov	9fb074e7bb	[BPI] Improve static heuristics for "cold" paths. Current approach doesn't work well in cases when multiple paths are predicted to be "cold". By "cold" paths I mean those containing "unreachable" instruction, call marked with 'cold' attribute and 'unwind' handler of 'invoke' instruction. The issue is that heuristics are applied one by one until the first match and essentially ignores relative hotness/coldness of other paths. New approach unifies processing of "cold" paths by assigning predefined absolute weight to each block estimated to be "cold". Then we propagate these weights up/down IR similarly to existing approach. And finally set up edge probabilities based on estimated block weights. One important difference is how we propagate weight up. Existing approach propagates the same weight to all blocks that are post-dominated by a block with some "known" weight. This is useless at least because it always gives 50\50 distribution which is assumed by default anyway. Worse, it causes the algorithm to skip further heuristics and can miss setting more accurate probability. New algorithm propagates the weight up only to the blocks that dominates and post-dominated by a block with some "known" weight. In other words, those blocks that are either always executed or not executed together. In addition new approach processes loops in an uniform way as well. Essentially loop exit edges are estimated as "cold" paths relative to back edges and should be considered uniformly with other coldness/hotness markers. Reviewed By: yrouban Differential Revision: https://reviews.llvm.org/D79485	2020-12-23 22:47:36 +07:00
Sebastian Neubauer	221fdedc69	[AMDGPU][GlobalISel] Fold flat vgpr + constant addresses Use getPtrBaseWithConstantOffset in selectFlatOffsetImpl to fold more vgpr+constant addresses. Differential Revision: https://reviews.llvm.org/D93692	2020-12-23 10:40:30 +01:00
Matt Arsenault	bac54639c7	AMDGPU: Add spilled CSR SGPRs to entry block live ins	2020-12-22 21:55:59 -05:00
Matt Arsenault	29ed846d67	AMDGPU: Fix assert when checking for implicit operand legality	2020-12-22 20:56:24 -05:00
Stanislav Mekhanoshin	d15119a02d	[AMDGPU][GlobalISel] GlobalISel for flat scratch It does not seem to fold offsets but this is not specific to the flat scratch as getPtrBaseWithConstantOffset() does not return the split for these tests unlike its SDag counterpart. Differential Revision: https://reviews.llvm.org/D93670	2020-12-22 16:33:06 -08:00
Stanislav Mekhanoshin	ca4bf58e4e	[AMDGPU] Support unaligned flat scratch in TLI Adjust SITargetLowering::allowsMisalignedMemoryAccessesImpl for unaligned flat scratch support. Mostly needed for global isel. Differential Revision: https://reviews.llvm.org/D93669	2020-12-22 16:12:31 -08:00
Stanislav Mekhanoshin	ae8f4b2178	[AMDGPU] Folding of FI operand with flat scratch Differential Revision: https://reviews.llvm.org/D93501	2020-12-22 10:48:04 -08:00
Fangrui Song	8ffda237a6	MCContext::reportError: don't call report_fatal_error Errors from MCAssembler, MCObjectStreamer and *ObjectWriter typically cause a crash: ``` % cat c.c int bar; extern int foo __attribute__((alias("bar"))); % clang -c -fcommon c.c fatal error: error in backend: Common symbol 'bar' cannot be used in assignment expr PLEASE submit a bug report to ... Stack dump: ... ``` `LLVMTargetMachine::addPassesToEmitFile` constructs `MachineModuleInfoWrapperPass` which creates a MCContext without SourceMgr. `MCContext::reportError` calls `report_fatal_error` which gets captured by Clang `LLVMErrorHandler` and gets translated to the output above. Since `MCContext::reportError` errors indicate user errors, such a crashing style error is inappropriate. So this patch changes `report_fatal_error` to `SourceMgr().PrintMessage`. ``` % clang -c -fcommon c.c <unknown>:0: error: Common symbol 'bar' cannot be used in assignment expr ``` Ideally we should at least recover the original filename (the line information is generally lost). That requires general improvement to MC diagnostics, because currently in many cases SMLoc information is lost.	2020-12-20 23:23:12 -08:00
Pushpinder Singh	e2303a448e	[FastRA] Fix handling of bundled MIs Fast register allocator skips bundled MIs, as the main assignment loop uses MachineBasicBlock::iterator (= MachineInstrBundleIterator) This was causing SIInsertWaitcnts to crash which expects all instructions to have registers assigned. This patch makes sure to set everything inside bundle to the same assignments done on BUNDLE header. Reviewed By: qcolombet Differential Revision: https://reviews.llvm.org/D90369	2020-12-21 02:10:55 -05:00
Whitney Tsang	2a814cd9e1	Ensure SplitEdge to return the new block between the two given blocks This PR implements the function splitBasicBlockBefore to address an issue that occurred during SplitEdge(BB, Succ, ...), inside splitBlockBefore. The issue occurs in SplitEdge when the Succ has a single predecessor and the edge between the BB and Succ is not critical. This produces the result ‘BB->Succ->New’. The new function splitBasicBlockBefore was added to splitBlockBefore to handle the issue and now produces the correct result ‘BB->New->Succ’. Below is an example of splitting the block bb1 at its first instruction. /// Original IR bb0: br bb1 bb1: %0 = mul i32 1, 2 br bb2 bb2: /// IR after splitEdge(bb0, bb1) using splitBasicBlock bb0: br bb1 bb1: br bb1.split bb1.split: %0 = mul i32 1, 2 br bb2 bb2: /// IR after splitEdge(bb0, bb1) using splitBasicBlockBefore bb0: br bb1.split bb1.split br bb1 bb1: %0 = mul i32 1, 2 br bb2 bb2: Differential Revision: https://reviews.llvm.org/D92200	2020-12-18 17:37:17 +00:00
Bangtian Liu	511cfe9441	Revert "Ensure SplitEdge to return the new block between the two given blocks" This reverts commit `d20e0c3444`.	2020-12-17 21:00:37 +00:00
Bangtian Liu	d20e0c3444	Ensure SplitEdge to return the new block between the two given blocks This PR implements the function splitBasicBlockBefore to address an issue that occurred during SplitEdge(BB, Succ, ...), inside splitBlockBefore. The issue occurs in SplitEdge when the Succ has a single predecessor and the edge between the BB and Succ is not critical. This produces the result ‘BB->Succ->New’. The new function splitBasicBlockBefore was added to splitBlockBefore to handle the issue and now produces the correct result ‘BB->New->Succ’. Below is an example of splitting the block bb1 at its first instruction. /// Original IR bb0: br bb1 bb1: %0 = mul i32 1, 2 br bb2 bb2: /// IR after splitEdge(bb0, bb1) using splitBasicBlock bb0: br bb1 bb1: br bb1.split bb1.split: %0 = mul i32 1, 2 br bb2 bb2: /// IR after splitEdge(bb0, bb1) using splitBasicBlockBefore bb0: br bb1.split bb1.split br bb1 bb1: %0 = mul i32 1, 2 br bb2 bb2: Differential Revision: https://reviews.llvm.org/D92200	2020-12-17 16:00:15 +00:00
Matt Arsenault	f333736757	AMDGPU: Remove SGPRSpillVGPRDefinedSet hack These VGPRs should be reserved and therefore do not need "correct" liveness. They should not have undef uses, which can still cause issues.	2020-12-16 21:33:35 -05:00
Bangtian Liu	c10757200d	Revert "Ensure SplitEdge to return the new block between the two given blocks" This reverts commit `cf638d793c`.	2020-12-16 11:52:30 +00:00
Stanislav Mekhanoshin	eb66bf0802	[AMDGPU] Print SCRATCH_EN field after the kernel Differential Revision: https://reviews.llvm.org/D93353	2020-12-15 22:44:30 -08:00
Bangtian Liu	cf638d793c	Ensure SplitEdge to return the new block between the two given blocks This PR implements the function splitBasicBlockBefore to address an issue that occurred during SplitEdge(BB, Succ, ...), inside splitBlockBefore. The issue occurs in SplitEdge when the Succ has a single predecessor and the edge between the BB and Succ is not critical. This produces the result ‘BB->Succ->New’. The new function splitBasicBlockBefore was added to splitBlockBefore to handle the issue and now produces the correct result ‘BB->New->Succ’. Below is an example of splitting the block bb1 at its first instruction. /// Original IR bb0: br bb1 bb1: %0 = mul i32 1, 2 br bb2 bb2: /// IR after splitEdge(bb0, bb1) using splitBasicBlock bb0: br bb1 bb1: br bb1.split bb1.split: %0 = mul i32 1, 2 br bb2 bb2: /// IR after splitEdge(bb0, bb1) using splitBasicBlockBefore bb0: br bb1.split bb1.split br bb1 bb1: %0 = mul i32 1, 2 br bb2 bb2: Differential Revision: https://reviews.llvm.org/D92200	2020-12-15 23:32:29 +00:00
Matt Arsenault	60eba8161b	RegisterCoalescer: Remove phi-only subranges when erasing identity copies Undef subranges are not present in the live range values, except when they cross block boundaries. In this situation, a identity copy is inside a loop, and one of the lanes is undefined. It only appears alive inside the loop due to the copy. Once the copy was erased, it would leave behind a segment inside the loop body with no corresponding def anywhere in the program. When RenameIndependentSubregs processed this dummy interval, it would introduce a "Multiple connected components in live interval" verifier error when IMPLICIT_DEFs were added to the other two blocks. I believe there is a missing verifier check for this type of dummy interval. I have found additional cases from the same fundamental problem in other areas I haven't managed to fix yet (e.g. the commented out prune_subrange_phi_value_* cases).	2020-12-15 17:36:32 -05:00
Changpeng Fang	ce0c0013d8	AMDGPU: If a store defines (alias) a load, it clobbers the load. Summary: If a store defines (must alias) a load, it clobbers the load. Fixes: SWDEV-258915 Reviewers: arsenm Differential Revision: https://reviews.llvm.org/D92951	2020-12-14 16:34:32 -08:00
Stanislav Mekhanoshin	cf5845d6c4	[AMDGPU] Use multi-dword flat scratch for spilling Differential Revision: https://reviews.llvm.org/D93067	2020-12-14 14:19:29 -08:00
Michael Liao	1fd1f638b6	[amdgpu] Fix a crash case when `V_CNDMASK` could be simplified. - Once an instruction is simplified, foldable candidates from it should be invalidated or skipped as the operand index is no longer valid. Differential Revision: https://reviews.llvm.org/D93174	2020-12-14 13:08:13 -05:00
Sebastian Neubauer	5733167f54	[AMDGPU] Mark amdgpu_gfx functions as module entry function - Allows lds allocations - Writes resource usage into COMPUTE_PGM_RSRC1 registers in PAL metadata Differential Revision: https://reviews.llvm.org/D92946	2020-12-14 10:43:39 +01:00
Georgii Rymar	98a4289810	[llvm-readobj] - For SHT_REL relocations, don't display an addend. This is https://bugs.llvm.org/show_bug.cgi?id=44257. In LLVM style we always print `0` as addend when dumping SHT_REL relocations. It is confusing, this patch stops printing it as the first comment on the bug page suggests. Differential revision: https://reviews.llvm.org/D93033	2020-12-14 12:03:00 +03:00
Mirko Brkusanin	0c7cce54eb	[AMDGPU] Resolve issues when picking between ds_read/write and ds_read2/write2 Both ds_read_b128 and ds_read2_b64 are valid for 128bit 16-byte aligned loads but the one that will be selected is determined either by the order in tablegen or by the AddedComplexity attribute. Currently ds_read_b128 has priority. While ds_read2_b64 has lower alignment requirements, we cannot always restrict ds_read_b128 to 16-byte alignment because of unaligned-access-mode option. This was causing ds_read_b128 to be selected for 8-byte aligned loads regardles of chosen access mode. To resolve this we use two patterns for selecting ds_read_b128. One requires alignment of 16-byte and the other requires unaligned-access-mode option. Same goes for ds_write2_b64 and ds_write_b128. Differential Revision: https://reviews.llvm.org/D92767	2020-12-10 12:40:49 +01:00
Stanislav Mekhanoshin	4617cc68f6	[AMDGPU] Fix expansion of 192 bit spills in PEI Differential Revision: https://reviews.llvm.org/D92979	2020-12-09 16:36:29 -08:00
Austin Kerbow	4aa842a800	[AMDGPU] Add new pseudos for indirect addressing with VGPR Indexing It is possible for copies or spills to be inserted in the middle of indirect addressing sequences which use VGPR indexing. Spills to accvgprs could be effected by the indexing mode. Add new pseudo instructions that are expanded after register allocation to avoid the problematic spill or copy placement. Differential Revision: https://reviews.llvm.org/D91048	2020-12-08 12:24:12 -08:00
Jay Foad	03663e4130	[AMDGPU] Add occupancy level tests for GFX10.3. NFC. getMaxWavesPerEU and getVGPRAllocGranule both changed in GFX10.3 and they both affect the occupancy calculation. Differential Revision: https://reviews.llvm.org/D92839	2020-12-08 14:15:01 +00:00
Stanislav Mekhanoshin	dd89249498	[AMDGPU] Annotate vgpr<->agpr spills in asm Differential Revision: https://reviews.llvm.org/D92125	2020-12-07 11:25:25 -08:00
Scott Linder	f6b9afae00	[AMDGPU] Extend and reorganize memory legalizer tests * Rename some tests to try to make a convention (where all components are optional) of: <addrspace>_<syncscope>_<memory-orders>_<operation> * Split up at a level of granularity appropriate for the different RUN lines (i.e. split on addrspace so GFX6 can avoid FLAT) and that makes running a specific test reasonable in terms of wall time taken. This also means when run as part of the test suite the testing is not one serial bottleneck. * Auto-generate check lines with `update_llc_test_checks.py` to make future maintenance more tractable. Reviewed By: rampitec, t-tye Differential Revision: https://reviews.llvm.org/D91545	2020-12-03 19:36:33 +00:00
Jay Foad	d28624a209	[AMDGPU] Stop adding an implicit def of vcc_hi for wave32 This doesn't seem to be needed for anything. Differential Revision: https://reviews.llvm.org/D92400	2020-12-02 10:11:42 +00:00
Juneyoung Lee	53040a968d	[ConstantFold] Fold more operations to poison This patch folds more operations to poison. Alive2 proof: https://alive2.llvm.org/ce/z/mxcb9G (it does not contain tests about div/rem because they fold to poison when raising UB) Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D92270	2020-11-29 21:19:48 +09:00
Nikita Popov	4df8efce80	[AA] Split up LocationSize::unknown() Currently, we have some confusion in the codebase regarding the meaning of LocationSize::unknown(): Some parts (including most of BasicAA) assume that LocationSize::unknown() only allows accesses after the base pointer. Some parts (various callers of AA) assume that LocationSize::unknown() allows accesses both before and after the base pointer (but within the underlying object). This patch splits up LocationSize::unknown() into LocationSize::afterPointer() and LocationSize::beforeOrAfterPointer() to make this completely unambiguous. I tried my best to determine which one is appropriate for all the existing uses. The test changes in cs-cs.ll in particular illustrate a previously clearly incorrect AA result: We were effectively assuming that argmemonly functions were only allowed to access their arguments after the passed pointer, but not before it. I'm pretty sure that this was not intentional, and it's certainly not specified by LangRef that way. Differential Revision: https://reviews.llvm.org/D91649	2020-11-26 18:39:55 +01:00
Roman Lebedev	ba74fa244f	[AMDGPU] Actually fully update opt-pipeline.ll test to account for -loop-idiom vs -indvars switch	2020-11-25 19:39:32 +03:00
Roman Lebedev	a8d74517dc	[PassManager] Run Induction Variable Simplification pass after Recognize loop idioms pass, not before Currently, `-indvars` runs first, and then immediately after `-loop-idiom` does. I'm not really sure if `-loop-idiom` requires `-indvars` to run beforehand, but i'm very sure that `-indvars` requires `-loop-idiom` to run afterwards, as it can be seen in the phase-ordering test. LoopIdiom runs on two types of loops: countable ones, and uncountable ones. For uncountable ones, IndVars obviously didn't make any change to them, since they are uncountable, so for them the order should be irrelevant. For countable ones, well, they should have been countable before IndVars for IndVars to make any change to them, and since SCEV is used on them, it shouldn't matter if IndVars have already canonicalized them. So i don't really see why we'd want the current ordering. Should this cause issues, it will give us a reproducer test case that shows flaws in this logic, and we then could adjust accordingly. While this is quite likely beneficial in-the-wild already, it's a required part for the full motivational pattern behind `left-shift-until-bittest` loop idiom (D91038). Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D91800	2020-11-25 19:20:07 +03:00
Sebastian Neubauer	edd675643d	[AMDGPU] Emit stack frame size in metadata Add .shader_functions to pal metadata, which contains the stack frame size for all non-entry-point functions. Differential Revision: https://reviews.llvm.org/D90036	2020-11-25 16:30:02 +01:00
Matt Arsenault	79f75468b4	AMDGPU: Fix counting kernel arguments towards register usage Also use DataLayout to get type size. Relying on the IR type size is also pretty broken here, since this won't perfectly capture how types are legalized.	2020-11-20 21:23:33 -05:00
Matt Arsenault	1d1234b2a4	OpaquePtr: Update more tests to use typed sret	2020-11-20 20:08:43 -05:00
Matt Arsenault	20c43d6bd5	OpaquePtr: Bulk update tests to use typed sret	2020-11-20 17:58:26 -05:00
Matt Arsenault	06c192d454	OpaquePtr: Bulk update tests to use typed byval Upgrade of the IR text tests should be the only thing blocking making typed byval mandatory. Partially done through regex and partially manual.	2020-11-20 14:00:46 -05:00
Sebastian Neubauer	7a18bdb350	[AMDGPU] Implement flat scratch init for pal Extract the scratch offset from the scratch buffer descriptor that is stored in the global table. Differential Revision: https://reviews.llvm.org/D91701	2020-11-20 11:14:30 +01:00
Scott Linder	0fe4b8e4b5	[NFC][AMDGPU] Remove some generic pointers in memory-legalizer tests These tests implicitly depend on the target supporting generic pointers, so to prepare for testing them on GFX6 (which lacks FLAT) remove the dependency where possible. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D91666	2020-11-18 20:52:18 +00:00
Sebastian Neubauer	72ccec1bbc	[AMDGPU] Fix v3f16 interaction with image store workaround In some cases, the wrong amount of registers was reserved. Also enable more v3f16 tests. Differential Revision: https://reviews.llvm.org/D90847	2020-11-18 18:21:04 +01:00
Jay Foad	7ecf19697e	[AMDGPU] Fix and extend vccz workarounds We have workarounds for two different cases where vccz can get out of sync with the value in vcc. This fixes them in two ways: 1. Fix the case where the def of vcc was in a previous basic block, by pessimistically assuming that vccz might be incorrect at a basic block boundary. 2. Fix the handling of pre-existing waitcnt instructions by calling generateWaitcntInstBefore before examining ScoreBrackets to determine whether there's an outstanding smem read operation. Differential Revision: https://reviews.llvm.org/D91636	2020-11-18 15:26:06 +00:00
Jay Foad	e67d8859f2	[AMDGPU] Precommit more vccz workaround tests	2020-11-17 15:55:40 +00:00
Michael Liao	f375885ab8	[InferAddrSpace] Teach to handle assumed address space. - In certain cases, a generic pointer could be assumed as a pointer to the global memory space or other spaces. With a dedicated target hook to query that address space from a given value, infer-address-space pass could infer and propagate that to all its users. Differential Revision: https://reviews.llvm.org/D91121	2020-11-16 17:06:33 -05:00
Matt Arsenault	d2e52eec51	AMDGPU: Select global saddr mode from SGPR pointer Use the 64-bit SGPR base with a 0 offset, since it's 1 fewer instruction to materialize the 0 vs. the 64-bit copy.	2020-11-16 11:51:06 -05:00
Mirko Brkusanin	4cf6dd518e	[AMDGPU][GlobalISel] Fix lowerShlSat RegBankSelect would crash on G_SELECT when type is not s1. Differential Revision: https://reviews.llvm.org/D91437	2020-11-16 17:43:31 +01:00
Matt Arsenault	a6e353b1d0	AMDGPU: Split large offsets when selecting global saddr mode When the offset doesn't fit in the immediate field, move some to voffset.	2020-11-16 11:36:01 -05:00
Florian Hahn	8dbe44cb29	Add pass to add !annotate metadata from @llvm.global.annotations. This patch adds a new pass to add !annotation metadata for entries in @llvm.global.anotations, which is generated using __attribute__((annotate("_name"))) on functions in Clang. This has been discussed on llvm-dev as part of RFC: Combining Annotation Metadata and Remarks http://lists.llvm.org/pipermail/llvm-dev/2020-November/146393.html Reviewed By: thegameg Differential Revision: https://reviews.llvm.org/D91195	2020-11-16 14:57:11 +00:00
Matt Arsenault	e7eb2ac53f	AMDGPU/GlobalISel: Regenerate some checks Fixes indentation confusing diff in future patch.	2020-11-13 11:29:15 -05:00
Florian Hahn	8bb6347939	Add !annotation metadata and remarks pass. This patch adds a new !annotation metadata kind which can be used to attach annotation strings to instructions. It also adds a new pass that emits summary remarks per function with the counts for each annotation kind. The intended uses cases for this new metadata is annotating 'interesting' instructions and the remarks should provide additional insight into transformations applied to a program. To motivate this, consider these specific questions we would like to get answered: * How many stores added for automatic variable initialization remain after optimizations? Where are they? * How many runtime checks inserted by a frontend could be eliminated? Where are the ones that did not get eliminated? Discussed on llvm-dev as part of 'RFC: Combining Annotation Metadata and Remarks' (http://lists.llvm.org/pipermail/llvm-dev/2020-November/146393.html) Reviewed By: thegameg, jdoerfert Differential Revision: https://reviews.llvm.org/D91188	2020-11-13 13:24:10 +00:00

1 2 3 4 5 ...

4150 Commits