llvm-project

Commit Graph

Author	SHA1	Message	Date
Matt Arsenault	bcf5184a68	AMDGPU/GlobalISel: Make sure <2 x s1> phis are scalarized	2020-07-26 10:04:47 -04:00
Matt Arsenault	6f961a1e7e	AMDGPU/GlobalISel: Legalize GDS atomics I noticed these don't use the _gfx9, non-m0 reading variants but not sure if that's a bug or not. It's the same in the DAG.	2020-07-26 10:03:34 -04:00
Matt Arsenault	5819159995	AMDGPU/GlobalISel: Pack constant G_BUILD_VECTOR_TRUNCs when selecting	2020-07-26 09:55:34 -04:00
Matt Arsenault	61ced4b87a	GlobalISel: Handle 'n' inline asm constraint	2020-07-26 09:30:41 -04:00
Matt Arsenault	4033aa1467	AMDGPU/GlobalISel: Sign extend integer constants This matches the DAG behavior and fixes immediate folding	2020-07-26 09:30:14 -04:00
Matt Arsenault	4f6502ab33	AMDGPU/GlobalISel: Replace selection tests for G_CONSTANT/G_FCONSTANT Split into separate tests and make more consistent with the others.	2020-07-26 09:30:09 -04:00
Changpeng Fang	9162b70e51	DADCombiner: Don't simplify the token factor if the node's number of operands already exceeds TokenFactorInlineLimit Summary: In parallelizeChainedStores, a TokenFactor was created with the size greater than 3000. We found that DAGCombiner::visitTokenFactor will consume a huge amount of time on such nodes. Since the number of operands already exceeds TokenFactorInlineLimit, we propose to give up simplification with the consideration of compile time. Reviewers: @spatel, @arsenm Differential Revision: https://reviews.llvm.org/D84204	2020-07-25 21:20:59 -07:00
Matt Arsenault	392b969c32	AMDGPU/GlobalISel: Don't assert on G_INSERT > 128-bits Just fallback for now. Really tablegen needs to generate all of the subregister index handling we need.	2020-07-25 10:05:44 -04:00
Matt Arsenault	2bd72abef0	AMDGPU: Skip other terminators before inserting s_cbranch_exec[n]z PHIElimination/createPHISourceCopy inserts non-branch terminators after the control flow pseudo if a successor phi reads a register defined by the control flow pseudo. If this happens, we need to split the expansion of the control flow pseudo to ensure all the branches are after all of the other mask management instructions. GlobalISel hit this in testscases that happened to be tail duplicated. The original testcase still does not work, since the same problem appears to be present in a later pass.	2020-07-24 16:51:59 -04:00
Dmitry Preobrazhensky	6b8948922c	[AMDGPU][MC] Added support of SP3 syntax for MTBUF format modifier Currently supported LLVM MTBUF syntax is shown below. It is not compatible with SP3. op dst, addr, rsrc, FORMAT, soffset This change adds support for SP3 syntax: op dst, addr, rsrc, soffset SP3FORMAT In addition to being compatible with SP3, this syntax allows using symbolic names for data, numeric and unified formats. Below is a list of added syntax variants. format:<expression> format:[<numeric-format-name>,<data-format-name>] format:[<data-format-name>,<numeric-format-name>] format:[<data-format-name>] format:[<numeric-format-name>] format:[<unified-format-name>] The last syntax variant is supported for GFX10 only. See llvm bug 37738 Reviewers: arsenm, rampitec, vpykhtin Differential Revision: https://reviews.llvm.org/D84026	2020-07-24 16:41:03 +03:00
Petar Avramovic	47bd41d099	AMDGPU/GlobalISel: Select set.inactive intrinsic Differential Revision: https://reviews.llvm.org/D84407	2020-07-24 10:14:14 +02:00
Matt Arsenault	891759db73	GlobalISel: Add scalarSameSizeAs LegalizeRule Widen or narrow a type to a type with the same scalar size as another. This can be used to force G_PTR_ADD/G_PTRMASK's scalar operand to match the bitwidth of the pointer type. Use this to disallow narrower types for G_PTRMASK.	2020-07-23 21:17:31 -04:00
Matt Arsenault	b9c644ec61	AMDGPU: Fix failures from overflowing uint8_t number of operands If the operand index exceeded the limit of unsigned char, it wrapped and would point to the wrong operand. Increase the size of the operand index field to avoid this, and also don't bother trying to fold into implicit operands.	2020-07-23 15:39:33 -04:00
Nikita Popov	deb4bb2b3a	[IR] Add min/max/abs intrinsics This adds the llvm.abs(), llvm.umin(), llvm.umax(), llvm.smin(), and llvm.smax() intrinsics specified in D81829. For SelectionDAG, the ISD opcodes and all the legalization and lowering already exist, so this just wires them up to the intrinsic in the SDAG builder and adds rudimentary tests. For GlobalISel only the min/max intrinsics are wired up, as llvm.abs() will require the addition of a G_ABS op, and corresponding legalization support. Differential Revision: https://reviews.llvm.org/D84125	2020-07-23 20:56:19 +02:00
Matt Arsenault	b2ee1cd2d9	AMDGPU/GlobalISel: Add some tests for stack passed pointers	2020-07-23 14:38:31 -04:00
Matt Arsenault	d2b8fcff34	AMDGPU/GlobalISel: Handle call return values The only case that I know doesn't work is the implicit sret case when the return type doesn't fit in the return registers.	2020-07-23 14:29:35 -04:00
Jay Foad	b35833b84e	[GlobalISel][AMDGPU] Legalize saturating add/subtract Add support in LegalizerHelper for lowering G_SADDSAT etc. either using add/subtract-with-overflow or using max/min instructions. Enable this lowering for AMDGPU so it can be tested. The legalization rules are still approximate and skips out on using the clamp bit to treat these as legal, which has never been used before. This also doesn't yet try to deal with expanding SALU cases.	2020-07-23 09:06:42 -04:00
Konstantin Schwarz	931488779f	[GlobalISel][InlineAsm] Add register class ID to the flags of register input operands Summary: We do this already for output operands, but missed it for (non-tied) input operands. Reviewers: arsenm, Petar.Avramovic Reviewed By: arsenm Subscribers: jvesely, wdng, nhaehnle, rovka, hiraditya, llvm-commits, kerbowa Tags: #llvm Differential Revision: https://reviews.llvm.org/D83763	2020-07-23 13:35:01 +02:00
Matt Arsenault	1fd1beea18	AMDGPU/GlobalISel: Fix translation of indirect calls	2020-07-22 13:13:21 -04:00
Dmitry Preobrazhensky	0b8fd77ad9	[AMDGPU][MC] Corrected decoding of 16-bit literals 16-bit literals are encoded as 32-bit values. If high 16-bits of the value is 0xFFFF, the decoded instruction cannot be reassembled. For example, the following code 0xff,0x04,0x04,0x52,0xcd,0xab,0xff,0xff was decoded as v_mul_lo_u16_e32 v2, 0xffffabcd, v2 However this literal is actually a 64-bit constant 0x00000000ffffabcd which violates requirements described in the documentation - the truncation is not safe. This change corrects decoding to make reassembly possible. Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D84098	2020-07-22 17:20:43 +03:00
Petar Avramovic	44967fc604	AMDGPU: Simplify f16 to i64 custom lowering Range that f16 can represent fits into i32. Lower as f16->i32->i64 instead of f16->f32->i64 since f32->i64 has long expansion. Differential Revision: https://reviews.llvm.org/D84166	2020-07-22 10:32:14 +02:00
Matt Arsenault	7a669130f7	AMDGPU/GlobalISel: Add some baseline degenerate call argument tests	2020-07-21 18:48:40 -04:00
Matt Arsenault	b258920095	AMDGPU/GlobalISel: Fix not erasing inst when lowering G_FRINT	2020-07-21 18:29:41 -04:00
Matt Arsenault	7cd8a0256d	GlobalISel: Legalize G_FPOWI	2020-07-21 18:13:04 -04:00
Matt Arsenault	1168119c2f	AMDGPU: Start interpreting byref on kernel arguments These are treated identically to value aggregates placed in the kernel argument list. A %struct.foo or %struct.foo addrspace(4)* byref(sizeof(%struct.foo)) align(alignof(%struct.foo)) argument should produce the same offsets and argument metadata. This handles all 3 kernel ABI implementations, and the two HSA metadata emission paths.	2020-07-21 18:11:22 -04:00
Matt Arsenault	2fe0ea8261	DAG: Handle expanding strict_fsub into fneg and strict_fadd The AMDGPU handling of f16 vectors is terrible still since it gets scalarized even when the vector operation is legal. The code is is essentially duplicated between the non-strict and strict case. Apparently no other expansions are currently trying to do this. This is mostly because I found the behavior of getStrictFPOperationAction to be confusing. In the ARM case, it would expand strict_fsub even though it shouldn't due to the later check. At that point, the logic required to check for legality was more complex than just duplicating the 2 instruction expansion.	2020-07-21 16:17:10 -04:00
Matt Arsenault	84704d989b	AMDGPU: Fix not accounting for constantexpr uses of LDS globals This was failing to add the size of LDS globals that weren't directly used by an instruction. They could be used by constant expressions which are transitively used by the function. This requires a better search, but just abort on this for now for correctness.	2020-07-20 11:41:41 -04:00
Matt Arsenault	61f1f2a204	AMDGPU/GlobalISel: Initial Implementation of calls Return values, and tail calls are not yet handled.	2020-07-20 11:13:22 -04:00
Petar Avramovic	6a1030aa0e	AMDGPU/GlobalISel: Legalize s16->s64 G_FPEXT Legalize using narrowScalar as s16->s32 G_FPEXT followed by s32->s64 G_FPEXT. Differential Revision: https://reviews.llvm.org/D84030	2020-07-20 16:12:19 +02:00
Matt Arsenault	5cbd4e415e	GlobalISel: Don't handle widenScalar for vector G_INSERT This handling didn't make any sense for vectors.	2020-07-20 10:06:18 -04:00
Matt Arsenault	93311a9812	AMDGPU/GlobalISel: Fix custom lowering of llvm.trunc.f64 for SI This was missing an operand from BFE and not erasing the original instruction.	2020-07-20 10:06:18 -04:00
Elvina Yakubova	b36a3e6140	[llvm-readobj] Update tests because of changes in llvm-readobj behavior This patch updates tests using llvm-readobj and llvm-readelf, because soon reading from stdin will be achievable only via a '-' as described here: https://bugs.llvm.org/show_bug.cgi?id=46400. Patch with changes to llvm-readobj behavior is here: https://reviews.llvm.org/D83704 Differential Revision: https://reviews.llvm.org/D83912 Reviewed by: jhenderson, MaskRay, grimar	2020-07-20 10:39:04 +01:00
Petar Avramovic	ba938f6388	AMDGPU/GlobalISel: Legalize s16->s64 G_FPTOSI/G_FPTOUI Add narrowScalarFor action. Add narrow scalar for typeIndex == 0 for G_FPTOSI/G_FPTOUI. Legalize using narrowScalarFor as s16->s32 G_FPTOSI/G_FPTOUI followed by s32->s64 G_SEXT/G_ZEXT. Differential Revision: https://reviews.llvm.org/D84010	2020-07-20 11:06:11 +02:00
Matt Arsenault	c73df56966	AMDGPU/GlobalISel: Address some test fixmes that don't fail now	2020-07-18 10:54:39 -04:00
Matt Arsenault	918f3fc2c7	AMDGPU/GlobalISel: Fix test copy paste error	2020-07-18 10:09:01 -04:00
Dmitry Preobrazhensky	2e87acac9b	[AMDGPU] Removed s_mov_regrd and mov_fed opcodes These opcodes are not intended for public use. Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D81659	2020-07-17 19:52:54 +03:00
Matt Arsenault	994fb86bc2	AMDGPU: Fix promoting f16 fpowi with legal f16	2020-07-17 11:29:05 -04:00
Jay Foad	f05bce86af	[AMDGPU] Add some missing check prefixes and tweak test The test needed some extra ALU instructions to prevent it from being memory bound.	2020-07-17 12:57:47 +01:00
Jay Foad	2dc3d1b313	[AMDGPU] Add some missing check prefixes	2020-07-17 12:56:29 +01:00
Jay Foad	760af7a074	[AMDGPU] Avoid splitting FLAT offsets in unsafe ways As explained in the comment: // For a FLAT instruction the hardware decides whether to access // global/scratch/shared memory based on the high bits of vaddr, // ignoring the offset field, so we have to ensure that when we add // remainder to vaddr it still points into the same underlying object. // The easiest way to do that is to make sure that we split the offset // into two pieces that are both >= 0 or both <= 0. In particular FLAT (as opposed to SCRATCH and GLOBAL) instructions have an unsigned immediate offset field, so we can't use it to help split a negative offset. Differential Revision: https://reviews.llvm.org/D83394	2020-07-17 11:44:10 +01:00
Jay Foad	62fd7f767c	[MachineScheduler] Fix the TopDepth/BotHeightReduce latency heuristics tryLatency compares two sched candidates. For the top zone it prefers the one with lesser depth, but only if that depth is greater than the total latency of the instructions we've already scheduled -- otherwise its latency would be hidden and there would be no stall. Unfortunately it only tests the depth of one of the candidates. This can lead to situations where the TopDepthReduce heuristic does not kick in, but a lower priority heuristic chooses the other candidate, whose depth is greater than the already scheduled latency, which causes a stall. The fix is to apply the heuristic if the depth of either candidate is greater than the already scheduled latency. All this also applies to the BotHeightReduce heuristic in the bottom zone. Differential Revision: https://reviews.llvm.org/D72392	2020-07-17 11:02:13 +01:00
hsmahesha	4905536086	Revert "[AMDGPU/MemOpsCluster] Implement new heuristic for computing max mem ops cluster size" This reverts commit `cc9d693856`.	2020-07-17 12:20:37 +05:30
Carl Ritson	3a18665748	[AMDGPU] Translate s_and/s_andn2 to s_mov in vcc optimisation When SCC is dead, but VCC is required then replace s_and / s_andn2 with s_mov into VCC when mask value is 0 or -1. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D83850	2020-07-17 11:48:57 +09:00
Matt Arsenault	6c5b635e95	AMDGPU: Add a few more missing test for AGPR tuple copying	2020-07-16 15:53:11 -04:00
Matt Arsenault	10382285ac	AMDGPU: Add missing tests for copyPhysReg AGPR tuples	2020-07-16 15:27:57 -04:00
Matt Arsenault	79f67cae91	AMDGPU: Rename add/sub with carry out instructions The hardware has created a real mess in the naming for add/sub, which have been renamed basically every generation. Switch the carry out pseudos to have the gfx9/gfx10 names. We were using the original SI/CI v_add_i32/v_sub_i32 names. Later targets reintroduced these names as carryless instructions with a saturating clamp bit, which we do not define. Do this rename so we can unambiguously add these missing instructions. The carry-in versions should also be renamed, but at least those had a consistent _u32 name to begin with. The 16-bit instructions were also renamed, but aren't ambiguous. This does regress assembler error message quality in some cases. In mismatched wave32/wave64 situations, this will switch from "unsupported instruction" to "invalid operand", with the error pointing at the wrong position. I couldn't quite follow how the assembler selects these, but the previous behavior seemed accidental to me. It looked like there was a partial attempt to handle this which was never completed (i.e. there is an AMDGPUOperand::isBoolReg but it isn't used for anything).	2020-07-16 13:16:30 -04:00
Petar Avramovic	6850033ca6	AMDGPU/GlobalISel: Legalize s64->s16 G_SITOFP/G_UITOFP Add widenScalar for TypeIdx == 0 for G_SITOFP/G_UITOFP. Legailize, using widenScalar, as s64->s32 G_SITOFP/G_UITOFP followed by s32->s16 G_FPTRUNC. Differential Revision: https://reviews.llvm.org/D83880	2020-07-16 16:31:57 +02:00
Petar Avramovic	5658002b80	AMDGPU/GlobalISel: Select G_FREEZE Select G_FREEZE in the same way that COPY is selected. Differential Revision: https://reviews.llvm.org/D83031	2020-07-16 11:10:48 +02:00
Carl Ritson	5bf2a9dd40	[AMDGPU] Update VMEM scalar write hazard mitigation sequence Using s_waitcnt_depctr 0xffe3 is potentially faster than v_nop. Reviewed By: rampitec, foad Differential Revision: https://reviews.llvm.org/D83872	2020-07-16 11:37:45 +09:00
dfukalov	7520393842	[NFC] Fixed typo in tests parameters Summary: llc reports `fp32-denormals` is not recognized. I guess it was intended to be `-denormal-fp-math-f32={preserve-sign\|ieee} -mattr=+mad-mac-f32-insts` Reviewers: rampitec Reviewed By: rampitec Subscribers: jvesely, nhaehnle, llvm-commits, kerbowa Tags: #llvm Differential Revision: https://reviews.llvm.org/D83883	2020-07-15 22:09:01 +03:00

1 2 3 4 5 ...

3740 Commits