llvm-project

Commit Graph

Author	SHA1	Message	Date
Carl Ritson	da33c96d47	[AMDGPU] Make SGPR spills exec mask agnostic Explicitly set the exec mask for SGPR spills and reloads. This fixes a bug where SGPR spills to memory could be incorrect if the exec mask was 0 (or differed between spill and reload). Additionally pack scalar subregisters (upto 16/32 per VGPR), so that the majority of scalar types can be spilt or reloaded with a simple memory access. This should amortize some of the additional overhead of manipulating the exec mask. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D80282	2020-06-03 12:34:26 +09:00
Matt Arsenault	cdd3054255	AMDGPU: Fix a test to be more stable The chained unconditional branches can be eliminated and it's not relevant to the test.	2020-06-02 13:47:48 -04:00
Matt Arsenault	4b1f6cdbf9	AMDGPU: Don't run indexing mode switches with exec = 0 Add mode defs rather than special casing this like some of the other instructions.	2020-06-02 13:47:48 -04:00
Matt Arsenault	452e0d9023	AMDGPU: Don't run mode switches with exec 0 These are scalar instructions that change vector instructions, so they should not be executed without any active lanes. The implementation of -amdgpu-skip-threshold also seem to be backwards from expected, since decreasing it prevents removal.	2020-06-02 13:47:48 -04:00
Matt Arsenault	85117e286d	AMDGPU: Fix not using scalar loads for global reads in shaders The pass which infers when it's legal to load a global address space as SMRD was only considering amdgpu_kernel, and ignoring the shader entry type calling conventions.	2020-06-02 09:49:23 -04:00
Dominik Montada	052c962ced	[GlobalISel] Combine scalar unmerge(trunc) Summary: Combine unmerge(trunc) to enable other merge combines. Without this combine, the scalar unmerge(trunc(merge)) pattern cannot be combined and easily lead to hard-to-legalize merge/unmerge artifacts. Reviewed By: arsenm Tags: #llvm Differential Revision: https://reviews.llvm.org/D79567	2020-06-02 08:56:18 +02:00
Matt Arsenault	20793b2aef	AMDGPU: Fix test in code directory	2020-06-01 13:26:51 -04:00
Matt Arsenault	7ad36491ca	AMDGPU: Fix alignment for dynamic allocas The alignment value also needs to be scaled by the wave size.	2020-06-01 13:06:37 -04:00
Jay Foad	2768edfff1	[AMDGPU] Propagate fast-math flags when lowering FSIN and FCOS Differential Revision: https://reviews.llvm.org/D80813	2020-05-31 05:21:55 +01:00
Jay Foad	d4751f3556	[AMDGPU] Precommit tests for D80813	2020-05-31 05:21:55 +01:00
Changpeng Fang	234eba90f4	AMDGPU: Add setTruncStoreAction for vector i64 types made legal recently Reviewers: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D80853	2020-05-30 20:45:27 -07:00
Carl Ritson	d04147789f	[AMDGPU] Remove assertion on S1024 SGPR to VGPR spill Summary: Replace an assertion that blocks S1024 SGPR to VGPR spill. The assertion pre-dates S1024 and is not wave size dependent. Reviewers: arsenm, sameerds, rampitec Reviewed By: arsenm Subscribers: qcolombet, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80783	2020-05-30 11:16:19 +09:00
Matt Arsenault	0892a96a05	AMDGPU: Optimize s_setreg_b32 to s_denorm_mode/s_round_mode This is a custom inserter because it was less work than teaching tablegen a way to indicate that it is sometimes OK to have a no side effect instruction in the output of a side effecting pattern. The asm is needed to look like a read of the mode register to prevent it from being deleted. However, there seems to be a bug where the mode register def instructions are moved across the asm sideeffect by the post-RA scheduler. Another oddity is the immediate is formatted differently between s_denorm_mode and s_round_mode.	2020-05-29 21:11:36 -04:00
Matt Arsenault	4f300d4996	AMDGPU: Add new baseline tests for setreg handling Most of these should be identical and use a common prefix, but update_llc_test_checks is failing to generate shared checks for some reason.	2020-05-29 21:00:30 -04:00
Matt Arsenault	f012c58abd	AMDGPU: Move MIMG MMO check to verifier	2020-05-29 20:58:23 -04:00
Matt Arsenault	2484109378	AMDGPU/GlobalISel: Add boilerplate for inline asm lowering Test mostly from minor adjustments to the AArch64 one.	2020-05-29 16:49:23 -04:00
Matt Arsenault	2d2627d47a	AMDGPU: Remove fp-exceptions feature This was never used, and the only thing it changed was removed in `284472be6d`. The floating point mode is also not a property of the subtarget.	2020-05-29 15:19:59 -04:00
Ehud Katz	f881c7967d	[tests] Fix AMDGPU test Fix naming issue in test due to change D80399.	2020-05-29 22:15:26 +03:00
Stanislav Mekhanoshin	a520294913	[AMDGPU] Regenrated urem/udiv global isel tests. NFC.	2020-05-29 12:08:47 -07:00
Stanislav Mekhanoshin	f6a6de288b	GlobalISel: fix CombinerHelper::matchEqualDefs() This matcher was always returning true for the different results of a same instruction. Differential Revision:	2020-05-29 09:30:02 -07:00
Jay Foad	9e0b52e2e6	[AMDGPU] Remove duplicate test cases The two "2sin" test cases were identical to the "sin_2x" test cases just above.	2020-05-29 16:36:36 +01:00
Stanislav Mekhanoshin	6c824c81a9	AMDGPU/GlobalISel: precommit extractelement test. NFC.	2020-05-28 11:46:06 -07:00
Matt Arsenault	97f3f0bab0	AMDGPU: Add intrinsic for s_setreg This will be more useful with fenv access implemented.	2020-05-28 14:26:38 -04:00
alex-t	b726d071b4	[AMDGPU] Reject moving PHI to VALU if the only VGPR input originated from move immediate Summary: PHIs result register class is set to VGPR or SGPR depending on the cross block value divergence. In some cases uniform PHI need to be converted to return VGPR to prevent the oddnumber of moves values from VGPR to SGPR and back. PHI should certainly return VGPR if it has at least one VGPR input. This change adds the exception. We don't want to convert uniform PHI to VGPRs in case the only VGPR input is a VGPR to SGPR COPY and definition od the source VGPR in this COPY is move immediate. bb.0: %0:vgpr_32 = V_MOV_B32_e32 0, implicit $exec %2:sreg_32 = ..... bb.1: %3:sreg_32 = PHI %1, %bb.3, %2, %bb.1 S_BRANCH %bb.3 bb.3: %1:sreg_32 = COPY %0 S_BRANCH %bb.2 Reviewers: rampitec Reviewed By: rampitec Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80434	2020-05-28 19:25:51 +03:00
Matt Arsenault	06019e3125	AMDGPU: Add missing test for s_denorm_mode scheduling Forgot to add this file to `1a9e0d7092`	2020-05-28 11:07:22 -04:00
Stanislav Mekhanoshin	7392bbc301	AMDGPU/GlobalISel: Fixed insert element for non-standard vectors Differential Revision: https://reviews.llvm.org/D80653	2020-05-27 16:26:22 -07:00
Matt Arsenault	5e007fe998	AMDGPU: Support non-entry block static sized allocas OpenMP emits these for some reason, so handle them. Assume these use 4096 bytes by default, with a flag to override this. Also change the related stack assumption for calls to have a flag.	2020-05-27 18:46:10 -04:00
Stanislav Mekhanoshin	8aa81aaebe	AMDGPU/GlobalISel: Fixed handling of non-standard vectors We do not have register classes for all possible vector sizes, so round it up for extract vector element. Also fixes selection of G_MERGE_VALUES when vectors are not a power of two. This has required to refactor getRegSplitParts() in way that it can handle not just power of two vectors. Ideally we would like RegSplitParts to be generated by tablegen. Differential Revision: https://reviews.llvm.org/D80457	2020-05-27 15:44:09 -07:00
Michael Liao	fa342b5c80	Enable `align <n>` to be used in the intrinsic definition. - This allow us to specify the (minimal) alignment on an intrinsic's arguments and, more importantly, the return value. Differential Revision: https://reviews.llvm.org/D80422	2020-05-27 16:38:18 -04:00
alex-t	eb1092ada3	[AMDGPU] Fix for the lost CarryOut/CarryIn register operands in S_ADD/SUB_CO_PSEUDO. Summary: This fixes the `5b898bddff` bug when the carry-in and carry-out registers became lost in lowering S_ADD/SUB_CO_PSEUDO. Reviewers: rampitec, arsenm Reviewed By: arsenm Subscribers: msearles, arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80158	2020-05-27 22:41:04 +03:00
Matt Arsenault	4b4496312e	AMDGPU: Start adding MODE register uses to instructions This is the groundwork required to implement strictfp. For now, this should be NFC for regular instructoins (many instructions just gain an extra use of a reserved register). Regalloc won't rematerialize instructions with reads of physical registers, but we were suffering from that anyway with the exec reads. Should add it for all the related FP uses (possibly with some extras). I did not add it to either the gpr index mode instructions (or every single VALU instruction) since it's a ridiculous feature already modeled as an arbitrary side effect. Also work towards marking instructions with FP exceptions. This doesn't actually set the bit yet since this would start to change codegen. It seems nofpexcept is currently not implied from the regular IR FP operations. Add it to some MIR tests where I think it might matter.	2020-05-27 14:47:00 -04:00
Matt Arsenault	07cd19efa2	AMDGPU: Fix dropping MI flags when rewriting instructions All 3 passes that change instruction encodings were dropping MI flags. This avoids scheduling regressions caused by setting mayRaiseFPExceptions on FP instructions for non-strictfp functions.	2020-05-27 13:27:06 -04:00
Matt Arsenault	833996cef1	AMDGPU: Fix backwards s_cselect_* operands The vector equivalent has backwards operands, but the scalar version does not. The passes that use these hooks aren't enabled by default, so this doesn't really change anything.	2020-05-27 09:26:09 -04:00
Matt Arsenault	ef3e831226	GlobalISel: Basic legalization for G_PTRMASK	2020-05-26 21:20:30 -04:00
Stanislav Mekhanoshin	512e806a33	[AMDGPU] Bail alloca vectorization if GEP not found Differential Revision: https://reviews.llvm.org/D80587	2020-05-26 13:59:49 -07:00
Matt Arsenault	bb10fa3a53	AMDGPU: Fix wrong null value for private address space I'm guessing this was a holdover from when 0 was an invalid stack pointer, but surprised nobody has discovered this before. Also don't allow offset folding for -1 pointers, since it looks weird to partially fold this.	2020-05-26 16:35:13 -04:00
Matt Arsenault	50d4b22ca0	AMDGPU/GlobalISel: Fix assert on 16-bit G_EXTRACT results I consider this to be a hack, since we probably should not mark any 16-bit extract as legal, and require all extracts to be done on multiples of 32. There are quite a few more battles to fight in the legalizer for sub-dword vectors, so just select this for now so we can pass OpenCL conformance without crashing. Also fix the same assert for G_INSERTs. Unlike G_EXTRACT there's not a trivial way to select this so just fail on it.	2020-05-26 12:14:08 -04:00
Matt Arsenault	8bc03d2168	GlobalISel: Merge G_PTR_MASK with llvm.ptrmask intrinsic Confusingly, these were unrelated and had different semantics. The G_PTR_MASK instruction predates the llvm.ptrmask intrinsic, but has a different format. G_PTR_MASK only allows clearing the low bits of a pointer, and only a constant number of bits. The ptrmask intrinsic allows an arbitrary mask. Replace G_PTR_MASK to match the intrinsic. Only selects the cases that look like the old instruction. More work is needed to select the general case. Also new legalization code is still needed to deal with the case where the incoming mask size does not match the pointer size, which has a specified behavior in the langref.	2020-05-26 11:48:13 -04:00
Matt Arsenault	2dd7714b8d	AMDGPU/GlobalISel: Don't select boolean phi by default This is currently missing most of the hard parts to lower correctly, so disable it for now. This fixes at least one OpenCL conformance test and allows it to pass with fallback. Hide this behind an option for now.	2020-05-26 11:01:21 -04:00
vpykhtin	92f3828dc5	[AMDGPU] Fix wait counts in the presence of 16bit subregisters Differential Revision: https://reviews.llvm.org/D80033	2020-05-26 12:19:27 +03:00
Dmitry Preobrazhensky	77aec3b4c0	[AMDGPU][MC][GFX8+] Enabled clamp for v_add_u16, v_sub_u16 and v_subrev_u16 See https://bugs.llvm.org/show_bug.cgi?id=45926 Reviewers: arsenm, rampitec, vpykhtin Differential Revision: https://reviews.llvm.org/D80430	2020-05-25 19:55:38 +03:00
Dmitry Preobrazhensky	b087b91c91	[AMDGPU][CODEGEN] Added 'A' constraint for inline assembler Summary: 'A' constraint requires an immediate int or fp constant that can be inlined in an instruction encoding. Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D78494	2020-05-25 14:23:34 +03:00
Sanjay Patel	57bb4787d7	[Pass Manager] remove EarlyCSE as clean-up for VectorCombine EarlyCSE was added with D75145, but the motivating test is not regressed by removing the extra pass now. That might be because VectorCombine altered the way it processes instructions, or it might be from (re)moving VectorCombine in the pipeline. The extra round of EarlyCSE appears to cost approximately 0.26% in compile-time as discussed in D80236, so we need some evidence to justify its inclusion here, but we do not have that (yet). I suspect that between SLP and VectorCombine, we are creating patterns that InstCombine and/or codegen are not prepared for, but we will need to reduce those examples and include them as PhaseOrdering and/or test-suite benchmarks.	2020-05-24 12:36:21 -04:00
Stanislav Mekhanoshin	62fb3fa6d9	[AMDGPU] Define 6 dword subregs This prevents autogeneration of degenerate names for these. Differential Revision: https://reviews.llvm.org/D80451	2020-05-22 13:53:29 -07:00
Sanjay Patel	024098ae53	[VectorCombine] set preserve alias analysis As noted in D80236, moving the pass in the pipeline exposed this shortcoming. Extra work to recalculate the alias results showed up as a compile-time slowdown.	2020-05-22 16:25:16 -04:00
Sanjay Patel	6438ea45e0	[VectorCombine] position pass after SLP in the optimization pipeline rather than before There are 2 known problem patterns shown in the test diffs here: vector horizontal ops (an x86 specialization) and vector reductions. SLP has greater ability to match and fold those than vector-combine, so let SLP have first chance at that. This is a quick fix while we continue to improve vector-combine and possibly canonicalize to reduction intrinsics. In the longer term, we should improve matching of these patterns because if they were created in the "bad" forms shown here, then we would miss optimizing them. I'm not sure what is happening with alias analysis on the addsub test. The old pass manager now shows an extra line for that, and we see an improvement that comes from SLP vectorizing a store. I don't know what's missing with the new pass manager to make that happen. Strangely, I can't reproduce the behavior if I compile from C++ with clang and invoke the new PM with "-fexperimental-new-pass-manager". Differential Revision: https://reviews.llvm.org/D80236	2020-05-22 12:22:44 -04:00
Matt Arsenault	66fe60220c	AMDGPU/GlobalISel: Fix masked control flow with fallthrough blocks Unlike SelectionDAGBuilder, IRTranslator omits the unconditional branch in fallthrough cases. Confusingly, the control flow pseudos function in the opposite way the intrinsics are used, and the branch targets always need to be swapped. We're inverting the target blocks, so we need to figure out the old fallthrough block and insert a branch to the original unconditional branch target.	2020-05-22 10:31:44 -04:00
Jon Roelofs	5a8db275f8	Revert "[llvm][test] Add COM: directives before colon-less non-CHECKs in comments. NFC" This reverts commit `183d6af081`. Revert pending further consensus building: https://reviews.llvm.org/D79963#2050521	2020-05-22 05:36:15 -06:00
Dmitry Preobrazhensky	933ebc4078	[AMDGPU][MC][GFX8+] Enabled clamp for v_mul_i32_i24_e64 and v_mul_u32_u24_e64 See bug 45925: https://bugs.llvm.org/show_bug.cgi?id=45925 Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D80287	2020-05-22 14:11:31 +03:00
Tim Renouf	d13a508820	[AMDGPU] Fixed incorrect PAL metadata register naming This only affects assembly and -filetype=asm codegen of PAL metadata. Differential Revision: https://reviews.llvm.org/D78860 Change-Id: I7b822e1917bf7b403486820d31afc483be207652	2020-05-21 22:13:19 +01:00
Stanislav Mekhanoshin	689e616ed0	[AMDGPU] Promote alloca to vector in opt Promote alloca to vector before SROA and loop unroll. If we manage to eliminate allocas before unroll we may choose to unroll less. Differential Revision: https://reviews.llvm.org/D80386	2020-05-21 13:49:51 -07:00
Stanislav Mekhanoshin	71bbe5d799	[AMDGPU] Added opt pipeline test. NFC.	2020-05-21 11:58:35 -07:00
Stanislav Mekhanoshin	1dfd1b3e4b	[AMDGPU] Tune threshold for cmp/select vector lowering It was set in total vector size while the idea was to limit a number of instructions. Now it started to work with doubles and thresholds needs to be updated. Differential Revision: https://reviews.llvm.org/D80322	2020-05-21 08:59:35 -07:00
Jon Roelofs	183d6af081	[llvm][test] Add COM: directives before colon-less non-CHECKs in comments. NFC Differential Revision: https://reviews.llvm.org/D79963	2020-05-21 09:29:27 -06:00
Stanislav Mekhanoshin	4eecf17164	[AMDGPU] Always expand ext/insertelement with divergent idx Even though series of cmd/cndmask can produce quite a lot of code that is still better than a loop. In case of doubles we would even produce two loops. Differential Revision: https://reviews.llvm.org/D80032	2020-05-20 15:51:29 -07:00
Matt Arsenault	e8f6b0e583	AMDGPU/GlobalISel: Fix splitting 64-bit extensions This was replicating the low bits into the high bits for G_ZEXT, rather than using 0.	2020-05-20 11:13:32 -04:00
Jay Foad	3c84353804	[AMDGPU] Add the test from D49097.	2020-05-20 14:34:51 +01:00
Stanislav Mekhanoshin	677929e352	[AMDGPU] Process V_MOV_B32_indirect in SET_GPR_IDX optimization Differential Revision: https://reviews.llvm.org/D80256	2020-05-19 21:37:14 -07:00
Matt Arsenault	77f05e5b53	AMDGPU/GlobalISel: Fix bug in test register bank The intent wasn't cases with illegal VGPR to SGPR copies.	2020-05-19 22:52:59 -04:00
Matt Arsenault	21d2884a9c	AMDGPU: Annotate functions that have stack objects Relying on any MachineFunction state in the MachineFunctionInfo constructor is hazardous, because the construction time is unclear and determined by the first use. The function may be only partially constructed, which is part of why we have many of these hacky string attributes to track what we need for ABI lowering. For SelectionDAG, all stack objects are created up-front before calling convention lowering so stack objects are visible at construction time. For GlobalISel, none of the IR function has been visited yet and the allocas haven't been added to the MachineFrameInfo yet. This should fix failing to set flat_scratch_init in GlobalISel when needed. This pass really needs to be turned into some kind of analysis, but I haven't found a nice way use one here.	2020-05-19 18:51:00 -04:00
Matt Arsenault	074b802654	AMDGPU: Fix DAG divergence for implicit function arguments This should be directly implied from the register class, and there's no need to special case live ins here. This was getting the wrong answer for the queue ptr argument in callable functions, since it's not an explicit IR argument and is always uniform. Fixes not using scalar loads for the aperture in addrspacecast lowering, and any other places that use implicit SGPR arguments.	2020-05-19 18:11:34 -04:00
Matt Arsenault	a7759d1785	GlobalISel: Fix IRTranslator for constantexpr selects This was assuming a select is always an instruction, which is not true.	2020-05-19 09:52:48 -04:00
Carl Ritson	eeece6dbe6	[AMDGPU] Add more VMEM to SALU WAR hazard tests. NFC	2020-05-19 19:52:13 +09:00
Stanislav Mekhanoshin	50f3bb1329	[AMDGPU] Fixed selection error for 64 bit extract_subvector Differential Revision: https://reviews.llvm.org/D80155	2020-05-18 14:17:59 -07:00
Matt Arsenault	3e315697ac	DAG: Use correct pointer size for llvm.ptrmask This was ignoring the address space, and would assert on address spaces with a different size from the default.	2020-05-18 16:46:11 -04:00
Matt Arsenault	b27a538dda	AMDGPU: Fix illegally constant folding from V_MOV_B32_sdwa This was assumed to be a simple move, and interpreting the immediate modifier operand as a materialized immediate. Apparently the SDWA pass never produces these, but GlobalISel does emit these for some vector shuffles.	2020-05-18 15:34:33 -04:00
Matt Arsenault	bf527a1dc4	AMDGPU/GlobalISel: Fix f64 G_FDIV lowering This was using an integer multiply instead of FP.	2020-05-18 15:14:08 -04:00
Matt Arsenault	4c70074e54	AMDGPU/GlobalISel: Fix splitting wide VALU, non-vector loads	2020-05-18 12:06:53 -04:00
Christudasan Devadasan	7c4e711ef8	[AMDGPU] Enable base pointer. When the callee requires a dynamic stack realignment, it is not possible to correcty access the incoming stack arguments using the stack pointer. We reserve a base pointer in such cases to access the function arguments inside the callee. The base pointer will hold the incoming stack pointer value before any kind of delta added to it. Reviewed By: arsenm, scott.linder Differential Revision: https://reviews.llvm.org/D78811	2020-05-17 16:13:55 +05:30
Eli Friedman	4f04db4b54	AllocaInst should store Align instead of MaybeAlign. Along the lines of D77454 and D79968. Unlike loads and stores, the default alignment is getPrefTypeAlign, to match the existing handling in various places, including SelectionDAG and InstCombine. Differential Revision: https://reviews.llvm.org/D80044	2020-05-16 14:53:16 -07:00
Carl Ritson	a065a01bf7	[AMDGPU] Allow use of StackPtrOffsetReg when building spills Summary: When spilling in the entry function we should be able to borrow StackPtrOffsetReg as a last resort. This restores behaviour removed in D75138, and fixes failures when shaders use all SGPRs, VGPRs and spill in the entry function. Reviewers: scott.linder, arsenm, tpr Reviewed By: scott.linder, arsenm Subscribers: qcolombet, foad, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D79776	2020-05-16 11:54:43 +09:00
Jay Foad	10c10f2419	[AMDGPU] Fix assertion failure in SIInsertHardClauses This new pass failed an assertion whenever there were s_nops after the end of clause. Differential Revision: https://reviews.llvm.org/D80007	2020-05-15 15:49:52 +01:00
Stanislav Mekhanoshin	7d16a22eb0	[AMDGPU] Peephole adjacent equivalent S_SET_GPR_IDX_ON Differential Revision: https://reviews.llvm.org/D79907	2020-05-14 15:44:33 -07:00
Stanislav Mekhanoshin	9d4cf5bd42	[AMDGPU] Make v16f64/v16i64 legal This allows indirect VGPR addressing to work. Differential Revision: https://reviews.llvm.org/D79960	2020-05-14 14:46:55 -07:00
Eli Friedman	4532a50899	Infer alignment of unmarked loads in IR/bitcode parsing. For IR generated by a compiler, this is really simple: you just take the datalayout from the beginning of the file, and apply it to all the IR later in the file. For optimization testcases that don't care about the datalayout, this is also really simple: we just use the default datalayout. The complexity here comes from the fact that some LLVM tools allow overriding the datalayout: some tools have an explicit flag for this, some tools will infer a datalayout based on the code generation target. Supporting this properly required plumbing through a bunch of new machinery: we want to allow overriding the datalayout after the datalayout is parsed from the file, but before we use any information from it. Therefore, IR/bitcode parsing now has a callback to allow tools to compute the datalayout at the appropriate time. Not sure if I covered all the LLVM tools that want to use the callback. (clang? lli? Misc IR manipulation tools like llvm-link?). But this is at least enough for all the LLVM regression tests, and IR without a datalayout is not something frontends should generate. This change had some sort of weird effects for certain CodeGen regression tests: if the datalayout is overridden with a datalayout with a different program or stack address space, we now parse IR based on the overridden datalayout, instead of the one written in the file (or the default one, if none is specified). This broke a few AVR tests, and one AMDGPU test. Outside the CodeGen tests I mentioned, the test changes are all just fixing CHECK lines and moving around datalayout lines in weird places. Differential Revision: https://reviews.llvm.org/D78403	2020-05-14 13:03:50 -07:00
Jay Foad	42a5560503	[AMDGPU] New SIInsertHardClauses pass Enable clausing of memory loads on gfx10 by adding a new pass to insert the s_clause instructions that mark the start of each hard clause. Differential Revision: https://reviews.llvm.org/D79792	2020-05-14 18:54:49 +01:00
Stanislav Mekhanoshin	591b029f40	[AMDGPU] Optimized indirect multi-VGPR addressing SelectMOVRELOffset prevents peeling of a constant from an index if final base could be negative. isBaseWithConstantOffset() succeeds if a value is an "add" or "or" operator. In case of "or" it shall be an add-like "or" which never changes a sign of the sum given a non-negative offset. I.e. we can safely allow peeling if operator is an "or". Differential Revision: https://reviews.llvm.org/D79898	2020-05-13 14:53:16 -07:00
Carl Ritson	195de442da	[AMDGPU] Strengthen export cluster ordering Summary: When removing barrier edges on exports then dependencies need to be propagated. Reviewers: foad Reviewed By: foad Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D79855	2020-05-13 23:07:37 +09:00
Stanislav Mekhanoshin	71ed66d97f	[AMDGPU] Make v4i64/v4f64/v8i64/v8f64 legal We can produce such vectors in the Promote Alloca pass, but we are unable to use movrel to operate it and lower via scratch. Making it legal makes SI_INDIRECT patterns work. There is more work to do in subsequent changes: 1. We initialize m0 twice to access each dword. It shall be possible to only do it once and increment base register number instead. 2. We also need v16i64/v16f64 but these first need to be added to tablegen. Differential Revision: https://reviews.llvm.org/D79808	2020-05-12 16:05:12 -07:00
Jay Foad	989be65b11	[GlobalISel][IRTranslator] Fix <1 x Ty> handling in ConstantExprs Summary: ConstantExprs involving operations on <1 x Ty> could translate into MIR that failed to verify with: * Bad machine code: Reading virtual register without a def * The problem was that translate(const Constant &C, Register Reg) had recursive calls that passed the same Reg in for the translation of a subexpression, but without updating VMap for the subexpression first as translate(const Constant &C, Register Reg) expects. Fix this by using the same translateCopy helper function that we use for translating Instructions. In some cases this causes extra G_COPY MIR instructions to be generated. Fixes https://bugs.llvm.org/show_bug.cgi?id=45576 Reviewers: arsenm, volkan, t.p.northover, aditya_nandakumar Subscribers: jvesely, wdng, nhaehnle, rovka, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78378	2020-05-12 16:51:03 +01:00
Carl Ritson	58f1417ebc	[AMDGPU] Order pos exports before param exports Summary: Modify export clustering DAG mutation to move position exports before other exports types. Reviewers: foad, arsenm, rampitec, nhaehnle Reviewed By: foad Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D79670	2020-05-12 23:02:23 +09:00
Austin Kerbow	1429e4c399	[AMDGPU][GlobalISel] Revise handling of wide loads in RegBankSelect When splitting loads in RegBankSelect G_EXTRACT_VECTOR_ELT were being added which could not be selected. Since invoking the legalizer will generate instructions that split and combine wide loads, we can remove the redundant repair instructions which are added by RegBankSelect. Differential Revision: https://reviews.llvm.org/D75547	2020-05-11 18:10:16 -07:00
Eli Friedman	c9c930ae67	[SelectionDAG] Don't promote the alignment of allocas beyond the stack alignment. allocas in LLVM IR have a specified alignment. When that alignment is specified, the alloca has at least that alignment at runtime. If the specified type of the alloca has a higher preferred alignment, SelectionDAG currently ignores that specified alignment, and increases the alignment. It does this even if it would trigger stack realignment. I don't think this makes sense, so this patch changes that. I was looking into this for SVE in particular: for SVE, overaligning vscale'ed types is extra expensive because it requires realigning the stack multiple times, or using dynamic allocation. (This currently isn't implemented.) I updated the expected assembly for a couple tests; in particular, for arg-copy-elide.ll, the optimization in question does not increase the alignment the way SelectionDAG normally would. For the rest, I just increased the specified alignment on the allocas to match what SelectionDAG was inferring. Differential Revision: https://reviews.llvm.org/D79532	2020-05-11 17:39:00 -07:00
Saiyedul Islam	117e5609e9	[AMDGPU] Reserving VGPR for future SGPR Spill Summary: One VGPR register is allocated to handle a future spill of SGPR if "--amdgpu-reserve-vgpr-for-sgpr-spill" option is used Reviewers: arsenm, rampitec, msearles, cdevadas Reviewed By: arsenm Subscribers: madhur13490, qcolombet, kerbowa, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #amdgpu, #llvm Differential Revision: https://reviews.llvm.org/D70379	2020-05-12 00:33:00 +00:00
Austin Kerbow	09253b608a	[AMDGPU] Allow spilling FP to memory If there are no available lanes in a reserved VGPR, no free SGPR, and no unused CSR VGPR when trying to save the FP it needs to be spilled to memory as a last resort. This can be done in the prolog/epilog if we manually add the spill and manage exec. Differential Revision: https://reviews.llvm.org/D79610	2020-05-11 16:42:59 -07:00
Stanislav Mekhanoshin	310d32cb80	[AMDGPU] Fix promote alloca which is already vector Just do not touch loads and stores which are already vector. Previously pass was just unable to see these loads and stores because these were hidden bitcasts. Differential Revision: https://reviews.llvm.org/D79738	2020-05-11 14:52:31 -07:00
Matt Arsenault	856dda3918	AMDGPU/GlobalISel: Remove -global-isel-abort=0 from tests	2020-05-10 17:19:47 -04:00
Matt Arsenault	3af85fa8f0	GlobalISel: Handle more cases in lowerUnmergeValues Handle scalar sources, as well as vectors.	2020-05-09 19:33:32 -04:00
Matt Arsenault	69999605ee	GlobalISel: Move code into lowering for G_MERGE_VALUES Currently this code exists in widenScalar for G_MERGE_VALUE sources. I'm not sure if the existing expansion in widenScalar should be removed or not. The widenScalar variant tries to extend to the requested size, but this just uses the original bitwidth.	2020-05-09 16:39:37 -04:00
Matt Arsenault	ee1a69824d	GlobalISel: Combine G_UNMERGE_VALUES with G_TRUNC G_BITCAST can be lowered with a pair of G_UNMERGE_VALUES and G_MERGE_VALUES with different types, but G_UNMERGE_VALUES of a vector can also be implemented with a bitcast to a scalar, which introduces the possibility for infinite loops. Try to eliminate an illegal source register type in the artifact combiner to avoid this from happening. Avoids infinite looping in the legalizer in a future patch which allows lowering G_UNMERGE_VALUES of a vector source with a G_BITCAST.	2020-05-09 16:14:32 -04:00
Matt Arsenault	beda9d04c2	AMDGPU: Skip GetUnderlyingObject check in pointsToConstantMemory Check the address space first before searching for the object definition to save compile time. As an added bonus, this will now treat casts to constant addrspace as constant. We also seemed to be missing targeted tests for this, so add a few missing other cases too.	2020-05-09 16:00:08 -04:00
Stanislav Mekhanoshin	db7dea2b6f	[AMDGPU] Vectorize alloca thru bitcast This is mostly useful if alloca element type is not integer and then casted to an integer for load or store. We now can vectorize an [i32] alloca but cannot do so for [float]. There also a separate patch needed to properly lower 64 bit types after they vectorized. At the moment these are lowered via scratch anyway. Differential Revision: https://reviews.llvm.org/D79641	2020-05-08 15:11:38 -07:00
Simon Pilgrim	70293ba26f	[DAG] SimplifyMultipleUseDemandedBits - remove superfluous bitcasts If the SimplifyMultipleUseDemandedBits calls BITCASTs that peek through back to the original type then we can remove the BITCASTs entirely. Differential Revision: https://reviews.llvm.org/D79572	2020-05-08 19:04:49 +01:00
Matt Arsenault	d8d62e358e	AMDGPU/GlobalISel: Regenerate checks Avoids extra diffs from the rename of G_GEP to G_PTR_ADD in the generated check variables in a future patch.	2020-05-08 10:46:00 -04:00
Matt Arsenault	fda0c8df28	AMDGPU: Lower addrspacecast to 32-bit constant Somehow this was missing from the DAG path, but not global isel.	2020-05-08 10:46:00 -04:00
Matt Arsenault	6f17b3e3a7	AMDGPU: Fix broken tests for HSA metadata These were testing byval private kernel arguments, which doesn't make any sense and has never been used. There didn't seem to be any tests for real value struct arguments, which are.	2020-05-07 15:27:12 -04:00
Jay Foad	17e13da29d	[AMDGPU] Re-auto-generate test checks	2020-05-07 11:08:11 +01:00
Carl Ritson	e3ffe7269b	[AMDGPU] Cluster shader exports Summary: Add DAG scheduling mutation to cluster export instructions. This avoids unnecessary waitcnts being added when computation ends up interspersed with exports. Reviewers: foad, arsenm, rampitec, nhaehnle Reviewed By: foad Subscribers: kzhuravl, jvesely, wdng, mgorny, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D79481	2020-05-07 19:05:38 +09:00
Carl Ritson	0d4d86cbd1	[AMDGPU] Precommit test for D79481. NFC Test shows unnecessary s_waitcnt between shader exports.	2020-05-07 19:01:51 +09:00
Michael Liao	4ee5a04187	[amdgpu] Fix check of VCC. Summary: - Need to include checking on the new 16-bit subregs. Reviewers: rampitec Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D79498	2020-05-06 14:16:37 -04:00
Stanislav Mekhanoshin	54d6dfe996	[AMDGPU] Drop 16 bit subreg suffixes on print We do not want to break asm syntax. These suffixes are quite useful for debugging, so add an option to print them. Right now it is NFC. Differential Revision: https://reviews.llvm.org/D79435	2020-05-06 08:14:10 -07:00
Ram Nalamothu	f7060f4f88	For PAL, make sure Scratch Buffer Descriptor do not clobber GIT pointer Since SRSRC has alignment requirements, first find non GIT pointer clobbered registers for SRSRC and then if those registers clobber preloaded Scratch Wave Offset register, copy the Scratch Wave Offset register to a free SGPR.	2020-05-06 10:31:15 -04:00
Matt Arsenault	074c371a48	AMDGPU: Insert kernarg code after allocas This produces more normal looking IR by keeping all the allocas clustered at the start of the block.	2020-05-06 10:19:56 -04:00
Dmitry Preobrazhensky	5998baccb9	[AMDGPU][MC][GFX9+] Enabled 21-bit signed offsets for SMEM instructions Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D79288	2020-05-06 14:13:10 +03:00
Christudasan Devadasan	b8a616ec59	[AMDGPU] Fixed the test by adding the triple.	2020-05-06 00:14:01 +05:30
Christudasan Devadasan	375cec4b6c	[AMDGPU] Introduce more scratch registers in the ABI. The AMDGPU target has a convention that defined all VGPRs (execept the initial 32 argument registers) as callee-saved. This convention is not efficient always, esp. when the callee requiring more registers, ended up emitting a large number of spills, even though its caller requires only a few. This patch revises the ABI by introducing more scratch registers that a callee can freely use. The 256 vgpr registers now become: 32 argument registers 112 scratch registers and 112 callee saved registers. The scratch registers and the CSRs are intermixed at regular intervals (a split boundary of 8) to obtain a better occupancy. Reviewers: arsenm, t-tye, rampitec, b-sumner, mjbedy, tpr Reviewed By: arsenm, t-tye Differential Revision: https://reviews.llvm.org/D76356	2020-05-05 23:02:58 +05:30
Stanislav Mekhanoshin	9ef166e657	[AMDGPU] Fix FoldImmediate for 16 bit operand Differential Revision: https://reviews.llvm.org/D79362	2020-05-05 10:19:14 -07:00
Jay Foad	3d76824b7f	[AMDGPU] Better support for VMEM soft clauses in GCNHazardRecognizer VMEM soft clauses only contain VMEM and FLAT instructions. Teaching GCNHazardRecognizer::checkSoftClauseHazards that other kinds of instructions will naturally break the clause means there are far fewer cases where it has to insert an s_nop instruction to forcibly break the clause. Differential Revision: https://reviews.llvm.org/D79353	2020-05-05 15:49:09 +01:00
Sebastian Neubauer	1de4e56933	[AMDGPU] Don't mark the .note section as ALLOC Marking a section as ALLOC tells the ELF loader to load the section into memory. As we do not want to load the notes into VRAM, the flag should not be there. On AMDHSA, .note is still marked as ALLOC, apparently this is currently needed for OpenCL (see https://reviews.llvm.org/D74995). Differential Revision: https://reviews.llvm.org/D76278	2020-05-05 14:21:45 +02:00
Stanislav Mekhanoshin	c85eda74b8	[AMDGPU] fix copies between 32 and 16 bit This a hack to fix illegal 32 to 16 bit copies. The problem is when we make 16 bit subregs legal it creates a huge amount of failures which can only be resolved at once without a temporary hack like this. The next step is to change operands, instruction definitions and patterns until this hack is not needed. Differential Revision: https://reviews.llvm.org/D79119	2020-05-04 08:54:22 -07:00
alex-t	5b898bddff	[AMDGPU] Enable carry out ADD/SUB operations divergence driven instruction selection. Summary: This change enables all kind of carry out ISD opcodes to be selected according to the node divergence. Reviewers: rampitec, arsenm, vpykhtin Reviewed By: rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78091	2020-05-04 16:42:25 +03:00
LemonBoy	6d103ca855	[SelectionDAG] Unify scalarizeVectorLoad and VectorLegalizer::ExpandLoad The two code paths have the same goal, legalizing a load of a non-byte-sized vector by loading the "flattened" representation in memory, slicing off each single element and then building a vector out of those pieces. The technique employed by `ExpandLoad` is slightly more convoluted and produces slightly better codegen on ARM, AMDGPU and x86 but suffers from some bugs (D78480) and is wrong for BE machines. Differential Revision: https://reviews.llvm.org/D79096	2020-05-02 15:18:10 -07:00
Jay Foad	5f7ea85e78	[AMDGPU] Remove unnecessary s_waitcnt between VMEM loads VMEM loads of the same type (sampler vs no sampler) are guaranteed to write their result registers in order, so there is no need for an s_waitcnt even if they write to overlapping vgprs. Differential Revision: https://reviews.llvm.org/D79176	2020-05-01 10:10:23 +01:00
Stanislav Mekhanoshin	26777ad7a0	[AMDGPU] Adapt GCNRegBankReassign for 16 bit subregs It allows it not to crash and analyze 16 bit subregs if those appear in the instructions. At the same time it does not attempt to reassign these. It still can correctly identify register banks to let larger registers to be reassigned. More work will be needed here when real instructions will use these registers and more tests as well. Differential Revision: https://reviews.llvm.org/D78772	2020-04-28 16:16:04 -07:00
Stanislav Mekhanoshin	8a30460697	[AMDGPU] Define AGPR subregs These are only needed as VGPR counterpart. Differential Revision: https://reviews.llvm.org/D78597	2020-04-28 15:30:43 -07:00
Stanislav Mekhanoshin	46a75436f8	[AMDGPU] Define special SGPR subregs These are used in SReg_32 and when we start to use SGPR_LO16 there will be compaints that not all registers in RC support all subreg indexes. For now it is NFC. Unused regunits are reserved so that verifier does not complain about missing phys reg live-ins. Differential Revision: https://reviews.llvm.org/D78591	2020-04-28 14:57:46 -07:00
Stanislav Mekhanoshin	395d93358e	Revert "[AMDGPU] Define special SGPR subregs" This reverts commit `1baaa080e0`.	2020-04-28 13:53:15 -07:00
Stanislav Mekhanoshin	1baaa080e0	[AMDGPU] Define special SGPR subregs These are used in SReg_32 and when we start to use SGPR_LO16 there will be compaints that not all registers in RC support all subreg indexes. For now it is NFC. Unused regunits are reserved so that verifier does not complain about missing phys reg live-ins. Differential Revision: https://reviews.llvm.org/D78591	2020-04-28 13:34:24 -07:00
Matt Arsenault	4cef9812eb	AMDGPU: Add some missing atomics tests We had no FP atomic load/store coverage.	2020-04-26 15:09:35 -04:00
Matt Arsenault	35e6a9c839	AMDGPU: Break read2/write2 search range on a memory fence This is to fix performance regressions introduced by `86c944d790`. The old search would collect all potentially mergeable instructions in the entire block. In this case, the same address is written in multiple places in the block on the other side of a fence. When sorted by offset, the two unmergeable, identical addresses would be next to each other and the merge would give up. Break the search space when we encounter an instruction we won't be able to merge across. This will keep the identical addresses in different merge attempts. This may also improve compile time by reducing the merge list size.	2020-04-24 15:53:30 -04:00
Piotr Sobczak	7631af3af2	[AMDGPU] Skip generating cache invalidating instructions on AMDPAL Summary: Frontend guarantees that coherent accesses have corresponding cache policy bits set (glc, dlc). Therefore there is no need for extra instructions that invalidate cache. Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78800	2020-04-24 13:53:44 +02:00
Christudasan Devadasan	207cd5f68f	[AMDGPU] Add the SGPR used for FP copy to block livein lists. The temporary register used for FP copy should be live throughout the function.	2020-04-24 11:47:38 +05:30
Matt Arsenault	89c8c80bd5	AMDGPU: Change pre-gfx9 implementation of fcanonicalize to mul If f32 denormals were enabled pre-gfx9, we would still try to implement this with v_max_f32. Pre-gfx9, these instructions ignored the denormal mode and did not flush. Switch to the multiply form for f32 as a workaround which should always work in any case. This fixes conformance failures when the library implementation of fmin/fmax were accidentally not inlined, forcing the assumption of no flushing on targets where denormals are not enabled by default. This is a workaround, since really we should not be mixing code with different FP mode expectations, but prefer the lowering that will work in any mode. Now this will always use max to implement canonicalize on gfx9+. This is only really beneficial for f64. For f32/f16 it's a neutral choice (and worse in terms of code size in 1 case), but possibly worse for the compiler since it does add an extra register use operand. Leave this change for later.	2020-04-23 15:24:13 -04:00
Matt Arsenault	5fe3f06596	AMDGPU/GlobalISel: Add new baseline checks for canonicalize	2020-04-23 15:04:32 -04:00
Jay Foad	0337017a9f	[AMDGPU] Use SGPR instead of SReg classes `12994a70cf` did this for 128-bit classes: SGPR_128 only includes the real allocatable SGPRs, and SReg_128 adds the additional non-allocatable TTMP registers. There's no point in allocating SReg_128 vregs. This shrinks the size of the classes regalloc needs to consider, which is usually good. This patch extends it to all classes > 64 bits, for consistency. Differential Revision: https://reviews.llvm.org/D78622	2020-04-23 11:45:22 +01:00
Jay Foad	dbdffe3ee9	[AMDGPU] Add 192-bit register classes Differential Revision: https://reviews.llvm.org/D78312	2020-04-22 13:10:37 +01:00
Piotr Sobczak	c48ceaf37b	Revert "[AMDGPU] Set the CostPerUse value for vgpr registers." This reverts commit `728b878de6`. D76417 has caused vgpr count to go up significantly in real-world graphics content.	2020-04-20 22:47:31 +02:00
Stanislav Mekhanoshin	992fbce4e9	[AMDGPU] copyPhysReg() for 16 bit SGPR subregs Differential Revision: https://reviews.llvm.org/D78255	2020-04-17 11:59:39 -07:00
Stanislav Mekhanoshin	fde2aefa22	[AMDGPU] Use SDWA for 16 bit subreg copy This simplifies the logic and allows to use it on GFX8. Differential Revision: https://reviews.llvm.org/D78150	2020-04-17 11:45:44 -07:00
Dominik Montada	55e3a7c6b2	[GlobalISel][AMDGPU] add legalization for G_FREEZE Summary: Copy the legalization rules from SelectionDAG: -widenScalar using anyext -narrowScalar using intermediate merges -scalarize/fewerElements using unmerge -moreElements using G_IMPLICIT_DEF and insert Add G_FREEZE legalization actions to AMDGPULegalizerInfo. Use the same legalization actions as G_IMPLICIT_DEF. Depends on D77795. Reviewers: dsanders, arsenm, aqjune, aditya_nandakumar, t.p.northover, lebedev.ri, paquette, aemerson Reviewed By: arsenm Subscribers: kzhuravl, yaxunl, dstuttard, tpr, t-tye, jvesely, nhaehnle, kerbowa, wdng, rovka, hiraditya, volkan, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78092	2020-04-17 16:44:46 +02:00
Stanislav Mekhanoshin	2e94a64b57	[AMDGPU] Define 16 bit SGPR subregs These are needed as a counterpart for VGPR subregs even though there are no scalar instructions which can operate 16 bit values. When we are materializing a constant that is done into an SGPR and that SGPR may/will be copied into a 16 bit VGPR subreg. Such copy is illegal. There are also similar problems if a source operand of a 16 bit VALU instruction is an SGPR. In addition we need to get a register with a lo16 subregister of an SGPR RC during selection and this fails as well. All of that makes me believe we need these subregisters as a syntactic glue. Differential Revision: https://reviews.llvm.org/D78250	2020-04-16 10:31:39 -07:00
Konstantin Schwarz	1a3e89aa2b	[MIR] Add comments to INLINEASM immediate flag MachineOperands Summary: The INLINEASM MIR instructions use immediate operands to encode the values of some operands. The MachineInstr pretty printer function already handles those operands and prints human readable annotations instead of the immediates. This patch adds similar annotations to the output of the MIRPrinter, however uses the new MIROperandComment feature. Reviewers: SjoerdMeijer, arsenm, efriedma Reviewed By: arsenm Subscribers: qcolombet, sdardis, jvesely, wdng, nhaehnle, hiraditya, jrtc27, atanasyan, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78088	2020-04-16 13:46:14 +02:00
Dominik Montada	e5d666d768	Revert "Revert "[GlobalISel] Fix invalid combine of unmerge(merge) with intermediate cast"" This reverts commit `1265899c5f`.	2020-04-16 09:30:34 +02:00
Stanislav Mekhanoshin	8a9d48b46d	[AMDGPU] Fixed lane mask in test. NFC.	2020-04-15 15:26:53 -07:00
Amara Emerson	c22cb5bd31	[GlobalISel] Enable artifact combiner to combine starting from a G_MERGE_VALUES. We generally only combine starting from users to defs in the artifact combiner, but this doesn't catch cases where at the point of combining a G_UNMERGE we don't yet have the opposite G_MERGE on input yet since we haven't legalized that far. This change adds the users of a G_MERGE to the artifact combiner worklist if one of the uses is a G_UNMERGE or G_TRUNC. Differential Revision: https://reviews.llvm.org/D77931	2020-04-15 10:34:13 -07:00
Dominik Montada	1265899c5f	Revert "[GlobalISel] Fix invalid combine of unmerge(merge) with intermediate cast" This reverts commit `bddac41b9f`.	2020-04-15 18:47:39 +02:00
Dominik Montada	bddac41b9f	[GlobalISel] Fix invalid combine of unmerge(merge) with intermediate cast Summary: The combine for unmerge(cast(merge)) is only valid for vectors, but was missing a corresponding check. Add a check that the operands are vectors to avoid an invalid combine. Without this check, the combiner would emit incorrect code for scalars and pointers because the artifact cast (trunc/ext) only affects bits at the end of the type, while this combine assumes that the casted bits appear between meaningful bits. This also uncovered a segmentation fault in the AMDGPU InstructionSelector. The tests triggering this bug have been moved to their own file and a check for the segmentation fault has been added. Reviewers: arsenm, dsanders, aemerson, paquette, aditya_nandakumar Reviewed By: arsenm Subscribers: tpr, jvesely, wdng, nhaehnle, rovka, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78191	2020-04-15 17:19:14 +02:00
Matt Arsenault	ef2cb8db34	AMDGPU/GlobalISel: Add some artifact combiner tests	2020-04-15 09:03:07 -04:00
Matt Arsenault	cc149172da	AMDGPU/GlobalISel: Fix selection of scalar f64 G_FABS This wasn't covered by existing tablegen patterns, but also suffers the same issues as G_FNEG. Workaround them by manually selecting, like G_FNEG.	2020-04-14 22:05:22 -04:00
Matt Arsenault	cb5dc3765b	TableGen/GlobalISel: Fix constraining REG_SEQUENCE operands This was hitting the default instruction constraint code which uses the register classes in the instruction def, which REG_SEQUENCE does not have. Fixes not constraining the register class for AMDGPU fneg/fabs patterns, which would fail when the use was another generic, unconstrained instruction. Another oddity I noticed is that the temporary registers are created with an unnecessary, but incorrect 16-bit LLT but this shouldn't matter. I'm also still unclear why root and sub-instructions have to be handled differently.	2020-04-14 22:05:22 -04:00
Eli Friedman	2876b3eef3	[SelectionDAG] Always preserve offset in MachinePointerInfo Previously, getWithOffset() would drop the offset if the base was null. Because of this, MachineMemOperand would return the wrong result from getAlign() in these cases. MachineMemOperand stores the alignment of the pointer without the offset. A bunch of MIR tests changed because we print the offset now. Split off from D77687. Differential Revision: https://reviews.llvm.org/D78049	2020-04-14 15:29:41 -07:00
Matt Arsenault	f48fe2c36e	GlobalISel: Fix casted unmerge of G_CONCAT_VECTORS This was assuming a scalarizing unmerge, and would fail assert if the unmerge was to smaller vector types.	2020-04-13 22:03:05 -04:00
Matt Arsenault	0ba40d4ccf	AMDGPU/GlobalISel: Combines for V_CVT_F32_UBYTE[0-3] Ports the existing DAG combines, minus the simplify demanded bits which seems to have no equivalent now. Without these, this isn't particularly helpful in most of the IR sample cases.	2020-04-13 19:18:19 -04:00
Austin Kerbow	a69b3e010c	[AMDGPU][GlobalISel] Fix div_scale in FDIV lowering Differential Revision: https://reviews.llvm.org/D78004	2020-04-13 15:54:49 -07:00
Eli Friedman	89e0662dee	Make IRBuilder automatically set alignment on load/store/alloca. This is equivalent in terms of LLVM IR semantics, but we want to transition away from using MaybeAlign to represent the alignment of these instructions. Differential Revision: https://reviews.llvm.org/D77984	2020-04-13 13:43:14 -07:00
Matt Arsenault	e6605a209c	DAG: Fix wrong legality check for ISD::FMAD Since `1725f28841`, this should check isFMADLegalForFAddFSub rather than the the plain isOperationLegal. This would assert in a subset of cases due to an oddity in how FMAD is selected. We will allow FMA formation pre-legalize, but not FMAD even in cases where it would be valid. The current hook requires passing in the root fadd/fsub. However, in this distributed case, this would be far more complicated to pass in the relevant operand. AMDGPU doesn't get any value from the node, and only needs the type and is the only implementor, so I'm not sure why we have this complexity. Just rename and expand the assert to avoid the more complicated checks spread through the distribution logic.	2020-04-13 10:25:39 -07:00
Jon Roelofs	0b0bb1969f	[llvm] Fix yet more missing FileCheck colons	2020-04-13 10:49:19 -06:00
Jonathan Roelofs	17bc995388	[llvm] Fix more missing FileCheck directive colons	2020-04-13 10:16:29 -06:00
Austin Kerbow	eab9a4f119	[AMDGPU] Don't assert on partial exec copy After Machine CSE and coalescing we can end up with copies of exec to subregister SGPRs. Differential Revision: https://reviews.llvm.org/D77992	2020-04-12 21:14:36 -07:00
Matt Arsenault	96819011ca	AMDGPU/GlobalISel: Fix RegBankSelect for v2s16 shifts These need to be promoted and scalarized for the SALU.	2020-04-11 20:55:33 -04:00

1 2 3 4 5 ...

3651 Commits