llvm-project

Commit Graph

Author	SHA1	Message	Date
Nicolai Hähnle	a80291ce10	Revert "[AMDGPU] Invert the handling of skip insertion." This reverts commit `0dc6c249bf`. The commit is reported to cause a regression in piglit/bin/glsl-vs-loop for Mesa.	2020-01-21 09:17:25 +01:00
Matt Arsenault	c72aa27f91	AMDDGPU/GlobalISel: Fix RegBankSelect for llvm.amdgcn.ps.live	2020-01-20 23:21:53 -05:00
Matt Arsenault	385fb337de	AMDGPU: Generate test checks These weren't much different than copied output anyway.	2020-01-20 20:03:45 -05:00
Matt Arsenault	e5823bf806	AMDGPU: Don't create weird sized integers There's no reason to introduce a new, unnaturally sized value here. This has a chance to produce worse code with legalization. Avoids regression in a future patch.	2020-01-20 20:02:54 -05:00
Matt Arsenault	317fdcd09a	AMDGPU: Cleanup and generate 64-bit div tests Split out r600 tests, and try to be more consistent with coverage. Cover a few more cases for 24-bit optimization and constants.	2020-01-20 17:19:39 -05:00
Matt Arsenault	9b13b4a0e3	AMDGPU: Prepare to use scalar register indexing Define pseudos mirroring the the VGPR indexing ones, and adjust the operands in the s_movrel* instructions to avoid the result def.	2020-01-20 17:19:16 -05:00
Fangrui Song	886d2c2ca7	[BranchRelaxation] Simplify offset computation and fix a bug in adjustBlockOffsets() If Start!=0, adjustBlockOffsets() may unnecessarily adjust the offset of Start. There is no correctness issue, but it can create more block splits.	2020-01-19 16:02:16 -08:00
Matt Arsenault	592de0009f	AMDGPU/GlobalISel: Select llvm.amdgcn.update.dpp The existing test is overly reliant on -mattr=-flat-for-global, and some missing optimizations to re-use.	2020-01-17 20:09:53 -05:00
Matt Arsenault	ec9628318d	AMDGPU/GlobalISel: Select DS append/consume	2020-01-17 20:09:53 -05:00
Stanislav Mekhanoshin	eebdd85e7d	[AMDGPU] allow multi-dword flat scratch access since GFX9 This is supported starting with GFX9. Differential Revision: https://reviews.llvm.org/D72865	2020-01-17 10:47:03 -08:00
Matt Arsenault	886f9071c6	AMDGPU: Don't assert on a16 images on targets without FeatureR128A16 Currently the lowering for i16 image coordinates asserts on gfx10. I'm somewhat confused by this though. The feature is missing from the gfx10 feature lists, but the a16 bit appears to be present in the manual for MIMG instructions.	2020-01-17 11:07:00 -05:00
Matt Arsenault	8945b23af5	AMDGPU: Update more tests to use modern buffer intrinsics	2020-01-16 14:29:38 -05:00
Matt Arsenault	e12b840abf	AMDGPU/GlobalISel: Improve lowering of G_SEXT_INREG Clamping the scalar is much better than lowering with superwide shifts for types > s64.	2020-01-16 14:29:37 -05:00
Matt Arsenault	a66d2817ca	GlobalISel: Don't ignore requested ext narrowing type This was assuming the narrow target was the source type. Respect the requested type when these don't match by using intermediate merges. This avoids producing very wide, illegal shift expansions.	2020-01-16 14:29:37 -05:00
Matt Arsenault	de4f88df97	AMDGPU: Remove IR section from MIR test Also generate check lines so this isn't just testing the meaningless block name.	2020-01-16 13:49:44 -05:00
Matt Arsenault	0d0fce42b0	GlobalISel: Preserve load/store metadata in IRTranslator This was dropping the invariant metadata on dead argument loads, so they weren't deleted. Atomics still need to be fixed the same way. Also, apparently store was never preserving dereferencable which should also be fixed.	2020-01-16 13:49:43 -05:00
Matt Arsenault	20ca49b646	AMDGPU: Update tests to use modern buffer intrinsics	2020-01-16 13:49:43 -05:00
Matt Arsenault	4ca1ad85b7	AMDGPU/GlobalISel: Don't handle legacy buffer intrinsic	2020-01-16 11:31:12 -05:00
Matt Arsenault	9b2f3532c7	AMDGPU/GlobalISel: Select DS GWS intrinsics	2020-01-16 11:25:10 -05:00
Yuanfang Chen	6e24c6037f	Revert "[Support] make report_fatal_error `abort` instead of `exit`" This reverts commit `647c3f4e47`. Got bots failure from sanitizer-windows and maybe others.	2020-01-15 17:52:25 -08:00
Yuanfang Chen	647c3f4e47	[Support] make report_fatal_error `abort` instead of `exit` Summary: This patch could be treated as a rebase of D33960. It also fixes PR35547. A fix for `llvm/test/Other/close-stderr.ll` is proposed in D68164. Seems the consensus is that the test is passing by chance and I'm not sure how important it is for us. So it is removed like in D33960 for now. The rest of the test fixes are just adding `--crash` flag to `not` tool. ** The reason it fixes PR35547 is `exit` does cleanup including calling class destructor whereas `abort` does not do any cleanup. In multithreading environment such as ThinLTO or JIT, threads may share states which mostly are ManagedStatic<>. If faulting thread tearing down a class when another thread is using it, there are chances of memory corruption. This is bad 1. It will stop error reporting like pretty stack printer; 2. The memory corruption is distracting and nondeterministic in terms of error message, and corruption type (depending one the timing, it could be double free, heap free after use, etc.). Reviewers: rnk, chandlerc, zturner, sepavloff, MaskRay, espindola Reviewed By: rnk, MaskRay Subscribers: wuzish, jholewinski, qcolombet, dschuff, jyknight, emaste, sdardis, nemanjai, jvesely, nhaehnle, sbc100, arichardson, jgravelle-google, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, lenary, s.egerton, pzheng, cfe-commits, MaskRay, filcab, davide, MatzeB, mehdi_amini, hiraditya, steven_wu, dexonsmith, rupprecht, seiya, llvm-commits Tags: #llvm, #clang Differential Revision: https://reviews.llvm.org/D67847	2020-01-15 17:05:13 -08:00
Stanislav Mekhanoshin	8b417dd3d6	Process BUNDLE in tail duplication When tail duplication estimates a size of tail it uses instruction count. Account for a number of instrictions in a bundle too. Differential Revision: https://reviews.llvm.org/D72783	2020-01-15 15:46:57 -08:00
Matt Arsenault	711a17afaf	AMDGPU/GlobalISel: Select exp with patterns This does produce slightly different code. Now a unique IMPLICIT_DEF is emitted for each of the implicit_def operands, rather than reusing the same one.	2020-01-15 18:33:15 -05:00
Matt Arsenault	25e9938a45	GlobalISel: Handle more cases of G_SEXT narrowing This now develops the same problem G_ZEXT/G_ANYEXT have where the requested type is assumed to be the source type. This will be fixed separately by creating intermediate merges.	2020-01-15 18:33:15 -05:00
Matt Arsenault	936483fb7d	GlobalISel: Implement lower for G_BITCAST Bitcast only really applies between scalars and vectors. Implement as an unmerge and remerge. The test needs to tolerate failure since one of the unmerges currently fails to legalize.	2020-01-15 08:58:58 -05:00
Matt Arsenault	91715617ad	GlobalISel: Fix narrowScalar for G_ANYEXT results This is nearly the same as G_ZEXT.	2020-01-15 08:58:57 -05:00
cdevadas	0dc6c249bf	[AMDGPU] Invert the handling of skip insertion. The current implementation of skip insertion (SIInsertSkip) makes it a mandatory pass required for correctness. Initially, the idea was to have an optional pass. This patch inserts the s_cbranch_execz upfront during SILowerControlFlow to skip over the sections of code when no lanes are active. Later, SIRemoveShortExecBranches removes the skips for short branches, unless there is a sideeffect and the skip branch is really necessary. This new pass will replace the handling of skip insertion in the existing SIInsertSkip Pass. Differential revision: https://reviews.llvm.org/D68092	2020-01-15 15:18:16 +05:30
Michael Liao	65c8abb14e	[amdgpu] Fix typos in a test case. - There are typos introduced due to merge.	2020-01-14 20:08:39 -05:00
Michael Liao	01a4b83154	[codegen,amdgpu] Enhance MIR DIE and re-arrange it for AMDGPU. Summary: - `dead-mi-elimination` assumes MIR in the SSA form and cannot be arranged after phi elimination or DeSSA. It's enhanced to handle the dead register definition by skipping use check on it. Once a register def is `dead`, all its uses, if any, should be `undef`. - Re-arrange the DIE in RA phase for AMDGPU by placing it directly after `detect-dead-lanes`. - Many relevant tests are refined due to different register assignment. Reviewers: rampitec, qcolombet, sunfish Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72709	2020-01-14 19:26:15 -05:00
Michael Liao	8d07f8d98c	[DAGCombine] Replace `getIntPtrConstant()` with `getVectorIdxTy()`. - Prefer `getVectorIdxTy()` as the index operand type for `EXTRACT_SUBVECTOR` as targets expect different types by overloading `getVectorIdxTy()`.	2020-01-14 17:03:05 -05:00
Jay Foad	b777e551f0	[MachineScheduler] Reduce reordering due to mem op clustering Summary: Mem op clustering adds a weak edge in the DAG between two loads or stores that should be clustered, but the direction of this edge is pretty arbitrary (it depends on the sort order of MemOpInfo, which represents the operands of a load or store). This often means that two loads or stores will get reordered even if they would naturally have been scheduled together anyway, which leads to test case churn and goes against the scheduler's "do no harm" philosophy. The fix makes sure that the direction of the edge always matches the original code order of the instructions. Reviewers: atrick, MatzeB, arsenm, rampitec, t.p.northover Subscribers: jvesely, wdng, nhaehnle, kristof.beyls, hiraditya, javed.absar, arphaman, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72706	2020-01-14 19:19:02 +00:00
Stanislav Mekhanoshin	ad741853c3	[AMDGPU] Model distance to instruction in bundle This change allows to model the height of the instruction within a bundle for latency adjustment purposes. Differential Revision: https://reviews.llvm.org/D72669	2020-01-14 01:18:59 -08:00
Stanislav Mekhanoshin	eca4474587	[AMDGPU] Fix getInstrLatency() always returning 1 We do not have InstrItinerary so generic getInstLatency() was always defaulting to return 1 cycle. We need to use TargetSchedModel instead to compute an instruction's latency. Differential Revision: https://reviews.llvm.org/D72655	2020-01-14 01:08:30 -08:00
Matt Arsenault	203801425d	AMDGPU/GlobalISel: Select llvm.amdgcn.ds.ordered.{add\|swap}	2020-01-13 13:09:38 -05:00
Matt Arsenault	2f090cc8f1	AMDGPU/GlobalISel: Add some baseline tests for vector extract A future change will try to fold constant offsets into the loop which these will stress.	2020-01-13 12:51:05 -05:00
Matt Arsenault	ca19d7a399	AMDGPU/GlobalISel: Fix branch targets when emitting SI_IF The branch target needs to be changed depending on whether there is an unconditional branch or not. Loops also need to be similarly fixed, but compiling a simple testcase end to end requires another set of patches that aren't upstream yet.	2020-01-13 12:51:05 -05:00
Matt Arsenault	d7d88b9d8b	GlobalISel: Fix assertion on wide G_ZEXT sources It's possible to have a type that needs a mask greater than 64-bits.	2020-01-13 08:29:45 -05:00
Simon Pilgrim	8f49204f26	[SelectionDAG] ComputeKnownBits - minimum leading/trailing zero bits in LSHR/SHL (PR44526) As detailed in https://blog.regehr.org/archives/1709 we don't make use of the known leading/trailing zeros for shifted values in cases where we don't know the shift amount value. This patch adds support to SelectionDAG::ComputeKnownBits to use KnownBits::countMinTrailingZeros and countMinLeadingZeros to set the minimum guaranteed leading/trailing known zero bits. Differential Revision: https://reviews.llvm.org/D72573	2020-01-13 11:08:12 +00:00
Matt Arsenault	3c868cbbda	AMDGPU: Split test function This avoids slightly different scheduling/regalloc behavior, and avoids a test diff between GlobalISel and SelectionDAG.	2020-01-12 22:44:51 -05:00
Matt Arsenault	555e7ee04c	AMDGPU/GlobalISel: Don't use XEXEC class for SGPRs We don't use the xexec register classes for arbitrary values anymore. Avoids a test variance beween GlobalISel and SelectionDAG>	2020-01-12 22:44:51 -05:00
Matt Arsenault	a10527cd37	AMDGPU/GlobalISel: Copy type when inserting readfirstlane getDefIgnoringCopies will fail to find any def if no type is set if we try to use it on the use's operand, so propagate the type.	2020-01-12 22:44:51 -05:00
Simon Pilgrim	065eefcfe9	[AMDGPU] Regenerate shl shift tests	2020-01-12 14:34:36 +00:00
Michael Bedy	4a32cd11ac	[AMDGPU] Remove unnecessary v_mov from a register to itself in WQM lowering. Summary: - SI Whole Quad Mode phase is replacing WQM pseudo instructions with v_mov instructions. While this is necessary for the special handling of moving results out of WWM live ranges, it is not necessary for WQM live ranges. The result is a v_mov from a register to itself after every WQM operation. This change uses a COPY psuedo in these cases, which allows the register allocator to coalesce the moves away. Reviewers: tpr, dstuttard, foad, nhaehnle Reviewed By: nhaehnle Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71386	2020-01-10 23:01:19 -05:00
Matt Arsenault	bac995d978	AMDGPU/GlobalISel: Clamp G_ZEXT source sizes Also clamps G_SEXT/G_ANYEXT, but the implementation is more limited so fewer cases actually work.	2020-01-10 09:42:49 -05:00
Matt Arsenault	35c3d101ae	AMDGPU/GlobalISel: Select G_EXTRACT_VECTOR_ELT Doesn't try to do the fold into the base register of an add of a constant in the index like the DAG path does.	2020-01-09 19:52:24 -05:00
Matt Arsenault	5cabb8357a	AMDGPU/GlobalISel: Fix G_EXTRACT_VECTOR_ELT mapping for s-v case If an SGPR vector is indexed with a VGPR, the actual indexing will be done on the SGPR and produce an SGPR. A copy needs to be inserted inside the waterwall loop to the VGPR result.	2020-01-09 19:46:54 -05:00
Stanislav Mekhanoshin	cd69e4c74c	[AMDGPU] Fix bundle scheduling Bundles coming to scheduler considered free, i.e. zero latency. Fixed. Differential Revision: https://reviews.llvm.org/D72487	2020-01-09 15:56:36 -08:00
Matt Arsenault	b4a647449f	TableGen/GlobalISel: Add way for SDNodeXForm to work on timm The current implementation assumes there is an instruction associated with the transform, but this is not the case for timm/TargetConstant/immarg values. These transforms should directly operate on a specific MachineOperand in the source instruction. TableGen would assert if you attempted to define an equivalent GISDNodeXFormEquiv using timm when it failed to find the instruction matcher. Specially recognize SDNodeXForms on timm, and pass the operand index to the render function. Ideally this would be a separate render function type that looks like void renderFoo(MachineInstrBuilder, const MachineOperand&), but this proved to be somewhat mechanically painful. Add an optional operand index which will only be passed if the transform should only look at the one source operand. Theoretically it would also be possible to only ever pass the MachineOperand, and the existing renderers would check the parent. I think that would be somewhat ugly for the standard usage which may want to inspect other operands, and I also think MachineOperand should eventually not carry a pointer to the parent instruction. Use it in one sample pattern. This isn't a great example, since the transform exists to satisfy DAG type constraints. This could also be avoided by just changing the MachineInstr's arbitrary choice of operand type from i16 to i32. Other patterns have nontrivial uses, but this serves as the simplest example. One flaw this still has is if you try to use an SDNodeXForm defined for imm, but the source pattern uses timm, you still see the "Failed to lookup instruction" assert. However, there is now a way to avoid it.	2020-01-09 17:37:52 -05:00
Matt Arsenault	0ea3c7291f	GlobalISel: Handle llvm.read_register Compared to the attempt in `bdcc6d3d26`, this uses intermediate generic instructions.	2020-01-09 17:37:52 -05:00
Matt Arsenault	767aa507a4	AMDGPU/GlobalISel: Fix argument lowering for vectors of pointers When these arguments are broken down by the EVT based callbacks, the pointer information is lost. Hack around this by coercing the register types to be the expected pointer element type when building the remerge operations.	2020-01-09 16:29:44 -05:00

1 2 3 4 5 ...

3001 Commits