llvm-project

Commit Graph

Author	SHA1	Message	Date
Tim Renouf	ee3e642627	[AMDGPU] Add gfx90c target This differentiates the Ryzen 4000/4300/4500/4700 series APUs that were previously included in gfx909. Differential Revision: https://reviews.llvm.org/D90419 Change-Id: Ia901a7157eb2f73ccd9f25dbacec38427312377d	2020-11-03 16:27:43 +00:00
Jay Foad	040c50278c	[AMDGPU] Fix ds_read2/write2 with unaligned offsets These instructions use a scaled offset. We were wrongly selecting them even when the required offset was not a multiple of the scale factor. Differential Revision: https://reviews.llvm.org/D90607	2020-11-03 15:16:10 +00:00
Jay Foad	6e008cb554	[AMDGPU] Precommit globalisel tests for ds_read2_b64 with large offset	2020-11-03 14:38:56 +00:00
Jay Foad	32897c05ab	[AMDGPU] Specify a triple to avoid codegen changes depending on host OS	2020-11-03 13:33:44 +00:00
Jay Foad	0892d2a311	Revert "Fix ds_read2/write2 unaligned offsets" This reverts commit `2e7e898c8f`. It was committed by mistake.	2020-11-02 14:01:33 +00:00
Jay Foad	2e7e898c8f	Fix ds_read2/write2 unaligned offsets	2020-11-02 13:57:13 +00:00
Jay Foad	c8cbaa153c	[AMDGPU] Precommit ds_read2/write2 with unaligned offset tests. NFC.	2020-11-02 13:57:08 +00:00
Jay Foad	f3881d6517	[AMDGPU] Generate test checks. NFC.	2020-11-02 13:56:46 +00:00
Jay Foad	d3f13f3edf	[AMDGPU] Remove a comment. NFC. This was obsoleted by `f78687df9b` which added gfx9 aligned/unaligned tests.	2020-11-02 13:56:46 +00:00
Christudasan Devadasan	9bb2b4f0aa	[AMDGPU] Add alignment check for v3 to v4 load type promotion It should be enabled only when the load alignment is at least 8-byte. Fixes: SWDEV-256824 Reviewed By: foad Differential Revision: https://reviews.llvm.org/D90404	2020-11-01 12:05:34 +05:30
Scott Linder	13a56ca5a9	[AMDGPU] Refactor and extend elf-header-flags-mach tests * Factor out common elements of the input YAML document and use sed to macro replace the run line specific elements. * Add checks for the common elements which depend on the ELF class. * Use non-numeric suffix for temporary files to avoid merge conflicts. * Sort tests by GFX# ascending. * Group ELF and YAML tests by GFX#. Reviewed By: t-tye Differential Revision: https://reviews.llvm.org/D90245	2020-10-30 18:57:04 +00:00
Matt Arsenault	790f5771fd	AMDGPU: Fix missing writelane cases to skip with exec=0	2020-10-30 11:15:11 -04:00
Jay Foad	58de4b2053	[AMDGPU] Use pseudo instructions for readlane/writelane This reverts r227987 "R600/SI: Determine target-specific encoding of READLANE and WRITELANE early v2". All the codegen changes are caused by the post-RA scheduler no longer treating readlane/writelane as scheduling barriers due to having unmodelled side effects. (The pseudos are hasSideEffects = 0, but the real instructions are hasSideEffects = ? which TableGen conservatively treats as 1.) Differential Revision: https://reviews.llvm.org/D90401	2020-10-29 16:00:53 +00:00
Jay Foad	7a79921edd	[AMDGPU] Remove gds operand from ds_gws_* MachineInstrs The operand value was always 1 (except in some bad MIR tests) so it was redundant. Differential Revision: https://reviews.llvm.org/D90378	2020-10-29 15:04:23 +00:00
Austin Kerbow	de51867343	[AMDGPU] Add Reset function to GCNHazardRecognizer Reset the tracked emitted instructions when starting scheduling on a new region. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D90347	2020-10-28 16:32:32 -07:00
Jay Foad	5b91a6a88b	[AMDGPU] Allow some modifiers on VOP3B instructions V_DIV_SCALE_F32/F64 are VOP3B encoded so they can't use the ABS src modifier, but they can still use NEG and the usual output modifiers. This partially reverts `3b99f12a4e` "AMDGPU: Remove modifiers from v_div_scale_*". Differential Revision: https://reviews.llvm.org/D90296	2020-10-28 21:54:14 +00:00
Austin Kerbow	8b127a8661	[AMDGPU] Fix inserting combined s_nop in bundles Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D90334	2020-10-28 14:34:04 -07:00
Aditya Nandakumar	bed8394047	[GISel]: Few InsertVecElt combines https://reviews.llvm.org/D88060 This adds the following combines 1) build_vector formation from insert_vec_elts 2) insert_vec_elts (build_vector) -> build_vector	2020-10-28 12:27:07 -07:00
Matt Arsenault	b9c21d43bb	RegAlloc: Clear isSSA The MIR parser may infer SSA, so -run-pass=regallocgreedy would hit a verifier error after multiple vreg defs are added.	2020-10-28 12:02:16 -04:00
Sebastian Neubauer	09c7345683	[AMDGPU] Precommit tests for D89388 and D89399, NFC	2020-10-28 16:58:55 +01:00
Carl Ritson	057934a6d7	[AMDGPU] Fix insert of SIPreAllocateWWMRegs in FastRegAlloc SIPreAllocateWWMRegs was being inserted after RegisterCoalescer but this pass does not exist during FastAlloc so pre-allocation pass was never being run. Insert pre-allocation after TwoAddressInstructionPass instead. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D90236	2020-10-28 12:15:15 +09:00
Michael Liao	46c3d5cb05	[amdgpu] Add the late codegen preparation pass. Summary: - Teach that pass to widen naturally aligned but not DWORD aligned sub-DWORD loads. Reviewers: rampitec, arsenm Subscribers: Tags: #llvm Differential Revision: https://reviews.llvm.org/D80364	2020-10-27 14:07:59 -04:00
Jay Foad	d028d2b376	[AMDGPU] Add llvm.amdgcn.div.scale with fneg tests	2020-10-27 16:05:51 +00:00
Michael Liao	0d092303b4	[amdgpu] Enable use of AA during codegen. - Add an internal option `-amdgpu-use-aa-in-codegen` to enable or disable this feature. By Default, it's enabled. Differential Revision: https://reviews.llvm.org/D89320	2020-10-27 09:46:23 -04:00
Carl Ritson	7a880ab388	[AMDGPU] Move WQM Pass after MI Scheduler Exec mask manipulation inserted by SIWholeQuadMode barriers to instruction scheduling. Move the entire pass after the machine instruction scheduler and make changes so pass is correct for non-SSA operation. These changes should leave the pass still usable pre-scheduler, although tests have be updated to reflect post-scheduler results. Reviewed By: nhaehnle Differential Revision: https://reviews.llvm.org/D88081	2020-10-27 10:25:53 +09:00
Stanislav Mekhanoshin	038d884a50	[AMDGPU] Use flat scratch instructions where available The support is disabled by default. So far there is instruction selection, spilling, and frame elimination. It also changes SP from unswizzled to swizzled as used by flat scratch instructions, so it cannot be mixed with MUBUF stack access. At the very least missing: - GlobalISel; - Some optimizations in frame elimination in between vector and scalar ALU; - It shall finally allow to always materialize frame index as an SGPR, but that is not implemented and frame elimination cannot handle it yet; - Unaligned and/or multidword flat scratch shall work, but it is legalized now for MUBUF; - Operand folding cannot optimize FI like with MUBUF yet; - It will need scaling the value of the SP/FP in the DWARF expression to recover the unswizzled scratch address; Differential Revision: https://reviews.llvm.org/D89170	2020-10-26 14:40:42 -07:00
Fraser Cormack	ffa6d2afa4	[DAGCombine] Add test case showing incorrect DAGCombine optimization This optmization produces incorrect results when the vector element type is not byte-sized. Related to D78568.	2020-10-26 12:37:31 +00:00
Sebastian Neubauer	a094b4fa4b	[AMDGPU] Emit new pal metadata by default If no pal metadata is given, default to the msgpack format instead of the legacy metadata. This makes tests better readable. Differential Revision: https://reviews.llvm.org/D90035	2020-10-26 10:16:17 +01:00
Christudasan Devadasan	5a061041ec	[AMDGPU] Avoid offset register in MUBUF for direct stack object accesses We use an absolute address for stack objects and it would be necessary to have a constant 0 for soffset field. Fixes: SWDEV-228562 Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D89234	2020-10-26 11:08:37 +05:30
Matt Arsenault	d61996473d	AMDGPU: Increase branch size estimate with offset bug This will be relaxed to insert a nop if the offset hits the bad value, so over estimate branch instruction sizes.	2020-10-23 10:34:24 -04:00
Matt Arsenault	549f326d32	AMDGPU: Cleanup MIR test Remove registers section and compact block/register numbers	2020-10-22 12:54:35 -04:00
Piotr Sobczak	7ae0033ca8	[AMDGPU] Fix expansion of i16 MULH This commit marks i16 MULH as expand in AMDGPU backend, which is necessary after the refactoring in D80485. Differential Revision: https://reviews.llvm.org/D89965	2020-10-22 17:05:06 +02:00
Matt Arsenault	d3bcfe2a36	AMDGPU: Implement getNoPreservedMask We don't support funclets for exception handling and I hit this when manually reducing MIR.	2020-10-22 10:17:31 -04:00
Matt Arsenault	188df17420	ScheduleDAGInstrs: Skip debug instructions at end of scheduling region If the end instruction of the scheduling region was a DBG_VALUE, the uses of the debug instruction were tracked as if they were real uses. This would then hit the deadDefHasNoUse assertion in addVRegDefDeps if the only use was the debug instruction.	2020-10-22 10:16:45 -04:00
Stanislav Mekhanoshin	611959f004	[AMDGPU] Fixed v_swap_b32 match 1. Fixed liveness issue with implicit kills. 2. Fixed potential problem with an indirect mov. Fixes: SWDEV-256848 Differential Revision: https://reviews.llvm.org/D89599	2020-10-21 10:14:24 -07:00
Matt Arsenault	1ed4caff1d	AMDGPU: Lower the threshold reported for maximum stack size exceeded Check the actual maximum supported stack size for a kernel.	2020-10-21 12:06:27 -04:00
Matt Arsenault	53c43431bc	AMDGPU: Propagate amdgpu-flat-work-group-size attributes Fixes being overly conservative with the register counts in called functions. This should try to do a conservative range merge, but for now just clone. Also fix not being able to functionally run the pass standalone.	2020-10-21 12:06:24 -04:00
Florian Hahn	88241ffb56	[Passes] Move ADCE before DSE & LICM. The adjustment seems to have very little impact on optimizations. The only binary change with -O3 MultiSource/SPEC2000/SPEC2006 on X86 is in consumer-typeset and the size there actually decreases by -0.1%, with not significant changes in the stats. On its own, it is mildly positive in terms of compile-time, most likely due to LICM & DSE having to process slightly less instructions. It should also be unlikely that DSE/LICM make much new code dead. http://llvm-compile-time-tracker.com/compare.php?from=df63eedef64d715ce1f31843f7de9c11fe1e597f&to=e3bdfcf94a9eeae6e006d010464f0c1b3550577d&stat=instructions With DSE & MemorySSA, it gives some nice compile-time improvements, due to the fact that DSE can re-use the PDT from ADCE, if it does not make any changes: http://llvm-compile-time-tracker.com/compare.php?from=15fdd6cd7c24c745df1bb419e72ff66fd138aa7e&to=481f494515fc89cb7caea8d862e40f2c910dc994&stat=instructions Reviewed By: xbolva00 Differential Revision: https://reviews.llvm.org/D87322	2020-10-21 10:30:56 +01:00
Austin Kerbow	ebdcef20ce	[AMDGPU] Avoid inserting noops during scheduling Passes that are run after the post-RA scheduler may insert instructions like waitcnt which eliminate the need for certain noops. After this patch the scheduler is still aware of possible latency from hazards but noops will not be inserted until the dedicated hazard recognizer pass is run. Depends on D89753. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D89754	2020-10-20 17:11:36 -07:00
Austin Kerbow	37d907899f	[HazardRec] Allow inserting multiple wait-states simultaneously If a target can encode multiple wait-states into a noop allow emitting such instructions directly. Reviewed By: rampitec, dmgreen Differential Revision: https://reviews.llvm.org/D89753	2020-10-20 17:03:47 -07:00
Tony	1bc7bfffdb	[AMDGPU] Optimize waitcnt insertion for flat memory operations Change waitcnt insertion to check the memory operand tokens to see if flat memory operations access VMEM in the same way it does to check if accessing LDS. This avoids adding waitcnt for counters for address spaces that are not accessed. In addition, only generate the pessimistic waitcnt 0 if a flat memory operation appears to access both VMEM and LDS. This benefits flat memory operations that explicitly specify the address space as GLOBAL or LOCAL. Differential Revision: https://reviews.llvm.org/D89618	2020-10-20 22:55:12 +00:00
Michael Liao	2a0e4d1c01	[amdgpu] Enhance AMDGPU AA. - In general, a generic point may alias to pointers in all other address spaces. However, for certain cases enforced by the programming model, we may found a generic point won't alias to pointers to local objects. * When a generic pointer is loaded from the constant address space, it could only be a pointer to the GLOBAL or CONSTANT address space. Thus, it won't alias to pointers to the PRIVATE or LOCAL address space. * When a generic pointer is passed as a kernel argument, it also could only be a pointer to the GLOBAL or CONSTANT address space. Thus, it also won't alias to pointers to the PRIVATE or LOCAL address space. Differential Revision: https://reviews.llvm.org/D89525	2020-10-20 09:54:12 -04:00
Carl Ritson	be2afbd019	[AMDGPU] Remove fix up operand from SI_ELSE Remove immediate operand from SI_ELSE which indicates if EXEC has been modified. Instead always emit code that handles EXEC and remove unnecessary instructions during pre-RA optimisation. This facilitates passes (i.e. SIWholeQuadMode) adding exec mask manipulation post control flow lowering, and pre control flow lower passes do not need to be aware of SI_ELSE handling. Reviewed By: nhaehnle Differential Revision: https://reviews.llvm.org/D89644	2020-10-20 19:15:21 +09:00
sstefan1	fbfb1c7909	[IR] Make nosync, nofree and willreturn default for intrinsics. D70365 allows us to make attributes default. This is a follow up to actually make nosync, nofree and willreturn default. The approach we chose, for now, is to opt-in to default attributes to avoid introducing problems to target specific intrinsics. Intrinsics with default attributes can be created using `DefaultAttrsIntrinsic` class.	2020-10-20 11:57:19 +02:00
Piotr Sobczak	c872faf6e0	[AMDGPU] Do not generate S_CMP_LG_U64 on gfx7 S_CMP_LG_U64 was added in gfx8 and is guarded by hasScalarCompareEq64(). Rewrite S_CMP_LG_U64 to S_OR_B32 + S_CMP_LG_U32 for targets that do not support 64-bit scalar compare. Differential Revision: https://reviews.llvm.org/D89536	2020-10-19 14:44:31 +02:00
Hans Wennborg	0628bea513	Revert "[PM/CC1] Add -f[no-]split-cold-code CC1 option to toggle splitting" This broke Chromium's PGO build, it seems because hot-cold-splitting got turned on unintentionally. See comment on the code review for repro etc. > This patch adds -f[no-]split-cold-code CC1 options to clang. This allows > the splitting pass to be toggled on/off. The current method of passing > `-mllvm -hot-cold-split=true` to clang isn't ideal as it may not compose > correctly (say, with `-O0` or `-Oz`). > > To implement the -fsplit-cold-code option, an attribute is applied to > functions to indicate that they may be considered for splitting. This > removes some complexity from the old/new PM pipeline builders, and > behaves as expected when LTO is enabled. > > Co-authored by: Saleem Abdulrasool <compnerd@compnerd.org> > Differential Revision: https://reviews.llvm.org/D57265 > Reviewed By: Aditya Kumar, Vedant Kumar > Reviewers: Teresa Johnson, Aditya Kumar, Fedor Sergeev, Philip Pfaffe, Vedant Kumar This reverts commit `273c299d5d`.	2020-10-19 12:31:14 +02:00
Austin Kerbow	978fbd8268	[AMDGPU] Run hazard recognizer pass later If instructions were removed in peephole passes after the hazard recognizer was run it is possible that new hazards could be introduced. Fixes: SWDEV-253090 Reviewed By: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D89077	2020-10-16 12:15:51 -07:00
Jay Foad	1417abe54c	[AMDGPU] Add new llvm.amdgcn.fma.legacy intrinsic Differential Revision: https://reviews.llvm.org/D89558	2020-10-16 17:10:21 +01:00
Matt Arsenault	ce16b6835b	AMDGPU: Don't kill super-register with overlapping copy This would end up killing part of the result super-register, resulting in a verifier error on a later use of the overlapping registers. We could add kills of any non-aliasing registers, but we should be moving away from relying on kill flags.	2020-10-16 09:34:35 -04:00
Florian Hahn	51ff04567b	Recommit "[DSE] Switch to MemorySSA-backed DSE by default." After investigation by @asbirlea, the issue that caused the revert appears to be an issue in the original source, rather than a problem with the compiler. This patch enables MemorySSA DSE again. This reverts commit `915310bf14`.	2020-10-16 09:02:53 +01:00
Vedant Kumar	273c299d5d	[PM/CC1] Add -f[no-]split-cold-code CC1 option to toggle splitting This patch adds -f[no-]split-cold-code CC1 options to clang. This allows the splitting pass to be toggled on/off. The current method of passing `-mllvm -hot-cold-split=true` to clang isn't ideal as it may not compose correctly (say, with `-O0` or `-Oz`). To implement the -fsplit-cold-code option, an attribute is applied to functions to indicate that they may be considered for splitting. This removes some complexity from the old/new PM pipeline builders, and behaves as expected when LTO is enabled. Co-authored by: Saleem Abdulrasool <compnerd@compnerd.org> Differential Revision: https://reviews.llvm.org/D57265 Reviewed By: Aditya Kumar, Vedant Kumar Reviewers: Teresa Johnson, Aditya Kumar, Fedor Sergeev, Philip Pfaffe, Vedant Kumar	2020-10-15 23:13:33 +00:00
alex-t	42ed388120	[AMDGPU] SILowerControlFlow::removeMBBifRedundant should not try to change MBB layout if it can fallthrough removeMBBifRedundant normally tries to keep predecessors fallthrough when removing redundant MBB. It has to change MBBs layout to keep the new successor to immediately follow the predecessor of removed MBB. It only may be allowed in case the new successor itself has no successors to which it fall through. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D89397	2020-10-15 23:20:54 +03:00
Stanislav Mekhanoshin	d1beb95d12	[AMDGPU] gfx1032 target Differential Revision: https://reviews.llvm.org/D89487	2020-10-15 12:41:18 -07:00
Matt Arsenault	663f16684d	AMDGPU: Fix verifier error on killed spill of partially undef register This does unfortunately end up with extra waitcnts getting inserted that were avoided before. Ideally we would avoid the spills of these undef components in the first place.	2020-10-15 09:45:44 -04:00
Carl Ritson	b70cb50204	[AMDGPU] Minimize number of s_mov generated by copyPhysReg Generate the minimal set of s_mov instructions required when expanding a SGPR copy operation in copyPhysReg. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D89187	2020-10-15 22:35:02 +09:00
Carl Ritson	75357ebc50	[AMDGPU] Pre-commit test for D89187	2020-10-15 15:29:07 +09:00
Konstantin Zhuravlyov	3fdf3b1539	AMDGPU: Update AMDHSA code object version handling Differential Revision: https://reviews.llvm.org/D89076	2020-10-14 13:04:27 -04:00
Jay Foad	b59d8d7c72	[AMDGPU][GlobalISel] Compute known bits for zero-extending loads Implement computeKnownBitsForTargetInstr for G_AMDGPU_BUFFER_LOAD_UBYTE and G_AMDGPU_BUFFER_LOAD_USHORT. This allows generic combines to remove some unnecessary G_ANDs. Differential Revision: https://reviews.llvm.org/D89316	2020-10-13 16:22:00 +01:00
Mirko Brkusanin	52ba4fa6aa	[GlobalISel] Avoid making G_PTR_ADD with nullptr When the first operand is a null pointer we can avoid making a G_PTR_ADD and make a G_INTTOPTR with the offset operand. This helps us avoid making add with 0 later on for targets such as AMDGPU. Differential Revision: https://reviews.llvm.org/D87140	2020-10-13 13:02:55 +02:00
Jay Foad	acd0dd3a62	[AMDGPU] Use lowercase for subtarget feature names in RUN lines	2020-10-13 09:02:09 +01:00
Ruiling Song	b215a26628	[AMDGPU] Update LiveVariables in convertToThreeAddress() This can fix an asan failure like below. ==15856==ERROR: AddressSanitizer: use-after-poison on address ... READ of size 8 at 0x6210001a3cb0 thread T0 #0 llvm::MachineInstr::getParent() #1 llvm::LiveVariables::VarInfo::findKill() #2 TwoAddressInstructionPass::rescheduleMIBelowKill() #3 TwoAddressInstructionPass::tryInstructionTransform() #4 TwoAddressInstructionPass::runOnMachineFunction() We need to update the Kills if we replace instructions. The Kills may be later accessed within TwoAddressInstruction pass. Differential Revision: https://reviews.llvm.org/D89092	2020-10-13 08:12:20 +08:00
Sebastian Neubauer	7f2a641aad	[AMDGPU] Insert waterfall loops for divergent calls Extend loadSRsrcFromVGPR to allow moving a range of instructions into the loop. The call instruction is surrounded by copies into physical registers which should be part of the waterfall loop. Differential Revision: https://reviews.llvm.org/D88291	2020-10-12 17:16:11 +02:00
Tim Renouf	666ef0db20	[AMDGPU] Add gfx602, gfx705, gfx805 targets At AMD, in an internal audit of our code, we found some corner cases where we were not quite differentiating targets enough for some old hardware. This commit is part of fixing that by adding three new targets: * The "Oland" and "Hainan" variants of gfx601 are now split out into gfx602. LLPC (in the GPUOpen driver) and other front-ends could use that to avoid using the shaderZExport workaround on gfx602. * One variant of gfx703 is now split out into gfx705. LLPC and other front-ends could use that to avoid using the shaderSpiCsRegAllocFragmentation workaround on gfx705. * The "TongaPro" variant of gfx802 is now split out into gfx805. TongaPro has a faster 64-bit shift than its former friends in gfx802, and a subtarget feature could be set up for that to take advantage of it. This commit does not make that change; it just adds the target. V2: Add clang changes. Put TargetParser list in order. V3: AMDGCNGPUs table in TargetParser.cpp needs to be in GPUKind order, so fix the GPUKind order. Differential Revision: https://reviews.llvm.org/D88916 Change-Id: Ia901a7157eb2f73ccd9f25dbacec38427312377d	2020-10-10 17:22:22 +01:00
Jay Foad	1dfbc2ea14	[AMDGPU] Only enable mad/mac legacy f32 patterns if denormals may be flushed Following on from D88890, this makes the newly added patterns conditional on NoFP32Denormals. mad/mac f32 instructions always flush denormals regardless of the MODE register setting, and I believe the legacy variants do the same. Differential Revision: https://reviews.llvm.org/D89123	2020-10-09 17:08:38 +01:00
Austin Kerbow	a4f35ab232	[AMDGPU] Fix mai hazard VALU to LD/ST Fixes: SWDEV-251863 Differential Revision: https://reviews.llvm.org/D89079	2020-10-08 17:13:02 -07:00
Jay Foad	7238faa4ae	[AMDGPU] Add patterns for mad/mac legacy f32 instructions Note that all subtargets up to GFX10.1 have v_mad_legacy_f32, but GFX8/9 lack v_mac_legacy_f32. GFX10.3 has no mad/mac f32 instructions at all. Differential Revision: https://reviews.llvm.org/D88890	2020-10-08 15:20:06 +01:00
Mirko Brkusanin	7c88d13fd1	[AMDGPU] Prefer SplitVectorLoad/Store over expandUnalignedLoad/Store ExpandUnalignedLoad/Store can sometimes produce unnecessary copies to temporary stack slot. We should prefer splitting vectors if possible. Differential Revision: https://reviews.llvm.org/D88882	2020-10-08 10:17:15 +02:00
Mirko Brkusanin	380087e6c9	[AMDGPU] Add test with redundant copies to temporary stack slot produced by expandUnalignedLoad Differential Revision: https://reviews.llvm.org/D88895	2020-10-08 10:17:15 +02:00
Ronak Chauhan	528057c197	[AMDGPU] Support disassembly for AMDGPU kernel descriptors Decode AMDGPU Kernel descriptors as assembler directives. Reviewed By: scott.linder, jhenderson, kzhuravl Differential Revision: https://reviews.llvm.org/D80713	2020-10-07 20:39:43 +05:30
Rodrigo Dominguez	f71f5f39f6	[AMDGPU] Implement hardware bug workaround for image instructions Summary: This implements a workaround for a hardware bug in gfx8 and gfx9, where register usage is not estimated correctly for image_store and image_gather4 instructions when D16 is used. Change-Id: I4e30744da6796acac53a9b5ad37ac1c2035c8899 Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D81172	2020-10-07 07:39:52 -04:00
Carl Ritson	ea9d6392f4	Fix reordering of instructions during VirtRegRewriter unbundling When unbundling COPY bundles in VirtRegRewriter the start of the bundle is not correctly referenced in the unbundling loop. The effect of this is that unbundled instructions are sometimes inserted out-of-order, particular in cases where multiple reordering have been applied to avoid clobbering dependencies. The resulting instruction sequence clobbers dependencies. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D88821	2020-10-06 09:43:02 +09:00
Amara Emerson	c2bce848ec	[GlobalISel] Fix CSEMIRBuilder silently allowing use-before-def. If a CSEMIRBuilder query hits the instruction at the current insert point, move insert point ahead one so that subsequent uses of the builder don't end up with uses before defs. This fix also shows that AMDGPU was also affected by this bug often, but got away with it because it was using a G_IMPLICIT_DEF before the use. Differential Revision: https://reviews.llvm.org/D88605	2020-10-05 11:00:00 -07:00
Carl Ritson	707c3d4d42	[AMDGPU][RegAlloc][SplitKit] Pre-commit test for D88821	2020-10-05 20:35:42 +09:00
Jay Foad	16778b19f2	[AMDGPU] Make bfe patterns divergence-aware This tends to increase code size but more importantly it reduces vgpr usage, and could avoid costly readfirstlanes if the result needs to be in an sgpr. Differential Revision: https://reviews.llvm.org/D88580	2020-10-05 09:55:10 +01:00
Carl Ritson	5136f4748a	CodeGen: Fix livein calculation in MachineBasicBlock splitAt Fix and simplify computation of liveins for new block. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D88535	2020-10-02 10:45:04 +09:00
Matt Arsenault	89baeaef2f	Reapply "RegAllocFast: Rewrite and improve" This reverts commit `73a6a164b8`.	2020-09-30 10:35:25 -04:00
Jay Foad	cdac4492b4	[SplitKit] Cope with no live subranges in defFromParent Following on from D87757 "[SplitKit] Only copy live lanes", it is possible to split a live range at a point when none of its subranges are live. This patch handles that case by inserting an implicit def of the superreg. Patch by Quentin Colombet! Differential Revision: https://reviews.llvm.org/D88397	2020-09-30 10:16:25 +01:00
Mirko Brkusanin	0249df33fe	[AMDGPU] Do not generate mul with 1 in AMDGPU Atomic Optimizer Check if operand of mul is constant value of one for certain atomic instructions in order to avoid making unnecessary instructions when -amdgpu-atomic-optimizer is present. Differential Revision: https://reviews.llvm.org/D88315	2020-09-30 11:09:18 +02:00
Mirko Brkusanin	8b08fa0103	Revert "[AMDGPU] Reorganize GCN subtarget features for unaligned access" This reverts commit `f5cd7ec9f3`. Certain rocPRIM/rocThrust/hipCUB tests were failing because of this change.	2020-09-29 15:33:34 +02:00
Ruiling Song	73805329ba	[RegisterCoalescer] Pass Undefs to extendToIndices() When extending the subranges, the reaching-def may be an undefs. When extending such kind of subrange, it will try to search for the reaching def first. If the reaching def is an undef and we did not provide 'Undefs', The findReachingDefs() will fail with message: "Use of $noreg does not have a corresponding definition on every path: LLVM ERROR: Use not jointly dominated by defs." So we computeSubRangeUndefs() and pass the result to extendToIndices(). Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D87744	2020-09-29 08:14:24 +08:00
Jay Foad	bab1a17ad7	[AMDGPU] Add bfi immediate pattern Differential Revision: https://reviews.llvm.org/D88246	2020-09-28 10:16:51 +01:00
Jay Foad	2806f586dc	[AMDGPU] Make bfi patterns divergence-aware This tends to increase code size but more importantly it reduces vgpr usage, and could avoid costly readfirstlanes if the result needs to be in an sgpr. Differential Revision: https://reviews.llvm.org/D88245	2020-09-28 10:16:51 +01:00
Florian Hahn	915310bf14	Revert "[DSE] Switch to MemorySSA-backed DSE by default." There appears to be a mis-compile with MemorySSA-backed DSE in combination with llvm.lifetime.end. It currently appears like DSE is doing the right thing and the llvm.lifetime.end markers are incorrect. The reverted patch uncovers the mis-compile. This patch temporarily switches back to the legacy DSE implementation, while we investigate. This reverts commit `9d172c8e9c`.	2020-09-26 18:35:27 +01:00
Jay Foad	b34ddfcc76	[SplitKit] In addDeadDef tolerate parent range that defines more lanes Following on from D87757 "[SplitKit] Only copy live lanes", in SplitEditor::addDeadDef, when we're checking whether the parent live interval has a subrange defining the same lanes, tolerate the case where the parent subrange defines a superset of the lanes. This can happen when the child subrange comes from SplitEditor::buildCopy decomposing a partial copy into a sequence of subreg copies that cover the required lanes. Differential Revision: https://reviews.llvm.org/D88020	2020-09-25 11:31:56 +01:00
Stanislav Mekhanoshin	43804364e2	[AMDGPU] Fixes typo in the test. NFC. denormal-fp-math-fp32 -> denormal-fp-math-f32	2020-09-24 16:07:15 -07:00
Matt Arsenault	e75afc9acf	GlobalISel: Use unmerge when copying wide vectors to result registers Avoid using G_EXTRACT and move towards a more consistent vector legalization strategy.	2020-09-24 15:19:51 -04:00
Stanislav Mekhanoshin	27a62f6317	[AMDGPU] global-isel support for RT Differential Revision: https://reviews.llvm.org/D87847	2020-09-24 10:29:45 -07:00
vpykhtin	d9beff04a3	[RegisterCoalescer] Fix IMPLICIT_DEF init removal for a register on joining This patch removes redundant IMPLICIT_DEF for subregs which was leading to incorrect register initialization on joining in some cases. Reviewed by: qcolombet Differential revision: https://reviews.llvm.org/D82258	2020-09-24 17:37:03 +03:00
Sebastian Neubauer	6f7cd16d29	[AMDGPU] Fix v3f16 handling for getresinfo v3f32 should not be expanded to v4f32. getresinfo with a dmask of 7 created an image sample with a v3f32 return value, which was bitcasted to a v4f32 in constructRetValue. Differential Revision: https://reviews.llvm.org/D88206	2020-09-24 16:03:02 +02:00
Matt Arsenault	dc08185ca7	IR: Have byref imply dereferenceable The langref already states it does, but this wasn't implemented. Also covers inalloca and preallocated. Also helps fix a dependence on pointer element types.	2020-09-24 09:57:28 -04:00
Pushpinder Singh	41d6669f1f	[GlobalISel][AMDGPU] Lower G_SMULH/G_UMULH Reviewed By: arsenm, foad Differential Revision: https://reviews.llvm.org/D85653	2020-09-23 22:25:29 -04:00
Carl Ritson	1e0500d4f7	[AMDGPU] Consider all SGPR uses as unique in constant bus verify Fix the verifier so that overlapping SGPR operands are counted independently. We cannot assume that overlapping SGPR accesses only count as a single constant bus use. The exception is implicit uses which do not add to constant bus usage (only) when overlapping. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D87748	2020-09-24 10:52:40 +09:00
Stanislav Mekhanoshin	59691dc874	[AMDGPU] Make ds fp atomics overloadable Differential Revision: https://reviews.llvm.org/D87947	2020-09-23 11:39:50 -07:00
Sebastian Neubauer	a343b9b032	Revert "[AMDGPU] Insert waitcnt after returning from call" This reverts commit `ca907bfb57`. According to michel.daenzer, > This completely broke the Mesa radeonsi driver on Navi 14. Xorg + > xterm come up with major corruption & psychedelic colours.	2020-09-23 17:16:39 +02:00
Matt Arsenault	c463fd136e	GlobalISel: Fix truncating shift amount in trunc (shl) combine The shift amount type does not necessarily match the result type. This was inserting a trunc from s32 to s32, which asserted. Just preserve the original shift amount type which can be legalized later.	2020-09-23 09:07:50 -04:00
Matt Arsenault	af0207f2ba	AMDGPU: Check global FP atomics match default FP mode We would always select global FP atomics from atomicrmw fadd, although they have a hardcoded FP mode.	2020-09-23 09:07:50 -04:00
Sebastian Neubauer	ca907bfb57	[AMDGPU] Insert waitcnt after returning from call When memory operations are outstanding on function calls, either the caller or the callee can insert a waitcnt to ensure that all reads are finished. Calls need some time to be executed, so if the callee inserts the waitcnt, filling the instruction buffer and waiting for memory will be interleaved, hiding some latency. This comes at the cost of having a waitcnt inside functions that may not be needed as no memory operations are outstanding. For function calls, this is already implemented. The same principal applies to returns: If the caller inserts a waitcnt after the call, the callee does not have to wait and the return and memory operation can be run in parallel. This commit implements waiting in the caller after returning from a function call. Differential Revision: https://reviews.llvm.org/D87674	2020-09-23 12:17:59 +02:00
Piotr Sobczak	8d7fd73c3a	[AMDGPU] Fix merging m0 inits Fix incorrect merges of m0 inits in loops. It was assumed that if a clobbering instruction appears in the same block as an init and the clobbering instruction does not dominate the init then it does not interfere with init. This does not work in the presence of loops, where in this scenario, the clobbering instruction does interfere with the init in another iteration. To fix this, do not check for block equality and defer the decision to the predecessor check. Differential Revision: https://reviews.llvm.org/D87882	2020-09-23 09:13:43 +02:00
Michael Liao	534f6e1718	[PeepholeOptimizer] Enhance the redundant COPY elimination. - Eliminate redundant COPYs from the same register & subregister pair. Differential Revision: https://reviews.llvm.org/D87939	2020-09-22 10:11:37 -04:00
Jay Foad	892ef2e3c0	[AMDGPU] More codegen patterns for v2i16/v2f16 build_vector It's simpler to do this at codegen time than to do ad-hoc constant folding of machine instructions in SIFoldOperands. Differential Revision: https://reviews.llvm.org/D88028	2020-09-22 10:41:38 +01:00
Muhammad Omair Javaid	73a6a164b8	Revert "Reapply Revert "RegAllocFast: Rewrite and improve"" This reverts commit `55f9f87da2`. Breaks following buildbots: http://lab.llvm.org:8011/builders/lldb-arm-ubuntu/builds/4306 http://lab.llvm.org:8011/builders/lldb-aarch64-ubuntu/builds/9154	2020-09-22 14:40:06 +05:00
Matt Arsenault	55f9f87da2	Reapply Revert "RegAllocFast: Rewrite and improve" This reverts commit `dbd53a1f0c`. Needed lldb test updates	2020-09-21 15:45:27 -04:00
Eric Christopher	dbd53a1f0c	Temporarily Revert "RegAllocFast: Rewrite and improve" as it's breaking a few tests in the lldb test suite. Bot: http://lab.llvm.org:8011/builders/lldb-arm-ubuntu/builds/4226/steps/test/logs/stdio This reverts commit `c8757ff3aa`.	2020-09-18 18:11:21 -07:00
Matt Arsenault	c8757ff3aa	RegAllocFast: Rewrite and improve This rewrites big parts of the fast register allocator. The basic strategy of doing block-local allocation hasn't changed but I tweaked several details: Track register state on register units instead of physical registers. This simplifies and speeds up handling of register aliases. Process basic blocks in reverse order: Definitions are known to end register livetimes when walking backwards (contrary when walking forward then uses may or may not be a kill so we need heuristics). Check register mask operands (calls) instead of conservatively assuming everything is clobbered. Enhance heuristics to detect killing uses: In case of a small number of defs/uses check if they are all in the same basic block and if so the last one is a killing use. Enhance heuristic for copy-coalescing through hinting: We check the first k defs of a register for COPYs rather than relying on there just being a single definition. When testing this on the full llvm test-suite including SPEC externals I measured: average 5.1% reduction in code size for X86, 4.9% reduction in code on aarch64. (ranging between 0% and 20% depending on the test) 0.5% faster compiletime (some analysis suggests the pass is slightly slower than before, but we more than make up for it because later passes are faster with the reduced instruction count) Also adds a few testcases that were broken without this patch, in particular bug 47278. Patch mostly by Matthias Braun	2020-09-18 14:05:18 -04:00
Matt Arsenault	870fd53e4f	Reapply "RegAllocFast: Record internal state based on register units" The regressions this caused should be fixed when https://reviews.llvm.org/D52010 is applied. This reverts commit `a21387c654`.	2020-09-18 14:05:18 -04:00
Matt Arsenault	0576f436e5	AMDGPU: Don't sometimes allow instructions before lowered si_end_cf Since `6524a7a2b9`, this would sometimes not emit the or to exec at the beginning of the block, where it really has to be. If there is an instruction that defines one of the source operands, split the block and turn the si_end_cf into a terminator. This avoids regressions when regalloc fast is switched to inserting reloads at the beginning of the block, instead of spills at the end of the block. In a future change, this should always split the block.	2020-09-18 13:43:01 -04:00
Matt Arsenault	27df165270	Revert "[amdgpu] Lower SGPR-to-VGPR copy in the final phase of ISel." This reverts commit `c3492a1aa1`. I think this is the wrong strategy and wrong place to do this transform anyway. Also reverts follow up commit `7d593d0d69`.	2020-09-18 09:48:33 -04:00
Mirko Brkusanin	ae36c02ad0	[AMDGPU] Set DS alignment requirements to be more strict Alignment requirements for ds_read/write_b96/b128 for gfx9 and onward are now the same as for other GCN subtargets. This way we can avoid any unintentional use of these instructions on systems that do not support dword alignment and instead require natural alignment. This also makes 'SH_MEM_CONFIG.alignment_mode == STRICT' the default. Differential Revision: https://reviews.llvm.org/D87821	2020-09-18 15:26:24 +02:00
Florian Hahn	9d172c8e9c	Recommit "[DSE] Switch to MemorySSA-backed DSE by default." This switches to using DSE + MemorySSA by default again, after fixing the issues reported after the first commit. Notable fixes `fc82006331`, `a0017c2bc2`. This reverts commit `3a59628f3c`.	2020-09-18 11:05:00 +01:00
Michael Liao	c3492a1aa1	[amdgpu] Lower SGPR-to-VGPR copy in the final phase of ISel. - Need to lower COPY from SGPR to VGPR to a real instruction as the standard COPY is used where the source and destination are from the same register bank so that we potentially coalesc them together and save one COPY. Considering that, backend optimizations, such as CSE, won't handle them. However, the copy from SGPR to VGPR always needs materializing to a native instruction, it should be lowered into a real one before other backend optimizations. Differential Revision: https://reviews.llvm.org/D87556	2020-09-17 11:04:17 -04:00
alex-t	0efbb70b71	[AMDGPU] should expand ROTL i16 to shifts. Instruction combining pass turns library rotl implementation to llvm.fshl.i16. In the selection dag the intrinsic is turned to ISD::ROTL node that cannot be selected. Need to expand it to shifts again. Reviewed By: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D87618	2020-09-17 17:34:33 +03:00
Jay Foad	6f6d389da5	[SplitKit] Only copy live lanes When splitting a live interval with subranges, only insert copies for the lanes that are live at the point of the split. This avoids some unnecessary copies and fixes a problem where copying dead lanes was generating MIR that failed verification. The test case for this is test/CodeGen/AMDGPU/splitkit-copy-live-lanes.mir. Without this fix, some earlier live range splitting would create %430: %430 [256r,848r:0)[848r,2584r:1) 0@256r 1@848r L0000000000000003 [848r,2584r:0) 0@848r L0000000000000030 [256r,2584r:0) 0@256r weight:1.480938e-03 ... 256B undef %430.sub2:vreg_128 = V_LSHRREV_B32_e32 16, %20.sub1:vreg_128, implicit $exec ... 848B %430.sub0:vreg_128 = V_AND_B32_e32 %92:sreg_32, %20.sub1:vreg_128, implicit $exec ... 2584B %431:vreg_128 = COPY %430:vreg_128 Then RAGreedy::tryLocalSplit would split %430 into %432 and %433 just before 848B giving: %432 [256r,844r:0) 0@256r L0000000000000030 [256r,844r:0) 0@256r weight:3.066802e-03 %433 [844r,848r:0)[848r,2584r:1) 0@844r 1@848r L0000000000000030 [844r,2584r:0) 0@844r L0000000000000003 [844r,844d:0)[848r,2584r:1) 0@844r 1@848r weight:2.831776e-03 ... 256B undef %432.sub2:vreg_128 = V_LSHRREV_B32_e32 16, %20.sub1:vreg_128, implicit $exec ... 844B undef %433.sub0:vreg_128 = COPY %432.sub0:vreg_128 { internal %433.sub2:vreg_128 = COPY %432.sub2:vreg_128 848B } %433.sub0:vreg_128 = V_AND_B32_e32 %92:sreg_32, %20.sub1:vreg_128, implicit $exec ... 2584B %431:vreg_128 = COPY %433:vreg_128 Note that the copy from %432 to %433 at 844B is a curious bundle-without-a-BUNDLE-instruction that SplitKit creates deliberately, and it includes a copy of .sub0 which is not live at this point, and that causes it to fail verification: * Bad machine code: No live subrange at use * - function: zextload_global_v64i16_to_v64i64 - basic block: %bb.0 (0x7faed48) [0B;2848B) - instruction: 844B undef %433.sub0:vreg_128 = COPY %432.sub0:vreg_128 - operand 1: %432.sub0:vreg_128 - interval: %432 [256r,844r:0) 0@256r L0000000000000030 [256r,844r:0) 0@256r weight:3.066802e-03 - at: 844B Using real bundles with a BUNDLE instruction might also fix this problem, but the current fix is less invasive and also avoids some unnecessary copies. https://bugs.llvm.org/show_bug.cgi?id=47492 Differential Revision: https://reviews.llvm.org/D87757	2020-09-17 09:26:11 +01:00
Jay Foad	d49707cf4b	[AMDGPU] Generate test checks for splitkit-copy-bundle.mir This is a pre-commit for D87757 "[SplitKit] Only copy live lanes".	2020-09-17 09:26:09 +01:00
Stanislav Mekhanoshin	91f503c3af	[AMDGPU] gfx1030 RT support Differential Revision: https://reviews.llvm.org/D87782	2020-09-16 11:40:58 -07:00
Matt Arsenault	88bdcbbf1a	GlobalISel: Lift store value widening restriction This doesn't change the memory size and doesn't need to worry about non-power-of-2 sizes.	2020-09-16 14:25:07 -04:00
Matt Arsenault	738c73a454	RegAllocFast: Make self loop live-out heuristic more aggressive This currently has no impact on code, but prevents sizeable code size regressions after D52010. This prevents spilling and reloading all values inside blocks that loop back. Add a baseline test which would regress without this patch.	2020-09-16 13:12:38 -04:00
Matt Arsenault	367248956e	AMDGPU: Clear offset register when using local stack area eliminateFrameIndex won't fix up the offset register when the direct frame index reference is moved to a separate move instruction. Switch the offset to a base 0 (which it probably should be to begin with).	2020-09-16 12:56:40 -04:00
Matt Arsenault	deae5e567d	AMDGPU: Add baseline test for incorrect SP access	2020-09-16 12:56:40 -04:00
Dmitry Preobrazhensky	06d058afec	[AMDGPU] Corrected directive to use for ELF weak refs WeakRefDirective should specify a directive to declare "a global as being a weak undefined symbol". The directive used by AMDGPU was incorrect - ".weakref" was intended for other purposes. The correct directive is ".weak" and it is already defined as default for ELF. So the redefinition was removed. Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D87762	2020-09-16 18:51:26 +03:00
Matt Arsenault	71131db689	AMDGPU: Improve <2 x i24> arguments and return value handling This was asserting for GlobalISel. For SelectionDAG, this was passing this on the stack. Instead, scalarize this as if it were a 32-bit vector.	2020-09-16 11:21:56 -04:00
Sebastian Neubauer	833b3b0d3a	[AMDGPU] Add v3f16/v3i16 support to SDag Fix lowering and instruction selection for v3x16 types and enable InstCombine to emit them. This patch only implements it for the selection dag. GlobalISel tests in GlobalISel/llvm.amdgcn.image.load.1d.d16.ll and GlobalISel/llvm.amdgcn.image.store.2d.d16.ll still don't work. Differential Revision: https://reviews.llvm.org/D84420	2020-09-16 17:20:27 +02:00
Jay Foad	90777e2924	[AMDGPU] Enable scheduling around FP MODE-setting instructions Pre-gfx10 all MODE-setting instructions were S_SETREG_B32 which is marked as having unmodeled side effects, which makes the machine scheduler treat it as a barrier. Now that we have proper implicit $mode operands we can use a no-side-effects S_SETREG_B32_mode pseudo instead for setregs that only touch the FP MODE bits, to give the scheduler more freedom. Differential Revision: https://reviews.llvm.org/D87446	2020-09-16 16:10:47 +01:00
Jay Foad	54bb9e8649	[AMDGPU] Add -show-mc-encoding to setreg tests This is a pre-commit for D87446 "[AMDGPU] Enable scheduling around FP MODE-setting instructions"	2020-09-16 16:09:47 +01:00
Alina Sbirlea	3b3ca5c989	Fix test after D86156.	2020-09-15 19:13:39 -07:00
Volkan Keles	a4e35cc2ec	GlobalISel: Add combines for G_TRUNC https://reviews.llvm.org/D87050	2020-09-15 15:50:34 -07:00
Stanislav Mekhanoshin	277de43d88	[AMDGPU] Unify intrinsic ret/nortn interface We have a single noret intrinsic an a lot of special handling around it. Declare it just as any other but do not define rtn instructions itself instead. Differential Revision: https://reviews.llvm.org/D87719	2020-09-15 15:26:42 -07:00
Florian Hahn	3a59628f3c	Revert "[DSE] Switch to MemorySSA-backed DSE by default." This reverts commit `fb109c42d9`. Temporarily revert due to a mis-compile pointed out at D87163.	2020-09-15 18:07:56 +01:00
Hans Wennborg	a21387c654	Revert "RegAllocFast: Record internal state based on register units" This seems to have caused incorrect register allocation in some cases, breaking tests in the Zig standard library (PR47278). As discussed on the bug, revert back to green for now. > Record internal state based on register units. This is often more > efficient as there are typically fewer register units to update > compared to iterating over all the aliases of a register. > > Original patch by Matthias Braun, but I've been rebasing and fixing it > for almost 2 years and fixed a few bugs causing intermediate failures > to make this patch independent of the changes in > https://reviews.llvm.org/D52010. This reverts commit `66251f7e1d`, and follow-ups `931a68f26b` and `0671a4c508`. It also adjust some test expectations.	2020-09-15 13:25:41 +02:00
Petar Avramovic	9b4fa85434	GlobalISel/IRTranslator resetTargetOptions based on function attributes Update TargetMachine.Options with function attributes before we start to generate MIR instructions. This allows access to correct function attributes via TargetMachine.Options (it used to access attributes of the function that was translated first). This affects some existing tests with "no-nans-fp-math" attribute. Follow-up on D87456. Differential Revision: https://reviews.llvm.org/D87511	2020-09-15 10:26:09 +02:00
Quentin Colombet	b3afad0463	[GlobalISel] Add a `X, Y = G_UNMERGE(G_ZEXT Z)` -> X = G_ZEXT Z; Y = 0 combine Add a combiner helper to transform unmerge of zext into one zext and a constant 0 Differential Revision: https://reviews.llvm.org/D87427	2020-09-14 17:27:23 -07:00
Quentin Colombet	d2321129bd	[GlobalISel] Add `X,Y<dead> = G_UNMERGE Z` -> X = G_TRUNC Z Add a combiner helper that replaces G_UNMERGE where all the destination lanes are dead except the first one with a G_TRUNC. Differential Revision: https://reviews.llvm.org/D87174	2020-09-14 17:27:23 -07:00
Quentin Colombet	a36278c2f8	[GlobalISel] Add G_UNMERGE(Cst) -> Cst1, Cst2, ... combine Add a combiner helper that replaces G_UNMERGE of big constants into direct use of smaller constants. Differential Revision: https://reviews.llvm.org/D87166	2020-09-14 16:30:18 -07:00
Craig Topper	c193a689b4	[SelectionDAG] Use Align/MaybeAlign in calls to getLoad/getStore/getExtLoad/getTruncStore. The versions that take 'unsigned' will be removed in the future. I tried to use getOriginalAlign instead of getAlign in some places. getAlign factors in the minimum alignment implied by the offset in the pointer info. Since we're also passing the pointer info we can use the original alignment. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D87592	2020-09-14 13:54:50 -07:00
Austin Kerbow	f859c30ecb	[AMDGPU] Add XDL resource to scheduling model Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D87621	2020-09-14 13:48:54 -07:00
Jay Foad	c799f873cb	[AMDGPU] Don't cluster stores Clustering loads has caching benefits, but as far as I know there is no advantage to clustering stores on any AMDGPU subtargets. The disadvantage is that it tends to increase register pressure and restricts scheduling freedom. Differential Revision: https://reviews.llvm.org/D85530	2020-09-14 13:40:17 +01:00
Georgii Rymar	e9c314611b	[llvm-readelf/obj] - Refine and generalize the code that is used to dump notes. There is some code that can be shared between GNU/LLVM styles. Also, this fixes 2 inconsistencies related to dumping unknown note types: 1) For GNU style we printed "Unknown note type: (0x00000003)" in some cases, and "Unknown note type (0x00000003)" (no colon) in other cases. GNU readelf always prints `:`. This patch removes the related code duplication and does the same. 2) For LLVM style in some cases we printed "Unknown note type (0x00000003)", but sometimes just "Unknown (0x00000003)". The latter is the right form, which is consistent with other unknowns that are printed in LLVM style. Rebased on top of D87453. Differential revision: https://reviews.llvm.org/D87454	2020-09-14 14:31:50 +03:00
Petar Avramovic	6e2a86ed5a	AMDGPU/GlobalISel Check for NoNaNsFPMath in isKnownNeverSNaN Check for NoNaNsFPMath function attribute in isKnownNeverSNaN. Function attributes are in held in 'TargetMachine.Options'. Among other things, this allows selection of some patterns imported in D87351 since G_FCANONICALIZE is not generated when isKnownNeverSNaN returns true in lowerFMinNumMaxNum. However we notice some incorrect results since function attributes are not correctly written in TargetMachine.Options when next function is processed. Take a look at @v_test_no_global_nnans_med3_f32_pat0_srcmod0, it has "no-nans-fp-math"="false" but TargetMachine.Options still has it set to true since first function in test file had this attribute set to true. This will be fixed in D87511. Differential Revision: https://reviews.llvm.org/D87456	2020-09-14 12:11:00 +02:00
Petar Avramovic	416346d1ca	AMDGPU/GlobalISel/Emitter Recognize additional 'same operand checks' The "name" of a non-leaf complex pattern (MY_PAT $op1, $op2) is "MY_PAT:op1:op2" and the ones with same "name" represent same operand. Add 'same operand check' for this case. Differential Revision: https://reviews.llvm.org/D87351	2020-09-14 12:10:59 +02:00
Petar Avramovic	0c8f4cd657	AMDGPU/GlobalISel Add test for non-leaf complex patterns GlobalIsel emitter does not import patterns where complex sub-operand of a non-leaf complex pattern is referenced more then once. Multiple references of complex patterns with same name and same sub-operands represent the same operand. Document this with a test.	2020-09-14 12:10:59 +02:00
Petar Avramovic	09b8871f8d	AMDGPU/GlobalISel/Emitter Support for predicate code that uses operands Predicates with 'let PredicateCodeUsesOperands = 1' want to examine matched operands. When we encounter predicate code that uses operands, analyze its named operand arguments and create a map between argument index and name. Later, when leaf node with name is encountered, emit GIM_RecordNamedOperand that will store that operand at its argument index in operand list. This operand list will be an argument to c++ code of the predicate. Differential Revision: https://reviews.llvm.org/D87285	2020-09-14 10:39:56 +02:00
Matt Arsenault	e21bb31eb6	CodeGen: Require SSA to run PeepholeOptimizer	2020-09-11 18:03:04 -04:00
Jay Foad	06e356c81e	[AMDGPU] Make movreld-bug test case more robust Without this, future optimizer improvements can optimize the entire function to "return 0".	2020-09-11 10:25:29 +01:00
Michael Liao	f787fe15d8	[EarlyCSE] Remove unnecessary operand swap. - As min/max are commutative operators, there is no need to swap operands. That breaks the convention calculating the hash value.	2020-09-11 02:14:04 -04:00
Florian Hahn	fb109c42d9	[DSE] Switch to MemorySSA-backed DSE by default. The tests have been updated and I plan to move them from the MSSA directory up. Some end-to-end tests needed small adjustments. One difference to the legacy DSE is that legacy DSE also deletes trivially dead instructions that are unrelated to memory operations. Because MemorySSA-backed DSE just walks the MemorySSA, we only visit/check memory instructions. But removing unrelated dead instructions is not really DSE's job and other passes will clean up. One noteworthy change is in llvm/test/Transforms/Coroutines/ArgAddr.ll, but I think this comes down to legacy DSE not handling instructions that may throw correctly in that case. To cover this with MemorySSA-backed DSE, we need an update to llvm.coro.begin to treat it's return value to belong to the same underlying object as the passed pointer. There are some minor cases MemorySSA-backed DSE currently misses, e.g. related to atomic operations, but I think those can be implemented after the switch. This has been discussed on llvm-dev: http://lists.llvm.org/pipermail/llvm-dev/2020-August/144417.html For the MultiSource/SPEC2000/SPEC2006 the number of eliminated stores goes from ~17500 (legayc DSE) to ~26300 (MemorySSA-backed). More numbers and details in the thread on llvm-dev. Impact on CTMark: ``` Legacy Pass Manager exec instrs size-text O3 + 0.60% - 0.27% ReleaseThinLTO + 1.00% - 0.42% ReleaseLTO-g. + 0.77% - 0.33% RelThinLTO (link only) + 0.87% - 0.42% RelLO-g (link only) + 0.78% - 0.33% ``` http://llvm-compile-time-tracker.com/compare.php?from=3f22e96d95c71ded906c67067d75278efb0a2525&to=ae8be4642533ff03803967ee9d7017c0d73b0ee0&stat=instructions ``` New Pass Manager exec instrs. size-text O3 + 0.95% - 0.25% ReleaseThinLTO + 1.34% - 0.41% ReleaseLTO-g. + 1.71% - 0.35% RelThinLTO (link only) + 0.96% - 0.41% RelLO-g (link only) + 2.21% - 0.35% ``` http://195.201.131.214:8000/compare.php?from=3f22e96d95c71ded906c67067d75278efb0a2525&to=ae8be4642533ff03803967ee9d7017c0d73b0ee0&stat=instructions Reviewed By: asbirlea, xbolva00, nikic Differential Revision: https://reviews.llvm.org/D87163	2020-09-10 22:24:32 +01:00
Matt Arsenault	85490874b2	AMDGPU: Skip all meta instructions in hazard recognizer This was not adding a necessary nop due to thinking the kill counted.	2020-09-09 19:45:40 -04:00
Matt Arsenault	82cbc9330a	AMDGPU: Fix inserting waitcnts before kill uses	2020-09-09 19:45:40 -04:00
dfukalov	c259d3a061	[AMDGPU] Fix for folding v2.16 literals. It was found some packed immediate operands (e.g. `<half 1.0, half 2.0>`) are incorrectly processed so one of two packed values were lost. Introduced new function to check immediate 32-bit operand can be folded. Converted condition about current op_sel flags value to fall-through. Fixes: SWDEV-247595 Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D87158	2020-09-10 01:39:25 +03:00
Amara Emerson	e5784ef8f6	[GlobalISel] Enable usage of BranchProbabilityInfo in IRTranslator. We weren't using this before, so none of the MachineFunction CFG edges had the branch probability information added. As a result, block placement later in the pipeline was flying blind. This is enabled only with optimizations enabled like SelectionDAG. Differential Revision: https://reviews.llvm.org/D86824	2020-09-09 14:31:12 -07:00
Amara Emerson	cc76da7ada	[GlobalISel] Rewrite the elide-br-by-swapping-icmp-ops combine to do less. This combine previously tried to take sequences like: %cond = G_ICMP pred, a, b G_BRCOND %cond, %truebb G_BR %falsebb %truebb: ... %falsebb: ... and by inverting the compare predicate and swapping branch targets, delete the G_BR and instead have a single conditional branch to the falsebb. Since in an earlier patch we have a combine to fold not(icmp) into just an inverted icmp, we don't need this combine to do as much. This patch instead generalizes the combine by just looking for: G_BRCOND %cond, %truebb G_BR %falsebb %truebb: ... %falsebb: ... and then inverting the condition using a not (xor). The xor can be folded away in a separate combine. This change also lets us avoid some optimization code in the IRTranslator. I also think that deleting G_BRs in the combiner is unnecessary. That's something that targets can decide to do at selection time and could simplify generic code in future. Differential Revision: https://reviews.llvm.org/D86664	2020-09-09 13:08:16 -07:00
Ronak Chauhan	f078577f31	Revert "[AMDGPU] Support disassembly for AMDGPU kernel descriptors" This reverts commit `487a805310`. Tests fail on big endian machines.	2020-09-09 18:01:28 +05:30

1 2 3 4 5 ...

4174 Commits