There are various hacks working around limitations in
handleAssignments, and the logical split between different parts isn't
correct. Start separating the type legalization to satisfy going
through the DAG infrastructure from the code required to split into
register types. The type splitting should be moved to generic code.
If the return values can't be lowered to registers
SelectionDAG performs the sret demotion. This patch
contains the basic implementation for the same in
the GlobalISel pipeline.
Furthermore, targets should bring relevant changes
during lowerFormalArguments, lowerReturn and
lowerCall to make use of this feature.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D92953
Summary:
This is to avoid unnecessary analysis since amdgpu.noclobber is only used for globals.
Reviewers:
arsenm
Fixes:
SWDEV-239161
Differential Revision:
https://reviews.llvm.org/D94107
It was removed in GFX10 GPUs, but LLVM could
generate it.
Reviewed By: rampitec, arsenm
Differential Revision: https://reviews.llvm.org/D94020
Change-Id: Id1c716d71313edcfb768b2b175a6789ef9b01f3c
Convert it to v_fma_legacy_f32 if it is profitable to do so, just like
other mac instructions that are converted to their mad equivalents.
Differential Revision: https://reviews.llvm.org/D94010
An AMDGPUAA class already existed that was supposed to work with the new
PM, but it wasn't tested and was a bit broken.
Fix up the existing classes to have the right keys/parameters.
Wire up AMDGPUAA inside AMDGPUTargetMachine.
Add it to the list of alias analyses for the "default" AAManager since
in adjustPassManager() amdgpu-aa is added into the pipeline at the
beginning.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D93914
The legacy PM doesn't run EP_ModuleOptimizerEarly on -O0, so skip
running it here when given O0.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D93886
There is a number of transforms in SimplifyCFG that take DomTree out of
DomTreeUpdater, and do updates manually. Until they are fixed,
user passes are unable to claim that PDT is preserved.
Note that the default for SimplifyCFG is still not to preserve DomTree,
so this is still effectively NFC.
This is a (last big?) part of the patch series to make SimplifyCFG
preserve DomTree. Currently, it still does not actually preserve it,
even thought it is pretty much fully updated to preserve it.
Once the default is flipped, a valid DomTree must be passed into
simplifyCFG, which means that whatever pass calls simplifyCFG,
should also be smart about DomTree's.
As far as i can see from `check-llvm` with default flipped,
this is the last LLVM test batch (other than bugpoint tests)
that needed fixes to not break with default flipped.
The changes here are boringly identical to the ones i did
over 42+ times/commits recently already,
so while AMDGPU is outside of my normal ecosystem,
i'm going to go for post-commit review here,
like in all the other 42+ changes.
Note that while the pass is taught to preserve {,Post}DomTree,
it still doesn't do that by default, because simplifycfg
still doesn't do that by default, and flipping default
in this pass will implicitly flip the default for simplifycfg.
That will happen, but not right now.
As mentioned in D93793, there are quite a few places where unary `IRBuilder::CreateShuffleVector(X, Mask)` can be used
instead of `IRBuilder::CreateShuffleVector(X, Undef, Mask)`.
Let's update them.
Actually, it would have been more natural if the patches were made in this order:
(1) let them use unary CreateShuffleVector first
(2) update IRBuilder::CreateShuffleVector to use poison as a placeholder value (D93793)
The order is swapped, but in terms of correctness it is still fine.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D93923
And add it to the AMDGPU opt pipeline.
This is a function pass instead of a module pass (like the legacy pass)
because it's getting added to a CGSCCPassManager, and you can't put a
module pass in a CGSCCPassManager.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D93885
And add to AMDGPU opt pipeline.
Don't pin an opt run to the legacy PM when -enable-new-pm=1 if these
passes (or passes introduced in https://reviews.llvm.org/D93863) are in
the list of passes.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D93875
And add them to the pipeline via
AMDGPUTargetMachine::registerPassBuilderCallbacks(), which mirrors
AMDGPUTargetMachine::adjustPassManager().
These passes can't be unconditionally added to PassRegistry.def since
they are only present when the AMDGPU backend is enabled. And there are
no target-specific headers in llvm/include, so parsing these pass names
must occur somewhere in the AMDGPU directory. I decided the best place
was inside the TargetMachine, since the PassBuilder invokes
TargetMachine::registerPassBuilderCallbacks() anyway. If we come up with
a cleaner solution for target-specific passes in the future that's fine,
but there aren't too many target-specific IR passes living in
target-specific directories so it shouldn't be too bad to change in the
future.
Reviewed By: ychen, arsenm
Differential Revision: https://reviews.llvm.org/D93863
Basic block containing "if" not necessarily dominates block that is the "false" target for the if.
That "false" target block may have another predecessor besides the "if" block. IR value corresponding to the Exec mask is generated by the
si_if intrinsic and then used by the end_cf intrinsic. In this case IR verifier complains that 'Def does not dominate all uses'.
This change split the edge between the "if" block and "false" target block to make it dominated by the "if" block.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D91435
Currently, the compiler crashes in instruction selection of global
load/stores in gfx600 due to the lack of FLAT instructions. This patch
fix the crash by selecting MUBUF instructions for global load/stores
in gfx600.
Authored-by: Praveen Velliengiri <Praveen.Velliengiri@amd.com>
Reviewed by: t-tye
Differential revision: https://reviews.llvm.org/D92483
If we happen to extract a non-dword subreg that breaks the
logic of the function and it may shrink the dmask because
it does not recognize the use of a lane(s).
This bug is next to impossible to trigger with the current
lowering in the BE, but it breaks in one of my future patches.
Differential Revision: https://reviews.llvm.org/D93782
Returning int64_t was arbitrarily limiting for wide integer types, and
the functions should handle the full generality of the IR.
Also changes the full form which returns the originally defined
vreg. Add another wrapper for the common case of just immediately
converting to int64_t (arguably this would be useful for the full
return value case as well).
One possible issue with this change is some of the existing uses did
break without conversion to getConstantVRegSExtVal, and it's possible
some without adequate test coverage are now broken.
It does not seem to fold offsets but this is not specific
to the flat scratch as getPtrBaseWithConstantOffset() does
not return the split for these tests unlike its SDag
counterpart.
Differential Revision: https://reviews.llvm.org/D93670
Adjust SITargetLowering::allowsMisalignedMemoryAccessesImpl for
unaligned flat scratch support. Mostly needed for global isel.
Differential Revision: https://reviews.llvm.org/D93669
... so just ensure that we pass DomTreeUpdater it into it.
Fixes DomTree preservation for a large number of tests,
all of which are marked as such so that they do not regress.
Calling Instruction::copyFastMathFlags() assumes the caller is
FPMathOperator. Avoid calling the function for instructions
that are not instances of FPMathOperator.
I think the global_load/store_dword_addtid instructions support
switching off the scalar address.
Add assembler and disassembler support for this.
Differential Revision: https://reviews.llvm.org/D93288
- Clarify documentation on initializing scratch.
- Rename compute_pgm_rsrc2 field for enabling scratch from
ENABLE_SGPR_PRIVATE_SEGMENT_WAVEFRONT_OFFSET to
ENABLE_PRIVATE_SEGMENT to match hardware definition.
Differential Revision: https://reviews.llvm.org/D93271