It should be enabled only when the load alignment is at least 8-byte.
Fixes: SWDEV-256824
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D90404
* Factor out common elements of the input YAML document and use sed to
macro replace the run line specific elements.
* Add checks for the common elements which depend on the ELF class.
* Use non-numeric suffix for temporary files to avoid merge conflicts.
* Sort tests by GFX# ascending.
* Group ELF and YAML tests by GFX#.
Reviewed By: t-tye
Differential Revision: https://reviews.llvm.org/D90245
This reverts r227987 "R600/SI: Determine target-specific encoding of READLANE and WRITELANE early v2".
All the codegen changes are caused by the post-RA scheduler no longer
treating readlane/writelane as scheduling barriers due to having
unmodelled side effects. (The pseudos are hasSideEffects = 0, but the
real instructions are hasSideEffects = ? which TableGen conservatively
treats as 1.)
Differential Revision: https://reviews.llvm.org/D90401
Reset the tracked emitted instructions when starting scheduling on a new
region.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D90347
V_DIV_SCALE_F32/F64 are VOP3B encoded so they can't use the ABS src
modifier, but they can still use NEG and the usual output modifiers.
This partially reverts 3b99f12a4e "AMDGPU: Remove modifiers from v_div_scale_*".
Differential Revision: https://reviews.llvm.org/D90296
https://reviews.llvm.org/D88060
This adds the following combines
1) build_vector formation from insert_vec_elts
2) insert_vec_elts (build_vector) -> build_vector
SIPreAllocateWWMRegs was being inserted after RegisterCoalescer
but this pass does not exist during FastAlloc so pre-allocation
pass was never being run.
Insert pre-allocation after TwoAddressInstructionPass instead.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D90236
- Add an internal option `-amdgpu-use-aa-in-codegen` to enable or
disable this feature. By Default, it's enabled.
Differential Revision: https://reviews.llvm.org/D89320
Exec mask manipulation inserted by SIWholeQuadMode barriers to
instruction scheduling. Move the entire pass after the machine
instruction scheduler and make changes so pass is correct for
non-SSA operation. These changes should leave the pass still
usable pre-scheduler, although tests have be updated to reflect
post-scheduler results.
Reviewed By: nhaehnle
Differential Revision: https://reviews.llvm.org/D88081
The support is disabled by default. So far there is instruction
selection, spilling, and frame elimination. It also changes SP
from unswizzled to swizzled as used by flat scratch instructions,
so it cannot be mixed with MUBUF stack access.
At the very least missing:
- GlobalISel;
- Some optimizations in frame elimination in between vector
and scalar ALU;
- It shall finally allow to always materialize frame index
as an SGPR, but that is not implemented and frame elimination
cannot handle it yet;
- Unaligned and/or multidword flat scratch shall work, but it
is legalized now for MUBUF;
- Operand folding cannot optimize FI like with MUBUF yet;
- It will need scaling the value of the SP/FP in the DWARF
expression to recover the unswizzled scratch address;
Differential Revision: https://reviews.llvm.org/D89170
If no pal metadata is given, default to the msgpack format instead of
the legacy metadata. This makes tests better readable.
Differential Revision: https://reviews.llvm.org/D90035
We use an absolute address for stack objects and
it would be necessary to have a constant 0 for soffset field.
Fixes: SWDEV-228562
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D89234
This commit marks i16 MULH as expand in AMDGPU backend,
which is necessary after the refactoring in D80485.
Differential Revision: https://reviews.llvm.org/D89965
If the end instruction of the scheduling region was a DBG_VALUE, the
uses of the debug instruction were tracked as if they were real
uses. This would then hit the deadDefHasNoUse assertion in
addVRegDefDeps if the only use was the debug instruction.
1. Fixed liveness issue with implicit kills.
2. Fixed potential problem with an indirect mov.
Fixes: SWDEV-256848
Differential Revision: https://reviews.llvm.org/D89599
Fixes being overly conservative with the register counts in called
functions. This should try to do a conservative range merge, but for
now just clone.
Also fix not being able to functionally run the pass standalone.
Passes that are run after the post-RA scheduler may insert instructions like
waitcnt which eliminate the need for certain noops. After this patch the
scheduler is still aware of possible latency from hazards but noops will
not be inserted until the dedicated hazard recognizer pass is run.
Depends on D89753.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D89754
If a target can encode multiple wait-states into a noop allow emitting such
instructions directly.
Reviewed By: rampitec, dmgreen
Differential Revision: https://reviews.llvm.org/D89753
Change waitcnt insertion to check the memory operand tokens to see if
flat memory operations access VMEM in the same way it does to check if
accessing LDS. This avoids adding waitcnt for counters for address
spaces that are not accessed.
In addition, only generate the pessimistic waitcnt 0 if a flat memory
operation appears to access both VMEM and LDS.
This benefits flat memory operations that explicitly specify the
address space as GLOBAL or LOCAL.
Differential Revision: https://reviews.llvm.org/D89618
- In general, a generic point may alias to pointers in all other address
spaces. However, for certain cases enforced by the programming model,
we may found a generic point won't alias to pointers to local objects.
* When a generic pointer is loaded from the constant address space, it
could only be a pointer to the GLOBAL or CONSTANT address space.
Thus, it won't alias to pointers to the PRIVATE or LOCAL address
space.
* When a generic pointer is passed as a kernel argument, it also could
only be a pointer to the GLOBAL or CONSTANT address space. Thus, it
also won't alias to pointers to the PRIVATE or LOCAL address space.
Differential Revision: https://reviews.llvm.org/D89525
Remove immediate operand from SI_ELSE which indicates if EXEC has
been modified. Instead always emit code that handles EXEC and
remove unnecessary instructions during pre-RA optimisation.
This facilitates passes (i.e. SIWholeQuadMode) adding exec mask
manipulation post control flow lowering, and pre control flow
lower passes do not need to be aware of SI_ELSE handling.
Reviewed By: nhaehnle
Differential Revision: https://reviews.llvm.org/D89644
D70365 allows us to make attributes default. This is a follow up to
actually make nosync, nofree and willreturn default. The approach we
chose, for now, is to opt-in to default attributes to avoid introducing
problems to target specific intrinsics. Intrinsics with default
attributes can be created using `DefaultAttrsIntrinsic` class.
S_CMP_LG_U64 was added in gfx8 and is guarded by hasScalarCompareEq64().
Rewrite S_CMP_LG_U64 to S_OR_B32 + S_CMP_LG_U32 for targets that
do not support 64-bit scalar compare.
Differential Revision: https://reviews.llvm.org/D89536
This broke Chromium's PGO build, it seems because hot-cold-splitting got turned
on unintentionally. See comment on the code review for repro etc.
> This patch adds -f[no-]split-cold-code CC1 options to clang. This allows
> the splitting pass to be toggled on/off. The current method of passing
> `-mllvm -hot-cold-split=true` to clang isn't ideal as it may not compose
> correctly (say, with `-O0` or `-Oz`).
>
> To implement the -fsplit-cold-code option, an attribute is applied to
> functions to indicate that they may be considered for splitting. This
> removes some complexity from the old/new PM pipeline builders, and
> behaves as expected when LTO is enabled.
>
> Co-authored by: Saleem Abdulrasool <compnerd@compnerd.org>
> Differential Revision: https://reviews.llvm.org/D57265
> Reviewed By: Aditya Kumar, Vedant Kumar
> Reviewers: Teresa Johnson, Aditya Kumar, Fedor Sergeev, Philip Pfaffe, Vedant Kumar
This reverts commit 273c299d5d.
If instructions were removed in peephole passes after the hazard recognizer was
run it is possible that new hazards could be introduced.
Fixes: SWDEV-253090
Reviewed By: rampitec, arsenm
Differential Revision: https://reviews.llvm.org/D89077
This would end up killing part of the result super-register, resulting
in a verifier error on a later use of the overlapping registers. We
could add kills of any non-aliasing registers, but we should be moving
away from relying on kill flags.
After investigation by @asbirlea, the issue that caused the
revert appears to be an issue in the original source, rather
than a problem with the compiler.
This patch enables MemorySSA DSE again.
This reverts commit 915310bf14.
This patch adds -f[no-]split-cold-code CC1 options to clang. This allows
the splitting pass to be toggled on/off. The current method of passing
`-mllvm -hot-cold-split=true` to clang isn't ideal as it may not compose
correctly (say, with `-O0` or `-Oz`).
To implement the -fsplit-cold-code option, an attribute is applied to
functions to indicate that they may be considered for splitting. This
removes some complexity from the old/new PM pipeline builders, and
behaves as expected when LTO is enabled.
Co-authored by: Saleem Abdulrasool <compnerd@compnerd.org>
Differential Revision: https://reviews.llvm.org/D57265
Reviewed By: Aditya Kumar, Vedant Kumar
Reviewers: Teresa Johnson, Aditya Kumar, Fedor Sergeev, Philip Pfaffe, Vedant Kumar
removeMBBifRedundant normally tries to keep predecessors fallthrough when removing redundant MBB.
It has to change MBBs layout to keep the new successor to immediately follow the predecessor of removed MBB.
It only may be allowed in case the new successor itself has no successors to which it fall through.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D89397
This does unfortunately end up with extra waitcnts getting inserted
that were avoided before. Ideally we would avoid the spills of these
undef components in the first place.
Generate the minimal set of s_mov instructions required when
expanding a SGPR copy operation in copyPhysReg.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D89187
Implement computeKnownBitsForTargetInstr for G_AMDGPU_BUFFER_LOAD_UBYTE
and G_AMDGPU_BUFFER_LOAD_USHORT. This allows generic combines to remove
some unnecessary G_ANDs.
Differential Revision: https://reviews.llvm.org/D89316