Summary:
D64497 allowed abs/neg source modifiers on v_cndmask_b32 but it doesn't
make any sense to apply them to f16 operands; they would interpret the
bits of the value as an f32, giving nonsensical results. This patch
restricts them to f32 operands.
Reviewers: arsenm, hakzsam
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64636
llvm-svn: 365904
Summary:
D59191 added support for these modifiers in the assembler and
disassembler. This patch just teaches instruction selection that it can
use them.
Reviewers: arsenm, tstellar
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64497
llvm-svn: 365640
In SelectionDAG AMDGPU treated these as legal, but this was mostly
because the bitcasts required for FP types were painful. Theoretically
the bitpattern should eventually match to bfi, so don't bother trying
to get the patterns to import.
llvm-svn: 365583
Basically the problem is that X86 doesn't set the Fast flag from
allowsMemoryAccess on certain CPUs due to slow unaligned memory
subtarget features. This prevents bitcasts from being folded into
loads and stores. But all vector loads and stores of the same width
are the same cost on X86.
This patch merges the allowsMemoryAccess call into isLoadBitCastBeneficial to allow X86 to skip it.
Differential Revision: https://reviews.llvm.org/D64295
llvm-svn: 365549
Function return instruction lowering, currently uses the fixed register pair s[30:31] for holding
the return address. It can be any SGPR pair other than the CSRs. Created an SGPR pair sub-register class
exclusive of the CSRs, and used this regclass while lowering the return instruction.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D63924
llvm-svn: 365512
Make the FP register callee saved.
This is tricky because now the FP needs to be spilled in the prolog
relative to the incoming SP register, rather than the frame register
used throughout the rest of the function. I don't like how this
bypassess the standard mechanism for CSR spills just to get the
correct insert point. I may look for a better solution, since all CSR
VGPRs may also need to have all lanes activated. Another option might
be to make getFrameIndexReference change the base register if the
frame index is a CSR, and then try to figure out the right insertion
point in emitProlog.
If there is a free VGPR lane available for SGPR spilling, try to use
it for the FP. If that would require intrtoducing a new VGPR spill,
try to use a free call clobbered SGPR. Only fallback to introducing a
new VGPR spill as a last resort.
This also doesn't attempt to handle SGPR spilling with scalar stores.
llvm-svn: 365372
Summary of changes:
- simplified handling of FLAT offset: offset_s13 and offset_u12 have been replaced with flat_offset;
- provided information about error position for pre-gfx9 targets;
- improved errors handling.
Reviewers: artem.tamazov, arsenm, rampitec
Differential Revision: https://reviews.llvm.org/D64244
llvm-svn: 365321
Add a string attribute instead of directly setting
MachineFunctionInfo. This avoids trying to get the analysis in the
MachineFunctionInfo in a way that doesn't work with the new pass
manager.
This will also avoid re-visiting the call graph for every single
function.
llvm-svn: 365241
Summary:
Since the changes to introduce vec3 and vec5, INSERT_VECTOR for these
sizes has been marked "expand", which made LegalizeDAG lower it to loads
and stores via a stack slot. The code got optimized a bit later, but the
now-unused stack slot was never deleted.
This commit avoids that problem by custom lowering INSERT_SUBVECTOR into
an EXTRACT_VECTOR_ELT and INSERT_VECTOR_ELT for each element in the
subvector to insert.
V2: Addressed review comments re test.
Differential Revision: https://reviews.llvm.org/D63160
Change-Id: I9e3c13e36f68cfa3431bb9814851cc1f673274e1
llvm-svn: 365148