This patch adds support for S_ANDN2, S_ORN2 32-bit and 64-bit instructions and adds splits to move them to the vector unit (for which there is no equivalent instruction). It modifies the way that the more complex scalar instructions are lowered to vector instructions by first breaking them down to sequences of simpler scalar instructions which are then lowered through the existing code paths. The pattern for S_XNOR has also been updated to apply inversion to one input rather than the output of the XOR as the result is equivalent and may allow leaving the NOT instruction on the scalar unit.
A new tests for NAND, NOR, ANDN2 and ORN2 have been added, and existing tests now hit the new instructions (and have been modified accordingly).
Differential: https://reviews.llvm.org/D54714
llvm-svn: 347877
See https://reviews.llvm.org/D47106 for details.
Reviewed By: probinson
Differential Revision: https://reviews.llvm.org/D47171
This commit drops that patch's changes to:
llvm/test/CodeGen/NVPTX/f16x2-instructions.ll
llvm/test/CodeGen/NVPTX/param-load-store.ll
For some reason, the dos line endings there prevent me from commiting
via the monorepo. A follow-up commit (not via the monorepo) will
finish the patch.
llvm-svn: 336843
Currently the default C calling convention functions are treated
the same as compute kernels. Make this explicit so the default
calling convention can be changed to a non-kernel.
Converted with perl -pi -e 's/define void/define amdgpu_kernel void/'
on the relevant test directories (and undoing in one place that actually
wanted a non-kernel).
llvm-svn: 298444
Summary:
The SILoadStoreOptimizer can now look ahead more then one instruction when
looking for instructions to merge, which greatly improves the number of
loads/stores that we are able to merge.
Moving the pass before scheduling avoids increasing register pressure after
the scheduler, so that the scheduler's register pressure estimates will be
more accurate. It also gives more consistent results, since it is no longer
affected by minor scheduling changes.
Reviewers: arsenm
Subscribers: arsenm, kzhuravl, llvm-commits
Differential Revision: https://reviews.llvm.org/D23814
llvm-svn: 279991
Another step for unification llvm assembler/disassembler with sp3.
Besides, CodeGen output is a bit improved, thus changes in CodeGen tests.
Assembler/Disassembler tests updated/added.
Differential Revision: http://reviews.llvm.org/D20796
llvm-svn: 271900
This basic combine was surprisingly missing.
AMDGPU legalizes many operations in terms of 32-bit vector components,
so not doing this results in many extra copies and subregister extracts
that need to be cleaned up later.
InstCombine already does this for the hasOneUse case. The target hook
is to fix a handful of tests which break (e.g. ARM/vmov.ll) which turn
from a vector materialize repeated immediate instruction to a constant
vector load with more scalar copies from it.
llvm-svn: 250129