VZEROUPPER should not be issued on Knights Landing (KNL), but on Skylake-avx512 it should be.
Differential Revision: https://reviews.llvm.org/D29874
llvm-svn: 296859
Currently the only way to use the (V)MOVNTDQA nontemporal vector loads instructions is through the int_x86_sse41_movntdqa style builtins.
This patch adds support for lowering nontemporal loads from general IR, allowing us to remove the movntdqa builtins in a future patch.
We currently still fold nontemporal loads into suitable instructions, we should probably look at removing this (and nontemporal stores as well) or at least make the target's folding implementation aware that its dealing with a nontemporal memory transaction.
There is also an issue that VMOVNTDQA only acts on 128-bit vectors on pre-AVX2 hardware - so currently a normal ymm load is still used on AVX1 targets.
Differential Review: http://reviews.llvm.org/D20965
llvm-svn: 272010
I'm not sure why this was missing for so long.
This also exposed that we were picking floating point 256-bit VMOVNTPS for some integer types in normal isel for AVX1 even though VMOVNTDQ is available. In practice it doesn't matter due to the execution dependency fix pass, but it required extra isel patterns. Fixing that in a follow up commit.
llvm-svn: 271481
autogenerated.
Also update existing test cases which appear to be generated by it and
weren't modified (other than addition of the header) by rerunning it.
llvm-svn: 253917
Add FastISel support for SSE4A scalar float / double non-temporal stores
Follow up to D13698
Differential Revision: http://reviews.llvm.org/D13773
llvm-svn: 250610
This patch teaches x86 fast-isel how to select nontemporal stores.
On x86, we can use MOVNTI for nontemporal stores of doublewords/quadwords.
Instructions (V)MOVNTPS/PD/DQ can be used for SSE2/AVX aligned nontemporal
vector stores.
Before this patch, fast-isel always selected 'movd/movq' instead of 'movnti'
for doubleword/quadword nontemporal stores. In the case of nontemporal stores
of aligned vectors, fast-isel always selected movaps/movapd/movdqa instead of
movntps/movntpd/movntdq.
With this patch, if we use SSE2/AVX intrinsics for nontemporal stores we now
always get the expected (V)MOVNT instructions.
The lack of fast-isel support for nontemporal stores was spotted when analyzing
the -O0 codegen for nontemporal stores.
Differential Revision: http://reviews.llvm.org/D13698
llvm-svn: 250285