The intrinsic takes one argument, the lower bits are affected by the operation and the upper bits should be passed through. The instruction itself takes two operands, the high bits of the first operand are passed through and the low bits of the second operand are modified by the operation. To match this to the intrinsic we should pass the single intrinsic input to both operands.
I had to remove the stack folding test for these instructions since they depended on the incorrect behavior. The same register is now used for both inputs so the load can't be folded.
llvm-svn: 288779
We only had partial memory folding support for the intrinsic definitions, and (as noted on PR27481) was causing FR32/FR64/VR128 mismatch errors with the machine verifier.
This patch adds missing memory folding support for both intrinsics and the ffloor/fnearbyint/fceil/frint/ftrunc patterns and in doing so fixes the failing machine verifier stack folding tests from PR27481.
Differential Revision: https://reviews.llvm.org/D23276
llvm-svn: 278106
This patch improves the memory folding of the inserted float element for the (V)INSERTPS instruction.
The existing implementation occurs in the DAGCombiner and relies on the narrowing of a whole vector load into a scalar load (and then converted into a vector) to (hopefully) allow folding to occur later on. Not only has this proven problematic for debug builds, it also prevents other memory folds (notably stack reloads) from happening.
This patch removes the old implementation and moves the folding code to the X86 foldMemoryOperand handler. A new private 'special case' function - foldMemoryOperandCustom - has been added to deal with memory folding of instructions that can't just use the lookup tables - (V)INSERTPS is the first of several that could be done.
It also tweaks the memory operand folding code with an additional pointer offset that allows existing memory addresses to be modified, in this case to convert the vector address to the explicit address of the scalar element that will be inserted.
Unlike the previous implementation we now set the insertion source index to zero, although this is ignored for the (V)INSERTPSrm version, anything that relied on shuffle decodes (such as unfolding of insertps loads) was incorrectly calculating the source address - I've added a test for this at insertps-unfold-load-bug.ll
Differential Revision: http://reviews.llvm.org/D13988
llvm-svn: 252074
The C standard has historically not specified whether or not these functions should raise the inexact flag. Traditionally on Darwin, these functions *did* raise inexact, and the llvm lowerings followed that conventions. n1778 (C bindings for IEEE-754 (2008)) clarifies that these functions should not set inexact. This patch brings the lowerings for arm64 and x86 in line with the newly specified behavior. This also lets us fold some logic into TD patterns, which is nice.
Differential Revision: http://reviews.llvm.org/D12969
llvm-svn: 248266
Minor tweak now that D7042 is complete, we can enable stack folding for (V)MOVDDUP and do proper testing.
Added missing AVX ymm folding patterns and fixed alignment for AVX VMOVSLDUP / VMOVSHDUP.
llvm-svn: 226873
Changed the AVX1 tests register spill tail call to return a xmm like the SSE42 version - makes doing diffs between them a lot easier without affecting the spills themselves.
llvm-svn: 226623
The SSE42 version of the AVX1 float stack folding tests will be added shortly, this renames the AVX1 file so that the files will be near each other in a directory listing to help ensure they are kept in sync.
llvm-svn: 226620