This requires that we communicate to X86InstrInfo::optimizeCompareInstr
that the second operand is neither a register nor an immediate. The way we
do that is by setting CmpMask to zero.
Note that there were already instructions where the second operand was not a
register nor an immediate, namely X86::SUB*rm, so also set CmpMask to zero
for those instructions. This seems like a latent bug, but I was unable to
trigger it.
Differential Revision: https://reviews.llvm.org/D28621
llvm-svn: 294634
In combineOrCmpEqZeroToCtlzSrl, replace "getConstantOperand == 0" by "isNullConstant" to account for floating point constants.
Differential Revision: https://reviews.llvm.org/D29756
llvm-svn: 294588
LowerBuildVectorv16i8/LowerBuildVectorv8i16 insert values into a UNDEF vector if the build vector doesn't contain any zero elements, resulting in register dependencies with a previous use of the register.
This patch attempts to break the register dependency by either always zeroing the vector before hand or (if we're inserting to the 0'th element) by using VZEXT_MOVL(SCALAR_TO_VECTOR(i32 AEXT(Elt))) which lowers to (V)MOVD and performs a similar function. Additionally (V)MOVD is a shorter instruction than PINSRB/PINSRW. We already do something similar for SSE41 PINSRD.
On pre-SSE41 LowerBuildVectorv16i8 we go a little further and use VZEXT_MOVL(SCALAR_TO_VECTOR(i32 ZEXT(Elt))) if the build vector contains zeros to avoid the vector zeroing at the cost of a scalar zero extension, which can probably be brought over to the other cases in a future patch in some cases (load folding etc.)
Differential Revision: https://reviews.llvm.org/D29720
llvm-svn: 294581
We only implemented it for one of the 3 HLE instructions and that instruction is also under the RTM flag. Clang only implements the RTM flag from its command line.
llvm-svn: 294562
This patch does the following.
1. Adds an Intrinsic int_x86_clzero which works with __builtin_ia32_clzero
2. Identifies clzero feature using cpuid info. (Function:8000_0008, Checks if EBX[0]=1)
3. Adds the clzero feature under znver1 architecture.
4. The custom inserter is added in Lowering.
5. A testcase is added to check the intrinsic.
6. The clzero instruction is added to assembler test.
Patch by Ganesh Gopalasubramanian with a couple formatting tweaks, a disassembler test, and using update_llc_test.py from me.
Differential revision: https://reviews.llvm.org/D29385
llvm-svn: 294558
They are currently modelled incorrectly (as calls, which clobber
registers, confusing e.g. Machine Copy Propagation).
Reverting until we figure out the proper solution.
llvm-svn: 294348
Summary:
This change allows usage of store instruction for implicit null check.
Memory Aliasing Analisys is not used and change conservatively supposes
that any store and load may access the same memory. As a result
re-ordering of store-store, store-load and load-store is prohibited.
Patch by Serguei Katkov!
Reviewers: reames, sanjoy
Reviewed By: sanjoy
Subscribers: atrick, llvm-commits
Differential Revision: https://reviews.llvm.org/D29400
llvm-svn: 294338
vXi8/vXi64 vector shifts are often shifted as vYi16/vYi32 types but we weren't always remembering to bitcast the input.
Tested with a new assert as we don't currently manipulate these shifts enough for test cases to catch them.
llvm-svn: 294308
This includes unmasked forms of variable shift and shifting by the lower element of a register.
Still need to do shift by immediate which was not foldable prior to avx512 and all the masked forms.
llvm-svn: 294285
Currently we only combine shuffle nodes if they have a single user to prevent us from causing code bloat by splitting the shuffles into several different combines.
We don't take into account that in some cases we will already have combined all the users during recursively calling up the shuffle tree.
This patch keeps a list of all the shuffle nodes that have been combined so far and permits combining of further shuffle nodes if all its users are in that list.
Differential Revision: https://reviews.llvm.org/D29399
llvm-svn: 294183
Similar to what we already do for zero elt insertion, we can quickly rematerialize 'allbits' vectors so to avoid a unnecessary gpr value and insertion into a vector
llvm-svn: 294162
Summary:
Make this interface reusable similarly to std::call_once and std::once_flag interface.
This makes porting LLDB to NetBSD easier as there was in the original approach a portable way to specify a non-static once_flag. With this change translating std::once_flag to llvm::once_flag is mechanical.
Sponsored by <The NetBSD Foundation>
Reviewers: mehdi_amini, labath, joerg
Reviewed By: mehdi_amini
Subscribers: emaste, clayborg
Differential Revision: https://reviews.llvm.org/D29566
llvm-svn: 294143
Similar was already done for several other shuffles in this function.
The test changes are because the old code used explicity zeroing for elements that could have been undef.
While I was here I also changed other shuffle vectors in the same function to use the same input twice instead of creating UNDEF nodes. getVectorShuffle can create the UNDEF for us.
llvm-svn: 294130
On one test this seems to have given more chance for DAG combine to do other INSERT_SUBVECTOR/EXTRACT_SUBVECTOR combines before the BLENDI was created. Looks like we can still improve more by teaching DAG combine to optimize INSERT_SUBVECTOR/EXTRACT_SUBVECTOR with BLENDI.
llvm-svn: 293944
Merging Load-add-store pattern into a increment op previously dropped
the load's chain from the instructions dependence if the store is
chained to a TokenFactor.
llvm-svn: 293892
This is a first attempt at using the MOVMSK instructions to replace all_of/any_of reduction patterns (i.e. an and/or + shuffle chain).
So far this only matches patterns where we are reducing an all/none bits source vector (i.e. a comparison result) but we should be able to expand on this in conjunction with improvements to 'bool vector' handling both in the x86 backend as well as the vectorizers etc.
Differential Revision: https://reviews.llvm.org/D28810
llvm-svn: 293880