[Intrinsic] Add fixed point division intrinsics.

Summary:
This patch adds intrinsics and ISelDAG nodes for
signed and unsigned fixed-point division:

  llvm.sdiv.fix.*
  llvm.udiv.fix.*

These intrinsics perform scaled division on two
integers or vectors of integers. They are required
for the implementation of the Embedded-C fixed-point
arithmetic in Clang.

Patch by: ebevhan

Reviewers: bjope, leonardchan, efriedma, craig.topper

Reviewed By: craig.topper

Subscribers: Ka-Ka, ilya, hiraditya, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70007
This commit is contained in:
Bevin Hansson 2020-01-08 15:05:03 +01:00 committed by Mikael Holmen
parent b2c2fe7219
commit 8e2b44f7e0
17 changed files with 1524 additions and 38 deletions

View File

@ -13675,16 +13675,17 @@ Fixed Point Arithmetic Intrinsics
A fixed point number represents a real data type for a number that has a fixed
number of digits after a radix point (equivalent to the decimal point '.').
The number of digits after the radix point is referred as the ``scale``. These
The number of digits after the radix point is referred as the `scale`. These
are useful for representing fractional values to a specific precision. The
following intrinsics perform fixed point arithmetic operations on 2 operands
of the same scale, specified as the third argument.
The `llvm.*mul.fix` family of intrinsic functions represents a multiplication
The ``llvm.*mul.fix`` family of intrinsic functions represents a multiplication
of fixed point numbers through scaled integers. Therefore, fixed point
multplication can be represented as
multiplication can be represented as
.. code-block:: llvm
::
%result = call i4 @llvm.smul.fix.i4(i4 %a, i4 %b, i32 %scale)
; Expands to
@ -13695,6 +13696,22 @@ multplication can be represented as
%r = ashr i8 %mul, i8 %scale2 ; this is for a target rounding down towards negative infinity
%result = trunc i8 %r to i4
The ``llvm.*div.fix`` family of intrinsic functions represents a division of
fixed point numbers through scaled integers. Fixed point division can be
represented as:
.. code-block:: llvm
%result call i4 @llvm.sdiv.fix.i4(i4 %a, i4 %b, i32 %scale)
; Expands to
%a2 = sext i4 %a to i8
%b2 = sext i4 %b to i8
%scale2 = trunc i32 %scale to i8
%a3 = shl i8 %a2, %scale2
%r = sdiv i8 %a3, %b2 ; this is for a target rounding towards zero
%result = trunc i8 %r to i4
For each of these functions, if the result cannot be represented exactly with
the provided scale, the result is rounded. Rounding is unspecified since
preferred rounding may vary for different targets. Rounding is specified
@ -13963,6 +13980,126 @@ Examples
%res = call i4 @llvm.umul.fix.sat.i4(i4 2, i4 4, i32 1) ; %res = 4 (1 x 2 = 2)
'``llvm.sdiv.fix.*``' Intrinsics
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Syntax
"""""""
This is an overloaded intrinsic. You can use ``llvm.sdiv.fix``
on any integer bit width or vectors of integers.
::
declare i16 @llvm.sdiv.fix.i16(i16 %a, i16 %b, i32 %scale)
declare i32 @llvm.sdiv.fix.i32(i32 %a, i32 %b, i32 %scale)
declare i64 @llvm.sdiv.fix.i64(i64 %a, i64 %b, i32 %scale)
declare <4 x i32> @llvm.sdiv.fix.v4i32(<4 x i32> %a, <4 x i32> %b, i32 %scale)
Overview
"""""""""
The '``llvm.sdiv.fix``' family of intrinsic functions perform signed
fixed point division on 2 arguments of the same scale.
Arguments
""""""""""
The arguments (%a and %b) and the result may be of integer types of any bit
width, but they must have the same bit width. The arguments may also work with
int vectors of the same length and int size. ``%a`` and ``%b`` are the two
values that will undergo signed fixed point division. The argument
``%scale`` represents the scale of both operands, and must be a constant
integer.
Semantics:
""""""""""
This operation performs fixed point division on the 2 arguments of a
specified scale. The result will also be returned in the same scale specified
in the third argument.
If the result value cannot be precisely represented in the given scale, the
value is rounded up or down to the closest representable value. The rounding
direction is unspecified.
It is undefined behavior if the result value does not fit within the range of
the fixed point type, or if the second argument is zero.
Examples
"""""""""
.. code-block:: llvm
%res = call i4 @llvm.sdiv.fix.i4(i4 6, i4 2, i32 0) ; %res = 3 (6 / 2 = 3)
%res = call i4 @llvm.sdiv.fix.i4(i4 6, i4 4, i32 1) ; %res = 3 (3 / 2 = 1.5)
%res = call i4 @llvm.sdiv.fix.i4(i4 3, i4 -2, i32 1) ; %res = -3 (1.5 / -1 = -1.5)
; The result in the following could be rounded up to 1 or down to 0.5
%res = call i4 @llvm.sdiv.fix.i4(i4 3, i4 4, i32 1) ; %res = 2 (or 1) (1.5 / 2 = 0.75)
'``llvm.udiv.fix.*``' Intrinsics
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Syntax
"""""""
This is an overloaded intrinsic. You can use ``llvm.udiv.fix``
on any integer bit width or vectors of integers.
::
declare i16 @llvm.udiv.fix.i16(i16 %a, i16 %b, i32 %scale)
declare i32 @llvm.udiv.fix.i32(i32 %a, i32 %b, i32 %scale)
declare i64 @llvm.udiv.fix.i64(i64 %a, i64 %b, i32 %scale)
declare <4 x i32> @llvm.udiv.fix.v4i32(<4 x i32> %a, <4 x i32> %b, i32 %scale)
Overview
"""""""""
The '``llvm.udiv.fix``' family of intrinsic functions perform unsigned
fixed point division on 2 arguments of the same scale.
Arguments
""""""""""
The arguments (%a and %b) and the result may be of integer types of any bit
width, but they must have the same bit width. The arguments may also work with
int vectors of the same length and int size. ``%a`` and ``%b`` are the two
values that will undergo unsigned fixed point division. The argument
``%scale`` represents the scale of both operands, and must be a constant
integer.
Semantics:
""""""""""
This operation performs fixed point division on the 2 arguments of a
specified scale. The result will also be returned in the same scale specified
in the third argument.
If the result value cannot be precisely represented in the given scale, the
value is rounded up or down to the closest representable value. The rounding
direction is unspecified.
It is undefined behavior if the result value does not fit within the range of
the fixed point type, or if the second argument is zero.
Examples
"""""""""
.. code-block:: llvm
%res = call i4 @llvm.udiv.fix.i4(i4 6, i4 2, i32 0) ; %res = 3 (6 / 2 = 3)
%res = call i4 @llvm.udiv.fix.i4(i4 6, i4 4, i32 1) ; %res = 3 (3 / 2 = 1.5)
%res = call i4 @llvm.udiv.fix.i4(i4 1, i4 -8, i32 4) ; %res = 2 (0.0625 / 0.5 = 0.125)
; The result in the following could be rounded up to 1 or down to 0.5
%res = call i4 @llvm.udiv.fix.i4(i4 3, i4 4, i32 1) ; %res = 2 (or 1) (1.5 / 2 = 0.75)
Specialised Arithmetic Intrinsics
---------------------------------

View File

@ -285,6 +285,12 @@ namespace ISD {
/// bits of the first 2 operands.
SMULFIXSAT, UMULFIXSAT,
/// RESULT = [US]DIVFIX(LHS, RHS, SCALE) - Perform fixed point division on
/// 2 integers with the same width and scale. SCALE represents the scale
/// of both operands as fixed point numbers. This SCALE parameter must be a
/// constant integer.
SDIVFIX, UDIVFIX,
/// Simple binary floating point operators.
FADD, FSUB, FMUL, FDIV, FREM,

View File

@ -935,6 +935,8 @@ public:
case ISD::SMULFIXSAT:
case ISD::UMULFIX:
case ISD::UMULFIXSAT:
case ISD::SDIVFIX:
case ISD::UDIVFIX:
Supported = isSupportedFixedPointOperation(Op, VT, Scale);
break;
}
@ -4184,6 +4186,14 @@ public:
/// method accepts integers as its arguments.
SDValue expandFixedPointMul(SDNode *Node, SelectionDAG &DAG) const;
/// Method for building the DAG expansion of ISD::[US]DIVFIX. This
/// method accepts integers as its arguments.
/// Note: This method may fail if the division could not be performed
/// within the type. Clients must retry with a wider type if this happens.
SDValue expandFixedPointDiv(unsigned Opcode, const SDLoc &dl,
SDValue LHS, SDValue RHS,
unsigned Scale, SelectionDAG &DAG) const;
/// Method for building the DAG expansion of ISD::U(ADD|SUB)O. Expansion
/// always suceeds and populates the Result and Overflow arguments.
void expandUADDSUBO(SDNode *Node, SDValue &Result, SDValue &Overflow,

View File

@ -930,6 +930,14 @@ def int_umul_fix : Intrinsic<[llvm_anyint_ty],
[LLVMMatchType<0>, LLVMMatchType<0>, llvm_i32_ty],
[IntrNoMem, IntrSpeculatable, IntrWillReturn, Commutative, ImmArg<2>]>;
def int_sdiv_fix : Intrinsic<[llvm_anyint_ty],
[LLVMMatchType<0>, LLVMMatchType<0>, llvm_i32_ty],
[IntrNoMem, ImmArg<2>]>;
def int_udiv_fix : Intrinsic<[llvm_anyint_ty],
[LLVMMatchType<0>, LLVMMatchType<0>, llvm_i32_ty],
[IntrNoMem, ImmArg<2>]>;
//===------------------- Fixed Point Saturation Arithmetic Intrinsics ----------------===//
//
def int_smul_fix_sat : Intrinsic<[llvm_anyint_ty],

View File

@ -124,7 +124,7 @@ def SDTIntSatNoShOp : SDTypeProfile<1, 2, [ // ssat with no shift
def SDTIntBinHiLoOp : SDTypeProfile<2, 2, [ // mulhi, mullo, sdivrem, udivrem
SDTCisSameAs<0, 1>, SDTCisSameAs<0, 2>, SDTCisSameAs<0, 3>,SDTCisInt<0>
]>;
def SDTIntScaledBinOp : SDTypeProfile<1, 3, [ // smulfix, umulfix
def SDTIntScaledBinOp : SDTypeProfile<1, 3, [ // smulfix, sdivfix, etc
SDTCisSameAs<0, 1>, SDTCisSameAs<0, 2>, SDTCisInt<0>, SDTCisInt<3>
]>;
@ -400,6 +400,8 @@ def smulfix : SDNode<"ISD::SMULFIX" , SDTIntScaledBinOp, [SDNPCommutative]>
def smulfixsat : SDNode<"ISD::SMULFIXSAT", SDTIntScaledBinOp, [SDNPCommutative]>;
def umulfix : SDNode<"ISD::UMULFIX" , SDTIntScaledBinOp, [SDNPCommutative]>;
def umulfixsat : SDNode<"ISD::UMULFIXSAT", SDTIntScaledBinOp, [SDNPCommutative]>;
def sdivfix : SDNode<"ISD::SDIVFIX" , SDTIntScaledBinOp>;
def udivfix : SDNode<"ISD::UDIVFIX" , SDTIntScaledBinOp>;
def sext_inreg : SDNode<"ISD::SIGN_EXTEND_INREG", SDTExtInreg>;
def sext_invec : SDNode<"ISD::SIGN_EXTEND_VECTOR_INREG", SDTExtInvec>;

View File

@ -1129,7 +1129,9 @@ void SelectionDAGLegalize::LegalizeOp(SDNode *Node) {
case ISD::SMULFIX:
case ISD::SMULFIXSAT:
case ISD::UMULFIX:
case ISD::UMULFIXSAT: {
case ISD::UMULFIXSAT:
case ISD::SDIVFIX:
case ISD::UDIVFIX: {
unsigned Scale = Node->getConstantOperandVal(2);
Action = TLI.getFixedPointOperationAction(Node->getOpcode(),
Node->getValueType(0), Scale);
@ -3417,6 +3419,24 @@ bool SelectionDAGLegalize::ExpandNode(SDNode *Node) {
case ISD::UMULFIXSAT:
Results.push_back(TLI.expandFixedPointMul(Node, DAG));
break;
case ISD::SDIVFIX:
case ISD::UDIVFIX:
if (SDValue V = TLI.expandFixedPointDiv(Node->getOpcode(), SDLoc(Node),
Node->getOperand(0),
Node->getOperand(1),
Node->getConstantOperandVal(2),
DAG)) {
Results.push_back(V);
break;
}
// FIXME: We might want to retry here with a wider type if we fail, if that
// type is legal.
// FIXME: Technically, so long as we only have sdivfixes where BW+Scale is
// <= 128 (which is the case for all of the default Embedded-C types),
// we will only get here with types and scales that we could always expand
// if we were allowed to generate libcalls to division functions of illegal
// type. But we cannot do that.
llvm_unreachable("Cannot expand DIVFIX!");
case ISD::ADDCARRY:
case ISD::SUBCARRY: {
SDValue LHS = Node->getOperand(0);

View File

@ -160,6 +160,9 @@ void DAGTypeLegalizer::PromoteIntegerResult(SDNode *N, unsigned ResNo) {
case ISD::UMULFIX:
case ISD::UMULFIXSAT: Res = PromoteIntRes_MULFIX(N); break;
case ISD::SDIVFIX:
case ISD::UDIVFIX: Res = PromoteIntRes_DIVFIX(N); break;
case ISD::ABS: Res = PromoteIntRes_ABS(N); break;
case ISD::ATOMIC_LOAD:
@ -778,6 +781,71 @@ SDValue DAGTypeLegalizer::PromoteIntRes_MULFIX(SDNode *N) {
N->getOperand(2));
}
static SDValue earlyExpandDIVFIX(SDNode *N, SDValue LHS, SDValue RHS,
unsigned Scale, const TargetLowering &TLI,
SelectionDAG &DAG) {
EVT VT = LHS.getValueType();
bool Signed = N->getOpcode() == ISD::SDIVFIX;
SDLoc dl(N);
// See if we can perform the division in this type without widening.
if (SDValue V = TLI.expandFixedPointDiv(N->getOpcode(), dl, LHS, RHS, Scale,
DAG))
return V;
// If that didn't work, double the type width and try again. That must work,
// or something is wrong.
EVT WideVT = EVT::getIntegerVT(*DAG.getContext(),
VT.getScalarSizeInBits() * 2);
if (Signed) {
LHS = DAG.getSExtOrTrunc(LHS, dl, WideVT);
RHS = DAG.getSExtOrTrunc(RHS, dl, WideVT);
} else {
LHS = DAG.getZExtOrTrunc(LHS, dl, WideVT);
RHS = DAG.getZExtOrTrunc(RHS, dl, WideVT);
}
// TODO: Saturation.
SDValue Res = TLI.expandFixedPointDiv(N->getOpcode(), dl, LHS, RHS, Scale,
DAG);
assert(Res && "Expanding DIVFIX with wide type failed?");
return DAG.getZExtOrTrunc(Res, dl, VT);
}
SDValue DAGTypeLegalizer::PromoteIntRes_DIVFIX(SDNode *N) {
SDLoc dl(N);
SDValue Op1Promoted, Op2Promoted;
bool Signed = N->getOpcode() == ISD::SDIVFIX;
if (Signed) {
Op1Promoted = SExtPromotedInteger(N->getOperand(0));
Op2Promoted = SExtPromotedInteger(N->getOperand(1));
} else {
Op1Promoted = ZExtPromotedInteger(N->getOperand(0));
Op2Promoted = ZExtPromotedInteger(N->getOperand(1));
}
EVT PromotedType = Op1Promoted.getValueType();
unsigned Scale = N->getConstantOperandVal(2);
SDValue Res;
// If the type is already legal and the operation is legal in that type, we
// should not early expand.
if (TLI.isTypeLegal(PromotedType)) {
TargetLowering::LegalizeAction Action =
TLI.getFixedPointOperationAction(N->getOpcode(), PromotedType, Scale);
if (Action == TargetLowering::Legal || Action == TargetLowering::Custom)
Res = DAG.getNode(N->getOpcode(), dl, PromotedType, Op1Promoted,
Op2Promoted, N->getOperand(2));
}
if (!Res)
Res = earlyExpandDIVFIX(N, Op1Promoted, Op2Promoted, Scale, TLI, DAG);
// TODO: Saturation.
return Res;
}
SDValue DAGTypeLegalizer::PromoteIntRes_SADDSUBO(SDNode *N, unsigned ResNo) {
if (ResNo == 1)
return PromoteIntRes_Overflow(N);
@ -1237,7 +1305,9 @@ bool DAGTypeLegalizer::PromoteIntegerOperand(SDNode *N, unsigned OpNo) {
case ISD::SMULFIX:
case ISD::SMULFIXSAT:
case ISD::UMULFIX:
case ISD::UMULFIXSAT: Res = PromoteIntOp_MULFIX(N); break;
case ISD::UMULFIXSAT:
case ISD::SDIVFIX:
case ISD::UDIVFIX: Res = PromoteIntOp_FIX(N); break;
case ISD::FPOWI: Res = PromoteIntOp_FPOWI(N); break;
@ -1623,7 +1693,7 @@ SDValue DAGTypeLegalizer::PromoteIntOp_ADDSUBCARRY(SDNode *N, unsigned OpNo) {
return SDValue(DAG.UpdateNodeOperands(N, LHS, RHS, Carry), 0);
}
SDValue DAGTypeLegalizer::PromoteIntOp_MULFIX(SDNode *N) {
SDValue DAGTypeLegalizer::PromoteIntOp_FIX(SDNode *N) {
SDValue Op2 = ZExtPromotedInteger(N->getOperand(2));
return SDValue(
DAG.UpdateNodeOperands(N, N->getOperand(0), N->getOperand(1), Op2), 0);
@ -1837,6 +1907,9 @@ void DAGTypeLegalizer::ExpandIntegerResult(SDNode *N, unsigned ResNo) {
case ISD::UMULFIX:
case ISD::UMULFIXSAT: ExpandIntRes_MULFIX(N, Lo, Hi); break;
case ISD::SDIVFIX:
case ISD::UDIVFIX: ExpandIntRes_DIVFIX(N, Lo, Hi); break;
case ISD::VECREDUCE_ADD:
case ISD::VECREDUCE_MUL:
case ISD::VECREDUCE_AND:
@ -3151,6 +3224,13 @@ void DAGTypeLegalizer::ExpandIntRes_MULFIX(SDNode *N, SDValue &Lo,
Lo = DAG.getSelect(dl, NVT, SatMin, NVTZero, Lo);
}
void DAGTypeLegalizer::ExpandIntRes_DIVFIX(SDNode *N, SDValue &Lo,
SDValue &Hi) {
SDValue Res = earlyExpandDIVFIX(N, N->getOperand(0), N->getOperand(1),
N->getConstantOperandVal(2), TLI, DAG);
SplitInteger(Res, Lo, Hi);
}
void DAGTypeLegalizer::ExpandIntRes_SADDSUBO(SDNode *Node,
SDValue &Lo, SDValue &Hi) {
SDValue LHS = Node->getOperand(0);

View File

@ -329,6 +329,7 @@ private:
SDValue PromoteIntRes_XMULO(SDNode *N, unsigned ResNo);
SDValue PromoteIntRes_ADDSUBSAT(SDNode *N);
SDValue PromoteIntRes_MULFIX(SDNode *N);
SDValue PromoteIntRes_DIVFIX(SDNode *N);
SDValue PromoteIntRes_FLT_ROUNDS(SDNode *N);
SDValue PromoteIntRes_VECREDUCE(SDNode *N);
SDValue PromoteIntRes_ABS(SDNode *N);
@ -367,7 +368,7 @@ private:
SDValue PromoteIntOp_ADDSUBCARRY(SDNode *N, unsigned OpNo);
SDValue PromoteIntOp_FRAMERETURNADDR(SDNode *N);
SDValue PromoteIntOp_PREFETCH(SDNode *N, unsigned OpNo);
SDValue PromoteIntOp_MULFIX(SDNode *N);
SDValue PromoteIntOp_FIX(SDNode *N);
SDValue PromoteIntOp_FPOWI(SDNode *N);
SDValue PromoteIntOp_VECREDUCE(SDNode *N);
@ -428,6 +429,7 @@ private:
void ExpandIntRes_XMULO (SDNode *N, SDValue &Lo, SDValue &Hi);
void ExpandIntRes_ADDSUBSAT (SDNode *N, SDValue &Lo, SDValue &Hi);
void ExpandIntRes_MULFIX (SDNode *N, SDValue &Lo, SDValue &Hi);
void ExpandIntRes_DIVFIX (SDNode *N, SDValue &Lo, SDValue &Hi);
void ExpandIntRes_ATOMIC_LOAD (SDNode *N, SDValue &Lo, SDValue &Hi);
void ExpandIntRes_VECREDUCE (SDNode *N, SDValue &Lo, SDValue &Hi);
@ -689,7 +691,7 @@ private:
SDValue ScalarizeVecRes_UNDEF(SDNode *N);
SDValue ScalarizeVecRes_VECTOR_SHUFFLE(SDNode *N);
SDValue ScalarizeVecRes_MULFIX(SDNode *N);
SDValue ScalarizeVecRes_FIX(SDNode *N);
// Vector Operand Scalarization: <1 x ty> -> ty.
bool ScalarizeVectorOperand(SDNode *N, unsigned OpNo);
@ -731,7 +733,7 @@ private:
void SplitVecRes_OverflowOp(SDNode *N, unsigned ResNo,
SDValue &Lo, SDValue &Hi);
void SplitVecRes_MULFIX(SDNode *N, SDValue &Lo, SDValue &Hi);
void SplitVecRes_FIX(SDNode *N, SDValue &Lo, SDValue &Hi);
void SplitVecRes_BITCAST(SDNode *N, SDValue &Lo, SDValue &Hi);
void SplitVecRes_BUILD_VECTOR(SDNode *N, SDValue &Lo, SDValue &Hi);

View File

@ -146,6 +146,7 @@ class VectorLegalizer {
SDValue ExpandMULO(SDValue Op);
SDValue ExpandAddSubSat(SDValue Op);
SDValue ExpandFixedPointMul(SDValue Op);
SDValue ExpandFixedPointDiv(SDValue Op);
SDValue ExpandStrictFPOp(SDValue Op);
SDValue UnrollStrictFPOp(SDValue Op);
@ -442,7 +443,9 @@ SDValue VectorLegalizer::LegalizeOp(SDValue Op) {
case ISD::SMULFIX:
case ISD::SMULFIXSAT:
case ISD::UMULFIX:
case ISD::UMULFIXSAT: {
case ISD::UMULFIXSAT:
case ISD::SDIVFIX:
case ISD::UDIVFIX: {
unsigned Scale = Node->getConstantOperandVal(2);
Action = TLI.getFixedPointOperationAction(Node->getOpcode(),
Node->getValueType(0), Scale);
@ -849,6 +852,9 @@ SDValue VectorLegalizer::Expand(SDValue Op) {
// targets? This should probably be investigated. And if we still prefer to
// unroll an explanation could be helpful.
return DAG.UnrollVectorOp(Op.getNode());
case ISD::SDIVFIX:
case ISD::UDIVFIX:
return ExpandFixedPointDiv(Op);
#define INSTRUCTION(NAME, NARG, ROUND_MODE, INTRINSIC, DAGN) \
case ISD::STRICT_##DAGN:
#include "llvm/IR/ConstrainedOps.def"
@ -1392,6 +1398,14 @@ SDValue VectorLegalizer::ExpandFixedPointMul(SDValue Op) {
return DAG.UnrollVectorOp(Op.getNode());
}
SDValue VectorLegalizer::ExpandFixedPointDiv(SDValue Op) {
SDNode *N = Op.getNode();
if (SDValue Expanded = TLI.expandFixedPointDiv(N->getOpcode(), SDLoc(N),
N->getOperand(0), N->getOperand(1), N->getConstantOperandVal(2), DAG))
return Expanded;
return DAG.UnrollVectorOp(N);
}
SDValue VectorLegalizer::ExpandStrictFPOp(SDValue Op) {
if (Op.getOpcode() == ISD::STRICT_UINT_TO_FP)
return ExpandUINT_TO_FLOAT(Op);

View File

@ -165,7 +165,9 @@ void DAGTypeLegalizer::ScalarizeVectorResult(SDNode *N, unsigned ResNo) {
case ISD::SMULFIXSAT:
case ISD::UMULFIX:
case ISD::UMULFIXSAT:
R = ScalarizeVecRes_MULFIX(N);
case ISD::SDIVFIX:
case ISD::UDIVFIX:
R = ScalarizeVecRes_FIX(N);
break;
}
@ -189,7 +191,7 @@ SDValue DAGTypeLegalizer::ScalarizeVecRes_TernaryOp(SDNode *N) {
Op0.getValueType(), Op0, Op1, Op2);
}
SDValue DAGTypeLegalizer::ScalarizeVecRes_MULFIX(SDNode *N) {
SDValue DAGTypeLegalizer::ScalarizeVecRes_FIX(SDNode *N) {
SDValue Op0 = GetScalarizedVector(N->getOperand(0));
SDValue Op1 = GetScalarizedVector(N->getOperand(1));
SDValue Op2 = N->getOperand(2);
@ -958,7 +960,9 @@ void DAGTypeLegalizer::SplitVectorResult(SDNode *N, unsigned ResNo) {
case ISD::SMULFIXSAT:
case ISD::UMULFIX:
case ISD::UMULFIXSAT:
SplitVecRes_MULFIX(N, Lo, Hi);
case ISD::SDIVFIX:
case ISD::UDIVFIX:
SplitVecRes_FIX(N, Lo, Hi);
break;
}
@ -997,7 +1001,7 @@ void DAGTypeLegalizer::SplitVecRes_TernaryOp(SDNode *N, SDValue &Lo,
Op0Hi, Op1Hi, Op2Hi);
}
void DAGTypeLegalizer::SplitVecRes_MULFIX(SDNode *N, SDValue &Lo, SDValue &Hi) {
void DAGTypeLegalizer::SplitVecRes_FIX(SDNode *N, SDValue &Lo, SDValue &Hi) {
SDValue LHSLo, LHSHi;
GetSplitVector(N->getOperand(0), LHSLo, LHSHi);
SDValue RHSLo, RHSHi;

View File

@ -5441,6 +5441,60 @@ static SDValue ExpandPowI(const SDLoc &DL, SDValue LHS, SDValue RHS,
return DAG.getNode(ISD::FPOWI, DL, LHS.getValueType(), LHS, RHS);
}
static SDValue expandDivFix(unsigned Opcode, const SDLoc &DL,
SDValue LHS, SDValue RHS, SDValue Scale,
SelectionDAG &DAG, const TargetLowering &TLI) {
EVT VT = LHS.getValueType();
bool Signed = Opcode == ISD::SDIVFIX;
LLVMContext &Ctx = *DAG.getContext();
// If the type is legal but the operation isn't, this node might survive all
// the way to operation legalization. If we end up there and we do not have
// the ability to widen the type (if VT*2 is not legal), we cannot expand the
// node.
// Coax the legalizer into expanding the node during type legalization instead
// by bumping the size by one bit. This will force it to Promote, enabling the
// early expansion and avoiding the need to expand later.
// We don't have to do this if Scale is 0; that can always be expanded.
// FIXME: We wouldn't have to do this (or any of the early
// expansion/promotion) if it was possible to expand a libcall of an
// illegal type during operation legalization. But it's not, so things
// get a bit hacky.
unsigned ScaleInt = cast<ConstantSDNode>(Scale)->getZExtValue();
if (ScaleInt > 0 &&
(TLI.isTypeLegal(VT) ||
(VT.isVector() && TLI.isTypeLegal(VT.getVectorElementType())))) {
TargetLowering::LegalizeAction Action = TLI.getFixedPointOperationAction(
Opcode, VT, ScaleInt);
if (Action != TargetLowering::Legal && Action != TargetLowering::Custom) {
EVT PromVT;
if (VT.isScalarInteger())
PromVT = EVT::getIntegerVT(Ctx, VT.getSizeInBits() + 1);
else if (VT.isVector()) {
PromVT = VT.getVectorElementType();
PromVT = EVT::getIntegerVT(Ctx, PromVT.getSizeInBits() + 1);
PromVT = EVT::getVectorVT(Ctx, PromVT, VT.getVectorElementCount());
} else
llvm_unreachable("Wrong VT for DIVFIX?");
if (Signed) {
LHS = DAG.getSExtOrTrunc(LHS, DL, PromVT);
RHS = DAG.getSExtOrTrunc(RHS, DL, PromVT);
} else {
LHS = DAG.getZExtOrTrunc(LHS, DL, PromVT);
RHS = DAG.getZExtOrTrunc(RHS, DL, PromVT);
}
// TODO: Saturation.
SDValue Res = DAG.getNode(Opcode, DL, PromVT, LHS, RHS, Scale);
return DAG.getZExtOrTrunc(Res, DL, VT);
}
}
return DAG.getNode(Opcode, DL, VT, LHS, RHS, Scale);
}
// getUnderlyingArgRegs - Find underlying registers used for a truncated,
// bitcasted, or split argument. Returns a list of <Register, size in bits>
static void
@ -5705,6 +5759,14 @@ static unsigned FixedPointIntrinsicToOpcode(unsigned Intrinsic) {
return ISD::SMULFIX;
case Intrinsic::umul_fix:
return ISD::UMULFIX;
case Intrinsic::smul_fix_sat:
return ISD::SMULFIXSAT;
case Intrinsic::umul_fix_sat:
return ISD::UMULFIXSAT;
case Intrinsic::sdiv_fix:
return ISD::SDIVFIX;
case Intrinsic::udiv_fix:
return ISD::UDIVFIX;
default:
llvm_unreachable("Unhandled fixed point intrinsic");
}
@ -6360,7 +6422,9 @@ void SelectionDAGBuilder::visitIntrinsicCall(const CallInst &I,
return;
}
case Intrinsic::smul_fix:
case Intrinsic::umul_fix: {
case Intrinsic::umul_fix:
case Intrinsic::smul_fix_sat:
case Intrinsic::umul_fix_sat: {
SDValue Op1 = getValue(I.getArgOperand(0));
SDValue Op2 = getValue(I.getArgOperand(1));
SDValue Op3 = getValue(I.getArgOperand(2));
@ -6368,20 +6432,13 @@ void SelectionDAGBuilder::visitIntrinsicCall(const CallInst &I,
Op1.getValueType(), Op1, Op2, Op3));
return;
}
case Intrinsic::smul_fix_sat: {
case Intrinsic::sdiv_fix:
case Intrinsic::udiv_fix: {
SDValue Op1 = getValue(I.getArgOperand(0));
SDValue Op2 = getValue(I.getArgOperand(1));
SDValue Op3 = getValue(I.getArgOperand(2));
setValue(&I, DAG.getNode(ISD::SMULFIXSAT, sdl, Op1.getValueType(), Op1, Op2,
Op3));
return;
}
case Intrinsic::umul_fix_sat: {
SDValue Op1 = getValue(I.getArgOperand(0));
SDValue Op2 = getValue(I.getArgOperand(1));
SDValue Op3 = getValue(I.getArgOperand(2));
setValue(&I, DAG.getNode(ISD::UMULFIXSAT, sdl, Op1.getValueType(), Op1, Op2,
Op3));
setValue(&I, expandDivFix(FixedPointIntrinsicToOpcode(Intrinsic), sdl,
Op1, Op2, Op3, DAG, TLI));
return;
}
case Intrinsic::stacksave: {

View File

@ -312,6 +312,9 @@ std::string SDNode::getOperationName(const SelectionDAG *G) const {
case ISD::UMULFIX: return "umulfix";
case ISD::UMULFIXSAT: return "umulfixsat";
case ISD::SDIVFIX: return "sdivfix";
case ISD::UDIVFIX: return "udivfix";
// Conversion operators.
case ISD::SIGN_EXTEND: return "sign_extend";
case ISD::ZERO_EXTEND: return "zero_extend";

View File

@ -7293,6 +7293,86 @@ TargetLowering::expandFixedPointMul(SDNode *Node, SelectionDAG &DAG) const {
return Result;
}
SDValue
TargetLowering::expandFixedPointDiv(unsigned Opcode, const SDLoc &dl,
SDValue LHS, SDValue RHS,
unsigned Scale, SelectionDAG &DAG) const {
assert((Opcode == ISD::SDIVFIX ||
Opcode == ISD::UDIVFIX) &&
"Expected a fixed point division opcode");
EVT VT = LHS.getValueType();
bool Signed = Opcode == ISD::SDIVFIX;
EVT BoolVT = getSetCCResultType(DAG.getDataLayout(), *DAG.getContext(), VT);
// If there is enough room in the type to upscale the LHS or downscale the
// RHS before the division, we can perform it in this type without having to
// resize. For signed operations, the LHS headroom is the number of
// redundant sign bits, and for unsigned ones it is the number of zeroes.
// The headroom for the RHS is the number of trailing zeroes.
unsigned LHSLead = Signed ? DAG.ComputeNumSignBits(LHS) - 1
: DAG.computeKnownBits(LHS).countMinLeadingZeros();
unsigned RHSTrail = DAG.computeKnownBits(RHS).countMinTrailingZeros();
if (LHSLead + RHSTrail < Scale)
return SDValue();
unsigned LHSShift = std::min(LHSLead, Scale);
unsigned RHSShift = Scale - LHSShift;
// At this point, we know that if we shift the LHS up by LHSShift and the
// RHS down by RHSShift, we can emit a regular division with a final scaling
// factor of Scale.
EVT ShiftTy = getShiftAmountTy(VT, DAG.getDataLayout());
if (LHSShift)
LHS = DAG.getNode(ISD::SHL, dl, VT, LHS,
DAG.getConstant(LHSShift, dl, ShiftTy));
if (RHSShift)
RHS = DAG.getNode(Signed ? ISD::SRA : ISD::SRL, dl, VT, RHS,
DAG.getConstant(RHSShift, dl, ShiftTy));
SDValue Quot;
if (Signed) {
// For signed operations, if the resulting quotient is negative and the
// remainder is nonzero, subtract 1 from the quotient to round towards
// negative infinity.
SDValue Rem;
// FIXME: Ideally we would always produce an SDIVREM here, but if the
// type isn't legal, SDIVREM cannot be expanded. There is no reason why
// we couldn't just form a libcall, but the type legalizer doesn't do it.
if (isTypeLegal(VT) &&
isOperationLegalOrCustom(ISD::SDIVREM, VT)) {
Quot = DAG.getNode(ISD::SDIVREM, dl,
DAG.getVTList(VT, VT),
LHS, RHS);
Rem = Quot.getValue(1);
Quot = Quot.getValue(0);
} else {
Quot = DAG.getNode(ISD::SDIV, dl, VT,
LHS, RHS);
Rem = DAG.getNode(ISD::SREM, dl, VT,
LHS, RHS);
}
SDValue Zero = DAG.getConstant(0, dl, VT);
SDValue RemNonZero = DAG.getSetCC(dl, BoolVT, Rem, Zero, ISD::SETNE);
SDValue LHSNeg = DAG.getSetCC(dl, BoolVT, LHS, Zero, ISD::SETLT);
SDValue RHSNeg = DAG.getSetCC(dl, BoolVT, RHS, Zero, ISD::SETLT);
SDValue QuotNeg = DAG.getNode(ISD::XOR, dl, BoolVT, LHSNeg, RHSNeg);
SDValue Sub1 = DAG.getNode(ISD::SUB, dl, VT, Quot,
DAG.getConstant(1, dl, VT));
Quot = DAG.getSelect(dl, VT,
DAG.getNode(ISD::AND, dl, BoolVT, RemNonZero, QuotNeg),
Sub1, Quot);
} else
Quot = DAG.getNode(ISD::UDIV, dl, VT,
LHS, RHS);
// TODO: Saturation.
return Quot;
}
void TargetLowering::expandUADDSUBO(
SDNode *Node, SDValue &Result, SDValue &Overflow, SelectionDAG &DAG) const {
SDLoc dl(Node);

View File

@ -663,6 +663,8 @@ void TargetLoweringBase::initActions() {
setOperationAction(ISD::SMULFIXSAT, VT, Expand);
setOperationAction(ISD::UMULFIX, VT, Expand);
setOperationAction(ISD::UMULFIXSAT, VT, Expand);
setOperationAction(ISD::SDIVFIX, VT, Expand);
setOperationAction(ISD::UDIVFIX, VT, Expand);
// Overflow operations default to expand
setOperationAction(ISD::SADDO, VT, Expand);

View File

@ -4677,28 +4677,32 @@ void Verifier::visitIntrinsicCall(Intrinsic::ID ID, CallBase &Call) {
case Intrinsic::smul_fix:
case Intrinsic::smul_fix_sat:
case Intrinsic::umul_fix:
case Intrinsic::umul_fix_sat: {
case Intrinsic::umul_fix_sat:
case Intrinsic::sdiv_fix:
case Intrinsic::udiv_fix: {
Value *Op1 = Call.getArgOperand(0);
Value *Op2 = Call.getArgOperand(1);
Assert(Op1->getType()->isIntOrIntVectorTy(),
"first operand of [us]mul_fix[_sat] must be an int type or vector "
"of ints");
"first operand of [us][mul|div]_fix[_sat] must be an int type or "
"vector of ints");
Assert(Op2->getType()->isIntOrIntVectorTy(),
"second operand of [us]mul_fix_[sat] must be an int type or vector "
"of ints");
"second operand of [us][mul|div]_fix[_sat] must be an int type or "
"vector of ints");
auto *Op3 = cast<ConstantInt>(Call.getArgOperand(2));
Assert(Op3->getType()->getBitWidth() <= 32,
"third argument of [us]mul_fix[_sat] must fit within 32 bits");
"third argument of [us][mul|div]_fix[_sat] must fit within 32 bits");
if (ID == Intrinsic::smul_fix || ID == Intrinsic::smul_fix_sat) {
if (ID == Intrinsic::smul_fix || ID == Intrinsic::smul_fix_sat ||
ID == Intrinsic::sdiv_fix) {
Assert(
Op3->getZExtValue() < Op1->getType()->getScalarSizeInBits(),
"the scale of smul_fix[_sat] must be less than the width of the operands");
"the scale of s[mul|div]_fix[_sat] must be less than the width of "
"the operands");
} else {
Assert(Op3->getZExtValue() <= Op1->getType()->getScalarSizeInBits(),
"the scale of umul_fix[_sat] must be less than or equal to the width of "
"the operands");
"the scale of u[mul|div]_fix[_sat] must be less than or equal "
"to the width of the operands");
}
break;
}

View File

@ -0,0 +1,713 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc < %s -mtriple=x86_64-linux | FileCheck %s --check-prefix=X64
; RUN: llc < %s -mtriple=i686 -mattr=cmov | FileCheck %s --check-prefix=X86
declare i4 @llvm.sdiv.fix.i4 (i4, i4, i32)
declare i15 @llvm.sdiv.fix.i15 (i15, i15, i32)
declare i16 @llvm.sdiv.fix.i16 (i16, i16, i32)
declare i18 @llvm.sdiv.fix.i18 (i18, i18, i32)
declare i64 @llvm.sdiv.fix.i64 (i64, i64, i32)
declare <4 x i32> @llvm.sdiv.fix.v4i32(<4 x i32>, <4 x i32>, i32)
define i16 @func(i16 %x, i16 %y) nounwind {
; X64-LABEL: func:
; X64: # %bb.0:
; X64-NEXT: movswl %si, %esi
; X64-NEXT: movswl %di, %ecx
; X64-NEXT: shll $7, %ecx
; X64-NEXT: movl %ecx, %eax
; X64-NEXT: cltd
; X64-NEXT: idivl %esi
; X64-NEXT: # kill: def $eax killed $eax def $rax
; X64-NEXT: leal -1(%rax), %edi
; X64-NEXT: testl %esi, %esi
; X64-NEXT: sets %sil
; X64-NEXT: testl %ecx, %ecx
; X64-NEXT: sets %cl
; X64-NEXT: xorb %sil, %cl
; X64-NEXT: testl %edx, %edx
; X64-NEXT: setne %dl
; X64-NEXT: testb %cl, %dl
; X64-NEXT: cmovnel %edi, %eax
; X64-NEXT: # kill: def $ax killed $ax killed $rax
; X64-NEXT: retq
;
; X86-LABEL: func:
; X86: # %bb.0:
; X86-NEXT: pushl %ebx
; X86-NEXT: pushl %edi
; X86-NEXT: pushl %esi
; X86-NEXT: movswl {{[0-9]+}}(%esp), %esi
; X86-NEXT: movswl {{[0-9]+}}(%esp), %ecx
; X86-NEXT: shll $7, %ecx
; X86-NEXT: movl %ecx, %eax
; X86-NEXT: cltd
; X86-NEXT: idivl %esi
; X86-NEXT: leal -1(%eax), %edi
; X86-NEXT: testl %esi, %esi
; X86-NEXT: sets %bl
; X86-NEXT: testl %ecx, %ecx
; X86-NEXT: sets %cl
; X86-NEXT: xorb %bl, %cl
; X86-NEXT: testl %edx, %edx
; X86-NEXT: setne %dl
; X86-NEXT: testb %cl, %dl
; X86-NEXT: cmovnel %edi, %eax
; X86-NEXT: # kill: def $ax killed $ax killed $eax
; X86-NEXT: popl %esi
; X86-NEXT: popl %edi
; X86-NEXT: popl %ebx
; X86-NEXT: retl
%tmp = call i16 @llvm.sdiv.fix.i16(i16 %x, i16 %y, i32 7)
ret i16 %tmp
}
define i16 @func2(i8 %x, i8 %y) nounwind {
; X64-LABEL: func2:
; X64: # %bb.0:
; X64-NEXT: movsbl %dil, %eax
; X64-NEXT: movsbl %sil, %ecx
; X64-NEXT: movswl %cx, %esi
; X64-NEXT: movswl %ax, %ecx
; X64-NEXT: shll $14, %ecx
; X64-NEXT: movl %ecx, %eax
; X64-NEXT: cltd
; X64-NEXT: idivl %esi
; X64-NEXT: # kill: def $eax killed $eax def $rax
; X64-NEXT: leal -1(%rax), %edi
; X64-NEXT: testl %esi, %esi
; X64-NEXT: sets %sil
; X64-NEXT: testl %ecx, %ecx
; X64-NEXT: sets %cl
; X64-NEXT: xorb %sil, %cl
; X64-NEXT: testl %edx, %edx
; X64-NEXT: setne %dl
; X64-NEXT: testb %cl, %dl
; X64-NEXT: cmovel %eax, %edi
; X64-NEXT: addl %edi, %edi
; X64-NEXT: movswl %di, %eax
; X64-NEXT: shrl %eax
; X64-NEXT: # kill: def $ax killed $ax killed $eax
; X64-NEXT: retq
;
; X86-LABEL: func2:
; X86: # %bb.0:
; X86-NEXT: pushl %ebx
; X86-NEXT: pushl %edi
; X86-NEXT: pushl %esi
; X86-NEXT: movsbl {{[0-9]+}}(%esp), %esi
; X86-NEXT: movsbl {{[0-9]+}}(%esp), %ecx
; X86-NEXT: shll $14, %ecx
; X86-NEXT: movl %ecx, %eax
; X86-NEXT: cltd
; X86-NEXT: idivl %esi
; X86-NEXT: leal -1(%eax), %edi
; X86-NEXT: testl %esi, %esi
; X86-NEXT: sets %bl
; X86-NEXT: testl %ecx, %ecx
; X86-NEXT: sets %cl
; X86-NEXT: xorb %bl, %cl
; X86-NEXT: testl %edx, %edx
; X86-NEXT: setne %dl
; X86-NEXT: testb %cl, %dl
; X86-NEXT: cmovel %eax, %edi
; X86-NEXT: addl %edi, %edi
; X86-NEXT: movswl %di, %eax
; X86-NEXT: shrl %eax
; X86-NEXT: # kill: def $ax killed $ax killed $eax
; X86-NEXT: popl %esi
; X86-NEXT: popl %edi
; X86-NEXT: popl %ebx
; X86-NEXT: retl
%x2 = sext i8 %x to i15
%y2 = sext i8 %y to i15
%tmp = call i15 @llvm.sdiv.fix.i15(i15 %x2, i15 %y2, i32 14)
%tmp2 = sext i15 %tmp to i16
ret i16 %tmp2
}
define i16 @func3(i15 %x, i8 %y) nounwind {
; X64-LABEL: func3:
; X64: # %bb.0:
; X64-NEXT: shll $8, %esi
; X64-NEXT: movswl %si, %ecx
; X64-NEXT: addl %edi, %edi
; X64-NEXT: shrl $4, %ecx
; X64-NEXT: movl %edi, %eax
; X64-NEXT: cwtd
; X64-NEXT: idivw %cx
; X64-NEXT: # kill: def $ax killed $ax def $rax
; X64-NEXT: leal -1(%rax), %esi
; X64-NEXT: testw %di, %di
; X64-NEXT: sets %dil
; X64-NEXT: testw %cx, %cx
; X64-NEXT: sets %cl
; X64-NEXT: xorb %dil, %cl
; X64-NEXT: testw %dx, %dx
; X64-NEXT: setne %dl
; X64-NEXT: testb %cl, %dl
; X64-NEXT: cmovel %eax, %esi
; X64-NEXT: addl %esi, %esi
; X64-NEXT: movswl %si, %eax
; X64-NEXT: shrl %eax
; X64-NEXT: # kill: def $ax killed $ax killed $eax
; X64-NEXT: retq
;
; X86-LABEL: func3:
; X86: # %bb.0:
; X86-NEXT: pushl %edi
; X86-NEXT: pushl %esi
; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx
; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
; X86-NEXT: shll $8, %eax
; X86-NEXT: movswl %ax, %esi
; X86-NEXT: addl %ecx, %ecx
; X86-NEXT: shrl $4, %esi
; X86-NEXT: movl %ecx, %eax
; X86-NEXT: cwtd
; X86-NEXT: idivw %si
; X86-NEXT: # kill: def $ax killed $ax def $eax
; X86-NEXT: leal -1(%eax), %edi
; X86-NEXT: testw %cx, %cx
; X86-NEXT: sets %cl
; X86-NEXT: testw %si, %si
; X86-NEXT: sets %ch
; X86-NEXT: xorb %cl, %ch
; X86-NEXT: testw %dx, %dx
; X86-NEXT: setne %cl
; X86-NEXT: testb %ch, %cl
; X86-NEXT: cmovel %eax, %edi
; X86-NEXT: addl %edi, %edi
; X86-NEXT: movswl %di, %eax
; X86-NEXT: shrl %eax
; X86-NEXT: # kill: def $ax killed $ax killed $eax
; X86-NEXT: popl %esi
; X86-NEXT: popl %edi
; X86-NEXT: retl
%y2 = sext i8 %y to i15
%y3 = shl i15 %y2, 7
%tmp = call i15 @llvm.sdiv.fix.i15(i15 %x, i15 %y3, i32 4)
%tmp2 = sext i15 %tmp to i16
ret i16 %tmp2
}
define i4 @func4(i4 %x, i4 %y) nounwind {
; X64-LABEL: func4:
; X64: # %bb.0:
; X64-NEXT: pushq %rbx
; X64-NEXT: shlb $4, %sil
; X64-NEXT: sarb $4, %sil
; X64-NEXT: shlb $4, %dil
; X64-NEXT: sarb $4, %dil
; X64-NEXT: shlb $2, %dil
; X64-NEXT: movsbl %dil, %ecx
; X64-NEXT: movl %ecx, %eax
; X64-NEXT: idivb %sil
; X64-NEXT: movsbl %ah, %ebx
; X64-NEXT: movzbl %al, %edi
; X64-NEXT: leal -1(%rdi), %eax
; X64-NEXT: movzbl %al, %eax
; X64-NEXT: testb %sil, %sil
; X64-NEXT: sets %dl
; X64-NEXT: testb %cl, %cl
; X64-NEXT: sets %cl
; X64-NEXT: xorb %dl, %cl
; X64-NEXT: testb %bl, %bl
; X64-NEXT: setne %dl
; X64-NEXT: testb %cl, %dl
; X64-NEXT: cmovel %edi, %eax
; X64-NEXT: # kill: def $al killed $al killed $eax
; X64-NEXT: popq %rbx
; X64-NEXT: retq
;
; X86-LABEL: func4:
; X86: # %bb.0:
; X86-NEXT: pushl %esi
; X86-NEXT: movb {{[0-9]+}}(%esp), %dl
; X86-NEXT: shlb $4, %dl
; X86-NEXT: sarb $4, %dl
; X86-NEXT: movb {{[0-9]+}}(%esp), %dh
; X86-NEXT: shlb $4, %dh
; X86-NEXT: sarb $4, %dh
; X86-NEXT: shlb $2, %dh
; X86-NEXT: movsbl %dh, %eax
; X86-NEXT: idivb %dl
; X86-NEXT: movsbl %ah, %ecx
; X86-NEXT: movzbl %al, %esi
; X86-NEXT: decb %al
; X86-NEXT: movzbl %al, %eax
; X86-NEXT: testb %dl, %dl
; X86-NEXT: sets %dl
; X86-NEXT: testb %dh, %dh
; X86-NEXT: sets %dh
; X86-NEXT: xorb %dl, %dh
; X86-NEXT: testb %cl, %cl
; X86-NEXT: setne %cl
; X86-NEXT: testb %dh, %cl
; X86-NEXT: cmovel %esi, %eax
; X86-NEXT: # kill: def $al killed $al killed $eax
; X86-NEXT: popl %esi
; X86-NEXT: retl
%tmp = call i4 @llvm.sdiv.fix.i4(i4 %x, i4 %y, i32 2)
ret i4 %tmp
}
define i64 @func5(i64 %x, i64 %y) nounwind {
; X64-LABEL: func5:
; X64: # %bb.0:
; X64-NEXT: pushq %rbp
; X64-NEXT: pushq %r15
; X64-NEXT: pushq %r14
; X64-NEXT: pushq %r13
; X64-NEXT: pushq %r12
; X64-NEXT: pushq %rbx
; X64-NEXT: subq $24, %rsp
; X64-NEXT: movq %rsi, %r14
; X64-NEXT: movq %rdi, %r15
; X64-NEXT: movq %rdi, %rax
; X64-NEXT: shrq $33, %rax
; X64-NEXT: movq %rdi, %rbx
; X64-NEXT: sarq $63, %rbx
; X64-NEXT: shlq $31, %rbx
; X64-NEXT: orq %rax, %rbx
; X64-NEXT: sets {{[-0-9]+}}(%r{{[sb]}}p) # 1-byte Folded Spill
; X64-NEXT: shlq $31, %r15
; X64-NEXT: movq %rsi, %r12
; X64-NEXT: sarq $63, %r12
; X64-NEXT: movq %r15, %rdi
; X64-NEXT: movq %rbx, %rsi
; X64-NEXT: movq %r14, %rdx
; X64-NEXT: movq %r12, %rcx
; X64-NEXT: callq __divti3
; X64-NEXT: movq %rax, %r13
; X64-NEXT: decq %rax
; X64-NEXT: movq %rax, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
; X64-NEXT: testq %r12, %r12
; X64-NEXT: sets %bpl
; X64-NEXT: xorb {{[-0-9]+}}(%r{{[sb]}}p), %bpl # 1-byte Folded Reload
; X64-NEXT: movq %r15, %rdi
; X64-NEXT: movq %rbx, %rsi
; X64-NEXT: movq %r14, %rdx
; X64-NEXT: movq %r12, %rcx
; X64-NEXT: callq __modti3
; X64-NEXT: orq %rax, %rdx
; X64-NEXT: setne %al
; X64-NEXT: testb %bpl, %al
; X64-NEXT: cmovneq {{[-0-9]+}}(%r{{[sb]}}p), %r13 # 8-byte Folded Reload
; X64-NEXT: movq %r13, %rax
; X64-NEXT: addq $24, %rsp
; X64-NEXT: popq %rbx
; X64-NEXT: popq %r12
; X64-NEXT: popq %r13
; X64-NEXT: popq %r14
; X64-NEXT: popq %r15
; X64-NEXT: popq %rbp
; X64-NEXT: retq
;
; X86-LABEL: func5:
; X86: # %bb.0:
; X86-NEXT: pushl %ebp
; X86-NEXT: movl %esp, %ebp
; X86-NEXT: pushl %ebx
; X86-NEXT: pushl %edi
; X86-NEXT: pushl %esi
; X86-NEXT: andl $-8, %esp
; X86-NEXT: subl $72, %esp
; X86-NEXT: movl 8(%ebp), %ecx
; X86-NEXT: movl 12(%ebp), %edx
; X86-NEXT: movl 20(%ebp), %ebx
; X86-NEXT: sarl $31, %ebx
; X86-NEXT: movl %edx, %eax
; X86-NEXT: shldl $31, %ecx, %eax
; X86-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
; X86-NEXT: shll $31, %ecx
; X86-NEXT: movl %ecx, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
; X86-NEXT: movl %edx, %esi
; X86-NEXT: sarl $31, %esi
; X86-NEXT: movl %esi, %edi
; X86-NEXT: shldl $31, %edx, %esi
; X86-NEXT: leal {{[0-9]+}}(%esp), %edx
; X86-NEXT: rorl %edi
; X86-NEXT: pushl %ebx
; X86-NEXT: pushl %ebx
; X86-NEXT: pushl 20(%ebp)
; X86-NEXT: pushl 16(%ebp)
; X86-NEXT: pushl %edi
; X86-NEXT: pushl %esi
; X86-NEXT: pushl %eax
; X86-NEXT: pushl %ecx
; X86-NEXT: pushl %edx
; X86-NEXT: calll __divti3
; X86-NEXT: addl $32, %esp
; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx
; X86-NEXT: movl %ecx, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
; X86-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
; X86-NEXT: subl $1, %ecx
; X86-NEXT: movl %ecx, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
; X86-NEXT: sbbl $0, %eax
; X86-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
; X86-NEXT: testl %ebx, %ebx
; X86-NEXT: sets %al
; X86-NEXT: testl %edi, %edi
; X86-NEXT: sets %cl
; X86-NEXT: xorb %al, %cl
; X86-NEXT: movb %cl, {{[-0-9]+}}(%e{{[sb]}}p) # 1-byte Spill
; X86-NEXT: leal {{[0-9]+}}(%esp), %eax
; X86-NEXT: pushl %ebx
; X86-NEXT: pushl %ebx
; X86-NEXT: pushl 20(%ebp)
; X86-NEXT: pushl 16(%ebp)
; X86-NEXT: pushl %edi
; X86-NEXT: pushl %esi
; X86-NEXT: pushl {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Folded Reload
; X86-NEXT: pushl {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Folded Reload
; X86-NEXT: pushl %eax
; X86-NEXT: calll __modti3
; X86-NEXT: addl $32, %esp
; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
; X86-NEXT: orl {{[0-9]+}}(%esp), %eax
; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx
; X86-NEXT: orl {{[0-9]+}}(%esp), %ecx
; X86-NEXT: orl %eax, %ecx
; X86-NEXT: setne %al
; X86-NEXT: testb %al, {{[-0-9]+}}(%e{{[sb]}}p) # 1-byte Folded Reload
; X86-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Reload
; X86-NEXT: cmovel {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Folded Reload
; X86-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %edx # 4-byte Reload
; X86-NEXT: cmovel {{[-0-9]+}}(%e{{[sb]}}p), %edx # 4-byte Folded Reload
; X86-NEXT: leal -12(%ebp), %esp
; X86-NEXT: popl %esi
; X86-NEXT: popl %edi
; X86-NEXT: popl %ebx
; X86-NEXT: popl %ebp
; X86-NEXT: retl
%tmp = call i64 @llvm.sdiv.fix.i64(i64 %x, i64 %y, i32 31)
ret i64 %tmp
}
define i18 @func6(i16 %x, i16 %y) nounwind {
; X64-LABEL: func6:
; X64: # %bb.0:
; X64-NEXT: movswl %di, %ecx
; X64-NEXT: movswl %si, %esi
; X64-NEXT: shll $7, %ecx
; X64-NEXT: movl %ecx, %eax
; X64-NEXT: cltd
; X64-NEXT: idivl %esi
; X64-NEXT: # kill: def $eax killed $eax def $rax
; X64-NEXT: leal -1(%rax), %edi
; X64-NEXT: testl %esi, %esi
; X64-NEXT: sets %sil
; X64-NEXT: testl %ecx, %ecx
; X64-NEXT: sets %cl
; X64-NEXT: xorb %sil, %cl
; X64-NEXT: testl %edx, %edx
; X64-NEXT: setne %dl
; X64-NEXT: testb %cl, %dl
; X64-NEXT: cmovnel %edi, %eax
; X64-NEXT: # kill: def $eax killed $eax killed $rax
; X64-NEXT: retq
;
; X86-LABEL: func6:
; X86: # %bb.0:
; X86-NEXT: pushl %ebx
; X86-NEXT: pushl %edi
; X86-NEXT: pushl %esi
; X86-NEXT: movswl {{[0-9]+}}(%esp), %esi
; X86-NEXT: movswl {{[0-9]+}}(%esp), %ecx
; X86-NEXT: shll $7, %ecx
; X86-NEXT: movl %ecx, %eax
; X86-NEXT: cltd
; X86-NEXT: idivl %esi
; X86-NEXT: leal -1(%eax), %edi
; X86-NEXT: testl %esi, %esi
; X86-NEXT: sets %bl
; X86-NEXT: testl %ecx, %ecx
; X86-NEXT: sets %cl
; X86-NEXT: xorb %bl, %cl
; X86-NEXT: testl %edx, %edx
; X86-NEXT: setne %dl
; X86-NEXT: testb %cl, %dl
; X86-NEXT: cmovnel %edi, %eax
; X86-NEXT: popl %esi
; X86-NEXT: popl %edi
; X86-NEXT: popl %ebx
; X86-NEXT: retl
%x2 = sext i16 %x to i18
%y2 = sext i16 %y to i18
%tmp = call i18 @llvm.sdiv.fix.i18(i18 %x2, i18 %y2, i32 7)
ret i18 %tmp
}
define <4 x i32> @vec(<4 x i32> %x, <4 x i32> %y) nounwind {
; X64-LABEL: vec:
; X64: # %bb.0:
; X64-NEXT: pxor %xmm2, %xmm2
; X64-NEXT: pcmpgtd %xmm1, %xmm2
; X64-NEXT: pshufd {{.*#+}} xmm3 = xmm1[2,3,0,1]
; X64-NEXT: movdqa %xmm1, %xmm4
; X64-NEXT: punpckldq {{.*#+}} xmm4 = xmm4[0],xmm2[0],xmm4[1],xmm2[1]
; X64-NEXT: movq %xmm4, %rcx
; X64-NEXT: pxor %xmm2, %xmm2
; X64-NEXT: pcmpgtd %xmm0, %xmm2
; X64-NEXT: pshufd {{.*#+}} xmm1 = xmm0[2,3,0,1]
; X64-NEXT: punpckldq {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1]
; X64-NEXT: psllq $31, %xmm0
; X64-NEXT: movq %xmm0, %rax
; X64-NEXT: cqto
; X64-NEXT: idivq %rcx
; X64-NEXT: movq %rax, %r8
; X64-NEXT: movq %rdx, %r11
; X64-NEXT: pshufd {{.*#+}} xmm2 = xmm4[2,3,0,1]
; X64-NEXT: movq %xmm2, %rcx
; X64-NEXT: pshufd {{.*#+}} xmm2 = xmm0[2,3,0,1]
; X64-NEXT: movq %xmm2, %rax
; X64-NEXT: cqto
; X64-NEXT: idivq %rcx
; X64-NEXT: movq %rax, %r10
; X64-NEXT: movq %rdx, %rcx
; X64-NEXT: pxor %xmm2, %xmm2
; X64-NEXT: pcmpgtd %xmm3, %xmm2
; X64-NEXT: punpckldq {{.*#+}} xmm3 = xmm3[0],xmm2[0],xmm3[1],xmm2[1]
; X64-NEXT: movq %xmm3, %rdi
; X64-NEXT: pxor %xmm2, %xmm2
; X64-NEXT: pcmpgtd %xmm1, %xmm2
; X64-NEXT: punpckldq {{.*#+}} xmm1 = xmm1[0],xmm2[0],xmm1[1],xmm2[1]
; X64-NEXT: psllq $31, %xmm1
; X64-NEXT: movq %xmm1, %rax
; X64-NEXT: cqto
; X64-NEXT: idivq %rdi
; X64-NEXT: movq %rax, %r9
; X64-NEXT: movq %rdx, %rdi
; X64-NEXT: pshufd {{.*#+}} xmm2 = xmm3[2,3,0,1]
; X64-NEXT: movq %xmm2, %rsi
; X64-NEXT: pshufd {{.*#+}} xmm2 = xmm1[2,3,0,1]
; X64-NEXT: movq %xmm2, %rax
; X64-NEXT: cqto
; X64-NEXT: idivq %rsi
; X64-NEXT: movq %r11, %xmm2
; X64-NEXT: movq %rcx, %xmm5
; X64-NEXT: pxor %xmm6, %xmm6
; X64-NEXT: punpcklqdq {{.*#+}} xmm2 = xmm2[0],xmm5[0]
; X64-NEXT: pcmpeqd %xmm6, %xmm2
; X64-NEXT: pshufd {{.*#+}} xmm5 = xmm2[1,0,3,2]
; X64-NEXT: pand %xmm2, %xmm5
; X64-NEXT: pxor %xmm2, %xmm2
; X64-NEXT: pcmpgtd %xmm4, %xmm2
; X64-NEXT: pxor %xmm4, %xmm4
; X64-NEXT: pcmpgtd %xmm0, %xmm4
; X64-NEXT: movq %r8, %xmm0
; X64-NEXT: pshufd {{.*#+}} xmm2 = xmm2[1,1,3,3]
; X64-NEXT: pshufd {{.*#+}} xmm4 = xmm4[1,1,3,3]
; X64-NEXT: pxor %xmm2, %xmm4
; X64-NEXT: movq %r10, %xmm2
; X64-NEXT: pandn %xmm4, %xmm5
; X64-NEXT: punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm2[0]
; X64-NEXT: movdqa %xmm5, %xmm2
; X64-NEXT: pandn %xmm0, %xmm2
; X64-NEXT: pcmpeqd %xmm4, %xmm4
; X64-NEXT: paddq %xmm4, %xmm0
; X64-NEXT: pand %xmm5, %xmm0
; X64-NEXT: por %xmm2, %xmm0
; X64-NEXT: movq %rdi, %xmm2
; X64-NEXT: movq %rdx, %xmm5
; X64-NEXT: punpcklqdq {{.*#+}} xmm2 = xmm2[0],xmm5[0]
; X64-NEXT: pcmpeqd %xmm6, %xmm2
; X64-NEXT: pshufd {{.*#+}} xmm5 = xmm2[1,0,3,2]
; X64-NEXT: pand %xmm2, %xmm5
; X64-NEXT: pxor %xmm2, %xmm2
; X64-NEXT: pcmpgtd %xmm3, %xmm2
; X64-NEXT: pshufd {{.*#+}} xmm2 = xmm2[1,1,3,3]
; X64-NEXT: pcmpgtd %xmm1, %xmm6
; X64-NEXT: pshufd {{.*#+}} xmm1 = xmm6[1,1,3,3]
; X64-NEXT: pxor %xmm2, %xmm1
; X64-NEXT: pandn %xmm1, %xmm5
; X64-NEXT: movq %r9, %xmm1
; X64-NEXT: movq %rax, %xmm2
; X64-NEXT: punpcklqdq {{.*#+}} xmm1 = xmm1[0],xmm2[0]
; X64-NEXT: movdqa %xmm5, %xmm2
; X64-NEXT: pandn %xmm1, %xmm2
; X64-NEXT: paddq %xmm4, %xmm1
; X64-NEXT: pand %xmm5, %xmm1
; X64-NEXT: por %xmm2, %xmm1
; X64-NEXT: shufps {{.*#+}} xmm0 = xmm0[0,2],xmm1[0,2]
; X64-NEXT: retq
;
; X86-LABEL: vec:
; X86: # %bb.0:
; X86-NEXT: pushl %ebp
; X86-NEXT: pushl %ebx
; X86-NEXT: pushl %edi
; X86-NEXT: pushl %esi
; X86-NEXT: subl $64, %esp
; X86-NEXT: movl {{[0-9]+}}(%esp), %ebp
; X86-NEXT: movl {{[0-9]+}}(%esp), %ebx
; X86-NEXT: movl {{[0-9]+}}(%esp), %edi
; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx
; X86-NEXT: movl %ecx, %edx
; X86-NEXT: sarl $31, %edx
; X86-NEXT: movl %edi, %esi
; X86-NEXT: shll $31, %esi
; X86-NEXT: movl %ebx, %eax
; X86-NEXT: shrl %eax
; X86-NEXT: andl $-2147483648, %ebx # imm = 0x80000000
; X86-NEXT: orl %eax, %ebx
; X86-NEXT: sets {{[-0-9]+}}(%e{{[sb]}}p) # 1-byte Folded Spill
; X86-NEXT: movl %ebp, %eax
; X86-NEXT: shrl %eax
; X86-NEXT: andl $-2147483648, %ebp # imm = 0x80000000
; X86-NEXT: orl %eax, %ebp
; X86-NEXT: movl %ebp, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
; X86-NEXT: sets {{[-0-9]+}}(%e{{[sb]}}p) # 1-byte Folded Spill
; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
; X86-NEXT: shrl %eax
; X86-NEXT: movl {{[0-9]+}}(%esp), %ebp
; X86-NEXT: andl $-2147483648, %ebp # imm = 0x80000000
; X86-NEXT: orl %eax, %ebp
; X86-NEXT: movl %ebp, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
; X86-NEXT: sets (%esp) # 1-byte Folded Spill
; X86-NEXT: movl %edi, %eax
; X86-NEXT: shrl %eax
; X86-NEXT: andl $-2147483648, %edi # imm = 0x80000000
; X86-NEXT: orl %eax, %edi
; X86-NEXT: sets {{[-0-9]+}}(%e{{[sb]}}p) # 1-byte Folded Spill
; X86-NEXT: pushl %edx
; X86-NEXT: movl %edx, %ebp
; X86-NEXT: movl %edx, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
; X86-NEXT: pushl %ecx
; X86-NEXT: pushl %edi
; X86-NEXT: pushl %esi
; X86-NEXT: calll __moddi3
; X86-NEXT: addl $16, %esp
; X86-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
; X86-NEXT: movl %edx, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
; X86-NEXT: pushl %ebp
; X86-NEXT: pushl {{[0-9]+}}(%esp)
; X86-NEXT: pushl %edi
; X86-NEXT: pushl %esi
; X86-NEXT: calll __divdi3
; X86-NEXT: addl $16, %esp
; X86-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx
; X86-NEXT: shll $31, %ecx
; X86-NEXT: movl {{[0-9]+}}(%esp), %edx
; X86-NEXT: movl %edx, %eax
; X86-NEXT: sarl $31, %eax
; X86-NEXT: pushl %eax
; X86-NEXT: movl %eax, %esi
; X86-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
; X86-NEXT: pushl %edx
; X86-NEXT: movl %edx, %ebp
; X86-NEXT: pushl %ebx
; X86-NEXT: pushl %ecx
; X86-NEXT: movl %ecx, %edi
; X86-NEXT: calll __moddi3
; X86-NEXT: addl $16, %esp
; X86-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
; X86-NEXT: movl %edx, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
; X86-NEXT: pushl %esi
; X86-NEXT: pushl %ebp
; X86-NEXT: pushl %ebx
; X86-NEXT: pushl %edi
; X86-NEXT: calll __divdi3
; X86-NEXT: addl $16, %esp
; X86-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
; X86-NEXT: shll $31, %eax
; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx
; X86-NEXT: movl %ecx, %edx
; X86-NEXT: sarl $31, %edx
; X86-NEXT: pushl %edx
; X86-NEXT: movl %edx, %ebp
; X86-NEXT: movl %edx, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
; X86-NEXT: pushl %ecx
; X86-NEXT: movl %ecx, %edi
; X86-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %ebx # 4-byte Reload
; X86-NEXT: pushl %ebx
; X86-NEXT: pushl %eax
; X86-NEXT: movl %eax, %esi
; X86-NEXT: calll __moddi3
; X86-NEXT: addl $16, %esp
; X86-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
; X86-NEXT: movl %edx, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
; X86-NEXT: pushl %ebp
; X86-NEXT: pushl %edi
; X86-NEXT: pushl %ebx
; X86-NEXT: pushl %esi
; X86-NEXT: calll __divdi3
; X86-NEXT: addl $16, %esp
; X86-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
; X86-NEXT: shll $31, %eax
; X86-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx
; X86-NEXT: movl %ecx, %ebp
; X86-NEXT: sarl $31, %ebp
; X86-NEXT: pushl %ebp
; X86-NEXT: pushl %ecx
; X86-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %esi # 4-byte Reload
; X86-NEXT: pushl %esi
; X86-NEXT: pushl %eax
; X86-NEXT: calll __moddi3
; X86-NEXT: addl $16, %esp
; X86-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
; X86-NEXT: movl %edx, %edi
; X86-NEXT: testl %ebp, %ebp
; X86-NEXT: sets %bl
; X86-NEXT: xorb (%esp), %bl # 1-byte Folded Reload
; X86-NEXT: pushl %ebp
; X86-NEXT: pushl {{[0-9]+}}(%esp)
; X86-NEXT: pushl %esi
; X86-NEXT: pushl {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Folded Reload
; X86-NEXT: calll __divdi3
; X86-NEXT: addl $16, %esp
; X86-NEXT: orl {{[-0-9]+}}(%e{{[sb]}}p), %edi # 4-byte Folded Reload
; X86-NEXT: setne %cl
; X86-NEXT: testb %bl, %cl
; X86-NEXT: leal -1(%eax), %ecx
; X86-NEXT: cmovel %eax, %ecx
; X86-NEXT: cmpl $0, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Folded Reload
; X86-NEXT: sets %al
; X86-NEXT: xorb {{[-0-9]+}}(%e{{[sb]}}p), %al # 1-byte Folded Reload
; X86-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %edx # 4-byte Reload
; X86-NEXT: orl {{[-0-9]+}}(%e{{[sb]}}p), %edx # 4-byte Folded Reload
; X86-NEXT: setne %dl
; X86-NEXT: testb %al, %dl
; X86-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Reload
; X86-NEXT: leal -1(%eax), %edi
; X86-NEXT: cmovel %eax, %edi
; X86-NEXT: cmpl $0, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Folded Reload
; X86-NEXT: sets %dl
; X86-NEXT: xorb {{[-0-9]+}}(%e{{[sb]}}p), %dl # 1-byte Folded Reload
; X86-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Reload
; X86-NEXT: orl {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Folded Reload
; X86-NEXT: setne %dh
; X86-NEXT: testb %dl, %dh
; X86-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Reload
; X86-NEXT: leal -1(%eax), %edx
; X86-NEXT: cmovel %eax, %edx
; X86-NEXT: cmpl $0, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Folded Reload
; X86-NEXT: sets %bl
; X86-NEXT: xorb {{[-0-9]+}}(%e{{[sb]}}p), %bl # 1-byte Folded Reload
; X86-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Reload
; X86-NEXT: orl {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Folded Reload
; X86-NEXT: setne %bh
; X86-NEXT: testb %bl, %bh
; X86-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Reload
; X86-NEXT: leal -1(%eax), %esi
; X86-NEXT: cmovel %eax, %esi
; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
; X86-NEXT: movl %esi, 12(%eax)
; X86-NEXT: movl %edx, 8(%eax)
; X86-NEXT: movl %edi, 4(%eax)
; X86-NEXT: movl %ecx, (%eax)
; X86-NEXT: addl $64, %esp
; X86-NEXT: popl %esi
; X86-NEXT: popl %edi
; X86-NEXT: popl %ebx
; X86-NEXT: popl %ebp
; X86-NEXT: retl $4
%tmp = call <4 x i32> @llvm.sdiv.fix.v4i32(<4 x i32> %x, <4 x i32> %y, i32 31)
ret <4 x i32> %tmp
}

View File

@ -0,0 +1,344 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc < %s -mtriple=x86_64-linux | FileCheck %s --check-prefix=X64
; RUN: llc < %s -mtriple=i686 -mattr=cmov | FileCheck %s --check-prefix=X86
declare i4 @llvm.udiv.fix.i4 (i4, i4, i32)
declare i15 @llvm.udiv.fix.i15 (i15, i15, i32)
declare i16 @llvm.udiv.fix.i16 (i16, i16, i32)
declare i18 @llvm.udiv.fix.i18 (i18, i18, i32)
declare i64 @llvm.udiv.fix.i64 (i64, i64, i32)
declare <4 x i32> @llvm.udiv.fix.v4i32(<4 x i32>, <4 x i32>, i32)
define i16 @func(i16 %x, i16 %y) nounwind {
; X64-LABEL: func:
; X64: # %bb.0:
; X64-NEXT: movzwl %si, %ecx
; X64-NEXT: movzwl %di, %eax
; X64-NEXT: shll $7, %eax
; X64-NEXT: xorl %edx, %edx
; X64-NEXT: divl %ecx
; X64-NEXT: # kill: def $ax killed $ax killed $eax
; X64-NEXT: retq
;
; X86-LABEL: func:
; X86: # %bb.0:
; X86-NEXT: movzwl {{[0-9]+}}(%esp), %ecx
; X86-NEXT: movzwl {{[0-9]+}}(%esp), %eax
; X86-NEXT: shll $7, %eax
; X86-NEXT: xorl %edx, %edx
; X86-NEXT: divl %ecx
; X86-NEXT: # kill: def $ax killed $ax killed $eax
; X86-NEXT: retl
%tmp = call i16 @llvm.udiv.fix.i16(i16 %x, i16 %y, i32 7)
ret i16 %tmp
}
define i16 @func2(i8 %x, i8 %y) nounwind {
; X64-LABEL: func2:
; X64: # %bb.0:
; X64-NEXT: movsbl %dil, %eax
; X64-NEXT: andl $32767, %eax # imm = 0x7FFF
; X64-NEXT: movsbl %sil, %ecx
; X64-NEXT: andl $32767, %ecx # imm = 0x7FFF
; X64-NEXT: shll $14, %eax
; X64-NEXT: xorl %edx, %edx
; X64-NEXT: divl %ecx
; X64-NEXT: addl %eax, %eax
; X64-NEXT: cwtl
; X64-NEXT: shrl %eax
; X64-NEXT: # kill: def $ax killed $ax killed $eax
; X64-NEXT: retq
;
; X86-LABEL: func2:
; X86: # %bb.0:
; X86-NEXT: movsbl {{[0-9]+}}(%esp), %ecx
; X86-NEXT: andl $32767, %ecx # imm = 0x7FFF
; X86-NEXT: movsbl {{[0-9]+}}(%esp), %eax
; X86-NEXT: andl $32767, %eax # imm = 0x7FFF
; X86-NEXT: shll $14, %eax
; X86-NEXT: xorl %edx, %edx
; X86-NEXT: divl %ecx
; X86-NEXT: addl %eax, %eax
; X86-NEXT: cwtl
; X86-NEXT: shrl %eax
; X86-NEXT: # kill: def $ax killed $ax killed $eax
; X86-NEXT: retl
%x2 = sext i8 %x to i15
%y2 = sext i8 %y to i15
%tmp = call i15 @llvm.udiv.fix.i15(i15 %x2, i15 %y2, i32 14)
%tmp2 = sext i15 %tmp to i16
ret i16 %tmp2
}
define i16 @func3(i15 %x, i8 %y) nounwind {
; X64-LABEL: func3:
; X64: # %bb.0:
; X64-NEXT: # kill: def $edi killed $edi def $rdi
; X64-NEXT: leal (%rdi,%rdi), %eax
; X64-NEXT: movzbl %sil, %ecx
; X64-NEXT: shll $4, %ecx
; X64-NEXT: # kill: def $ax killed $ax killed $eax
; X64-NEXT: xorl %edx, %edx
; X64-NEXT: divw %cx
; X64-NEXT: # kill: def $ax killed $ax def $eax
; X64-NEXT: addl %eax, %eax
; X64-NEXT: cwtl
; X64-NEXT: shrl %eax
; X64-NEXT: # kill: def $ax killed $ax killed $eax
; X64-NEXT: retq
;
; X86-LABEL: func3:
; X86: # %bb.0:
; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx
; X86-NEXT: addl %eax, %eax
; X86-NEXT: movzbl %cl, %ecx
; X86-NEXT: shll $4, %ecx
; X86-NEXT: # kill: def $ax killed $ax killed $eax
; X86-NEXT: xorl %edx, %edx
; X86-NEXT: divw %cx
; X86-NEXT: # kill: def $ax killed $ax def $eax
; X86-NEXT: addl %eax, %eax
; X86-NEXT: cwtl
; X86-NEXT: shrl %eax
; X86-NEXT: # kill: def $ax killed $ax killed $eax
; X86-NEXT: retl
%y2 = sext i8 %y to i15
%y3 = shl i15 %y2, 7
%tmp = call i15 @llvm.udiv.fix.i15(i15 %x, i15 %y3, i32 4)
%tmp2 = sext i15 %tmp to i16
ret i16 %tmp2
}
define i4 @func4(i4 %x, i4 %y) nounwind {
; X64-LABEL: func4:
; X64: # %bb.0:
; X64-NEXT: andb $15, %sil
; X64-NEXT: andb $15, %dil
; X64-NEXT: shlb $2, %dil
; X64-NEXT: movzbl %dil, %eax
; X64-NEXT: divb %sil
; X64-NEXT: retq
;
; X86-LABEL: func4:
; X86: # %bb.0:
; X86-NEXT: movb {{[0-9]+}}(%esp), %cl
; X86-NEXT: andb $15, %cl
; X86-NEXT: movb {{[0-9]+}}(%esp), %al
; X86-NEXT: andb $15, %al
; X86-NEXT: shlb $2, %al
; X86-NEXT: movzbl %al, %eax
; X86-NEXT: divb %cl
; X86-NEXT: retl
%tmp = call i4 @llvm.udiv.fix.i4(i4 %x, i4 %y, i32 2)
ret i4 %tmp
}
define i64 @func5(i64 %x, i64 %y) nounwind {
; X64-LABEL: func5:
; X64: # %bb.0:
; X64-NEXT: pushq %rax
; X64-NEXT: movq %rsi, %rdx
; X64-NEXT: movq %rdi, %rsi
; X64-NEXT: shlq $31, %rdi
; X64-NEXT: shrq $33, %rsi
; X64-NEXT: xorl %ecx, %ecx
; X64-NEXT: callq __udivti3
; X64-NEXT: popq %rcx
; X64-NEXT: retq
;
; X86-LABEL: func5:
; X86: # %bb.0:
; X86-NEXT: pushl %ebp
; X86-NEXT: movl %esp, %ebp
; X86-NEXT: pushl %esi
; X86-NEXT: andl $-8, %esp
; X86-NEXT: subl $24, %esp
; X86-NEXT: movl 8(%ebp), %eax
; X86-NEXT: movl 12(%ebp), %ecx
; X86-NEXT: movl %ecx, %edx
; X86-NEXT: shrl %edx
; X86-NEXT: shldl $31, %eax, %ecx
; X86-NEXT: shll $31, %eax
; X86-NEXT: movl %esp, %esi
; X86-NEXT: pushl $0
; X86-NEXT: pushl $0
; X86-NEXT: pushl 20(%ebp)
; X86-NEXT: pushl 16(%ebp)
; X86-NEXT: pushl $0
; X86-NEXT: pushl %edx
; X86-NEXT: pushl %ecx
; X86-NEXT: pushl %eax
; X86-NEXT: pushl %esi
; X86-NEXT: calll __udivti3
; X86-NEXT: addl $32, %esp
; X86-NEXT: movl (%esp), %eax
; X86-NEXT: movl {{[0-9]+}}(%esp), %edx
; X86-NEXT: leal -4(%ebp), %esp
; X86-NEXT: popl %esi
; X86-NEXT: popl %ebp
; X86-NEXT: retl
%tmp = call i64 @llvm.udiv.fix.i64(i64 %x, i64 %y, i32 31)
ret i64 %tmp
}
define i18 @func6(i16 %x, i16 %y) nounwind {
; X64-LABEL: func6:
; X64: # %bb.0:
; X64-NEXT: movswl %di, %eax
; X64-NEXT: andl $262143, %eax # imm = 0x3FFFF
; X64-NEXT: movswl %si, %ecx
; X64-NEXT: andl $262143, %ecx # imm = 0x3FFFF
; X64-NEXT: shll $7, %eax
; X64-NEXT: xorl %edx, %edx
; X64-NEXT: divl %ecx
; X64-NEXT: retq
;
; X86-LABEL: func6:
; X86: # %bb.0:
; X86-NEXT: movswl {{[0-9]+}}(%esp), %ecx
; X86-NEXT: andl $262143, %ecx # imm = 0x3FFFF
; X86-NEXT: movswl {{[0-9]+}}(%esp), %eax
; X86-NEXT: andl $262143, %eax # imm = 0x3FFFF
; X86-NEXT: shll $7, %eax
; X86-NEXT: xorl %edx, %edx
; X86-NEXT: divl %ecx
; X86-NEXT: retl
%x2 = sext i16 %x to i18
%y2 = sext i16 %y to i18
%tmp = call i18 @llvm.udiv.fix.i18(i18 %x2, i18 %y2, i32 7)
ret i18 %tmp
}
define i16 @func7(i16 %x, i16 %y) nounwind {
; X64-LABEL: func7:
; X64: # %bb.0:
; X64-NEXT: movl %edi, %eax
; X64-NEXT: shll $16, %eax
; X64-NEXT: movzwl %si, %ecx
; X64-NEXT: xorl %edx, %edx
; X64-NEXT: divl %ecx
; X64-NEXT: # kill: def $ax killed $ax killed $eax
; X64-NEXT: retq
;
; X86-LABEL: func7:
; X86: # %bb.0:
; X86-NEXT: movzwl {{[0-9]+}}(%esp), %ecx
; X86-NEXT: movzwl {{[0-9]+}}(%esp), %eax
; X86-NEXT: shll $16, %eax
; X86-NEXT: xorl %edx, %edx
; X86-NEXT: divl %ecx
; X86-NEXT: # kill: def $ax killed $ax killed $eax
; X86-NEXT: retl
%tmp = call i16 @llvm.udiv.fix.i16(i16 %x, i16 %y, i32 16)
ret i16 %tmp
}
define <4 x i32> @vec(<4 x i32> %x, <4 x i32> %y) nounwind {
; X64-LABEL: vec:
; X64: # %bb.0:
; X64-NEXT: pxor %xmm2, %xmm2
; X64-NEXT: movdqa %xmm1, %xmm4
; X64-NEXT: punpckhdq {{.*#+}} xmm4 = xmm4[2],xmm2[2],xmm4[3],xmm2[3]
; X64-NEXT: movq %xmm4, %rcx
; X64-NEXT: movdqa %xmm0, %xmm5
; X64-NEXT: punpckhdq {{.*#+}} xmm5 = xmm5[2],xmm2[2],xmm5[3],xmm2[3]
; X64-NEXT: psllq $31, %xmm5
; X64-NEXT: movq %xmm5, %rax
; X64-NEXT: xorl %edx, %edx
; X64-NEXT: divq %rcx
; X64-NEXT: movq %rax, %xmm3
; X64-NEXT: pshufd {{.*#+}} xmm4 = xmm4[2,3,0,1]
; X64-NEXT: movq %xmm4, %rcx
; X64-NEXT: pshufd {{.*#+}} xmm4 = xmm5[2,3,0,1]
; X64-NEXT: movq %xmm4, %rax
; X64-NEXT: xorl %edx, %edx
; X64-NEXT: divq %rcx
; X64-NEXT: movq %rax, %xmm4
; X64-NEXT: punpcklqdq {{.*#+}} xmm3 = xmm3[0],xmm4[0]
; X64-NEXT: punpckldq {{.*#+}} xmm1 = xmm1[0],xmm2[0],xmm1[1],xmm2[1]
; X64-NEXT: movq %xmm1, %rcx
; X64-NEXT: punpckldq {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1]
; X64-NEXT: psllq $31, %xmm0
; X64-NEXT: movq %xmm0, %rax
; X64-NEXT: xorl %edx, %edx
; X64-NEXT: divq %rcx
; X64-NEXT: movq %rax, %xmm2
; X64-NEXT: pshufd {{.*#+}} xmm1 = xmm1[2,3,0,1]
; X64-NEXT: movq %xmm1, %rcx
; X64-NEXT: pshufd {{.*#+}} xmm0 = xmm0[2,3,0,1]
; X64-NEXT: movq %xmm0, %rax
; X64-NEXT: xorl %edx, %edx
; X64-NEXT: divq %rcx
; X64-NEXT: movq %rax, %xmm0
; X64-NEXT: punpcklqdq {{.*#+}} xmm2 = xmm2[0],xmm0[0]
; X64-NEXT: shufps {{.*#+}} xmm2 = xmm2[0,2],xmm3[0,2]
; X64-NEXT: movaps %xmm2, %xmm0
; X64-NEXT: retq
;
; X86-LABEL: vec:
; X86: # %bb.0:
; X86-NEXT: pushl %ebp
; X86-NEXT: pushl %ebx
; X86-NEXT: pushl %edi
; X86-NEXT: pushl %esi
; X86-NEXT: pushl %eax
; X86-NEXT: movl {{[0-9]+}}(%esp), %esi
; X86-NEXT: movl {{[0-9]+}}(%esp), %edi
; X86-NEXT: movl {{[0-9]+}}(%esp), %ebp
; X86-NEXT: movl {{[0-9]+}}(%esp), %ebx
; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
; X86-NEXT: movl %eax, %ecx
; X86-NEXT: shrl %ecx
; X86-NEXT: shll $31, %eax
; X86-NEXT: pushl $0
; X86-NEXT: pushl {{[0-9]+}}(%esp)
; X86-NEXT: pushl %ecx
; X86-NEXT: pushl %eax
; X86-NEXT: calll __udivdi3
; X86-NEXT: addl $16, %esp
; X86-NEXT: movl %eax, (%esp) # 4-byte Spill
; X86-NEXT: movl %ebx, %eax
; X86-NEXT: shrl %eax
; X86-NEXT: shll $31, %ebx
; X86-NEXT: pushl $0
; X86-NEXT: pushl {{[0-9]+}}(%esp)
; X86-NEXT: pushl %eax
; X86-NEXT: pushl %ebx
; X86-NEXT: calll __udivdi3
; X86-NEXT: addl $16, %esp
; X86-NEXT: movl %eax, %ebx
; X86-NEXT: movl %ebp, %eax
; X86-NEXT: shrl %eax
; X86-NEXT: shll $31, %ebp
; X86-NEXT: pushl $0
; X86-NEXT: pushl {{[0-9]+}}(%esp)
; X86-NEXT: pushl %eax
; X86-NEXT: pushl %ebp
; X86-NEXT: calll __udivdi3
; X86-NEXT: addl $16, %esp
; X86-NEXT: movl %eax, %ebp
; X86-NEXT: movl %edi, %eax
; X86-NEXT: shrl %eax
; X86-NEXT: shll $31, %edi
; X86-NEXT: pushl $0
; X86-NEXT: pushl {{[0-9]+}}(%esp)
; X86-NEXT: pushl %eax
; X86-NEXT: pushl %edi
; X86-NEXT: calll __udivdi3
; X86-NEXT: addl $16, %esp
; X86-NEXT: movl %eax, 12(%esi)
; X86-NEXT: movl %ebp, 8(%esi)
; X86-NEXT: movl %ebx, 4(%esi)
; X86-NEXT: movl (%esp), %eax # 4-byte Reload
; X86-NEXT: movl %eax, (%esi)
; X86-NEXT: movl %esi, %eax
; X86-NEXT: addl $4, %esp
; X86-NEXT: popl %esi
; X86-NEXT: popl %edi
; X86-NEXT: popl %ebx
; X86-NEXT: popl %ebp
; X86-NEXT: retl $4
%tmp = call <4 x i32> @llvm.udiv.fix.v4i32(<4 x i32> %x, <4 x i32> %y, i32 31)
ret <4 x i32> %tmp
}