forked from OSchip/llvm-project
[Intrinsic] Add fixed point division intrinsics.
Summary: This patch adds intrinsics and ISelDAG nodes for signed and unsigned fixed-point division: llvm.sdiv.fix.* llvm.udiv.fix.* These intrinsics perform scaled division on two integers or vectors of integers. They are required for the implementation of the Embedded-C fixed-point arithmetic in Clang. Patch by: ebevhan Reviewers: bjope, leonardchan, efriedma, craig.topper Reviewed By: craig.topper Subscribers: Ka-Ka, ilya, hiraditya, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70007
This commit is contained in:
parent
b2c2fe7219
commit
8e2b44f7e0
|
@ -13675,16 +13675,17 @@ Fixed Point Arithmetic Intrinsics
|
|||
|
||||
A fixed point number represents a real data type for a number that has a fixed
|
||||
number of digits after a radix point (equivalent to the decimal point '.').
|
||||
The number of digits after the radix point is referred as the ``scale``. These
|
||||
The number of digits after the radix point is referred as the `scale`. These
|
||||
are useful for representing fractional values to a specific precision. The
|
||||
following intrinsics perform fixed point arithmetic operations on 2 operands
|
||||
of the same scale, specified as the third argument.
|
||||
|
||||
The `llvm.*mul.fix` family of intrinsic functions represents a multiplication
|
||||
The ``llvm.*mul.fix`` family of intrinsic functions represents a multiplication
|
||||
of fixed point numbers through scaled integers. Therefore, fixed point
|
||||
multplication can be represented as
|
||||
multiplication can be represented as
|
||||
|
||||
.. code-block:: llvm
|
||||
|
||||
::
|
||||
%result = call i4 @llvm.smul.fix.i4(i4 %a, i4 %b, i32 %scale)
|
||||
|
||||
; Expands to
|
||||
|
@ -13695,6 +13696,22 @@ multplication can be represented as
|
|||
%r = ashr i8 %mul, i8 %scale2 ; this is for a target rounding down towards negative infinity
|
||||
%result = trunc i8 %r to i4
|
||||
|
||||
The ``llvm.*div.fix`` family of intrinsic functions represents a division of
|
||||
fixed point numbers through scaled integers. Fixed point division can be
|
||||
represented as:
|
||||
|
||||
.. code-block:: llvm
|
||||
|
||||
%result call i4 @llvm.sdiv.fix.i4(i4 %a, i4 %b, i32 %scale)
|
||||
|
||||
; Expands to
|
||||
%a2 = sext i4 %a to i8
|
||||
%b2 = sext i4 %b to i8
|
||||
%scale2 = trunc i32 %scale to i8
|
||||
%a3 = shl i8 %a2, %scale2
|
||||
%r = sdiv i8 %a3, %b2 ; this is for a target rounding towards zero
|
||||
%result = trunc i8 %r to i4
|
||||
|
||||
For each of these functions, if the result cannot be represented exactly with
|
||||
the provided scale, the result is rounded. Rounding is unspecified since
|
||||
preferred rounding may vary for different targets. Rounding is specified
|
||||
|
@ -13963,6 +13980,126 @@ Examples
|
|||
%res = call i4 @llvm.umul.fix.sat.i4(i4 2, i4 4, i32 1) ; %res = 4 (1 x 2 = 2)
|
||||
|
||||
|
||||
'``llvm.sdiv.fix.*``' Intrinsics
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
Syntax
|
||||
"""""""
|
||||
|
||||
This is an overloaded intrinsic. You can use ``llvm.sdiv.fix``
|
||||
on any integer bit width or vectors of integers.
|
||||
|
||||
::
|
||||
|
||||
declare i16 @llvm.sdiv.fix.i16(i16 %a, i16 %b, i32 %scale)
|
||||
declare i32 @llvm.sdiv.fix.i32(i32 %a, i32 %b, i32 %scale)
|
||||
declare i64 @llvm.sdiv.fix.i64(i64 %a, i64 %b, i32 %scale)
|
||||
declare <4 x i32> @llvm.sdiv.fix.v4i32(<4 x i32> %a, <4 x i32> %b, i32 %scale)
|
||||
|
||||
Overview
|
||||
"""""""""
|
||||
|
||||
The '``llvm.sdiv.fix``' family of intrinsic functions perform signed
|
||||
fixed point division on 2 arguments of the same scale.
|
||||
|
||||
Arguments
|
||||
""""""""""
|
||||
|
||||
The arguments (%a and %b) and the result may be of integer types of any bit
|
||||
width, but they must have the same bit width. The arguments may also work with
|
||||
int vectors of the same length and int size. ``%a`` and ``%b`` are the two
|
||||
values that will undergo signed fixed point division. The argument
|
||||
``%scale`` represents the scale of both operands, and must be a constant
|
||||
integer.
|
||||
|
||||
Semantics:
|
||||
""""""""""
|
||||
|
||||
This operation performs fixed point division on the 2 arguments of a
|
||||
specified scale. The result will also be returned in the same scale specified
|
||||
in the third argument.
|
||||
|
||||
If the result value cannot be precisely represented in the given scale, the
|
||||
value is rounded up or down to the closest representable value. The rounding
|
||||
direction is unspecified.
|
||||
|
||||
It is undefined behavior if the result value does not fit within the range of
|
||||
the fixed point type, or if the second argument is zero.
|
||||
|
||||
|
||||
Examples
|
||||
"""""""""
|
||||
|
||||
.. code-block:: llvm
|
||||
|
||||
%res = call i4 @llvm.sdiv.fix.i4(i4 6, i4 2, i32 0) ; %res = 3 (6 / 2 = 3)
|
||||
%res = call i4 @llvm.sdiv.fix.i4(i4 6, i4 4, i32 1) ; %res = 3 (3 / 2 = 1.5)
|
||||
%res = call i4 @llvm.sdiv.fix.i4(i4 3, i4 -2, i32 1) ; %res = -3 (1.5 / -1 = -1.5)
|
||||
|
||||
; The result in the following could be rounded up to 1 or down to 0.5
|
||||
%res = call i4 @llvm.sdiv.fix.i4(i4 3, i4 4, i32 1) ; %res = 2 (or 1) (1.5 / 2 = 0.75)
|
||||
|
||||
|
||||
'``llvm.udiv.fix.*``' Intrinsics
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
Syntax
|
||||
"""""""
|
||||
|
||||
This is an overloaded intrinsic. You can use ``llvm.udiv.fix``
|
||||
on any integer bit width or vectors of integers.
|
||||
|
||||
::
|
||||
|
||||
declare i16 @llvm.udiv.fix.i16(i16 %a, i16 %b, i32 %scale)
|
||||
declare i32 @llvm.udiv.fix.i32(i32 %a, i32 %b, i32 %scale)
|
||||
declare i64 @llvm.udiv.fix.i64(i64 %a, i64 %b, i32 %scale)
|
||||
declare <4 x i32> @llvm.udiv.fix.v4i32(<4 x i32> %a, <4 x i32> %b, i32 %scale)
|
||||
|
||||
Overview
|
||||
"""""""""
|
||||
|
||||
The '``llvm.udiv.fix``' family of intrinsic functions perform unsigned
|
||||
fixed point division on 2 arguments of the same scale.
|
||||
|
||||
Arguments
|
||||
""""""""""
|
||||
|
||||
The arguments (%a and %b) and the result may be of integer types of any bit
|
||||
width, but they must have the same bit width. The arguments may also work with
|
||||
int vectors of the same length and int size. ``%a`` and ``%b`` are the two
|
||||
values that will undergo unsigned fixed point division. The argument
|
||||
``%scale`` represents the scale of both operands, and must be a constant
|
||||
integer.
|
||||
|
||||
Semantics:
|
||||
""""""""""
|
||||
|
||||
This operation performs fixed point division on the 2 arguments of a
|
||||
specified scale. The result will also be returned in the same scale specified
|
||||
in the third argument.
|
||||
|
||||
If the result value cannot be precisely represented in the given scale, the
|
||||
value is rounded up or down to the closest representable value. The rounding
|
||||
direction is unspecified.
|
||||
|
||||
It is undefined behavior if the result value does not fit within the range of
|
||||
the fixed point type, or if the second argument is zero.
|
||||
|
||||
|
||||
Examples
|
||||
"""""""""
|
||||
|
||||
.. code-block:: llvm
|
||||
|
||||
%res = call i4 @llvm.udiv.fix.i4(i4 6, i4 2, i32 0) ; %res = 3 (6 / 2 = 3)
|
||||
%res = call i4 @llvm.udiv.fix.i4(i4 6, i4 4, i32 1) ; %res = 3 (3 / 2 = 1.5)
|
||||
%res = call i4 @llvm.udiv.fix.i4(i4 1, i4 -8, i32 4) ; %res = 2 (0.0625 / 0.5 = 0.125)
|
||||
|
||||
; The result in the following could be rounded up to 1 or down to 0.5
|
||||
%res = call i4 @llvm.udiv.fix.i4(i4 3, i4 4, i32 1) ; %res = 2 (or 1) (1.5 / 2 = 0.75)
|
||||
|
||||
|
||||
Specialised Arithmetic Intrinsics
|
||||
---------------------------------
|
||||
|
||||
|
|
|
@ -285,6 +285,12 @@ namespace ISD {
|
|||
/// bits of the first 2 operands.
|
||||
SMULFIXSAT, UMULFIXSAT,
|
||||
|
||||
/// RESULT = [US]DIVFIX(LHS, RHS, SCALE) - Perform fixed point division on
|
||||
/// 2 integers with the same width and scale. SCALE represents the scale
|
||||
/// of both operands as fixed point numbers. This SCALE parameter must be a
|
||||
/// constant integer.
|
||||
SDIVFIX, UDIVFIX,
|
||||
|
||||
/// Simple binary floating point operators.
|
||||
FADD, FSUB, FMUL, FDIV, FREM,
|
||||
|
||||
|
|
|
@ -935,6 +935,8 @@ public:
|
|||
case ISD::SMULFIXSAT:
|
||||
case ISD::UMULFIX:
|
||||
case ISD::UMULFIXSAT:
|
||||
case ISD::SDIVFIX:
|
||||
case ISD::UDIVFIX:
|
||||
Supported = isSupportedFixedPointOperation(Op, VT, Scale);
|
||||
break;
|
||||
}
|
||||
|
@ -4184,6 +4186,14 @@ public:
|
|||
/// method accepts integers as its arguments.
|
||||
SDValue expandFixedPointMul(SDNode *Node, SelectionDAG &DAG) const;
|
||||
|
||||
/// Method for building the DAG expansion of ISD::[US]DIVFIX. This
|
||||
/// method accepts integers as its arguments.
|
||||
/// Note: This method may fail if the division could not be performed
|
||||
/// within the type. Clients must retry with a wider type if this happens.
|
||||
SDValue expandFixedPointDiv(unsigned Opcode, const SDLoc &dl,
|
||||
SDValue LHS, SDValue RHS,
|
||||
unsigned Scale, SelectionDAG &DAG) const;
|
||||
|
||||
/// Method for building the DAG expansion of ISD::U(ADD|SUB)O. Expansion
|
||||
/// always suceeds and populates the Result and Overflow arguments.
|
||||
void expandUADDSUBO(SDNode *Node, SDValue &Result, SDValue &Overflow,
|
||||
|
|
|
@ -930,6 +930,14 @@ def int_umul_fix : Intrinsic<[llvm_anyint_ty],
|
|||
[LLVMMatchType<0>, LLVMMatchType<0>, llvm_i32_ty],
|
||||
[IntrNoMem, IntrSpeculatable, IntrWillReturn, Commutative, ImmArg<2>]>;
|
||||
|
||||
def int_sdiv_fix : Intrinsic<[llvm_anyint_ty],
|
||||
[LLVMMatchType<0>, LLVMMatchType<0>, llvm_i32_ty],
|
||||
[IntrNoMem, ImmArg<2>]>;
|
||||
|
||||
def int_udiv_fix : Intrinsic<[llvm_anyint_ty],
|
||||
[LLVMMatchType<0>, LLVMMatchType<0>, llvm_i32_ty],
|
||||
[IntrNoMem, ImmArg<2>]>;
|
||||
|
||||
//===------------------- Fixed Point Saturation Arithmetic Intrinsics ----------------===//
|
||||
//
|
||||
def int_smul_fix_sat : Intrinsic<[llvm_anyint_ty],
|
||||
|
|
|
@ -124,7 +124,7 @@ def SDTIntSatNoShOp : SDTypeProfile<1, 2, [ // ssat with no shift
|
|||
def SDTIntBinHiLoOp : SDTypeProfile<2, 2, [ // mulhi, mullo, sdivrem, udivrem
|
||||
SDTCisSameAs<0, 1>, SDTCisSameAs<0, 2>, SDTCisSameAs<0, 3>,SDTCisInt<0>
|
||||
]>;
|
||||
def SDTIntScaledBinOp : SDTypeProfile<1, 3, [ // smulfix, umulfix
|
||||
def SDTIntScaledBinOp : SDTypeProfile<1, 3, [ // smulfix, sdivfix, etc
|
||||
SDTCisSameAs<0, 1>, SDTCisSameAs<0, 2>, SDTCisInt<0>, SDTCisInt<3>
|
||||
]>;
|
||||
|
||||
|
@ -400,6 +400,8 @@ def smulfix : SDNode<"ISD::SMULFIX" , SDTIntScaledBinOp, [SDNPCommutative]>
|
|||
def smulfixsat : SDNode<"ISD::SMULFIXSAT", SDTIntScaledBinOp, [SDNPCommutative]>;
|
||||
def umulfix : SDNode<"ISD::UMULFIX" , SDTIntScaledBinOp, [SDNPCommutative]>;
|
||||
def umulfixsat : SDNode<"ISD::UMULFIXSAT", SDTIntScaledBinOp, [SDNPCommutative]>;
|
||||
def sdivfix : SDNode<"ISD::SDIVFIX" , SDTIntScaledBinOp>;
|
||||
def udivfix : SDNode<"ISD::UDIVFIX" , SDTIntScaledBinOp>;
|
||||
|
||||
def sext_inreg : SDNode<"ISD::SIGN_EXTEND_INREG", SDTExtInreg>;
|
||||
def sext_invec : SDNode<"ISD::SIGN_EXTEND_VECTOR_INREG", SDTExtInvec>;
|
||||
|
|
|
@ -1129,7 +1129,9 @@ void SelectionDAGLegalize::LegalizeOp(SDNode *Node) {
|
|||
case ISD::SMULFIX:
|
||||
case ISD::SMULFIXSAT:
|
||||
case ISD::UMULFIX:
|
||||
case ISD::UMULFIXSAT: {
|
||||
case ISD::UMULFIXSAT:
|
||||
case ISD::SDIVFIX:
|
||||
case ISD::UDIVFIX: {
|
||||
unsigned Scale = Node->getConstantOperandVal(2);
|
||||
Action = TLI.getFixedPointOperationAction(Node->getOpcode(),
|
||||
Node->getValueType(0), Scale);
|
||||
|
@ -3417,6 +3419,24 @@ bool SelectionDAGLegalize::ExpandNode(SDNode *Node) {
|
|||
case ISD::UMULFIXSAT:
|
||||
Results.push_back(TLI.expandFixedPointMul(Node, DAG));
|
||||
break;
|
||||
case ISD::SDIVFIX:
|
||||
case ISD::UDIVFIX:
|
||||
if (SDValue V = TLI.expandFixedPointDiv(Node->getOpcode(), SDLoc(Node),
|
||||
Node->getOperand(0),
|
||||
Node->getOperand(1),
|
||||
Node->getConstantOperandVal(2),
|
||||
DAG)) {
|
||||
Results.push_back(V);
|
||||
break;
|
||||
}
|
||||
// FIXME: We might want to retry here with a wider type if we fail, if that
|
||||
// type is legal.
|
||||
// FIXME: Technically, so long as we only have sdivfixes where BW+Scale is
|
||||
// <= 128 (which is the case for all of the default Embedded-C types),
|
||||
// we will only get here with types and scales that we could always expand
|
||||
// if we were allowed to generate libcalls to division functions of illegal
|
||||
// type. But we cannot do that.
|
||||
llvm_unreachable("Cannot expand DIVFIX!");
|
||||
case ISD::ADDCARRY:
|
||||
case ISD::SUBCARRY: {
|
||||
SDValue LHS = Node->getOperand(0);
|
||||
|
|
|
@ -160,6 +160,9 @@ void DAGTypeLegalizer::PromoteIntegerResult(SDNode *N, unsigned ResNo) {
|
|||
case ISD::UMULFIX:
|
||||
case ISD::UMULFIXSAT: Res = PromoteIntRes_MULFIX(N); break;
|
||||
|
||||
case ISD::SDIVFIX:
|
||||
case ISD::UDIVFIX: Res = PromoteIntRes_DIVFIX(N); break;
|
||||
|
||||
case ISD::ABS: Res = PromoteIntRes_ABS(N); break;
|
||||
|
||||
case ISD::ATOMIC_LOAD:
|
||||
|
@ -778,6 +781,71 @@ SDValue DAGTypeLegalizer::PromoteIntRes_MULFIX(SDNode *N) {
|
|||
N->getOperand(2));
|
||||
}
|
||||
|
||||
static SDValue earlyExpandDIVFIX(SDNode *N, SDValue LHS, SDValue RHS,
|
||||
unsigned Scale, const TargetLowering &TLI,
|
||||
SelectionDAG &DAG) {
|
||||
EVT VT = LHS.getValueType();
|
||||
bool Signed = N->getOpcode() == ISD::SDIVFIX;
|
||||
|
||||
SDLoc dl(N);
|
||||
// See if we can perform the division in this type without widening.
|
||||
if (SDValue V = TLI.expandFixedPointDiv(N->getOpcode(), dl, LHS, RHS, Scale,
|
||||
DAG))
|
||||
return V;
|
||||
|
||||
// If that didn't work, double the type width and try again. That must work,
|
||||
// or something is wrong.
|
||||
EVT WideVT = EVT::getIntegerVT(*DAG.getContext(),
|
||||
VT.getScalarSizeInBits() * 2);
|
||||
if (Signed) {
|
||||
LHS = DAG.getSExtOrTrunc(LHS, dl, WideVT);
|
||||
RHS = DAG.getSExtOrTrunc(RHS, dl, WideVT);
|
||||
} else {
|
||||
LHS = DAG.getZExtOrTrunc(LHS, dl, WideVT);
|
||||
RHS = DAG.getZExtOrTrunc(RHS, dl, WideVT);
|
||||
}
|
||||
|
||||
// TODO: Saturation.
|
||||
|
||||
SDValue Res = TLI.expandFixedPointDiv(N->getOpcode(), dl, LHS, RHS, Scale,
|
||||
DAG);
|
||||
assert(Res && "Expanding DIVFIX with wide type failed?");
|
||||
return DAG.getZExtOrTrunc(Res, dl, VT);
|
||||
}
|
||||
|
||||
SDValue DAGTypeLegalizer::PromoteIntRes_DIVFIX(SDNode *N) {
|
||||
SDLoc dl(N);
|
||||
SDValue Op1Promoted, Op2Promoted;
|
||||
bool Signed = N->getOpcode() == ISD::SDIVFIX;
|
||||
if (Signed) {
|
||||
Op1Promoted = SExtPromotedInteger(N->getOperand(0));
|
||||
Op2Promoted = SExtPromotedInteger(N->getOperand(1));
|
||||
} else {
|
||||
Op1Promoted = ZExtPromotedInteger(N->getOperand(0));
|
||||
Op2Promoted = ZExtPromotedInteger(N->getOperand(1));
|
||||
}
|
||||
EVT PromotedType = Op1Promoted.getValueType();
|
||||
unsigned Scale = N->getConstantOperandVal(2);
|
||||
|
||||
SDValue Res;
|
||||
// If the type is already legal and the operation is legal in that type, we
|
||||
// should not early expand.
|
||||
if (TLI.isTypeLegal(PromotedType)) {
|
||||
TargetLowering::LegalizeAction Action =
|
||||
TLI.getFixedPointOperationAction(N->getOpcode(), PromotedType, Scale);
|
||||
if (Action == TargetLowering::Legal || Action == TargetLowering::Custom)
|
||||
Res = DAG.getNode(N->getOpcode(), dl, PromotedType, Op1Promoted,
|
||||
Op2Promoted, N->getOperand(2));
|
||||
}
|
||||
|
||||
if (!Res)
|
||||
Res = earlyExpandDIVFIX(N, Op1Promoted, Op2Promoted, Scale, TLI, DAG);
|
||||
|
||||
// TODO: Saturation.
|
||||
|
||||
return Res;
|
||||
}
|
||||
|
||||
SDValue DAGTypeLegalizer::PromoteIntRes_SADDSUBO(SDNode *N, unsigned ResNo) {
|
||||
if (ResNo == 1)
|
||||
return PromoteIntRes_Overflow(N);
|
||||
|
@ -1237,7 +1305,9 @@ bool DAGTypeLegalizer::PromoteIntegerOperand(SDNode *N, unsigned OpNo) {
|
|||
case ISD::SMULFIX:
|
||||
case ISD::SMULFIXSAT:
|
||||
case ISD::UMULFIX:
|
||||
case ISD::UMULFIXSAT: Res = PromoteIntOp_MULFIX(N); break;
|
||||
case ISD::UMULFIXSAT:
|
||||
case ISD::SDIVFIX:
|
||||
case ISD::UDIVFIX: Res = PromoteIntOp_FIX(N); break;
|
||||
|
||||
case ISD::FPOWI: Res = PromoteIntOp_FPOWI(N); break;
|
||||
|
||||
|
@ -1623,7 +1693,7 @@ SDValue DAGTypeLegalizer::PromoteIntOp_ADDSUBCARRY(SDNode *N, unsigned OpNo) {
|
|||
return SDValue(DAG.UpdateNodeOperands(N, LHS, RHS, Carry), 0);
|
||||
}
|
||||
|
||||
SDValue DAGTypeLegalizer::PromoteIntOp_MULFIX(SDNode *N) {
|
||||
SDValue DAGTypeLegalizer::PromoteIntOp_FIX(SDNode *N) {
|
||||
SDValue Op2 = ZExtPromotedInteger(N->getOperand(2));
|
||||
return SDValue(
|
||||
DAG.UpdateNodeOperands(N, N->getOperand(0), N->getOperand(1), Op2), 0);
|
||||
|
@ -1837,6 +1907,9 @@ void DAGTypeLegalizer::ExpandIntegerResult(SDNode *N, unsigned ResNo) {
|
|||
case ISD::UMULFIX:
|
||||
case ISD::UMULFIXSAT: ExpandIntRes_MULFIX(N, Lo, Hi); break;
|
||||
|
||||
case ISD::SDIVFIX:
|
||||
case ISD::UDIVFIX: ExpandIntRes_DIVFIX(N, Lo, Hi); break;
|
||||
|
||||
case ISD::VECREDUCE_ADD:
|
||||
case ISD::VECREDUCE_MUL:
|
||||
case ISD::VECREDUCE_AND:
|
||||
|
@ -3151,6 +3224,13 @@ void DAGTypeLegalizer::ExpandIntRes_MULFIX(SDNode *N, SDValue &Lo,
|
|||
Lo = DAG.getSelect(dl, NVT, SatMin, NVTZero, Lo);
|
||||
}
|
||||
|
||||
void DAGTypeLegalizer::ExpandIntRes_DIVFIX(SDNode *N, SDValue &Lo,
|
||||
SDValue &Hi) {
|
||||
SDValue Res = earlyExpandDIVFIX(N, N->getOperand(0), N->getOperand(1),
|
||||
N->getConstantOperandVal(2), TLI, DAG);
|
||||
SplitInteger(Res, Lo, Hi);
|
||||
}
|
||||
|
||||
void DAGTypeLegalizer::ExpandIntRes_SADDSUBO(SDNode *Node,
|
||||
SDValue &Lo, SDValue &Hi) {
|
||||
SDValue LHS = Node->getOperand(0);
|
||||
|
|
|
@ -329,6 +329,7 @@ private:
|
|||
SDValue PromoteIntRes_XMULO(SDNode *N, unsigned ResNo);
|
||||
SDValue PromoteIntRes_ADDSUBSAT(SDNode *N);
|
||||
SDValue PromoteIntRes_MULFIX(SDNode *N);
|
||||
SDValue PromoteIntRes_DIVFIX(SDNode *N);
|
||||
SDValue PromoteIntRes_FLT_ROUNDS(SDNode *N);
|
||||
SDValue PromoteIntRes_VECREDUCE(SDNode *N);
|
||||
SDValue PromoteIntRes_ABS(SDNode *N);
|
||||
|
@ -367,7 +368,7 @@ private:
|
|||
SDValue PromoteIntOp_ADDSUBCARRY(SDNode *N, unsigned OpNo);
|
||||
SDValue PromoteIntOp_FRAMERETURNADDR(SDNode *N);
|
||||
SDValue PromoteIntOp_PREFETCH(SDNode *N, unsigned OpNo);
|
||||
SDValue PromoteIntOp_MULFIX(SDNode *N);
|
||||
SDValue PromoteIntOp_FIX(SDNode *N);
|
||||
SDValue PromoteIntOp_FPOWI(SDNode *N);
|
||||
SDValue PromoteIntOp_VECREDUCE(SDNode *N);
|
||||
|
||||
|
@ -428,6 +429,7 @@ private:
|
|||
void ExpandIntRes_XMULO (SDNode *N, SDValue &Lo, SDValue &Hi);
|
||||
void ExpandIntRes_ADDSUBSAT (SDNode *N, SDValue &Lo, SDValue &Hi);
|
||||
void ExpandIntRes_MULFIX (SDNode *N, SDValue &Lo, SDValue &Hi);
|
||||
void ExpandIntRes_DIVFIX (SDNode *N, SDValue &Lo, SDValue &Hi);
|
||||
|
||||
void ExpandIntRes_ATOMIC_LOAD (SDNode *N, SDValue &Lo, SDValue &Hi);
|
||||
void ExpandIntRes_VECREDUCE (SDNode *N, SDValue &Lo, SDValue &Hi);
|
||||
|
@ -689,7 +691,7 @@ private:
|
|||
SDValue ScalarizeVecRes_UNDEF(SDNode *N);
|
||||
SDValue ScalarizeVecRes_VECTOR_SHUFFLE(SDNode *N);
|
||||
|
||||
SDValue ScalarizeVecRes_MULFIX(SDNode *N);
|
||||
SDValue ScalarizeVecRes_FIX(SDNode *N);
|
||||
|
||||
// Vector Operand Scalarization: <1 x ty> -> ty.
|
||||
bool ScalarizeVectorOperand(SDNode *N, unsigned OpNo);
|
||||
|
@ -731,7 +733,7 @@ private:
|
|||
void SplitVecRes_OverflowOp(SDNode *N, unsigned ResNo,
|
||||
SDValue &Lo, SDValue &Hi);
|
||||
|
||||
void SplitVecRes_MULFIX(SDNode *N, SDValue &Lo, SDValue &Hi);
|
||||
void SplitVecRes_FIX(SDNode *N, SDValue &Lo, SDValue &Hi);
|
||||
|
||||
void SplitVecRes_BITCAST(SDNode *N, SDValue &Lo, SDValue &Hi);
|
||||
void SplitVecRes_BUILD_VECTOR(SDNode *N, SDValue &Lo, SDValue &Hi);
|
||||
|
|
|
@ -146,6 +146,7 @@ class VectorLegalizer {
|
|||
SDValue ExpandMULO(SDValue Op);
|
||||
SDValue ExpandAddSubSat(SDValue Op);
|
||||
SDValue ExpandFixedPointMul(SDValue Op);
|
||||
SDValue ExpandFixedPointDiv(SDValue Op);
|
||||
SDValue ExpandStrictFPOp(SDValue Op);
|
||||
|
||||
SDValue UnrollStrictFPOp(SDValue Op);
|
||||
|
@ -442,7 +443,9 @@ SDValue VectorLegalizer::LegalizeOp(SDValue Op) {
|
|||
case ISD::SMULFIX:
|
||||
case ISD::SMULFIXSAT:
|
||||
case ISD::UMULFIX:
|
||||
case ISD::UMULFIXSAT: {
|
||||
case ISD::UMULFIXSAT:
|
||||
case ISD::SDIVFIX:
|
||||
case ISD::UDIVFIX: {
|
||||
unsigned Scale = Node->getConstantOperandVal(2);
|
||||
Action = TLI.getFixedPointOperationAction(Node->getOpcode(),
|
||||
Node->getValueType(0), Scale);
|
||||
|
@ -849,6 +852,9 @@ SDValue VectorLegalizer::Expand(SDValue Op) {
|
|||
// targets? This should probably be investigated. And if we still prefer to
|
||||
// unroll an explanation could be helpful.
|
||||
return DAG.UnrollVectorOp(Op.getNode());
|
||||
case ISD::SDIVFIX:
|
||||
case ISD::UDIVFIX:
|
||||
return ExpandFixedPointDiv(Op);
|
||||
#define INSTRUCTION(NAME, NARG, ROUND_MODE, INTRINSIC, DAGN) \
|
||||
case ISD::STRICT_##DAGN:
|
||||
#include "llvm/IR/ConstrainedOps.def"
|
||||
|
@ -1392,6 +1398,14 @@ SDValue VectorLegalizer::ExpandFixedPointMul(SDValue Op) {
|
|||
return DAG.UnrollVectorOp(Op.getNode());
|
||||
}
|
||||
|
||||
SDValue VectorLegalizer::ExpandFixedPointDiv(SDValue Op) {
|
||||
SDNode *N = Op.getNode();
|
||||
if (SDValue Expanded = TLI.expandFixedPointDiv(N->getOpcode(), SDLoc(N),
|
||||
N->getOperand(0), N->getOperand(1), N->getConstantOperandVal(2), DAG))
|
||||
return Expanded;
|
||||
return DAG.UnrollVectorOp(N);
|
||||
}
|
||||
|
||||
SDValue VectorLegalizer::ExpandStrictFPOp(SDValue Op) {
|
||||
if (Op.getOpcode() == ISD::STRICT_UINT_TO_FP)
|
||||
return ExpandUINT_TO_FLOAT(Op);
|
||||
|
|
|
@ -165,7 +165,9 @@ void DAGTypeLegalizer::ScalarizeVectorResult(SDNode *N, unsigned ResNo) {
|
|||
case ISD::SMULFIXSAT:
|
||||
case ISD::UMULFIX:
|
||||
case ISD::UMULFIXSAT:
|
||||
R = ScalarizeVecRes_MULFIX(N);
|
||||
case ISD::SDIVFIX:
|
||||
case ISD::UDIVFIX:
|
||||
R = ScalarizeVecRes_FIX(N);
|
||||
break;
|
||||
}
|
||||
|
||||
|
@ -189,7 +191,7 @@ SDValue DAGTypeLegalizer::ScalarizeVecRes_TernaryOp(SDNode *N) {
|
|||
Op0.getValueType(), Op0, Op1, Op2);
|
||||
}
|
||||
|
||||
SDValue DAGTypeLegalizer::ScalarizeVecRes_MULFIX(SDNode *N) {
|
||||
SDValue DAGTypeLegalizer::ScalarizeVecRes_FIX(SDNode *N) {
|
||||
SDValue Op0 = GetScalarizedVector(N->getOperand(0));
|
||||
SDValue Op1 = GetScalarizedVector(N->getOperand(1));
|
||||
SDValue Op2 = N->getOperand(2);
|
||||
|
@ -958,7 +960,9 @@ void DAGTypeLegalizer::SplitVectorResult(SDNode *N, unsigned ResNo) {
|
|||
case ISD::SMULFIXSAT:
|
||||
case ISD::UMULFIX:
|
||||
case ISD::UMULFIXSAT:
|
||||
SplitVecRes_MULFIX(N, Lo, Hi);
|
||||
case ISD::SDIVFIX:
|
||||
case ISD::UDIVFIX:
|
||||
SplitVecRes_FIX(N, Lo, Hi);
|
||||
break;
|
||||
}
|
||||
|
||||
|
@ -997,7 +1001,7 @@ void DAGTypeLegalizer::SplitVecRes_TernaryOp(SDNode *N, SDValue &Lo,
|
|||
Op0Hi, Op1Hi, Op2Hi);
|
||||
}
|
||||
|
||||
void DAGTypeLegalizer::SplitVecRes_MULFIX(SDNode *N, SDValue &Lo, SDValue &Hi) {
|
||||
void DAGTypeLegalizer::SplitVecRes_FIX(SDNode *N, SDValue &Lo, SDValue &Hi) {
|
||||
SDValue LHSLo, LHSHi;
|
||||
GetSplitVector(N->getOperand(0), LHSLo, LHSHi);
|
||||
SDValue RHSLo, RHSHi;
|
||||
|
|
|
@ -5441,6 +5441,60 @@ static SDValue ExpandPowI(const SDLoc &DL, SDValue LHS, SDValue RHS,
|
|||
return DAG.getNode(ISD::FPOWI, DL, LHS.getValueType(), LHS, RHS);
|
||||
}
|
||||
|
||||
static SDValue expandDivFix(unsigned Opcode, const SDLoc &DL,
|
||||
SDValue LHS, SDValue RHS, SDValue Scale,
|
||||
SelectionDAG &DAG, const TargetLowering &TLI) {
|
||||
EVT VT = LHS.getValueType();
|
||||
bool Signed = Opcode == ISD::SDIVFIX;
|
||||
LLVMContext &Ctx = *DAG.getContext();
|
||||
|
||||
// If the type is legal but the operation isn't, this node might survive all
|
||||
// the way to operation legalization. If we end up there and we do not have
|
||||
// the ability to widen the type (if VT*2 is not legal), we cannot expand the
|
||||
// node.
|
||||
|
||||
// Coax the legalizer into expanding the node during type legalization instead
|
||||
// by bumping the size by one bit. This will force it to Promote, enabling the
|
||||
// early expansion and avoiding the need to expand later.
|
||||
|
||||
// We don't have to do this if Scale is 0; that can always be expanded.
|
||||
|
||||
// FIXME: We wouldn't have to do this (or any of the early
|
||||
// expansion/promotion) if it was possible to expand a libcall of an
|
||||
// illegal type during operation legalization. But it's not, so things
|
||||
// get a bit hacky.
|
||||
unsigned ScaleInt = cast<ConstantSDNode>(Scale)->getZExtValue();
|
||||
if (ScaleInt > 0 &&
|
||||
(TLI.isTypeLegal(VT) ||
|
||||
(VT.isVector() && TLI.isTypeLegal(VT.getVectorElementType())))) {
|
||||
TargetLowering::LegalizeAction Action = TLI.getFixedPointOperationAction(
|
||||
Opcode, VT, ScaleInt);
|
||||
if (Action != TargetLowering::Legal && Action != TargetLowering::Custom) {
|
||||
EVT PromVT;
|
||||
if (VT.isScalarInteger())
|
||||
PromVT = EVT::getIntegerVT(Ctx, VT.getSizeInBits() + 1);
|
||||
else if (VT.isVector()) {
|
||||
PromVT = VT.getVectorElementType();
|
||||
PromVT = EVT::getIntegerVT(Ctx, PromVT.getSizeInBits() + 1);
|
||||
PromVT = EVT::getVectorVT(Ctx, PromVT, VT.getVectorElementCount());
|
||||
} else
|
||||
llvm_unreachable("Wrong VT for DIVFIX?");
|
||||
if (Signed) {
|
||||
LHS = DAG.getSExtOrTrunc(LHS, DL, PromVT);
|
||||
RHS = DAG.getSExtOrTrunc(RHS, DL, PromVT);
|
||||
} else {
|
||||
LHS = DAG.getZExtOrTrunc(LHS, DL, PromVT);
|
||||
RHS = DAG.getZExtOrTrunc(RHS, DL, PromVT);
|
||||
}
|
||||
// TODO: Saturation.
|
||||
SDValue Res = DAG.getNode(Opcode, DL, PromVT, LHS, RHS, Scale);
|
||||
return DAG.getZExtOrTrunc(Res, DL, VT);
|
||||
}
|
||||
}
|
||||
|
||||
return DAG.getNode(Opcode, DL, VT, LHS, RHS, Scale);
|
||||
}
|
||||
|
||||
// getUnderlyingArgRegs - Find underlying registers used for a truncated,
|
||||
// bitcasted, or split argument. Returns a list of <Register, size in bits>
|
||||
static void
|
||||
|
@ -5705,6 +5759,14 @@ static unsigned FixedPointIntrinsicToOpcode(unsigned Intrinsic) {
|
|||
return ISD::SMULFIX;
|
||||
case Intrinsic::umul_fix:
|
||||
return ISD::UMULFIX;
|
||||
case Intrinsic::smul_fix_sat:
|
||||
return ISD::SMULFIXSAT;
|
||||
case Intrinsic::umul_fix_sat:
|
||||
return ISD::UMULFIXSAT;
|
||||
case Intrinsic::sdiv_fix:
|
||||
return ISD::SDIVFIX;
|
||||
case Intrinsic::udiv_fix:
|
||||
return ISD::UDIVFIX;
|
||||
default:
|
||||
llvm_unreachable("Unhandled fixed point intrinsic");
|
||||
}
|
||||
|
@ -6360,7 +6422,9 @@ void SelectionDAGBuilder::visitIntrinsicCall(const CallInst &I,
|
|||
return;
|
||||
}
|
||||
case Intrinsic::smul_fix:
|
||||
case Intrinsic::umul_fix: {
|
||||
case Intrinsic::umul_fix:
|
||||
case Intrinsic::smul_fix_sat:
|
||||
case Intrinsic::umul_fix_sat: {
|
||||
SDValue Op1 = getValue(I.getArgOperand(0));
|
||||
SDValue Op2 = getValue(I.getArgOperand(1));
|
||||
SDValue Op3 = getValue(I.getArgOperand(2));
|
||||
|
@ -6368,20 +6432,13 @@ void SelectionDAGBuilder::visitIntrinsicCall(const CallInst &I,
|
|||
Op1.getValueType(), Op1, Op2, Op3));
|
||||
return;
|
||||
}
|
||||
case Intrinsic::smul_fix_sat: {
|
||||
case Intrinsic::sdiv_fix:
|
||||
case Intrinsic::udiv_fix: {
|
||||
SDValue Op1 = getValue(I.getArgOperand(0));
|
||||
SDValue Op2 = getValue(I.getArgOperand(1));
|
||||
SDValue Op3 = getValue(I.getArgOperand(2));
|
||||
setValue(&I, DAG.getNode(ISD::SMULFIXSAT, sdl, Op1.getValueType(), Op1, Op2,
|
||||
Op3));
|
||||
return;
|
||||
}
|
||||
case Intrinsic::umul_fix_sat: {
|
||||
SDValue Op1 = getValue(I.getArgOperand(0));
|
||||
SDValue Op2 = getValue(I.getArgOperand(1));
|
||||
SDValue Op3 = getValue(I.getArgOperand(2));
|
||||
setValue(&I, DAG.getNode(ISD::UMULFIXSAT, sdl, Op1.getValueType(), Op1, Op2,
|
||||
Op3));
|
||||
setValue(&I, expandDivFix(FixedPointIntrinsicToOpcode(Intrinsic), sdl,
|
||||
Op1, Op2, Op3, DAG, TLI));
|
||||
return;
|
||||
}
|
||||
case Intrinsic::stacksave: {
|
||||
|
|
|
@ -312,6 +312,9 @@ std::string SDNode::getOperationName(const SelectionDAG *G) const {
|
|||
case ISD::UMULFIX: return "umulfix";
|
||||
case ISD::UMULFIXSAT: return "umulfixsat";
|
||||
|
||||
case ISD::SDIVFIX: return "sdivfix";
|
||||
case ISD::UDIVFIX: return "udivfix";
|
||||
|
||||
// Conversion operators.
|
||||
case ISD::SIGN_EXTEND: return "sign_extend";
|
||||
case ISD::ZERO_EXTEND: return "zero_extend";
|
||||
|
|
|
@ -7293,6 +7293,86 @@ TargetLowering::expandFixedPointMul(SDNode *Node, SelectionDAG &DAG) const {
|
|||
return Result;
|
||||
}
|
||||
|
||||
SDValue
|
||||
TargetLowering::expandFixedPointDiv(unsigned Opcode, const SDLoc &dl,
|
||||
SDValue LHS, SDValue RHS,
|
||||
unsigned Scale, SelectionDAG &DAG) const {
|
||||
assert((Opcode == ISD::SDIVFIX ||
|
||||
Opcode == ISD::UDIVFIX) &&
|
||||
"Expected a fixed point division opcode");
|
||||
|
||||
EVT VT = LHS.getValueType();
|
||||
bool Signed = Opcode == ISD::SDIVFIX;
|
||||
EVT BoolVT = getSetCCResultType(DAG.getDataLayout(), *DAG.getContext(), VT);
|
||||
|
||||
// If there is enough room in the type to upscale the LHS or downscale the
|
||||
// RHS before the division, we can perform it in this type without having to
|
||||
// resize. For signed operations, the LHS headroom is the number of
|
||||
// redundant sign bits, and for unsigned ones it is the number of zeroes.
|
||||
// The headroom for the RHS is the number of trailing zeroes.
|
||||
unsigned LHSLead = Signed ? DAG.ComputeNumSignBits(LHS) - 1
|
||||
: DAG.computeKnownBits(LHS).countMinLeadingZeros();
|
||||
unsigned RHSTrail = DAG.computeKnownBits(RHS).countMinTrailingZeros();
|
||||
|
||||
if (LHSLead + RHSTrail < Scale)
|
||||
return SDValue();
|
||||
|
||||
unsigned LHSShift = std::min(LHSLead, Scale);
|
||||
unsigned RHSShift = Scale - LHSShift;
|
||||
|
||||
// At this point, we know that if we shift the LHS up by LHSShift and the
|
||||
// RHS down by RHSShift, we can emit a regular division with a final scaling
|
||||
// factor of Scale.
|
||||
|
||||
EVT ShiftTy = getShiftAmountTy(VT, DAG.getDataLayout());
|
||||
if (LHSShift)
|
||||
LHS = DAG.getNode(ISD::SHL, dl, VT, LHS,
|
||||
DAG.getConstant(LHSShift, dl, ShiftTy));
|
||||
if (RHSShift)
|
||||
RHS = DAG.getNode(Signed ? ISD::SRA : ISD::SRL, dl, VT, RHS,
|
||||
DAG.getConstant(RHSShift, dl, ShiftTy));
|
||||
|
||||
SDValue Quot;
|
||||
if (Signed) {
|
||||
// For signed operations, if the resulting quotient is negative and the
|
||||
// remainder is nonzero, subtract 1 from the quotient to round towards
|
||||
// negative infinity.
|
||||
SDValue Rem;
|
||||
// FIXME: Ideally we would always produce an SDIVREM here, but if the
|
||||
// type isn't legal, SDIVREM cannot be expanded. There is no reason why
|
||||
// we couldn't just form a libcall, but the type legalizer doesn't do it.
|
||||
if (isTypeLegal(VT) &&
|
||||
isOperationLegalOrCustom(ISD::SDIVREM, VT)) {
|
||||
Quot = DAG.getNode(ISD::SDIVREM, dl,
|
||||
DAG.getVTList(VT, VT),
|
||||
LHS, RHS);
|
||||
Rem = Quot.getValue(1);
|
||||
Quot = Quot.getValue(0);
|
||||
} else {
|
||||
Quot = DAG.getNode(ISD::SDIV, dl, VT,
|
||||
LHS, RHS);
|
||||
Rem = DAG.getNode(ISD::SREM, dl, VT,
|
||||
LHS, RHS);
|
||||
}
|
||||
SDValue Zero = DAG.getConstant(0, dl, VT);
|
||||
SDValue RemNonZero = DAG.getSetCC(dl, BoolVT, Rem, Zero, ISD::SETNE);
|
||||
SDValue LHSNeg = DAG.getSetCC(dl, BoolVT, LHS, Zero, ISD::SETLT);
|
||||
SDValue RHSNeg = DAG.getSetCC(dl, BoolVT, RHS, Zero, ISD::SETLT);
|
||||
SDValue QuotNeg = DAG.getNode(ISD::XOR, dl, BoolVT, LHSNeg, RHSNeg);
|
||||
SDValue Sub1 = DAG.getNode(ISD::SUB, dl, VT, Quot,
|
||||
DAG.getConstant(1, dl, VT));
|
||||
Quot = DAG.getSelect(dl, VT,
|
||||
DAG.getNode(ISD::AND, dl, BoolVT, RemNonZero, QuotNeg),
|
||||
Sub1, Quot);
|
||||
} else
|
||||
Quot = DAG.getNode(ISD::UDIV, dl, VT,
|
||||
LHS, RHS);
|
||||
|
||||
// TODO: Saturation.
|
||||
|
||||
return Quot;
|
||||
}
|
||||
|
||||
void TargetLowering::expandUADDSUBO(
|
||||
SDNode *Node, SDValue &Result, SDValue &Overflow, SelectionDAG &DAG) const {
|
||||
SDLoc dl(Node);
|
||||
|
|
|
@ -663,6 +663,8 @@ void TargetLoweringBase::initActions() {
|
|||
setOperationAction(ISD::SMULFIXSAT, VT, Expand);
|
||||
setOperationAction(ISD::UMULFIX, VT, Expand);
|
||||
setOperationAction(ISD::UMULFIXSAT, VT, Expand);
|
||||
setOperationAction(ISD::SDIVFIX, VT, Expand);
|
||||
setOperationAction(ISD::UDIVFIX, VT, Expand);
|
||||
|
||||
// Overflow operations default to expand
|
||||
setOperationAction(ISD::SADDO, VT, Expand);
|
||||
|
|
|
@ -4677,28 +4677,32 @@ void Verifier::visitIntrinsicCall(Intrinsic::ID ID, CallBase &Call) {
|
|||
case Intrinsic::smul_fix:
|
||||
case Intrinsic::smul_fix_sat:
|
||||
case Intrinsic::umul_fix:
|
||||
case Intrinsic::umul_fix_sat: {
|
||||
case Intrinsic::umul_fix_sat:
|
||||
case Intrinsic::sdiv_fix:
|
||||
case Intrinsic::udiv_fix: {
|
||||
Value *Op1 = Call.getArgOperand(0);
|
||||
Value *Op2 = Call.getArgOperand(1);
|
||||
Assert(Op1->getType()->isIntOrIntVectorTy(),
|
||||
"first operand of [us]mul_fix[_sat] must be an int type or vector "
|
||||
"of ints");
|
||||
"first operand of [us][mul|div]_fix[_sat] must be an int type or "
|
||||
"vector of ints");
|
||||
Assert(Op2->getType()->isIntOrIntVectorTy(),
|
||||
"second operand of [us]mul_fix_[sat] must be an int type or vector "
|
||||
"of ints");
|
||||
"second operand of [us][mul|div]_fix[_sat] must be an int type or "
|
||||
"vector of ints");
|
||||
|
||||
auto *Op3 = cast<ConstantInt>(Call.getArgOperand(2));
|
||||
Assert(Op3->getType()->getBitWidth() <= 32,
|
||||
"third argument of [us]mul_fix[_sat] must fit within 32 bits");
|
||||
"third argument of [us][mul|div]_fix[_sat] must fit within 32 bits");
|
||||
|
||||
if (ID == Intrinsic::smul_fix || ID == Intrinsic::smul_fix_sat) {
|
||||
if (ID == Intrinsic::smul_fix || ID == Intrinsic::smul_fix_sat ||
|
||||
ID == Intrinsic::sdiv_fix) {
|
||||
Assert(
|
||||
Op3->getZExtValue() < Op1->getType()->getScalarSizeInBits(),
|
||||
"the scale of smul_fix[_sat] must be less than the width of the operands");
|
||||
"the scale of s[mul|div]_fix[_sat] must be less than the width of "
|
||||
"the operands");
|
||||
} else {
|
||||
Assert(Op3->getZExtValue() <= Op1->getType()->getScalarSizeInBits(),
|
||||
"the scale of umul_fix[_sat] must be less than or equal to the width of "
|
||||
"the operands");
|
||||
"the scale of u[mul|div]_fix[_sat] must be less than or equal "
|
||||
"to the width of the operands");
|
||||
}
|
||||
break;
|
||||
}
|
||||
|
|
|
@ -0,0 +1,713 @@
|
|||
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
|
||||
; RUN: llc < %s -mtriple=x86_64-linux | FileCheck %s --check-prefix=X64
|
||||
; RUN: llc < %s -mtriple=i686 -mattr=cmov | FileCheck %s --check-prefix=X86
|
||||
|
||||
declare i4 @llvm.sdiv.fix.i4 (i4, i4, i32)
|
||||
declare i15 @llvm.sdiv.fix.i15 (i15, i15, i32)
|
||||
declare i16 @llvm.sdiv.fix.i16 (i16, i16, i32)
|
||||
declare i18 @llvm.sdiv.fix.i18 (i18, i18, i32)
|
||||
declare i64 @llvm.sdiv.fix.i64 (i64, i64, i32)
|
||||
declare <4 x i32> @llvm.sdiv.fix.v4i32(<4 x i32>, <4 x i32>, i32)
|
||||
|
||||
define i16 @func(i16 %x, i16 %y) nounwind {
|
||||
; X64-LABEL: func:
|
||||
; X64: # %bb.0:
|
||||
; X64-NEXT: movswl %si, %esi
|
||||
; X64-NEXT: movswl %di, %ecx
|
||||
; X64-NEXT: shll $7, %ecx
|
||||
; X64-NEXT: movl %ecx, %eax
|
||||
; X64-NEXT: cltd
|
||||
; X64-NEXT: idivl %esi
|
||||
; X64-NEXT: # kill: def $eax killed $eax def $rax
|
||||
; X64-NEXT: leal -1(%rax), %edi
|
||||
; X64-NEXT: testl %esi, %esi
|
||||
; X64-NEXT: sets %sil
|
||||
; X64-NEXT: testl %ecx, %ecx
|
||||
; X64-NEXT: sets %cl
|
||||
; X64-NEXT: xorb %sil, %cl
|
||||
; X64-NEXT: testl %edx, %edx
|
||||
; X64-NEXT: setne %dl
|
||||
; X64-NEXT: testb %cl, %dl
|
||||
; X64-NEXT: cmovnel %edi, %eax
|
||||
; X64-NEXT: # kill: def $ax killed $ax killed $rax
|
||||
; X64-NEXT: retq
|
||||
;
|
||||
; X86-LABEL: func:
|
||||
; X86: # %bb.0:
|
||||
; X86-NEXT: pushl %ebx
|
||||
; X86-NEXT: pushl %edi
|
||||
; X86-NEXT: pushl %esi
|
||||
; X86-NEXT: movswl {{[0-9]+}}(%esp), %esi
|
||||
; X86-NEXT: movswl {{[0-9]+}}(%esp), %ecx
|
||||
; X86-NEXT: shll $7, %ecx
|
||||
; X86-NEXT: movl %ecx, %eax
|
||||
; X86-NEXT: cltd
|
||||
; X86-NEXT: idivl %esi
|
||||
; X86-NEXT: leal -1(%eax), %edi
|
||||
; X86-NEXT: testl %esi, %esi
|
||||
; X86-NEXT: sets %bl
|
||||
; X86-NEXT: testl %ecx, %ecx
|
||||
; X86-NEXT: sets %cl
|
||||
; X86-NEXT: xorb %bl, %cl
|
||||
; X86-NEXT: testl %edx, %edx
|
||||
; X86-NEXT: setne %dl
|
||||
; X86-NEXT: testb %cl, %dl
|
||||
; X86-NEXT: cmovnel %edi, %eax
|
||||
; X86-NEXT: # kill: def $ax killed $ax killed $eax
|
||||
; X86-NEXT: popl %esi
|
||||
; X86-NEXT: popl %edi
|
||||
; X86-NEXT: popl %ebx
|
||||
; X86-NEXT: retl
|
||||
%tmp = call i16 @llvm.sdiv.fix.i16(i16 %x, i16 %y, i32 7)
|
||||
ret i16 %tmp
|
||||
}
|
||||
|
||||
define i16 @func2(i8 %x, i8 %y) nounwind {
|
||||
; X64-LABEL: func2:
|
||||
; X64: # %bb.0:
|
||||
; X64-NEXT: movsbl %dil, %eax
|
||||
; X64-NEXT: movsbl %sil, %ecx
|
||||
; X64-NEXT: movswl %cx, %esi
|
||||
; X64-NEXT: movswl %ax, %ecx
|
||||
; X64-NEXT: shll $14, %ecx
|
||||
; X64-NEXT: movl %ecx, %eax
|
||||
; X64-NEXT: cltd
|
||||
; X64-NEXT: idivl %esi
|
||||
; X64-NEXT: # kill: def $eax killed $eax def $rax
|
||||
; X64-NEXT: leal -1(%rax), %edi
|
||||
; X64-NEXT: testl %esi, %esi
|
||||
; X64-NEXT: sets %sil
|
||||
; X64-NEXT: testl %ecx, %ecx
|
||||
; X64-NEXT: sets %cl
|
||||
; X64-NEXT: xorb %sil, %cl
|
||||
; X64-NEXT: testl %edx, %edx
|
||||
; X64-NEXT: setne %dl
|
||||
; X64-NEXT: testb %cl, %dl
|
||||
; X64-NEXT: cmovel %eax, %edi
|
||||
; X64-NEXT: addl %edi, %edi
|
||||
; X64-NEXT: movswl %di, %eax
|
||||
; X64-NEXT: shrl %eax
|
||||
; X64-NEXT: # kill: def $ax killed $ax killed $eax
|
||||
; X64-NEXT: retq
|
||||
;
|
||||
; X86-LABEL: func2:
|
||||
; X86: # %bb.0:
|
||||
; X86-NEXT: pushl %ebx
|
||||
; X86-NEXT: pushl %edi
|
||||
; X86-NEXT: pushl %esi
|
||||
; X86-NEXT: movsbl {{[0-9]+}}(%esp), %esi
|
||||
; X86-NEXT: movsbl {{[0-9]+}}(%esp), %ecx
|
||||
; X86-NEXT: shll $14, %ecx
|
||||
; X86-NEXT: movl %ecx, %eax
|
||||
; X86-NEXT: cltd
|
||||
; X86-NEXT: idivl %esi
|
||||
; X86-NEXT: leal -1(%eax), %edi
|
||||
; X86-NEXT: testl %esi, %esi
|
||||
; X86-NEXT: sets %bl
|
||||
; X86-NEXT: testl %ecx, %ecx
|
||||
; X86-NEXT: sets %cl
|
||||
; X86-NEXT: xorb %bl, %cl
|
||||
; X86-NEXT: testl %edx, %edx
|
||||
; X86-NEXT: setne %dl
|
||||
; X86-NEXT: testb %cl, %dl
|
||||
; X86-NEXT: cmovel %eax, %edi
|
||||
; X86-NEXT: addl %edi, %edi
|
||||
; X86-NEXT: movswl %di, %eax
|
||||
; X86-NEXT: shrl %eax
|
||||
; X86-NEXT: # kill: def $ax killed $ax killed $eax
|
||||
; X86-NEXT: popl %esi
|
||||
; X86-NEXT: popl %edi
|
||||
; X86-NEXT: popl %ebx
|
||||
; X86-NEXT: retl
|
||||
%x2 = sext i8 %x to i15
|
||||
%y2 = sext i8 %y to i15
|
||||
%tmp = call i15 @llvm.sdiv.fix.i15(i15 %x2, i15 %y2, i32 14)
|
||||
%tmp2 = sext i15 %tmp to i16
|
||||
ret i16 %tmp2
|
||||
}
|
||||
|
||||
define i16 @func3(i15 %x, i8 %y) nounwind {
|
||||
; X64-LABEL: func3:
|
||||
; X64: # %bb.0:
|
||||
; X64-NEXT: shll $8, %esi
|
||||
; X64-NEXT: movswl %si, %ecx
|
||||
; X64-NEXT: addl %edi, %edi
|
||||
; X64-NEXT: shrl $4, %ecx
|
||||
; X64-NEXT: movl %edi, %eax
|
||||
; X64-NEXT: cwtd
|
||||
; X64-NEXT: idivw %cx
|
||||
; X64-NEXT: # kill: def $ax killed $ax def $rax
|
||||
; X64-NEXT: leal -1(%rax), %esi
|
||||
; X64-NEXT: testw %di, %di
|
||||
; X64-NEXT: sets %dil
|
||||
; X64-NEXT: testw %cx, %cx
|
||||
; X64-NEXT: sets %cl
|
||||
; X64-NEXT: xorb %dil, %cl
|
||||
; X64-NEXT: testw %dx, %dx
|
||||
; X64-NEXT: setne %dl
|
||||
; X64-NEXT: testb %cl, %dl
|
||||
; X64-NEXT: cmovel %eax, %esi
|
||||
; X64-NEXT: addl %esi, %esi
|
||||
; X64-NEXT: movswl %si, %eax
|
||||
; X64-NEXT: shrl %eax
|
||||
; X64-NEXT: # kill: def $ax killed $ax killed $eax
|
||||
; X64-NEXT: retq
|
||||
;
|
||||
; X86-LABEL: func3:
|
||||
; X86: # %bb.0:
|
||||
; X86-NEXT: pushl %edi
|
||||
; X86-NEXT: pushl %esi
|
||||
; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx
|
||||
; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
|
||||
; X86-NEXT: shll $8, %eax
|
||||
; X86-NEXT: movswl %ax, %esi
|
||||
; X86-NEXT: addl %ecx, %ecx
|
||||
; X86-NEXT: shrl $4, %esi
|
||||
; X86-NEXT: movl %ecx, %eax
|
||||
; X86-NEXT: cwtd
|
||||
; X86-NEXT: idivw %si
|
||||
; X86-NEXT: # kill: def $ax killed $ax def $eax
|
||||
; X86-NEXT: leal -1(%eax), %edi
|
||||
; X86-NEXT: testw %cx, %cx
|
||||
; X86-NEXT: sets %cl
|
||||
; X86-NEXT: testw %si, %si
|
||||
; X86-NEXT: sets %ch
|
||||
; X86-NEXT: xorb %cl, %ch
|
||||
; X86-NEXT: testw %dx, %dx
|
||||
; X86-NEXT: setne %cl
|
||||
; X86-NEXT: testb %ch, %cl
|
||||
; X86-NEXT: cmovel %eax, %edi
|
||||
; X86-NEXT: addl %edi, %edi
|
||||
; X86-NEXT: movswl %di, %eax
|
||||
; X86-NEXT: shrl %eax
|
||||
; X86-NEXT: # kill: def $ax killed $ax killed $eax
|
||||
; X86-NEXT: popl %esi
|
||||
; X86-NEXT: popl %edi
|
||||
; X86-NEXT: retl
|
||||
%y2 = sext i8 %y to i15
|
||||
%y3 = shl i15 %y2, 7
|
||||
%tmp = call i15 @llvm.sdiv.fix.i15(i15 %x, i15 %y3, i32 4)
|
||||
%tmp2 = sext i15 %tmp to i16
|
||||
ret i16 %tmp2
|
||||
}
|
||||
|
||||
define i4 @func4(i4 %x, i4 %y) nounwind {
|
||||
; X64-LABEL: func4:
|
||||
; X64: # %bb.0:
|
||||
; X64-NEXT: pushq %rbx
|
||||
; X64-NEXT: shlb $4, %sil
|
||||
; X64-NEXT: sarb $4, %sil
|
||||
; X64-NEXT: shlb $4, %dil
|
||||
; X64-NEXT: sarb $4, %dil
|
||||
; X64-NEXT: shlb $2, %dil
|
||||
; X64-NEXT: movsbl %dil, %ecx
|
||||
; X64-NEXT: movl %ecx, %eax
|
||||
; X64-NEXT: idivb %sil
|
||||
; X64-NEXT: movsbl %ah, %ebx
|
||||
; X64-NEXT: movzbl %al, %edi
|
||||
; X64-NEXT: leal -1(%rdi), %eax
|
||||
; X64-NEXT: movzbl %al, %eax
|
||||
; X64-NEXT: testb %sil, %sil
|
||||
; X64-NEXT: sets %dl
|
||||
; X64-NEXT: testb %cl, %cl
|
||||
; X64-NEXT: sets %cl
|
||||
; X64-NEXT: xorb %dl, %cl
|
||||
; X64-NEXT: testb %bl, %bl
|
||||
; X64-NEXT: setne %dl
|
||||
; X64-NEXT: testb %cl, %dl
|
||||
; X64-NEXT: cmovel %edi, %eax
|
||||
; X64-NEXT: # kill: def $al killed $al killed $eax
|
||||
; X64-NEXT: popq %rbx
|
||||
; X64-NEXT: retq
|
||||
;
|
||||
; X86-LABEL: func4:
|
||||
; X86: # %bb.0:
|
||||
; X86-NEXT: pushl %esi
|
||||
; X86-NEXT: movb {{[0-9]+}}(%esp), %dl
|
||||
; X86-NEXT: shlb $4, %dl
|
||||
; X86-NEXT: sarb $4, %dl
|
||||
; X86-NEXT: movb {{[0-9]+}}(%esp), %dh
|
||||
; X86-NEXT: shlb $4, %dh
|
||||
; X86-NEXT: sarb $4, %dh
|
||||
; X86-NEXT: shlb $2, %dh
|
||||
; X86-NEXT: movsbl %dh, %eax
|
||||
; X86-NEXT: idivb %dl
|
||||
; X86-NEXT: movsbl %ah, %ecx
|
||||
; X86-NEXT: movzbl %al, %esi
|
||||
; X86-NEXT: decb %al
|
||||
; X86-NEXT: movzbl %al, %eax
|
||||
; X86-NEXT: testb %dl, %dl
|
||||
; X86-NEXT: sets %dl
|
||||
; X86-NEXT: testb %dh, %dh
|
||||
; X86-NEXT: sets %dh
|
||||
; X86-NEXT: xorb %dl, %dh
|
||||
; X86-NEXT: testb %cl, %cl
|
||||
; X86-NEXT: setne %cl
|
||||
; X86-NEXT: testb %dh, %cl
|
||||
; X86-NEXT: cmovel %esi, %eax
|
||||
; X86-NEXT: # kill: def $al killed $al killed $eax
|
||||
; X86-NEXT: popl %esi
|
||||
; X86-NEXT: retl
|
||||
%tmp = call i4 @llvm.sdiv.fix.i4(i4 %x, i4 %y, i32 2)
|
||||
ret i4 %tmp
|
||||
}
|
||||
|
||||
define i64 @func5(i64 %x, i64 %y) nounwind {
|
||||
; X64-LABEL: func5:
|
||||
; X64: # %bb.0:
|
||||
; X64-NEXT: pushq %rbp
|
||||
; X64-NEXT: pushq %r15
|
||||
; X64-NEXT: pushq %r14
|
||||
; X64-NEXT: pushq %r13
|
||||
; X64-NEXT: pushq %r12
|
||||
; X64-NEXT: pushq %rbx
|
||||
; X64-NEXT: subq $24, %rsp
|
||||
; X64-NEXT: movq %rsi, %r14
|
||||
; X64-NEXT: movq %rdi, %r15
|
||||
; X64-NEXT: movq %rdi, %rax
|
||||
; X64-NEXT: shrq $33, %rax
|
||||
; X64-NEXT: movq %rdi, %rbx
|
||||
; X64-NEXT: sarq $63, %rbx
|
||||
; X64-NEXT: shlq $31, %rbx
|
||||
; X64-NEXT: orq %rax, %rbx
|
||||
; X64-NEXT: sets {{[-0-9]+}}(%r{{[sb]}}p) # 1-byte Folded Spill
|
||||
; X64-NEXT: shlq $31, %r15
|
||||
; X64-NEXT: movq %rsi, %r12
|
||||
; X64-NEXT: sarq $63, %r12
|
||||
; X64-NEXT: movq %r15, %rdi
|
||||
; X64-NEXT: movq %rbx, %rsi
|
||||
; X64-NEXT: movq %r14, %rdx
|
||||
; X64-NEXT: movq %r12, %rcx
|
||||
; X64-NEXT: callq __divti3
|
||||
; X64-NEXT: movq %rax, %r13
|
||||
; X64-NEXT: decq %rax
|
||||
; X64-NEXT: movq %rax, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
|
||||
; X64-NEXT: testq %r12, %r12
|
||||
; X64-NEXT: sets %bpl
|
||||
; X64-NEXT: xorb {{[-0-9]+}}(%r{{[sb]}}p), %bpl # 1-byte Folded Reload
|
||||
; X64-NEXT: movq %r15, %rdi
|
||||
; X64-NEXT: movq %rbx, %rsi
|
||||
; X64-NEXT: movq %r14, %rdx
|
||||
; X64-NEXT: movq %r12, %rcx
|
||||
; X64-NEXT: callq __modti3
|
||||
; X64-NEXT: orq %rax, %rdx
|
||||
; X64-NEXT: setne %al
|
||||
; X64-NEXT: testb %bpl, %al
|
||||
; X64-NEXT: cmovneq {{[-0-9]+}}(%r{{[sb]}}p), %r13 # 8-byte Folded Reload
|
||||
; X64-NEXT: movq %r13, %rax
|
||||
; X64-NEXT: addq $24, %rsp
|
||||
; X64-NEXT: popq %rbx
|
||||
; X64-NEXT: popq %r12
|
||||
; X64-NEXT: popq %r13
|
||||
; X64-NEXT: popq %r14
|
||||
; X64-NEXT: popq %r15
|
||||
; X64-NEXT: popq %rbp
|
||||
; X64-NEXT: retq
|
||||
;
|
||||
; X86-LABEL: func5:
|
||||
; X86: # %bb.0:
|
||||
; X86-NEXT: pushl %ebp
|
||||
; X86-NEXT: movl %esp, %ebp
|
||||
; X86-NEXT: pushl %ebx
|
||||
; X86-NEXT: pushl %edi
|
||||
; X86-NEXT: pushl %esi
|
||||
; X86-NEXT: andl $-8, %esp
|
||||
; X86-NEXT: subl $72, %esp
|
||||
; X86-NEXT: movl 8(%ebp), %ecx
|
||||
; X86-NEXT: movl 12(%ebp), %edx
|
||||
; X86-NEXT: movl 20(%ebp), %ebx
|
||||
; X86-NEXT: sarl $31, %ebx
|
||||
; X86-NEXT: movl %edx, %eax
|
||||
; X86-NEXT: shldl $31, %ecx, %eax
|
||||
; X86-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
|
||||
; X86-NEXT: shll $31, %ecx
|
||||
; X86-NEXT: movl %ecx, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
|
||||
; X86-NEXT: movl %edx, %esi
|
||||
; X86-NEXT: sarl $31, %esi
|
||||
; X86-NEXT: movl %esi, %edi
|
||||
; X86-NEXT: shldl $31, %edx, %esi
|
||||
; X86-NEXT: leal {{[0-9]+}}(%esp), %edx
|
||||
; X86-NEXT: rorl %edi
|
||||
; X86-NEXT: pushl %ebx
|
||||
; X86-NEXT: pushl %ebx
|
||||
; X86-NEXT: pushl 20(%ebp)
|
||||
; X86-NEXT: pushl 16(%ebp)
|
||||
; X86-NEXT: pushl %edi
|
||||
; X86-NEXT: pushl %esi
|
||||
; X86-NEXT: pushl %eax
|
||||
; X86-NEXT: pushl %ecx
|
||||
; X86-NEXT: pushl %edx
|
||||
; X86-NEXT: calll __divti3
|
||||
; X86-NEXT: addl $32, %esp
|
||||
; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx
|
||||
; X86-NEXT: movl %ecx, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
|
||||
; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
|
||||
; X86-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
|
||||
; X86-NEXT: subl $1, %ecx
|
||||
; X86-NEXT: movl %ecx, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
|
||||
; X86-NEXT: sbbl $0, %eax
|
||||
; X86-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
|
||||
; X86-NEXT: testl %ebx, %ebx
|
||||
; X86-NEXT: sets %al
|
||||
; X86-NEXT: testl %edi, %edi
|
||||
; X86-NEXT: sets %cl
|
||||
; X86-NEXT: xorb %al, %cl
|
||||
; X86-NEXT: movb %cl, {{[-0-9]+}}(%e{{[sb]}}p) # 1-byte Spill
|
||||
; X86-NEXT: leal {{[0-9]+}}(%esp), %eax
|
||||
; X86-NEXT: pushl %ebx
|
||||
; X86-NEXT: pushl %ebx
|
||||
; X86-NEXT: pushl 20(%ebp)
|
||||
; X86-NEXT: pushl 16(%ebp)
|
||||
; X86-NEXT: pushl %edi
|
||||
; X86-NEXT: pushl %esi
|
||||
; X86-NEXT: pushl {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Folded Reload
|
||||
; X86-NEXT: pushl {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Folded Reload
|
||||
; X86-NEXT: pushl %eax
|
||||
; X86-NEXT: calll __modti3
|
||||
; X86-NEXT: addl $32, %esp
|
||||
; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
|
||||
; X86-NEXT: orl {{[0-9]+}}(%esp), %eax
|
||||
; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx
|
||||
; X86-NEXT: orl {{[0-9]+}}(%esp), %ecx
|
||||
; X86-NEXT: orl %eax, %ecx
|
||||
; X86-NEXT: setne %al
|
||||
; X86-NEXT: testb %al, {{[-0-9]+}}(%e{{[sb]}}p) # 1-byte Folded Reload
|
||||
; X86-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Reload
|
||||
; X86-NEXT: cmovel {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Folded Reload
|
||||
; X86-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %edx # 4-byte Reload
|
||||
; X86-NEXT: cmovel {{[-0-9]+}}(%e{{[sb]}}p), %edx # 4-byte Folded Reload
|
||||
; X86-NEXT: leal -12(%ebp), %esp
|
||||
; X86-NEXT: popl %esi
|
||||
; X86-NEXT: popl %edi
|
||||
; X86-NEXT: popl %ebx
|
||||
; X86-NEXT: popl %ebp
|
||||
; X86-NEXT: retl
|
||||
%tmp = call i64 @llvm.sdiv.fix.i64(i64 %x, i64 %y, i32 31)
|
||||
ret i64 %tmp
|
||||
}
|
||||
|
||||
define i18 @func6(i16 %x, i16 %y) nounwind {
|
||||
; X64-LABEL: func6:
|
||||
; X64: # %bb.0:
|
||||
; X64-NEXT: movswl %di, %ecx
|
||||
; X64-NEXT: movswl %si, %esi
|
||||
; X64-NEXT: shll $7, %ecx
|
||||
; X64-NEXT: movl %ecx, %eax
|
||||
; X64-NEXT: cltd
|
||||
; X64-NEXT: idivl %esi
|
||||
; X64-NEXT: # kill: def $eax killed $eax def $rax
|
||||
; X64-NEXT: leal -1(%rax), %edi
|
||||
; X64-NEXT: testl %esi, %esi
|
||||
; X64-NEXT: sets %sil
|
||||
; X64-NEXT: testl %ecx, %ecx
|
||||
; X64-NEXT: sets %cl
|
||||
; X64-NEXT: xorb %sil, %cl
|
||||
; X64-NEXT: testl %edx, %edx
|
||||
; X64-NEXT: setne %dl
|
||||
; X64-NEXT: testb %cl, %dl
|
||||
; X64-NEXT: cmovnel %edi, %eax
|
||||
; X64-NEXT: # kill: def $eax killed $eax killed $rax
|
||||
; X64-NEXT: retq
|
||||
;
|
||||
; X86-LABEL: func6:
|
||||
; X86: # %bb.0:
|
||||
; X86-NEXT: pushl %ebx
|
||||
; X86-NEXT: pushl %edi
|
||||
; X86-NEXT: pushl %esi
|
||||
; X86-NEXT: movswl {{[0-9]+}}(%esp), %esi
|
||||
; X86-NEXT: movswl {{[0-9]+}}(%esp), %ecx
|
||||
; X86-NEXT: shll $7, %ecx
|
||||
; X86-NEXT: movl %ecx, %eax
|
||||
; X86-NEXT: cltd
|
||||
; X86-NEXT: idivl %esi
|
||||
; X86-NEXT: leal -1(%eax), %edi
|
||||
; X86-NEXT: testl %esi, %esi
|
||||
; X86-NEXT: sets %bl
|
||||
; X86-NEXT: testl %ecx, %ecx
|
||||
; X86-NEXT: sets %cl
|
||||
; X86-NEXT: xorb %bl, %cl
|
||||
; X86-NEXT: testl %edx, %edx
|
||||
; X86-NEXT: setne %dl
|
||||
; X86-NEXT: testb %cl, %dl
|
||||
; X86-NEXT: cmovnel %edi, %eax
|
||||
; X86-NEXT: popl %esi
|
||||
; X86-NEXT: popl %edi
|
||||
; X86-NEXT: popl %ebx
|
||||
; X86-NEXT: retl
|
||||
%x2 = sext i16 %x to i18
|
||||
%y2 = sext i16 %y to i18
|
||||
%tmp = call i18 @llvm.sdiv.fix.i18(i18 %x2, i18 %y2, i32 7)
|
||||
ret i18 %tmp
|
||||
}
|
||||
|
||||
define <4 x i32> @vec(<4 x i32> %x, <4 x i32> %y) nounwind {
|
||||
; X64-LABEL: vec:
|
||||
; X64: # %bb.0:
|
||||
; X64-NEXT: pxor %xmm2, %xmm2
|
||||
; X64-NEXT: pcmpgtd %xmm1, %xmm2
|
||||
; X64-NEXT: pshufd {{.*#+}} xmm3 = xmm1[2,3,0,1]
|
||||
; X64-NEXT: movdqa %xmm1, %xmm4
|
||||
; X64-NEXT: punpckldq {{.*#+}} xmm4 = xmm4[0],xmm2[0],xmm4[1],xmm2[1]
|
||||
; X64-NEXT: movq %xmm4, %rcx
|
||||
; X64-NEXT: pxor %xmm2, %xmm2
|
||||
; X64-NEXT: pcmpgtd %xmm0, %xmm2
|
||||
; X64-NEXT: pshufd {{.*#+}} xmm1 = xmm0[2,3,0,1]
|
||||
; X64-NEXT: punpckldq {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1]
|
||||
; X64-NEXT: psllq $31, %xmm0
|
||||
; X64-NEXT: movq %xmm0, %rax
|
||||
; X64-NEXT: cqto
|
||||
; X64-NEXT: idivq %rcx
|
||||
; X64-NEXT: movq %rax, %r8
|
||||
; X64-NEXT: movq %rdx, %r11
|
||||
; X64-NEXT: pshufd {{.*#+}} xmm2 = xmm4[2,3,0,1]
|
||||
; X64-NEXT: movq %xmm2, %rcx
|
||||
; X64-NEXT: pshufd {{.*#+}} xmm2 = xmm0[2,3,0,1]
|
||||
; X64-NEXT: movq %xmm2, %rax
|
||||
; X64-NEXT: cqto
|
||||
; X64-NEXT: idivq %rcx
|
||||
; X64-NEXT: movq %rax, %r10
|
||||
; X64-NEXT: movq %rdx, %rcx
|
||||
; X64-NEXT: pxor %xmm2, %xmm2
|
||||
; X64-NEXT: pcmpgtd %xmm3, %xmm2
|
||||
; X64-NEXT: punpckldq {{.*#+}} xmm3 = xmm3[0],xmm2[0],xmm3[1],xmm2[1]
|
||||
; X64-NEXT: movq %xmm3, %rdi
|
||||
; X64-NEXT: pxor %xmm2, %xmm2
|
||||
; X64-NEXT: pcmpgtd %xmm1, %xmm2
|
||||
; X64-NEXT: punpckldq {{.*#+}} xmm1 = xmm1[0],xmm2[0],xmm1[1],xmm2[1]
|
||||
; X64-NEXT: psllq $31, %xmm1
|
||||
; X64-NEXT: movq %xmm1, %rax
|
||||
; X64-NEXT: cqto
|
||||
; X64-NEXT: idivq %rdi
|
||||
; X64-NEXT: movq %rax, %r9
|
||||
; X64-NEXT: movq %rdx, %rdi
|
||||
; X64-NEXT: pshufd {{.*#+}} xmm2 = xmm3[2,3,0,1]
|
||||
; X64-NEXT: movq %xmm2, %rsi
|
||||
; X64-NEXT: pshufd {{.*#+}} xmm2 = xmm1[2,3,0,1]
|
||||
; X64-NEXT: movq %xmm2, %rax
|
||||
; X64-NEXT: cqto
|
||||
; X64-NEXT: idivq %rsi
|
||||
; X64-NEXT: movq %r11, %xmm2
|
||||
; X64-NEXT: movq %rcx, %xmm5
|
||||
; X64-NEXT: pxor %xmm6, %xmm6
|
||||
; X64-NEXT: punpcklqdq {{.*#+}} xmm2 = xmm2[0],xmm5[0]
|
||||
; X64-NEXT: pcmpeqd %xmm6, %xmm2
|
||||
; X64-NEXT: pshufd {{.*#+}} xmm5 = xmm2[1,0,3,2]
|
||||
; X64-NEXT: pand %xmm2, %xmm5
|
||||
; X64-NEXT: pxor %xmm2, %xmm2
|
||||
; X64-NEXT: pcmpgtd %xmm4, %xmm2
|
||||
; X64-NEXT: pxor %xmm4, %xmm4
|
||||
; X64-NEXT: pcmpgtd %xmm0, %xmm4
|
||||
; X64-NEXT: movq %r8, %xmm0
|
||||
; X64-NEXT: pshufd {{.*#+}} xmm2 = xmm2[1,1,3,3]
|
||||
; X64-NEXT: pshufd {{.*#+}} xmm4 = xmm4[1,1,3,3]
|
||||
; X64-NEXT: pxor %xmm2, %xmm4
|
||||
; X64-NEXT: movq %r10, %xmm2
|
||||
; X64-NEXT: pandn %xmm4, %xmm5
|
||||
; X64-NEXT: punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm2[0]
|
||||
; X64-NEXT: movdqa %xmm5, %xmm2
|
||||
; X64-NEXT: pandn %xmm0, %xmm2
|
||||
; X64-NEXT: pcmpeqd %xmm4, %xmm4
|
||||
; X64-NEXT: paddq %xmm4, %xmm0
|
||||
; X64-NEXT: pand %xmm5, %xmm0
|
||||
; X64-NEXT: por %xmm2, %xmm0
|
||||
; X64-NEXT: movq %rdi, %xmm2
|
||||
; X64-NEXT: movq %rdx, %xmm5
|
||||
; X64-NEXT: punpcklqdq {{.*#+}} xmm2 = xmm2[0],xmm5[0]
|
||||
; X64-NEXT: pcmpeqd %xmm6, %xmm2
|
||||
; X64-NEXT: pshufd {{.*#+}} xmm5 = xmm2[1,0,3,2]
|
||||
; X64-NEXT: pand %xmm2, %xmm5
|
||||
; X64-NEXT: pxor %xmm2, %xmm2
|
||||
; X64-NEXT: pcmpgtd %xmm3, %xmm2
|
||||
; X64-NEXT: pshufd {{.*#+}} xmm2 = xmm2[1,1,3,3]
|
||||
; X64-NEXT: pcmpgtd %xmm1, %xmm6
|
||||
; X64-NEXT: pshufd {{.*#+}} xmm1 = xmm6[1,1,3,3]
|
||||
; X64-NEXT: pxor %xmm2, %xmm1
|
||||
; X64-NEXT: pandn %xmm1, %xmm5
|
||||
; X64-NEXT: movq %r9, %xmm1
|
||||
; X64-NEXT: movq %rax, %xmm2
|
||||
; X64-NEXT: punpcklqdq {{.*#+}} xmm1 = xmm1[0],xmm2[0]
|
||||
; X64-NEXT: movdqa %xmm5, %xmm2
|
||||
; X64-NEXT: pandn %xmm1, %xmm2
|
||||
; X64-NEXT: paddq %xmm4, %xmm1
|
||||
; X64-NEXT: pand %xmm5, %xmm1
|
||||
; X64-NEXT: por %xmm2, %xmm1
|
||||
; X64-NEXT: shufps {{.*#+}} xmm0 = xmm0[0,2],xmm1[0,2]
|
||||
; X64-NEXT: retq
|
||||
;
|
||||
; X86-LABEL: vec:
|
||||
; X86: # %bb.0:
|
||||
; X86-NEXT: pushl %ebp
|
||||
; X86-NEXT: pushl %ebx
|
||||
; X86-NEXT: pushl %edi
|
||||
; X86-NEXT: pushl %esi
|
||||
; X86-NEXT: subl $64, %esp
|
||||
; X86-NEXT: movl {{[0-9]+}}(%esp), %ebp
|
||||
; X86-NEXT: movl {{[0-9]+}}(%esp), %ebx
|
||||
; X86-NEXT: movl {{[0-9]+}}(%esp), %edi
|
||||
; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx
|
||||
; X86-NEXT: movl %ecx, %edx
|
||||
; X86-NEXT: sarl $31, %edx
|
||||
; X86-NEXT: movl %edi, %esi
|
||||
; X86-NEXT: shll $31, %esi
|
||||
; X86-NEXT: movl %ebx, %eax
|
||||
; X86-NEXT: shrl %eax
|
||||
; X86-NEXT: andl $-2147483648, %ebx # imm = 0x80000000
|
||||
; X86-NEXT: orl %eax, %ebx
|
||||
; X86-NEXT: sets {{[-0-9]+}}(%e{{[sb]}}p) # 1-byte Folded Spill
|
||||
; X86-NEXT: movl %ebp, %eax
|
||||
; X86-NEXT: shrl %eax
|
||||
; X86-NEXT: andl $-2147483648, %ebp # imm = 0x80000000
|
||||
; X86-NEXT: orl %eax, %ebp
|
||||
; X86-NEXT: movl %ebp, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
|
||||
; X86-NEXT: sets {{[-0-9]+}}(%e{{[sb]}}p) # 1-byte Folded Spill
|
||||
; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
|
||||
; X86-NEXT: shrl %eax
|
||||
; X86-NEXT: movl {{[0-9]+}}(%esp), %ebp
|
||||
; X86-NEXT: andl $-2147483648, %ebp # imm = 0x80000000
|
||||
; X86-NEXT: orl %eax, %ebp
|
||||
; X86-NEXT: movl %ebp, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
|
||||
; X86-NEXT: sets (%esp) # 1-byte Folded Spill
|
||||
; X86-NEXT: movl %edi, %eax
|
||||
; X86-NEXT: shrl %eax
|
||||
; X86-NEXT: andl $-2147483648, %edi # imm = 0x80000000
|
||||
; X86-NEXT: orl %eax, %edi
|
||||
; X86-NEXT: sets {{[-0-9]+}}(%e{{[sb]}}p) # 1-byte Folded Spill
|
||||
; X86-NEXT: pushl %edx
|
||||
; X86-NEXT: movl %edx, %ebp
|
||||
; X86-NEXT: movl %edx, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
|
||||
; X86-NEXT: pushl %ecx
|
||||
; X86-NEXT: pushl %edi
|
||||
; X86-NEXT: pushl %esi
|
||||
; X86-NEXT: calll __moddi3
|
||||
; X86-NEXT: addl $16, %esp
|
||||
; X86-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
|
||||
; X86-NEXT: movl %edx, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
|
||||
; X86-NEXT: pushl %ebp
|
||||
; X86-NEXT: pushl {{[0-9]+}}(%esp)
|
||||
; X86-NEXT: pushl %edi
|
||||
; X86-NEXT: pushl %esi
|
||||
; X86-NEXT: calll __divdi3
|
||||
; X86-NEXT: addl $16, %esp
|
||||
; X86-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
|
||||
; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx
|
||||
; X86-NEXT: shll $31, %ecx
|
||||
; X86-NEXT: movl {{[0-9]+}}(%esp), %edx
|
||||
; X86-NEXT: movl %edx, %eax
|
||||
; X86-NEXT: sarl $31, %eax
|
||||
; X86-NEXT: pushl %eax
|
||||
; X86-NEXT: movl %eax, %esi
|
||||
; X86-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
|
||||
; X86-NEXT: pushl %edx
|
||||
; X86-NEXT: movl %edx, %ebp
|
||||
; X86-NEXT: pushl %ebx
|
||||
; X86-NEXT: pushl %ecx
|
||||
; X86-NEXT: movl %ecx, %edi
|
||||
; X86-NEXT: calll __moddi3
|
||||
; X86-NEXT: addl $16, %esp
|
||||
; X86-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
|
||||
; X86-NEXT: movl %edx, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
|
||||
; X86-NEXT: pushl %esi
|
||||
; X86-NEXT: pushl %ebp
|
||||
; X86-NEXT: pushl %ebx
|
||||
; X86-NEXT: pushl %edi
|
||||
; X86-NEXT: calll __divdi3
|
||||
; X86-NEXT: addl $16, %esp
|
||||
; X86-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
|
||||
; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
|
||||
; X86-NEXT: shll $31, %eax
|
||||
; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx
|
||||
; X86-NEXT: movl %ecx, %edx
|
||||
; X86-NEXT: sarl $31, %edx
|
||||
; X86-NEXT: pushl %edx
|
||||
; X86-NEXT: movl %edx, %ebp
|
||||
; X86-NEXT: movl %edx, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
|
||||
; X86-NEXT: pushl %ecx
|
||||
; X86-NEXT: movl %ecx, %edi
|
||||
; X86-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %ebx # 4-byte Reload
|
||||
; X86-NEXT: pushl %ebx
|
||||
; X86-NEXT: pushl %eax
|
||||
; X86-NEXT: movl %eax, %esi
|
||||
; X86-NEXT: calll __moddi3
|
||||
; X86-NEXT: addl $16, %esp
|
||||
; X86-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
|
||||
; X86-NEXT: movl %edx, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
|
||||
; X86-NEXT: pushl %ebp
|
||||
; X86-NEXT: pushl %edi
|
||||
; X86-NEXT: pushl %ebx
|
||||
; X86-NEXT: pushl %esi
|
||||
; X86-NEXT: calll __divdi3
|
||||
; X86-NEXT: addl $16, %esp
|
||||
; X86-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
|
||||
; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
|
||||
; X86-NEXT: shll $31, %eax
|
||||
; X86-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
|
||||
; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx
|
||||
; X86-NEXT: movl %ecx, %ebp
|
||||
; X86-NEXT: sarl $31, %ebp
|
||||
; X86-NEXT: pushl %ebp
|
||||
; X86-NEXT: pushl %ecx
|
||||
; X86-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %esi # 4-byte Reload
|
||||
; X86-NEXT: pushl %esi
|
||||
; X86-NEXT: pushl %eax
|
||||
; X86-NEXT: calll __moddi3
|
||||
; X86-NEXT: addl $16, %esp
|
||||
; X86-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
|
||||
; X86-NEXT: movl %edx, %edi
|
||||
; X86-NEXT: testl %ebp, %ebp
|
||||
; X86-NEXT: sets %bl
|
||||
; X86-NEXT: xorb (%esp), %bl # 1-byte Folded Reload
|
||||
; X86-NEXT: pushl %ebp
|
||||
; X86-NEXT: pushl {{[0-9]+}}(%esp)
|
||||
; X86-NEXT: pushl %esi
|
||||
; X86-NEXT: pushl {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Folded Reload
|
||||
; X86-NEXT: calll __divdi3
|
||||
; X86-NEXT: addl $16, %esp
|
||||
; X86-NEXT: orl {{[-0-9]+}}(%e{{[sb]}}p), %edi # 4-byte Folded Reload
|
||||
; X86-NEXT: setne %cl
|
||||
; X86-NEXT: testb %bl, %cl
|
||||
; X86-NEXT: leal -1(%eax), %ecx
|
||||
; X86-NEXT: cmovel %eax, %ecx
|
||||
; X86-NEXT: cmpl $0, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Folded Reload
|
||||
; X86-NEXT: sets %al
|
||||
; X86-NEXT: xorb {{[-0-9]+}}(%e{{[sb]}}p), %al # 1-byte Folded Reload
|
||||
; X86-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %edx # 4-byte Reload
|
||||
; X86-NEXT: orl {{[-0-9]+}}(%e{{[sb]}}p), %edx # 4-byte Folded Reload
|
||||
; X86-NEXT: setne %dl
|
||||
; X86-NEXT: testb %al, %dl
|
||||
; X86-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Reload
|
||||
; X86-NEXT: leal -1(%eax), %edi
|
||||
; X86-NEXT: cmovel %eax, %edi
|
||||
; X86-NEXT: cmpl $0, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Folded Reload
|
||||
; X86-NEXT: sets %dl
|
||||
; X86-NEXT: xorb {{[-0-9]+}}(%e{{[sb]}}p), %dl # 1-byte Folded Reload
|
||||
; X86-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Reload
|
||||
; X86-NEXT: orl {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Folded Reload
|
||||
; X86-NEXT: setne %dh
|
||||
; X86-NEXT: testb %dl, %dh
|
||||
; X86-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Reload
|
||||
; X86-NEXT: leal -1(%eax), %edx
|
||||
; X86-NEXT: cmovel %eax, %edx
|
||||
; X86-NEXT: cmpl $0, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Folded Reload
|
||||
; X86-NEXT: sets %bl
|
||||
; X86-NEXT: xorb {{[-0-9]+}}(%e{{[sb]}}p), %bl # 1-byte Folded Reload
|
||||
; X86-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Reload
|
||||
; X86-NEXT: orl {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Folded Reload
|
||||
; X86-NEXT: setne %bh
|
||||
; X86-NEXT: testb %bl, %bh
|
||||
; X86-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Reload
|
||||
; X86-NEXT: leal -1(%eax), %esi
|
||||
; X86-NEXT: cmovel %eax, %esi
|
||||
; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
|
||||
; X86-NEXT: movl %esi, 12(%eax)
|
||||
; X86-NEXT: movl %edx, 8(%eax)
|
||||
; X86-NEXT: movl %edi, 4(%eax)
|
||||
; X86-NEXT: movl %ecx, (%eax)
|
||||
; X86-NEXT: addl $64, %esp
|
||||
; X86-NEXT: popl %esi
|
||||
; X86-NEXT: popl %edi
|
||||
; X86-NEXT: popl %ebx
|
||||
; X86-NEXT: popl %ebp
|
||||
; X86-NEXT: retl $4
|
||||
%tmp = call <4 x i32> @llvm.sdiv.fix.v4i32(<4 x i32> %x, <4 x i32> %y, i32 31)
|
||||
ret <4 x i32> %tmp
|
||||
}
|
|
@ -0,0 +1,344 @@
|
|||
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
|
||||
; RUN: llc < %s -mtriple=x86_64-linux | FileCheck %s --check-prefix=X64
|
||||
; RUN: llc < %s -mtriple=i686 -mattr=cmov | FileCheck %s --check-prefix=X86
|
||||
|
||||
declare i4 @llvm.udiv.fix.i4 (i4, i4, i32)
|
||||
declare i15 @llvm.udiv.fix.i15 (i15, i15, i32)
|
||||
declare i16 @llvm.udiv.fix.i16 (i16, i16, i32)
|
||||
declare i18 @llvm.udiv.fix.i18 (i18, i18, i32)
|
||||
declare i64 @llvm.udiv.fix.i64 (i64, i64, i32)
|
||||
declare <4 x i32> @llvm.udiv.fix.v4i32(<4 x i32>, <4 x i32>, i32)
|
||||
|
||||
define i16 @func(i16 %x, i16 %y) nounwind {
|
||||
; X64-LABEL: func:
|
||||
; X64: # %bb.0:
|
||||
; X64-NEXT: movzwl %si, %ecx
|
||||
; X64-NEXT: movzwl %di, %eax
|
||||
; X64-NEXT: shll $7, %eax
|
||||
; X64-NEXT: xorl %edx, %edx
|
||||
; X64-NEXT: divl %ecx
|
||||
; X64-NEXT: # kill: def $ax killed $ax killed $eax
|
||||
; X64-NEXT: retq
|
||||
;
|
||||
; X86-LABEL: func:
|
||||
; X86: # %bb.0:
|
||||
; X86-NEXT: movzwl {{[0-9]+}}(%esp), %ecx
|
||||
; X86-NEXT: movzwl {{[0-9]+}}(%esp), %eax
|
||||
; X86-NEXT: shll $7, %eax
|
||||
; X86-NEXT: xorl %edx, %edx
|
||||
; X86-NEXT: divl %ecx
|
||||
; X86-NEXT: # kill: def $ax killed $ax killed $eax
|
||||
; X86-NEXT: retl
|
||||
%tmp = call i16 @llvm.udiv.fix.i16(i16 %x, i16 %y, i32 7)
|
||||
ret i16 %tmp
|
||||
}
|
||||
|
||||
define i16 @func2(i8 %x, i8 %y) nounwind {
|
||||
; X64-LABEL: func2:
|
||||
; X64: # %bb.0:
|
||||
; X64-NEXT: movsbl %dil, %eax
|
||||
; X64-NEXT: andl $32767, %eax # imm = 0x7FFF
|
||||
; X64-NEXT: movsbl %sil, %ecx
|
||||
; X64-NEXT: andl $32767, %ecx # imm = 0x7FFF
|
||||
; X64-NEXT: shll $14, %eax
|
||||
; X64-NEXT: xorl %edx, %edx
|
||||
; X64-NEXT: divl %ecx
|
||||
; X64-NEXT: addl %eax, %eax
|
||||
; X64-NEXT: cwtl
|
||||
; X64-NEXT: shrl %eax
|
||||
; X64-NEXT: # kill: def $ax killed $ax killed $eax
|
||||
; X64-NEXT: retq
|
||||
;
|
||||
; X86-LABEL: func2:
|
||||
; X86: # %bb.0:
|
||||
; X86-NEXT: movsbl {{[0-9]+}}(%esp), %ecx
|
||||
; X86-NEXT: andl $32767, %ecx # imm = 0x7FFF
|
||||
; X86-NEXT: movsbl {{[0-9]+}}(%esp), %eax
|
||||
; X86-NEXT: andl $32767, %eax # imm = 0x7FFF
|
||||
; X86-NEXT: shll $14, %eax
|
||||
; X86-NEXT: xorl %edx, %edx
|
||||
; X86-NEXT: divl %ecx
|
||||
; X86-NEXT: addl %eax, %eax
|
||||
; X86-NEXT: cwtl
|
||||
; X86-NEXT: shrl %eax
|
||||
; X86-NEXT: # kill: def $ax killed $ax killed $eax
|
||||
; X86-NEXT: retl
|
||||
%x2 = sext i8 %x to i15
|
||||
%y2 = sext i8 %y to i15
|
||||
%tmp = call i15 @llvm.udiv.fix.i15(i15 %x2, i15 %y2, i32 14)
|
||||
%tmp2 = sext i15 %tmp to i16
|
||||
ret i16 %tmp2
|
||||
}
|
||||
|
||||
define i16 @func3(i15 %x, i8 %y) nounwind {
|
||||
; X64-LABEL: func3:
|
||||
; X64: # %bb.0:
|
||||
; X64-NEXT: # kill: def $edi killed $edi def $rdi
|
||||
; X64-NEXT: leal (%rdi,%rdi), %eax
|
||||
; X64-NEXT: movzbl %sil, %ecx
|
||||
; X64-NEXT: shll $4, %ecx
|
||||
; X64-NEXT: # kill: def $ax killed $ax killed $eax
|
||||
; X64-NEXT: xorl %edx, %edx
|
||||
; X64-NEXT: divw %cx
|
||||
; X64-NEXT: # kill: def $ax killed $ax def $eax
|
||||
; X64-NEXT: addl %eax, %eax
|
||||
; X64-NEXT: cwtl
|
||||
; X64-NEXT: shrl %eax
|
||||
; X64-NEXT: # kill: def $ax killed $ax killed $eax
|
||||
; X64-NEXT: retq
|
||||
;
|
||||
; X86-LABEL: func3:
|
||||
; X86: # %bb.0:
|
||||
; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
|
||||
; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx
|
||||
; X86-NEXT: addl %eax, %eax
|
||||
; X86-NEXT: movzbl %cl, %ecx
|
||||
; X86-NEXT: shll $4, %ecx
|
||||
; X86-NEXT: # kill: def $ax killed $ax killed $eax
|
||||
; X86-NEXT: xorl %edx, %edx
|
||||
; X86-NEXT: divw %cx
|
||||
; X86-NEXT: # kill: def $ax killed $ax def $eax
|
||||
; X86-NEXT: addl %eax, %eax
|
||||
; X86-NEXT: cwtl
|
||||
; X86-NEXT: shrl %eax
|
||||
; X86-NEXT: # kill: def $ax killed $ax killed $eax
|
||||
; X86-NEXT: retl
|
||||
%y2 = sext i8 %y to i15
|
||||
%y3 = shl i15 %y2, 7
|
||||
%tmp = call i15 @llvm.udiv.fix.i15(i15 %x, i15 %y3, i32 4)
|
||||
%tmp2 = sext i15 %tmp to i16
|
||||
ret i16 %tmp2
|
||||
}
|
||||
|
||||
define i4 @func4(i4 %x, i4 %y) nounwind {
|
||||
; X64-LABEL: func4:
|
||||
; X64: # %bb.0:
|
||||
; X64-NEXT: andb $15, %sil
|
||||
; X64-NEXT: andb $15, %dil
|
||||
; X64-NEXT: shlb $2, %dil
|
||||
; X64-NEXT: movzbl %dil, %eax
|
||||
; X64-NEXT: divb %sil
|
||||
; X64-NEXT: retq
|
||||
;
|
||||
; X86-LABEL: func4:
|
||||
; X86: # %bb.0:
|
||||
; X86-NEXT: movb {{[0-9]+}}(%esp), %cl
|
||||
; X86-NEXT: andb $15, %cl
|
||||
; X86-NEXT: movb {{[0-9]+}}(%esp), %al
|
||||
; X86-NEXT: andb $15, %al
|
||||
; X86-NEXT: shlb $2, %al
|
||||
; X86-NEXT: movzbl %al, %eax
|
||||
; X86-NEXT: divb %cl
|
||||
; X86-NEXT: retl
|
||||
%tmp = call i4 @llvm.udiv.fix.i4(i4 %x, i4 %y, i32 2)
|
||||
ret i4 %tmp
|
||||
}
|
||||
|
||||
define i64 @func5(i64 %x, i64 %y) nounwind {
|
||||
; X64-LABEL: func5:
|
||||
; X64: # %bb.0:
|
||||
; X64-NEXT: pushq %rax
|
||||
; X64-NEXT: movq %rsi, %rdx
|
||||
; X64-NEXT: movq %rdi, %rsi
|
||||
; X64-NEXT: shlq $31, %rdi
|
||||
; X64-NEXT: shrq $33, %rsi
|
||||
; X64-NEXT: xorl %ecx, %ecx
|
||||
; X64-NEXT: callq __udivti3
|
||||
; X64-NEXT: popq %rcx
|
||||
; X64-NEXT: retq
|
||||
;
|
||||
; X86-LABEL: func5:
|
||||
; X86: # %bb.0:
|
||||
; X86-NEXT: pushl %ebp
|
||||
; X86-NEXT: movl %esp, %ebp
|
||||
; X86-NEXT: pushl %esi
|
||||
; X86-NEXT: andl $-8, %esp
|
||||
; X86-NEXT: subl $24, %esp
|
||||
; X86-NEXT: movl 8(%ebp), %eax
|
||||
; X86-NEXT: movl 12(%ebp), %ecx
|
||||
; X86-NEXT: movl %ecx, %edx
|
||||
; X86-NEXT: shrl %edx
|
||||
; X86-NEXT: shldl $31, %eax, %ecx
|
||||
; X86-NEXT: shll $31, %eax
|
||||
; X86-NEXT: movl %esp, %esi
|
||||
; X86-NEXT: pushl $0
|
||||
; X86-NEXT: pushl $0
|
||||
; X86-NEXT: pushl 20(%ebp)
|
||||
; X86-NEXT: pushl 16(%ebp)
|
||||
; X86-NEXT: pushl $0
|
||||
; X86-NEXT: pushl %edx
|
||||
; X86-NEXT: pushl %ecx
|
||||
; X86-NEXT: pushl %eax
|
||||
; X86-NEXT: pushl %esi
|
||||
; X86-NEXT: calll __udivti3
|
||||
; X86-NEXT: addl $32, %esp
|
||||
; X86-NEXT: movl (%esp), %eax
|
||||
; X86-NEXT: movl {{[0-9]+}}(%esp), %edx
|
||||
; X86-NEXT: leal -4(%ebp), %esp
|
||||
; X86-NEXT: popl %esi
|
||||
; X86-NEXT: popl %ebp
|
||||
; X86-NEXT: retl
|
||||
%tmp = call i64 @llvm.udiv.fix.i64(i64 %x, i64 %y, i32 31)
|
||||
ret i64 %tmp
|
||||
}
|
||||
|
||||
define i18 @func6(i16 %x, i16 %y) nounwind {
|
||||
; X64-LABEL: func6:
|
||||
; X64: # %bb.0:
|
||||
; X64-NEXT: movswl %di, %eax
|
||||
; X64-NEXT: andl $262143, %eax # imm = 0x3FFFF
|
||||
; X64-NEXT: movswl %si, %ecx
|
||||
; X64-NEXT: andl $262143, %ecx # imm = 0x3FFFF
|
||||
; X64-NEXT: shll $7, %eax
|
||||
; X64-NEXT: xorl %edx, %edx
|
||||
; X64-NEXT: divl %ecx
|
||||
; X64-NEXT: retq
|
||||
;
|
||||
; X86-LABEL: func6:
|
||||
; X86: # %bb.0:
|
||||
; X86-NEXT: movswl {{[0-9]+}}(%esp), %ecx
|
||||
; X86-NEXT: andl $262143, %ecx # imm = 0x3FFFF
|
||||
; X86-NEXT: movswl {{[0-9]+}}(%esp), %eax
|
||||
; X86-NEXT: andl $262143, %eax # imm = 0x3FFFF
|
||||
; X86-NEXT: shll $7, %eax
|
||||
; X86-NEXT: xorl %edx, %edx
|
||||
; X86-NEXT: divl %ecx
|
||||
; X86-NEXT: retl
|
||||
%x2 = sext i16 %x to i18
|
||||
%y2 = sext i16 %y to i18
|
||||
%tmp = call i18 @llvm.udiv.fix.i18(i18 %x2, i18 %y2, i32 7)
|
||||
ret i18 %tmp
|
||||
}
|
||||
|
||||
define i16 @func7(i16 %x, i16 %y) nounwind {
|
||||
; X64-LABEL: func7:
|
||||
; X64: # %bb.0:
|
||||
; X64-NEXT: movl %edi, %eax
|
||||
; X64-NEXT: shll $16, %eax
|
||||
; X64-NEXT: movzwl %si, %ecx
|
||||
; X64-NEXT: xorl %edx, %edx
|
||||
; X64-NEXT: divl %ecx
|
||||
; X64-NEXT: # kill: def $ax killed $ax killed $eax
|
||||
; X64-NEXT: retq
|
||||
;
|
||||
; X86-LABEL: func7:
|
||||
; X86: # %bb.0:
|
||||
; X86-NEXT: movzwl {{[0-9]+}}(%esp), %ecx
|
||||
; X86-NEXT: movzwl {{[0-9]+}}(%esp), %eax
|
||||
; X86-NEXT: shll $16, %eax
|
||||
; X86-NEXT: xorl %edx, %edx
|
||||
; X86-NEXT: divl %ecx
|
||||
; X86-NEXT: # kill: def $ax killed $ax killed $eax
|
||||
; X86-NEXT: retl
|
||||
%tmp = call i16 @llvm.udiv.fix.i16(i16 %x, i16 %y, i32 16)
|
||||
ret i16 %tmp
|
||||
}
|
||||
|
||||
define <4 x i32> @vec(<4 x i32> %x, <4 x i32> %y) nounwind {
|
||||
; X64-LABEL: vec:
|
||||
; X64: # %bb.0:
|
||||
; X64-NEXT: pxor %xmm2, %xmm2
|
||||
; X64-NEXT: movdqa %xmm1, %xmm4
|
||||
; X64-NEXT: punpckhdq {{.*#+}} xmm4 = xmm4[2],xmm2[2],xmm4[3],xmm2[3]
|
||||
; X64-NEXT: movq %xmm4, %rcx
|
||||
; X64-NEXT: movdqa %xmm0, %xmm5
|
||||
; X64-NEXT: punpckhdq {{.*#+}} xmm5 = xmm5[2],xmm2[2],xmm5[3],xmm2[3]
|
||||
; X64-NEXT: psllq $31, %xmm5
|
||||
; X64-NEXT: movq %xmm5, %rax
|
||||
; X64-NEXT: xorl %edx, %edx
|
||||
; X64-NEXT: divq %rcx
|
||||
; X64-NEXT: movq %rax, %xmm3
|
||||
; X64-NEXT: pshufd {{.*#+}} xmm4 = xmm4[2,3,0,1]
|
||||
; X64-NEXT: movq %xmm4, %rcx
|
||||
; X64-NEXT: pshufd {{.*#+}} xmm4 = xmm5[2,3,0,1]
|
||||
; X64-NEXT: movq %xmm4, %rax
|
||||
; X64-NEXT: xorl %edx, %edx
|
||||
; X64-NEXT: divq %rcx
|
||||
; X64-NEXT: movq %rax, %xmm4
|
||||
; X64-NEXT: punpcklqdq {{.*#+}} xmm3 = xmm3[0],xmm4[0]
|
||||
; X64-NEXT: punpckldq {{.*#+}} xmm1 = xmm1[0],xmm2[0],xmm1[1],xmm2[1]
|
||||
; X64-NEXT: movq %xmm1, %rcx
|
||||
; X64-NEXT: punpckldq {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1]
|
||||
; X64-NEXT: psllq $31, %xmm0
|
||||
; X64-NEXT: movq %xmm0, %rax
|
||||
; X64-NEXT: xorl %edx, %edx
|
||||
; X64-NEXT: divq %rcx
|
||||
; X64-NEXT: movq %rax, %xmm2
|
||||
; X64-NEXT: pshufd {{.*#+}} xmm1 = xmm1[2,3,0,1]
|
||||
; X64-NEXT: movq %xmm1, %rcx
|
||||
; X64-NEXT: pshufd {{.*#+}} xmm0 = xmm0[2,3,0,1]
|
||||
; X64-NEXT: movq %xmm0, %rax
|
||||
; X64-NEXT: xorl %edx, %edx
|
||||
; X64-NEXT: divq %rcx
|
||||
; X64-NEXT: movq %rax, %xmm0
|
||||
; X64-NEXT: punpcklqdq {{.*#+}} xmm2 = xmm2[0],xmm0[0]
|
||||
; X64-NEXT: shufps {{.*#+}} xmm2 = xmm2[0,2],xmm3[0,2]
|
||||
; X64-NEXT: movaps %xmm2, %xmm0
|
||||
; X64-NEXT: retq
|
||||
;
|
||||
; X86-LABEL: vec:
|
||||
; X86: # %bb.0:
|
||||
; X86-NEXT: pushl %ebp
|
||||
; X86-NEXT: pushl %ebx
|
||||
; X86-NEXT: pushl %edi
|
||||
; X86-NEXT: pushl %esi
|
||||
; X86-NEXT: pushl %eax
|
||||
; X86-NEXT: movl {{[0-9]+}}(%esp), %esi
|
||||
; X86-NEXT: movl {{[0-9]+}}(%esp), %edi
|
||||
; X86-NEXT: movl {{[0-9]+}}(%esp), %ebp
|
||||
; X86-NEXT: movl {{[0-9]+}}(%esp), %ebx
|
||||
; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
|
||||
; X86-NEXT: movl %eax, %ecx
|
||||
; X86-NEXT: shrl %ecx
|
||||
; X86-NEXT: shll $31, %eax
|
||||
; X86-NEXT: pushl $0
|
||||
; X86-NEXT: pushl {{[0-9]+}}(%esp)
|
||||
; X86-NEXT: pushl %ecx
|
||||
; X86-NEXT: pushl %eax
|
||||
; X86-NEXT: calll __udivdi3
|
||||
; X86-NEXT: addl $16, %esp
|
||||
; X86-NEXT: movl %eax, (%esp) # 4-byte Spill
|
||||
; X86-NEXT: movl %ebx, %eax
|
||||
; X86-NEXT: shrl %eax
|
||||
; X86-NEXT: shll $31, %ebx
|
||||
; X86-NEXT: pushl $0
|
||||
; X86-NEXT: pushl {{[0-9]+}}(%esp)
|
||||
; X86-NEXT: pushl %eax
|
||||
; X86-NEXT: pushl %ebx
|
||||
; X86-NEXT: calll __udivdi3
|
||||
; X86-NEXT: addl $16, %esp
|
||||
; X86-NEXT: movl %eax, %ebx
|
||||
; X86-NEXT: movl %ebp, %eax
|
||||
; X86-NEXT: shrl %eax
|
||||
; X86-NEXT: shll $31, %ebp
|
||||
; X86-NEXT: pushl $0
|
||||
; X86-NEXT: pushl {{[0-9]+}}(%esp)
|
||||
; X86-NEXT: pushl %eax
|
||||
; X86-NEXT: pushl %ebp
|
||||
; X86-NEXT: calll __udivdi3
|
||||
; X86-NEXT: addl $16, %esp
|
||||
; X86-NEXT: movl %eax, %ebp
|
||||
; X86-NEXT: movl %edi, %eax
|
||||
; X86-NEXT: shrl %eax
|
||||
; X86-NEXT: shll $31, %edi
|
||||
; X86-NEXT: pushl $0
|
||||
; X86-NEXT: pushl {{[0-9]+}}(%esp)
|
||||
; X86-NEXT: pushl %eax
|
||||
; X86-NEXT: pushl %edi
|
||||
; X86-NEXT: calll __udivdi3
|
||||
; X86-NEXT: addl $16, %esp
|
||||
; X86-NEXT: movl %eax, 12(%esi)
|
||||
; X86-NEXT: movl %ebp, 8(%esi)
|
||||
; X86-NEXT: movl %ebx, 4(%esi)
|
||||
; X86-NEXT: movl (%esp), %eax # 4-byte Reload
|
||||
; X86-NEXT: movl %eax, (%esi)
|
||||
; X86-NEXT: movl %esi, %eax
|
||||
; X86-NEXT: addl $4, %esp
|
||||
; X86-NEXT: popl %esi
|
||||
; X86-NEXT: popl %edi
|
||||
; X86-NEXT: popl %ebx
|
||||
; X86-NEXT: popl %ebp
|
||||
; X86-NEXT: retl $4
|
||||
%tmp = call <4 x i32> @llvm.udiv.fix.v4i32(<4 x i32> %x, <4 x i32> %y, i32 31)
|
||||
ret <4 x i32> %tmp
|
||||
}
|
Loading…
Reference in New Issue