Commit Graph

21 Commits

Author SHA1 Message Date
Sanjay Patel e68f71574f InstCombine: fsub nsz 0, X ==> fsub nsz -0.0, X
Some day the backend may handle instruction-level fast math flags and make
this transform unnecessary, but it's still better practice to use the canonical
representation of fneg when possible (use a -0.0).

This is a partial fix for PR20870 ( http://llvm.org/bugs/show_bug.cgi?id=20870 ).
See also http://reviews.llvm.org/D6723.

Differential Revision: http://reviews.llvm.org/D6731

llvm-svn: 225050
2014-12-31 22:14:05 +00:00
Sanjay Patel ea3c802887 use -0.0 when creating an fneg instruction
Backends recognize (-0.0 - X) as the canonical form for fneg
and produce better code. Eg, ppc64 with 0.0:

   lis r2, ha16(LCPI0_0)
   lfs f0, lo16(LCPI0_0)(r2)
   fsubs f1, f0, f1
   blr

vs. -0.0:

   fneg f1, f1
   blr

Differential Revision: http://reviews.llvm.org/D6723

llvm-svn: 224583
2014-12-19 16:44:08 +00:00
Sanjay Patel c699a6117b fold: sqrt(x * x * y) -> fabs(x) * sqrt(y)
If a square root call has an FP multiplication argument that can be reassociated,
then we can hoist a repeated factor out of the square root call and into a fabs().

In the simplest case, this:

   y = sqrt(x * x);

becomes this:

   y = fabs(x);

This patch relies on an earlier optimization in instcombine or reassociate to put the
multiplication tree into a canonical form, so we don't have to search over
every permutation of the multiplication tree.

Because there are no IR-level FastMathFlags for intrinsics (PR21290), we have to
use function-level attributes to do this optimization. This needs to be fixed
for both the intrinsics and in the backend.

Differential Revision: http://reviews.llvm.org/D5787

llvm-svn: 219944
2014-10-16 18:48:17 +00:00
Benjamin Kramer 76b15d04ff InstCombine: Refactor fmul/fdiv combines to handle vectors.
llvm-svn: 199598
2014-01-19 13:36:27 +00:00
Owen Anderson 48b842ef7c Fix more instances of dropped fast math flags when optimizing FADD instructions. All found by inspection (aka grep).
llvm-svn: 199528
2014-01-18 00:48:14 +00:00
Owen Anderson e7321660c1 Fix two cases where we could lose fast math flags when optimizing FADD expressions.
llvm-svn: 199427
2014-01-16 21:26:02 +00:00
Shuxin Yang 3a7ca6ec87 [Fast-math] Disable "(C1/X)*C2 => (C1*C2)/X" if C1/X has multiple uses.
If "C1/X" were having multiple uses, the only benefit of this
transformation is to potentially shorten critical path. But it is at the
cost of instroducing additional div.

  The additional div may or may not incur cost depending on how div is
implemented. If it is implemented using Newton–Raphson iteration, it dosen't
seem to incur any cost (FIXME). However, if the div blocks the entire
pipeline, that sounds to be pretty expensive. Let CodeGen to take care 
this transformation.

  This patch sees 6% on a benchmark.

rdar://15032743

llvm-svn: 191037
2013-09-19 21:13:46 +00:00
Stephen Lin c1c7a1309c Update Transforms tests to use CHECK-LABEL for easier debugging. No functionality change.
This update was done with the following bash script:

  find test/Transforms -name "*.ll" | \
  while read NAME; do
    echo "$NAME"
    if ! grep -q "^; *RUN: *llc" $NAME; then
      TEMP=`mktemp -t temp`
      cp $NAME $TEMP
      sed -n "s/^define [^@]*@\([A-Za-z0-9_]*\)(.*$/\1/p" < $NAME | \
      while read FUNC; do
        sed -i '' "s/;\(.*\)\([A-Za-z0-9_]*\):\( *\)@$FUNC\([( ]*\)\$/;\1\2-LABEL:\3@$FUNC(/g" $TEMP
      done
      mv $TEMP $NAME
    fi
  done

llvm-svn: 186268
2013-07-14 01:42:54 +00:00
Shuxin Yang 389ed4b8f7 Fix a bug in fast-math fadd/fsub simplification.
The problem is that the code mistakenly took for granted that following constructor 
is able to create an APFloat from a *SIGNED* integer:
   
  APFloat::APFloat(const fltSemantics &ourSemantics, integerPart value)

rdar://13486998

llvm-svn: 177906
2013-03-25 20:43:41 +00:00
Shuxin Yang 2eca602f8b Perform factorization as a last resort of unsafe fadd/fsub simplification.
Rules include:
  1)1 x*y +/- x*z => x*(y +/- z) 
    (the order of operands dosen't matter)

  2) y/x +/- z/x => (y +/- z)/x 

 The transformation is disabled if the new add/sub expr "y +/- z" is a 
denormal/naz/inifinity.

rdar://12911472

llvm-svn: 177088
2013-03-14 18:08:26 +00:00
Quentin Colombet e684a6d4aa Fix a bug in instcombine for fmul in fast math mode.
The instcombine recognized pattern looks like:
a = b * c
d = a +/- Cst
or
a = b * c
d = Cst +/- a

When creating the new operands for fadd or fsub instruction following the related fmul, the first operand was created with the second original operand (M0 was created with C1) and the second with the first (M1 with Opnd0).

The fix consists in creating the new operands with the appropriate original operand, i.e., M0 with Opnd0 and M1 with C1.

llvm-svn: 176300
2013-02-28 21:12:40 +00:00
Michael Ilseman 1dd6f2a5ba Preserve fast-math flags after reassociation and commutation. Update test cases
llvm-svn: 174571
2013-02-07 01:40:15 +00:00
Michael Ilseman 10f2055812 whitespace
llvm-svn: 174569
2013-02-07 01:27:13 +00:00
Shuxin Yang e822745202 1. Hoist minus sign as high as possible in an attempt to reveal
some optimization opportunities (in the enclosing supper-expressions).

   rule 1. (-0.0 - X ) * Y => -0.0 - (X * Y)
     if expression "-0.0 - X" has only one reference.

   rule 2. (0.0 - X ) * Y => -0.0 - (X * Y)
     if expression "0.0 - X" has only one reference, and
        the instruction is marked "noSignedZero".

2. Eliminate negation (The compiler was already able to handle these
    opt if the 0.0s are replaced with -0.0.)

   rule 3: (0.0 - X) * (0.0 - Y) => X * Y
   rule 4: (0.0 - X) * C => X * -C
   if the expr is flagged "noSignedZero".

3. 
  Rule 5: (X*Y) * X => (X*X) * Y
   if X!=Y and the expression is flagged with "UnsafeAlgebra".

   The purpose of this transformation is two-fold:
    a) to form a power expression (of X).
    b) potentially shorten the critical path: After transformation, the
       latency of the instruction Y is amortized by the expression of X*X,
       and therefore Y is in a "less critical" position compared to what it
      was before the transformation. 

4. Remove the InstCombine code about simplifiying "X * select".
   
   The reasons are following:
    a) The "select" is somewhat architecture-dependent, therefore the
       higher level optimizers are not able to precisely predict if
       the simplification really yields any performance improvement
       or not.

    b) The "select" operator is bit complicate, and tends to obscure
       optimization opportunities. It is btter to keep it as low as
       possible in expr tree, and let CodeGen to tackle the optimization.

llvm-svn: 172551
2013-01-15 21:09:32 +00:00
Shuxin Yang 320f52a4b0 This change is to implement following rules under the condition C_A and/or C_R
---------------------------------------------------------------------------
 C_A: reassociation is allowed
 C_R: reciprocal of a constant C is appropriate, which means 
    - 1/C is exact, or 
    - reciprocal is allowed and 1/C is neither a special value nor a denormal.
 -----------------------------------------------------------------------------

 rule1:  (X/C1) / C2 => X / (C2*C1)  (if C_A)
                     => X * (1/(C2*C1))  (if C_A && C_R)
 rule 2:  X*C1 / C2 => X * (C1/C2)  if C_A
 rule 3: (X/Y)/Z = > X/(Y*Z)  (if C_A && at least one of Y and Z is symbolic value)
 rule 4: Z/(X/Y) = > (Z*Y)/X  (similar to rule3)

 rule 5: C1/(X*C2) => (C1/C2) / X (if C_A)
 rule 6: C1/(X/C2) => (C1*C2) / X (if C_A)
 rule 7: C1/(C2/X) => (C1/C2) * X (if C_A)

llvm-svn: 172488
2013-01-14 22:48:41 +00:00
Shuxin Yang f0537ab681 Consider expression "0.0 - X" as the negation of X if
- this expression is explicitly marked no-signed-zero, or
  - no-signed-zero of this expression can be derived from some context.

llvm-svn: 171922
2013-01-09 00:13:41 +00:00
Shuxin Yang df0e61e793 This change is to implement following rules:
o. X/C1 * C2 => X * (C2/C1) (if C2/C1 is neither special FP nor denormal)
  o. X/C1 * C2 -> X/(C1/C2)   (if C2/C1 is either specical FP or denormal, but C1/C2 is a normal Fp)

     Let MDC denote multiplication or dividion with one & only one operand being a constant
  o. (MDC ± C1) * C2 => (MDC * C2) ± (C1 * C2)
     (so long as the constant-folding doesn't yield any denormal or special value)

llvm-svn: 171793
2013-01-07 21:39:23 +00:00
Shuxin Yang 37a1efe1c6 rdar://12801297
InstCombine for unsafe floating-point add/sub.

llvm-svn: 170471
2012-12-18 23:10:12 +00:00
Shuxin Yang f8e9a5a061 rdar://12753946
Implement rule : "x * (select cond 1.0, 0.0) -> select cond x, 0.0"

llvm-svn: 170226
2012-12-14 18:46:06 +00:00
Shuxin Yang f265351491 fix a typo
llvm-svn: 168909
2012-11-29 18:09:37 +00:00
Shuxin Yang 01ab5d718b Instruction::isAssociative() returns true for fmul/fadd if they are tagged "unsafe" mode.
Approved by: Eli and Michael.

llvm-svn: 168848
2012-11-29 01:47:31 +00:00