Commit Graph

329 Commits

Author SHA1 Message Date
Arnold Schwaighofer f5284ff61f ARM cost model: Fix cost of fptrunc and fpext instructions
A vector fptrunc and fpext simply gets split into scalar instructions.

radar://13192358

llvm-svn: 177159
2013-03-15 15:10:47 +00:00
Arnold Schwaighofer 8070b382ec ARM cost model: Increase cost of some vector selects we do terrible on
By terrible I mean we store/load from the stack.

This matters on PAQp8 in _Z5trainPsS_ii (which is inlined into Mixer::update)
where we decide to vectorize a loop with a VF of 8 resulting in a 25%
degradation on a cortex-a8.

LV: Found an estimated cost of 2 for VF 8 For instruction:   icmp slt i32
LV: Found an estimated cost of 2 for VF 8 For instruction:   select i1, i32, i32

The bug that tracks the CodeGen part is PR14868.

radar://13403975

llvm-svn: 177105
2013-03-14 19:17:02 +00:00
Arnold Schwaighofer 90774f3c8f ARM cost model: Increase the cost for vector casts that use the stack
Increase the cost of v8/v16-i8 to v8/v16-i32 casts and truncates as the backend
currently lowers those using stack accesses.

This was responsible for a significant degradation on
MultiSource/Benchmarks/Trimaran/enc-pc1/enc-pc1
where we vectorize one loop to a vector factor of 16. After this patch we select
a vector factor of 4 which will generate reasonable code.

unsigned char cle[32];

void test(short c) {
  unsigned short compte;
  for (compte = 0; compte <= 31; compte++) {
    cle[compte] = cle[compte] ^ c;
  }
}

radar://13220512

llvm-svn: 176898
2013-03-12 21:19:22 +00:00
Arnold Schwaighofer 20ef54f4c1 X86 cost model: Adjust cost for custom lowered vector multiplies
This matters for example in following matrix multiply:

int **mmult(int rows, int cols, int **m1, int **m2, int **m3) {
  int i, j, k, val;
  for (i=0; i<rows; i++) {
    for (j=0; j<cols; j++) {
      val = 0;
      for (k=0; k<cols; k++) {
        val += m1[i][k] * m2[k][j];
      }
      m3[i][j] = val;
    }
  }
  return(m3);
}

Taken from the test-suite benchmark Shootout.

We estimate the cost of the multiply to be 2 while we generate 9 instructions
for it and end up being quite a bit slower than the scalar version (48% on my
machine).

Also, properly differentiate between avx1 and avx2. On avx-1 we still split the
vector into 2 128bits and handle the subvector muls like above with 9
instructions.
Only on avx-2 will we have a cost of 9 for v4i64.

I changed the test case in test/Transforms/LoopVectorize/X86/avx1.ll to use an
add instead of a mul because with a mul we now no longer vectorize. I did
verify that the mul would be indeed more expensive when vectorized with 3
kernels:

for (i ...)
   r += a[i] * 3;
for (i ...)
  m1[i] = m1[i] * 3; // This matches the test case in avx1.ll
and a matrix multiply.

In each case the vectorized version was considerably slower.

radar://13304919

llvm-svn: 176403
2013-03-02 04:02:52 +00:00
Benjamin Kramer f7cfac7a14 Cost model support for lowered math builtins.
We make the cost for calling libm functions extremely high as emitting the
calls is expensive and causes spills (on x86) so performance suffers. We still
vectorize important calls like ceilf and friends on SSE4.1. and fabs.

Differential Revision: http://llvm-reviews.chandlerc.com/D466

llvm-svn: 176287
2013-02-28 19:09:33 +00:00
Elena Demikhovsky 0ccdd1315b I optimized the following patterns:
sext <4 x i1> to <4 x i64>
 sext <4 x i8> to <4 x i64>
 sext <4 x i16> to <4 x i64>
 
I'm running Combine on SIGN_EXTEND_IN_REG and revert SEXT patterns:
 (sext_in_reg (v4i64 anyext (v4i32 x )), ExtraVT) -> (v4i64 sext (v4i32 sext_in_reg (v4i32 x , ExtraVT)))
 
 The sext_in_reg (v4i32 x) may be lowered to shl+sar operations.
 The "sar" does not exist on 64-bit operation, so lowering sext_in_reg (v4i64 x) has no vector solution.

I also added a cost of this operations to the AVX costs table.

llvm-svn: 175619
2013-02-20 12:42:54 +00:00
Arnold Schwaighofer 89aef93841 ARM cost model: Add vector reverse shuffle costs
A reverse shuffle is lowered to a vrev and possibly a vext instruction (quad
word).

radar://13171406

llvm-svn: 174933
2013-02-12 02:40:39 +00:00
Bill Schmidt 62fe7a5b17 Refine fix to bug 15041.
Thanks to help from Nadav and Hal, I have a more reasonable (and even
correct!) approach.  This specifically penalizes the insertelement
and extractelement operations for the performance hit that will occur
on PowerPC processors.

llvm-svn: 174725
2013-02-08 18:19:17 +00:00
Arnold Schwaighofer 594fa2dc2b ARM cost model: Address computation in vector mem ops not free
Adds a function to target transform info to query for the cost of address
computation. The cost model analysis pass now also queries this interface.
The code in LoopVectorize adds the cost of address computation as part of the
memory instruction cost calculation. Only there, we know whether the instruction
will be scalarized or not.
Increase the penality for inserting in to D registers on swift. This becomes
necessary because we now always assume that address computation has a cost and
three is a closer value to the architecture.

radar://13097204

llvm-svn: 174713
2013-02-08 14:50:48 +00:00
Arnold Schwaighofer 213fced704 ARM cost model: Add costs for vector selects
Vector selects are cheap on NEON. They get lowered to a vbsl instruction.

radar://13158753

llvm-svn: 174631
2013-02-07 16:10:15 +00:00
Arnold Schwaighofer a804bbee9b ARM cost model: Cost for scalar integer casts and floating point conversions
Also adds some costs for vector integer float conversions.

llvm-svn: 174371
2013-02-05 14:05:55 +00:00
Arnold Schwaighofer 98f1012f9b ARM cost model: Penalize insertelement into D subregisters
Swift has a renaming dependency if we load into D subregisters. We don't have a
way of distinguishing between insertelement operations of values from loads and
other values. Therefore, we are pessimistic for now (The performance problem
showed up in example 14 of gcc-loops).

radar://13096933

llvm-svn: 174300
2013-02-04 02:52:05 +00:00
Hal Finkel 4e5ca9e578 Initial implementation of PPCTargetTransformInfo
This provides a place to add customized operation cost information and
control some other target-specific IR-level transformations.

The only non-trivial logic in this checkin assigns a higher cost to
unaligned loads and stores (covered by the included test case).

llvm-svn: 173520
2013-01-25 23:05:59 +00:00
Nadav Rotem b1615b1ac4 Make opt grab the triple from the module and use it to initialize the target machine.
llvm-svn: 171341
2013-01-01 08:00:32 +00:00
Nadav Rotem aa92ea4f12 We are not ready to estimate the cost of integer expansions based on the number of parts. This test is too noisy.
llvm-svn: 170999
2012-12-23 09:11:07 +00:00
Nadav Rotem 6d4fdd6d2c Improve the X86 cost model for loads and stores.
llvm-svn: 170830
2012-12-21 01:33:59 +00:00
Jakub Staszak 338863a546 Reverse order of checking SSE level when calculating compare cost, so we check
AVX2 before AVX.

llvm-svn: 170464
2012-12-18 22:57:56 +00:00
Nadav Rotem 0a471ea66c Cost Model: change the default cost of control flow instructions (br / ret / ...) to zero.
llvm-svn: 169423
2012-12-05 21:21:26 +00:00
Nadav Rotem f036ca466e CostModel: add another known vector trunc optimization.
llvm-svn: 167488
2012-11-06 21:17:17 +00:00
Nadav Rotem 0914f0b262 Cost Model: add tables for some avx type-conversion hacks.
llvm-svn: 167480
2012-11-06 19:33:53 +00:00
Nadav Rotem c378a8067d CostModel: Add tables for the common x86 compares.
llvm-svn: 167421
2012-11-05 23:48:20 +00:00
Nadav Rotem ae79765676 Code Model: Improve the accuracy of the zext/sext/trunc vector cost estimation.
llvm-svn: 167412
2012-11-05 22:20:53 +00:00
Nadav Rotem 856ffa6677 Cost Model: Normalize the insert/extract index when splitting types
llvm-svn: 167402
2012-11-05 21:12:13 +00:00
Nadav Rotem 020be9dc29 Cost Model: teach the cost model about expanding integers.
llvm-svn: 167401
2012-11-05 21:11:10 +00:00
Nadav Rotem 7411623fd8 Implement the cost of abnormal x86 instruction lowering as a table.
llvm-svn: 167395
2012-11-05 19:32:46 +00:00
Nadav Rotem c2345cbe73 X86 CostModel: Add support for a some of the common arithmetic instructions for SSE4, AVX and AVX2.
llvm-svn: 167347
2012-11-03 00:39:56 +00:00
Nadav Rotem 23848f8f1d Add a stub for the x86 cost model impl. Implement a basic cost rule for inserting/extracting from XMM registers.
llvm-svn: 167333
2012-11-02 23:27:16 +00:00
Nadav Rotem 13da94734c CostModel: add support for Vector Insert and Extract.
llvm-svn: 167329
2012-11-02 22:31:56 +00:00
Nadav Rotem a6b91ac307 Add a cost model analysis that allows us to estimate the cost of IR-level instructions.
llvm-svn: 167324
2012-11-02 21:48:17 +00:00