Commit Graph

3360 Commits

Author SHA1 Message Date
Nadav Rotem 5350cd314b If all of the write objects are identified then we can vectorize the loop even if the read objects are unidentified.
PR14719.

llvm-svn: 171124
2012-12-26 23:30:53 +00:00
Nadav Rotem 3f7c4f36ba LoopVectorizer: Optimize the vectorization of consecutive memory access when the iteration step is -1
llvm-svn: 171114
2012-12-26 19:08:17 +00:00
Hal Finkel 30e95a8ebb BBVectorize: Use VTTI to compute costs for intrinsics vectorization
For the time being this includes only some dummy test cases. Once the
generic implementation of the intrinsics cost function does something other
than assuming scalarization in all cases, or some target specializes the
interface, some real test cases can be added.

Also, for consistency, I changed the type of IID from unsigned to Intrinsic::ID
in a few other places.

llvm-svn: 171079
2012-12-26 01:36:57 +00:00
Hal Finkel b44f890133 LoopVectorize: Enable vectorization of the fmuladd intrinsic
llvm-svn: 171076
2012-12-25 23:21:29 +00:00
Hal Finkel 2a456112ec BBVectorize: Enable vectorization of the fmuladd intrinsic
llvm-svn: 171075
2012-12-25 22:36:08 +00:00
Nick Lewycky fb43258080 Fix typo "Makre" -> "Make".
llvm-svn: 171043
2012-12-24 19:55:47 +00:00
Nadav Rotem 5f7c12cfbd LoopVectorizer: When checking for vectorizable types, also check
the StoreInst operands.

PR14705.

llvm-svn: 171023
2012-12-24 09:14:18 +00:00
Nadav Rotem bd5d1d832a LoopVectorizer: Fix an endless loop in the code that looks for reductions.
The bug was in the code that detects PHIs in if-then-else block sequence.

PR14701.

llvm-svn: 171008
2012-12-24 01:22:06 +00:00
Nadav Rotem cf9999d9d5 CostModel: Change the default target-independent implementation for finding
the cost of arithmetic functions. We now assume that the cost of arithmetic
operations that are marked as Legal or Promote is low, but ops that are
marked as custom are higher.

llvm-svn: 171002
2012-12-23 17:31:23 +00:00
Nadav Rotem 2cade68025 Loop Vectorizer: Update the cost model of scatter/gather operations and make
them more expensive.

llvm-svn: 170995
2012-12-23 07:23:55 +00:00
Nadav Rotem e7785686a5 Fix a bug in the code that checks if we can vectorize loops while using dynamic
memory bound checks.  Before the fix we were able to vectorize this loop from
the Livermore Loops benchmark:

for ( k=1 ; k<n ; k++ )
  x[k] = x[k-1] + y[k];

llvm-svn: 170811
2012-12-21 00:07:35 +00:00
Nadav Rotem 2ababf68d7 LoopVectorize: Fix a bug in the scalarization of instructions.
Before if-conversion we could check if a value is loop invariant
if it was declared inside the basic block. Now that loops have
multiple blocks this check is incorrect.

This fixes External/SPEC/CINT95/099_go/099_go

llvm-svn: 170756
2012-12-20 20:24:40 +00:00
James Molloy 4f6fb953a7 Add a new attribute, 'noduplicate'. If a function contains a noduplicate call, the call cannot be duplicated - Jump threading, loop unrolling, loop unswitching, and loop rotation are inhibited if they would duplicate the call.
Similarly inlining of the function is inhibited, if that would duplicate the call (in particular inlining is still allowed when there is only one callsite and the function has internal linkage).

llvm-svn: 170704
2012-12-20 16:04:27 +00:00
Paul Redmond 5917f4c715 Transform (x&C)>V into (x&C)!=0 where possible
When the least bit of C is greater than V, (x&C) must be greater than V
if it is not zero, so the comparison can be simplified.

Although this was suggested in Target/X86/README.txt, it benefits any
architecture with a directly testable form of AND.

Patch by Kevin Schoedel

llvm-svn: 170576
2012-12-19 19:47:13 +00:00
Benjamin Kramer ae0bb61053 Make TargetLowering::getTypeConversion more resilient against odd illegal MVTs.
- An MVT can become an EVT when being split (e.g. v2i8 -> v1i8, the latter doesn't exist)
- Return the scalar value when an MVT is scalarized (v1i64 -> i64)

Fixes PR14639ff.

llvm-svn: 170546
2012-12-19 14:34:28 +00:00
Shuxin Yang 37a1efe1c6 rdar://12801297
InstCombine for unsafe floating-point add/sub.

llvm-svn: 170471
2012-12-18 23:10:12 +00:00
Benjamin Kramer f0e5d2f032 LoopVectorize: Emit reductions as log2(vectorsize) shuffles + vector ops instead of scalar operations.
For example on x86 with SSE4.2 a <8 x i8> add reduction becomes
	movdqa	%xmm0, %xmm1
	movhlps	%xmm1, %xmm1            ## xmm1 = xmm1[1,1]
	paddw	%xmm0, %xmm1
	pshufd	$1, %xmm1, %xmm0        ## xmm0 = xmm1[1,0,0,0]
	paddw	%xmm1, %xmm0
	phaddw	%xmm0, %xmm0
	pextrb	$0, %xmm0, %edx

instead of
	pextrb	$2, %xmm0, %esi
	pextrb	$0, %xmm0, %edx
	addb	%sil, %dl
	pextrb	$4, %xmm0, %esi
	addb	%dl, %sil
	pextrb	$6, %xmm0, %edx
	addb	%sil, %dl
	pextrb	$8, %xmm0, %esi
	addb	%dl, %sil
	pextrb	$10, %xmm0, %edi
	pextrb	$14, %xmm0, %edx
	addb	%sil, %dil
	pextrb	$12, %xmm0, %esi
	addb	%dil, %sil
	addb	%sil, %dl

llvm-svn: 170439
2012-12-18 18:40:20 +00:00
Nadav Rotem cb23342876 Rename the test so that we can add additional vectors-of-pointers tests
into the same file in the future.

llvm-svn: 170414
2012-12-18 05:50:54 +00:00
Nadav Rotem a5024fc3e1 SROA: Replace calls to getScalarSizeInBits to DataLayout's API because
getScalarSizeInBits could not handle vectors of pointers.

llvm-svn: 170412
2012-12-18 05:23:31 +00:00
Chandler Carruth e3f4119b06 Fix another SROA crasher, PR14601.
This was a silly oversight, we weren't pruning allocas which were used
by variable-length memory intrinsics from the set that could be widened
and promoted as integers. Fix that.

llvm-svn: 170353
2012-12-17 18:48:07 +00:00
Chandler Carruth 21eb4e96c2 Teach the rewriting of memcpy calls to support subvector copies.
This also cleans up a bit of the memcpy call rewriting by sinking some
irrelevant code further down and making the call-emitting code a bit
more concrete.

Previously, memcpy of a subvector would actually miscompile (!!!) the
copy into a single vector element copy. I have no idea how this ever
worked. =/ This is the memcpy half of PR14478 which we probably weren't
noticing previously because it didn't actually assert.

The rewrite relies on the newly refactored insert- and extractVector
functions to do the heavy lifting, and those are the same as used for
loads and stores which makes the test coverage a bit more meaningful
here.

llvm-svn: 170338
2012-12-17 14:51:24 +00:00
Chandler Carruth cacda256a1 Fix a secondary bug I introduced while fixing the first part of PR14478.
The first half of fixing this bug was actually in r170328, but was
entirely coincidental. It did however get me to realize the nature of
the bug, and adapt the test case to test more interesting behavior. In
turn, that uncovered the rest of the bug which I've fixed here.

This should fix two new asserts that showed up in the vectorize nightly
tester.

llvm-svn: 170333
2012-12-17 14:03:01 +00:00
Chandler Carruth ccca504f3a Fix the first part of PR14478: memset now works.
PR14478 highlights a serious problem in SROA that simply wasn't being
exercised due to a lack of vector input code mixed with C-library
function calls. Part of SROA was written carefully to handle subvector
accesses via memset and memcpy, but the rewriter never grew support for
this. Fixing it required refactoring the subvector access code in other
parts of SROA so it could be shared, and then fixing the splat formation
logic and using subvector insertion (this patch).

The PR isn't quite fixed yet, as memcpy is still broken in the same way.
I'm starting on that series of patches now.

Hopefully this will be enough to bring the bullet benchmark back to life
with the bb-vectorizer enabled, but that may require fixing memcpy as
well.

llvm-svn: 170301
2012-12-17 04:07:37 +00:00
Chandler Carruth c50394fcfa Add a corollary test for PR14572. We got this code path correct already.
llvm-svn: 170271
2012-12-15 09:31:54 +00:00
Chandler Carruth 067edd342f Relax an overly aggressive assert to fix PR14572.
The alloca width is based on the alloc size, not the type size.

llvm-svn: 170270
2012-12-15 09:26:06 +00:00
Michael Ilseman e2754dc887 Add back FoldOpIntoPhi optimizations with fix. Included test cases to help catch these errors and to test the presence of the optimization itself
llvm-svn: 170248
2012-12-14 22:08:26 +00:00
Nadav Rotem aa3e2a907e Fix a crash in ValueTracking on vectors of pointers.
llvm-svn: 170240
2012-12-14 20:43:49 +00:00
Shuxin Yang f8e9a5a061 rdar://12753946
Implement rule : "x * (select cond 1.0, 0.0) -> select cond x, 0.0"

llvm-svn: 170226
2012-12-14 18:46:06 +00:00
NAKAMURA Takumi 38d2b2442f Revert r170020, "Simplify negated bit test", for now.
This assumes (1 << n) is always not zero. Consider n is greater than word size.
Although I know it is undefined, this transforms undefined behavior hidden.

This led clang unexpected behavior with some failures. I will investigate to fix undefined shl in clang.

llvm-svn: 170128
2012-12-13 14:28:16 +00:00
Quentin Colombet c0dba2035a Take into account minimize size attribute in the inliner.
Better controls the inlining of functions when the caller function has MinSize attribute.
Basically, when the caller function has this attribute, we do not "force" the inlining
of callee functions carrying the InlineHint attribute (i.e., functions defined with
inline keyword)

llvm-svn: 170065
2012-12-13 01:05:25 +00:00
Nadav Rotem 36510f7194 Teach the cost model about the optimization in r169904: Truncation of induction variables costs the same as scalar trunc.
llvm-svn: 170051
2012-12-13 00:21:03 +00:00
Jakub Staszak a3619d31d8 unHECKify test fixed by Jacob in r159003.
llvm-svn: 170023
2012-12-12 20:58:42 +00:00
David Majnemer 5226aa94ce Simplify negated bit test
llvm-svn: 170020
2012-12-12 20:48:54 +00:00
Jakub Staszak 0a74fc8d6c unHECKify test. It was fixed by Chris in 2009.
llvm-svn: 170017
2012-12-12 20:43:00 +00:00
Jakub Staszak aee9cca331 Fix typo in test-case.
llvm-svn: 170015
2012-12-12 20:29:06 +00:00
Jakub Staszak 67bf76ebbc Fix typo.
llvm-svn: 170006
2012-12-12 19:47:04 +00:00
Nadav Rotem d0bb22bba3 LoopVectorizer: Use the "optsize" attribute to decide if we are allowed to increase the function size.
llvm-svn: 170004
2012-12-12 19:29:45 +00:00
Shuxin Yang 81b3678564 - Fix a problematic way in creating all-the-1 APInt.
- Propagate "exact" bit of [l|a]shr instruction.

llvm-svn: 169942
2012-12-12 00:29:03 +00:00
Michael Ilseman bb6f691b01 Added a slew of SimplifyInstruction floating-point optimizations, many of which take advantage of fast-math flags. Test cases included.
fsub X, +0 ==> X
  fsub X, -0 ==> X, when we know X is not -0
  fsub +/-0.0, (fsub -0.0, X) ==> X
  fsub nsz +/-0.0, (fsub +/-0.0, X) ==> X
  fsub nnan ninf X, X ==> 0.0
  fadd nsz X, 0 ==> X
  fadd [nnan ninf] X, (fsub [nnan ninf] 0, X) ==> 0
    where nnan and ninf have to occur at least once somewhere in this expression
  fmul X, 1.0 ==> X

llvm-svn: 169940
2012-12-12 00:27:46 +00:00
Nadav Rotem f707bf4ca3 PR14574. Fix a bug in the code that calculates the mask the converted PHIs in if-conversion.
llvm-svn: 169916
2012-12-11 21:30:14 +00:00
Nadav Rotem e266efb70b Loop Vectorize: optimize the vectorization of trunc(induction_var). The truncation is now done on scalars.
llvm-svn: 169904
2012-12-11 18:58:10 +00:00
Nadav Rotem dbb3328194 Fix PR14565. Don't if-convert loops that have switch statements in them.
llvm-svn: 169813
2012-12-11 04:55:10 +00:00
Nadav Rotem 7b5b55c195 Add support for reverse induction variables. For example:
while (i--)
 sum+=A[i];

llvm-svn: 169752
2012-12-10 19:25:06 +00:00
Chandler Carruth e45f4658a3 Fix PR14548: SROA was crashing on a mixture of i1 and i8 loads and stores.
When SROA was evaluating a mixture of i1 and i8 loads and stores, in
just a particular case, it would tickle a latent bug where we compared
bits to bytes rather than bits to bits. As a consequence of the latent
bug, we would allow integers through which were not byte-size multiples,
a situation the later rewriting code was never intended to handle.

In release builds this could trigger all manner of oddities, but the
reported issue in PR14548 was forming invalid bitcast instructions.

The only downside of this fix is that it makes it more clear that SROA
in its current form is not capable of handling mixed i1 and i8 loads and
stores. Sometimes with the previous code this would work by luck, but
usually it would crash, so I'm not terribly worried. I'll watch the LNT
numbers just to be sure.

llvm-svn: 169719
2012-12-10 00:54:45 +00:00
Paul Redmond 2adb13c100 LoopVectorize: support vectorizing intrinsic calls
- added function to VectorTargetTransformInfo to query cost of intrinsics
- vectorize trivially vectorizable intrinsic calls such as sin, cos, log, etc.

Reviewed by: Nadav

llvm-svn: 169711
2012-12-09 20:42:17 +00:00
Shuxin Yang 95de7c37e2 - Re-enable population count loop idiom recognization
- fix a bug which cause sigfault.
- add two testing cases which was causing crash

llvm-svn: 169687
2012-12-09 03:12:46 +00:00
Chandler Carruth 91e47532fe Revert the patches adding a popcount loop idiom recognition pass.
There are still bugs in this pass, as well as other issues that are
being worked on, but the bugs are crashers that occur pretty easily in
the wild. Test cases have been sent to the original commit's review
thread.

This reverts the commits:
  r169671: Fix a logic error.
  r169604: Move the popcnt tests to an X86 subdirectory.
  r168931: Initial commit adding the pass.

llvm-svn: 169683
2012-12-08 22:18:29 +00:00
David Tweed cfeb8fc49a The test unconditionally assumes a particular cpu has a backend build in the target.
Buildbots for some hosts may choose to build only their own backend in order to
maximise testing-turnaround time. Move the test into a prefixed directory so
lit's standard "backend specific" suppression can be done.

llvm-svn: 169604
2012-12-07 15:57:45 +00:00
Chandler Carruth 80d3e56c73 Add support to ValueTracking for determining that a pointer is non-null
by virtue of inbounds GEPs that preclude a null pointer.

This is a very common pattern in the code generated by std::vector and
other standard library routines which use allocators that test for null
pervasively. This is one step closer to teaching Clang+LLVM to be able
to produce an empty function for:

  void f() {
    std::vector<int> v;
    v.push_back(1);
    v.push_back(2);
    v.push_back(3);
    v.push_back(4);
  }

Which is related to getting them to completely fold SmallVector
push_back sequences into constants when inlining and other optimizations
make that a possibility.

llvm-svn: 169573
2012-12-07 02:08:58 +00:00
Dmitri Gribenko 1c704355cf Fix typos in CHECK lines.
Patch by Alexander Zinenko.

llvm-svn: 169547
2012-12-06 21:24:47 +00:00