llvm-project

Commit Graph

Author	SHA1	Message	Date
Arnold Schwaighofer	2d920477a4	LoopVectorize: Hoist conditional loads if possible InstCombine can be uncooperative to vectorization and sink loads into conditional blocks. This prevents vectorization. Undo this optimization if there are unconditional memory accesses to the same addresses in the loop. radar://13815763 llvm-svn: 181860	2013-05-15 01:44:30 +00:00
Arnold Schwaighofer	2e7a922a15	LoopVectorize: Handle loops with multiple forward inductions We used to give up if we saw two integer inductions. After this patch, we base further induction variables on the chosen one like we do in the reverse induction and pointer induction case. Fixes PR15720. radar://13851975 llvm-svn: 181746	2013-05-14 00:21:18 +00:00
Arnold Schwaighofer	f2305e4467	LoopVectorize: Use the widest induction variable type Use the widest induction type encountered for the cannonical induction variable. We used to turn the following loop into an empty loop because we used i8 as induction variable type and truncated 1024 to 0 as trip count. int a[1024]; void fail() { int reverse_induction = 1023; unsigned char forward_induction = 0; while ((reverse_induction) >= 0) { forward_induction++; a[reverse_induction] = forward_induction; --reverse_induction; } } radar://13862901 llvm-svn: 181667	2013-05-11 23:04:28 +00:00
Nadav Rotem	5ff6db1153	Add an additional testcase for PR15882. llvm-svn: 181646	2013-05-10 22:55:44 +00:00
Arnold Schwaighofer	2e8c69cf97	LoopVectorizer: Don't assert on the absence of induction variables A computable loop exit count does not imply the presence of an induction variable. Scalar evolution can return a value for an infinite loop. Fixes PR15926. llvm-svn: 181495	2013-05-09 00:32:18 +00:00
Arnold Schwaighofer	3610139ac5	LoopVectorizer: Improve reduction variable identification The two nested loops were confusing and also conservative in identifying reduction variables. This patch replaces them by a worklist based approach. llvm-svn: 181369	2013-05-07 21:55:37 +00:00
Arnold Schwaighofer	e78b76fbed	LoopVectorize: getConsecutiveVector must respect signed arithmetic We were passing an i32 to ConstantInt::get where an i64 was needed and we must also pass the sign if we pass negatives numbers. The start index passed to getConsecutiveVector must also be signed. Should fix PR15882. llvm-svn: 181286	2013-05-07 04:37:05 +00:00
Arnold Schwaighofer	d96e427eac	LoopVectorize: Add support for floating point min/max reductions Add support for min/max reductions when "no-nans-float-math" is enabled. This allows us to assume we have ordered floating point math and treat ordered and unordered predicates equally. radar://13723044 llvm-svn: 181144	2013-05-05 01:54:48 +00:00
Arnold Schwaighofer	a670a0a3aa	LoopVectorize: We don't need an identity element for min/max reductions We can just use the initial element that feeds the reduction. max(max(x, y), z) == max(max(x,y), max(x,z)) radar://13723044 llvm-svn: 181141	2013-05-05 01:54:42 +00:00
Nadav Rotem	4ce060b3da	LoopVectorizer: Add support for if-conversion of PHINodes with 3+ incoming values. By supporting the vectorization of PHINodes with more than two incoming values we can increase the complexity of nested if statements. We can now vectorize this loop: int foo(int A, int B, int n) { for (int i=0; i < n; i++) { int x = 9; if (A[i] > B[i]) { if (A[i] > 19) { x = 3; } else if (B[i] < 4 ) { x = 4; } else { x = 5; } } A[i] = x; } } llvm-svn: 181037	2013-05-03 17:42:55 +00:00
Manman Ren	16649b0107	TBAA: remove !tbaa from testing cases if not used. This will make it easier to turn on struct-path aware TBAA since the metadata format will change. llvm-svn: 180935	2013-05-02 18:11:35 +00:00
Manman Ren	1a5ff287fd	TBAA: remove !tbaa from testing cases if not used. This will make it easier to turn on struct-path aware TBAA since the metadata format will change. llvm-svn: 180796	2013-04-30 17:52:57 +00:00
Nadav Rotem	13306816fc	LoopVectorizer: Calculate the number of pointers to disambiguate at runtime based on the numbers of reads and writes. llvm-svn: 180593	2013-04-26 05:08:59 +00:00
Nadav Rotem	f43cbeee15	LoopVectorizer: No need to generate pointer disambiguation checks between readonly pointers. llvm-svn: 180570	2013-04-25 19:55:03 +00:00
Arnold Schwaighofer	a6578f7056	LoopVectorize: Scalarize padded types This patch disables memory-instruction vectorization for types that need padding bytes, e.g., x86_fp80 has 10 bytes store size with 6 bytes padding in darwin on x86_64. Because the load/store vectorization is performed by the bit casting to a packed vector, which has incompatible memory layout due to the lack of padding bytes, the present vectorizer produces inconsistent result for memory instructions of those types. This patch checks an equality of the AllocSize of a scalar type and allocated size for each vector element, to ensure that there is no padding bytes and the array can be read/written using vector operations. Patch by Daisuke Takahashi! Fixes PR15758. llvm-svn: 180196	2013-04-24 16:16:01 +00:00
Arnold Schwaighofer	23a0589bce	LoopVectorizer: Bail out if we don't have datalayout we need it llvm-svn: 180195	2013-04-24 16:15:58 +00:00
Nadav Rotem	71c9d6d333	LoopVectorizer: Fix 15830. When scalarizing and unrolling stores make sure that the order in which the elements are scalarized is the same as the original order. This fixes a miscompilation in FreeBSD's regex library. llvm-svn: 180121	2013-04-23 17:12:42 +00:00
Pekka Jaaskelainen	d3c90e132a	Call the potentially costly isAnnotatedParallel() only once. Made the uniform write test's checks a bit stricter. llvm-svn: 180119	2013-04-23 16:44:43 +00:00
Pekka Jaaskelainen	6f2f66b63f	Refuse to (even try to) vectorize loops which have uniform writes, even if erroneously annotated with the parallel loop metadata. Fixes Bug 15794: "Loop Vectorizer: Crashes with the use of llvm.loop.parallel metadata" llvm-svn: 180081	2013-04-23 08:08:51 +00:00
Arnold Schwaighofer	4cd6aa110c	LoopVectorizer: Recognize min/max reductions A min/max operation is represented by a select(cmp(lt/le/gt/ge, X, Y), X, Y) sequence in LLVM. If we see such a sequence we can treat it just as any other commutative binary instruction and reduce it. This appears to help bzip2 by about 1.5% on an imac12,2. radar://12960601 llvm-svn: 179773	2013-04-18 17:22:34 +00:00
Benjamin Kramer	8df2cfb858	LoopVectorize: Use a set to avoid longer cycles in the reduction chain too. Fixes PR15748. llvm-svn: 179757	2013-04-18 14:29:13 +00:00
Arnold Schwaighofer	f9cea17f75	LoopVectorizer: integer division is not a reduction operation Don't classify idiv/udiv as a reduction operation. Integer division is lossy. For example : (1 / 2) * 4 != 4/2. Example: int a[] = { 2, 5, 2, 2} int x = 80; for() x /= a[i]; Scalar: x /= 2 // = 40 x /= 5 // = 8 x /= 2 // = 4 x /= 2 // = 2 Vectorized: <80, 1> / <2,5> //= <40,0> <40, 0> / <2,2> //= <20,0> 20*0 = 0 radar://13640654 llvm-svn: 179381	2013-04-12 15:15:19 +00:00
Arnold Schwaighofer	df6f67ed87	LoopVectorizer: Pass OperandValueKind information to the cost model Pass down the fact that an operand is going to be a vector of constants. This should bring the performance of MultiSource/Benchmarks/PAQ8p/paq8p on x86 back. It had degraded to scalar performance due to my pervious shift cost change that made all shifts expensive on x86. radar://13576547 llvm-svn: 178809	2013-04-04 23:26:27 +00:00
Benjamin Kramer	52ceb44331	X86TTI: Add accurate costs for itofp operations, based on the actual instruction counts. llvm-svn: 178459	2013-04-01 10:23:49 +00:00
Arnold Schwaighofer	9b55e31bcb	LoopVectorizer: Insert some white space to make test case more readable Also remove some unneeded function attributes. llvm-svn: 177114	2013-03-14 21:31:09 +00:00
Arnold Schwaighofer	4991ce9d49	Add missing asserts flag to test - it uses debug flags llvm-svn: 177102	2013-03-14 19:01:58 +00:00
Arnold Schwaighofer	c63cf3a0ae	LoopVectorize: Invert case when we use a vector cmp value to query select cost We generate a select with a vectorized condition argument when the condition is NOT loop invariant. Not the other way around. llvm-svn: 177098	2013-03-14 18:54:36 +00:00
Benjamin Kramer	01b75cc0f2	Test case hygiene. llvm-svn: 176772	2013-03-09 18:25:40 +00:00
Arnold Schwaighofer	4090b61ac3	LoopVectorizer: Ignore dbg.value instructions We want vectorization to happen at -g. Ignore calls to the dbg.value intrinsic and don't transfer them to the vectorized code. radar://13378964 llvm-svn: 176768	2013-03-09 15:56:34 +00:00
Benjamin Kramer	10a74ed434	Force cpu in test. llvm-svn: 176702	2013-03-08 17:01:18 +00:00
Benjamin Kramer	37c2d65c5a	Insert the reduction start value into the first bypass block to preserve domination. Fixes PR15344. llvm-svn: 176701	2013-03-08 16:58:37 +00:00
Arnold Schwaighofer	20ef54f4c1	X86 cost model: Adjust cost for custom lowered vector multiplies This matters for example in following matrix multiply: int mmult(int rows, int cols, int m1, int m2, int m3) { int i, j, k, val; for (i=0; i<rows; i++) { for (j=0; j<cols; j++) { val = 0; for (k=0; k<cols; k++) { val += m1[i][k] * m2[k][j]; } m3[i][j] = val; } } return(m3); } Taken from the test-suite benchmark Shootout. We estimate the cost of the multiply to be 2 while we generate 9 instructions for it and end up being quite a bit slower than the scalar version (48% on my machine). Also, properly differentiate between avx1 and avx2. On avx-1 we still split the vector into 2 128bits and handle the subvector muls like above with 9 instructions. Only on avx-2 will we have a cost of 9 for v4i64. I changed the test case in test/Transforms/LoopVectorize/X86/avx1.ll to use an add instead of a mul because with a mul we now no longer vectorize. I did verify that the mul would be indeed more expensive when vectorized with 3 kernels: for (i ...) r += a[i] * 3; for (i ...) m1[i] = m1[i] * 3; // This matches the test case in avx1.ll and a matrix multiply. In each case the vectorized version was considerably slower. radar://13304919 llvm-svn: 176403	2013-03-02 04:02:52 +00:00
Nadav Rotem	739e37a0d2	PR14448 - prevent the loop vectorizer from vectorizing the same loop twice. The LoopVectorizer often runs multiple times on the same function due to inlining. When this happens the loop vectorizer often vectorizes the same loops multiple times, increasing code size and adding unneeded branches. With this patch, the vectorizer during vectorization puts metadata on scalar loops and marks them as 'already vectorized' so that it knows to ignore them when it sees them a second time. PR14448. llvm-svn: 176399	2013-03-02 01:33:49 +00:00
Benjamin Kramer	12f98fae98	LoopVectorize: Don't hang forever if a PHI only has skipped PHI uses. Fixes PR15384. llvm-svn: 176366	2013-03-01 19:07:31 +00:00
Benjamin Kramer	dc145816fd	LoopVectorize: Vectorize math builtin calls. This properly asks TargetLibraryInfo if a call is available and if it is, it can be translated into the corresponding LLVM builtin. We don't vectorize sqrt() yet because I'm not sure about the semantics for negative numbers. The other intrinsic should be exact equivalents to the libm functions. Differential Revision: http://llvm-reviews.chandlerc.com/D465 llvm-svn: 176188	2013-02-27 15:24:19 +00:00
Renato Golin	0890ace58a	Some more tests for the global structure vectorizer llvm-svn: 175964	2013-02-23 12:48:30 +00:00
Renato Golin	adc1b07002	More tests to global struct vectorizer llvm-svn: 175898	2013-02-22 16:18:31 +00:00
Renato Golin	cf928cb53f	Allow GlobalValues to vectorize with AliasAnalysis Storing the load/store instructions with the values and inspect them using Alias Analysis to make sure they don't alias, since the GEP pointer operand doesn't take the offset into account. Trying hard to not add any extra cost to loads and stores that don't overlap on global values, AA is only calculated if all of the previous attempts failed. Using biggest vector register size as the stride for the vectorization access, as we're being conservative and the cost model (which calculates the real vectorization factor) is only run after the legalization phase. We might re-think this relationship in the future, but for now, I'd rather be safe than sorry. llvm-svn: 175818	2013-02-21 22:39:03 +00:00
Pekka Jaaskelainen	62848c9c24	Forgot to 'svn add' the LoopVectorizer tests for the new parallel loop metadata, sorry. llvm-svn: 175311	2013-02-15 21:50:19 +00:00
NAKAMURA Takumi	7ec43d9b37	Formatting. llvm-svn: 174380	2013-02-05 15:32:16 +00:00
NAKAMURA Takumi	6635fe56d3	llvm/test/Transforms/LoopVectorize/X86/vector_ptr_load_store.ll: "-debug" requires +Asserts. llvm-svn: 174379	2013-02-05 15:32:10 +00:00
Arnold Schwaighofer	22174f5d5a	Loop Vectorizer: Handle pointer stores/loads in getWidestType() In the loop vectorizer cost model, we used to ignore stores/loads of a pointer type when computing the widest type within a loop. This meant that if we had only stores/loads of pointers in a loop we would return a widest type of 8bits (instead of 32 or 64 bit) and therefore a vector factor that was too big. Now, if we see a consecutive store/load of pointers we use the size of a pointer (from data layout). This problem occured in SingleSource/Benchmarks/Shootout-C++/hash.cpp (reduced test case is the first test in vector_ptr_load_store.ll). radar://13139343 llvm-svn: 174377	2013-02-05 15:08:02 +00:00
Pekka Jaaskelainen	995a3e731d	Made the min-trip-count-switch test X86-specific to avoid breakage with builds without X86-support. llvm-svn: 174052	2013-01-31 10:33:22 +00:00
Renato Golin	5e9d55eca0	Adding simple cast cost to ARM Changing ARMBaseTargetMachine to return ARMTargetLowering intead of the generic one (similar to x86 code). Tests showing which instructions were added to cast when necessary or cost zero when not. Downcast to 16 bits are not lowered in NEON, so costs are not there yet. llvm-svn: 173849	2013-01-29 23:31:38 +00:00
Pekka Jaaskelainen	f50ab84bb1	LoopVectorize: convert TinyTripCountVectorThreshold constant to a command line switch. llvm-svn: 173837	2013-01-29 21:42:08 +00:00
Nadav Rotem	ab3e698ee9	Add support for reverse pointer induction variables. These are loops that contain pointers that count backwards. For example, this is the hot loop in BZIP: do { m = --p; p = ( ... ); } while (--n); llvm-svn: 173219	2013-01-23 01:35:00 +00:00
Nadav Rotem	c42f90b1f4	LoopVectorizer: Implement a new heuristics for selecting the unroll factor. We ignore the cpu frontend and focus on pipeline utilization. We do this because we don't have a good way to estimate the loop body size at the IR level. llvm-svn: 172964	2013-01-20 05:24:29 +00:00
Nadav Rotem	2169dbed2c	Change the cpu type in the test. llvm-svn: 172963	2013-01-20 05:20:56 +00:00
Benjamin Kramer	d455ed85d1	LoopVectorizer: Emit memory checks into their own basic block. This separates the check for "too few elements to run the vector loop" from the "memory overlap" check, giving a lot nicer code and allowing to skip the memory checks when we're not going to execute the vector code anyways. We still leave the decision of whether to emit the memory checks as branches or setccs, but it seems to be doing a good job. If ugly code pops up we may want to emit them as separate blocks too. Small speedup on MultiSource/Benchmarks/MallocBench/espresso. Most of this is legwork to allow multiple bypass blocks while updating PHIs, dominators and loop info. llvm-svn: 172902	2013-01-19 13:57:58 +00:00
Benjamin Kramer	b7050f0a7c	Move test that depends on the x86 target into a target-specific directory. Should fix the arm buildbot (which only builds the arm target). llvm-svn: 172611	2013-01-16 13:25:56 +00:00

1 2 3

123 Commits