llvm-project

Commit Graph

Author	SHA1	Message	Date
Kostya Serebryany	4f8f0c5aa2	[asan] experimental tracing for indirect calls, llvm part. llvm-svn: 220699	2014-10-27 18:13:56 +00:00
David Majnemer	c8bdd23acf	InstCombine: Fix a combine assuming that icmp operands were integers An icmp may have pointer arguments, it isn't limited to integers or vectors of integers. This fixes PR21388. llvm-svn: 220664	2014-10-27 05:47:49 +00:00
Arnold Schwaighofer	eb1a38fa73	Add an option to the LTO code generator to disable vectorization during LTO We used to always vectorize (slp and loop vectorize) in the LTO pass pipeline. r220345 changed it so that we used the PassManager's fields 'LoopVectorize' and 'SLPVectorize' out of the desire to be able to disable vectorization using the cl::opt flags 'vectorize-loops'/'slp-vectorize' which the before mentioned fields default to. Unfortunately, this turns off vectorization because those fields default to false. This commit adds flags to the LTO library to disable lto vectorization which reconciles the desire to optionally disable vectorization during LTO and the desired behavior of defaulting to enabled vectorization. We really want tools to set PassManager flags directly to enable/disable vectorization and not go the route via cl::opt flags in PassManagerBuilder.cpp. llvm-svn: 220652	2014-10-26 21:50:58 +00:00
Andrew Trick	dd925ad218	LSR: Minor cleanup after Daniel's patch. Combine the Inserted an Done sets into a Visited set. llvm-svn: 220623	2014-10-25 19:59:30 +00:00
Andrew Trick	9ccbed5a12	Fix LSR compile time. This is a simple fix that brings the compilation time from 5min to 5s on a specific real-world example. It's a large chain of computation in a crypto routine (always a problem for SCEV). A unit test is not feasible and there would be no way to check it. The fix is just basic good practice for dealing with SCEVs, there's no risk of regression. Patch by Daniel Reynaud! llvm-svn: 220622	2014-10-25 19:42:07 +00:00
Jingyue Wu	fe72fcebf6	[SeparateConstOffsetFromGEP] Fixed a bug related to unsigned modulo The dividend in "signed % unsigned" is treated as unsigned instead of signed, causing unexpected behavior such as -64 % (uint64_t)24 == 0. Added a regression test in split-gep.ll Patched by Hao Liu. llvm-svn: 220618	2014-10-25 18:34:03 +00:00
Benjamin Kramer	63207bc9c3	Clean up assume intrinsic pattern matching, no need to check that the argument is a value. Also make it const safe and remove superfluous casting. NFC. llvm-svn: 220616	2014-10-25 18:09:01 +00:00
Jingyue Wu	b723152379	[SeparateConstOffsetFromGEP] Fixed a bug in rebuilding OR expressions The two operands of the new OR expression should be NextInChain and TheOther instead of the two original operands. Added a regression test in split-gep.ll. Hao Liu reported this bug, and provded the test case and an initial patch. Thanks! llvm-svn: 220615	2014-10-25 17:36:21 +00:00
David Majnemer	2abb8183b5	InstCombine: Remove overzealous asserts These asserts can trigger if the worklist iteration order is sufficiently unlucky. Instead of adding special case logic to handle these edge conditions, just bail out on trying to transform them: InstSimplify will get them when it reaches them on the worklist. This fixes PR21378. N.B. No test case is included because any test would rely on the fragile worklist iteration order. llvm-svn: 220612	2014-10-25 07:13:13 +00:00
Evgeniy Stepanov	d337a59db5	[msan] Make -msan-check-constant-shadow a bit stronger. Allow (under the experimental flag) non-Instructions to participate in MSan checks. llvm-svn: 220601	2014-10-24 23:34:15 +00:00
Nick Lewycky	592d84974c	If requested, apply function merging at -O0 too. It's useful there to reduce the time to compile. llvm-svn: 220537	2014-10-23 23:49:31 +00:00
Timur Iskhodzhanov	eb229ca928	Make getDISubprogram(const Function *F) available in LLVM Reviewed at http://reviews.llvm.org/D5950 llvm-svn: 220536	2014-10-23 23:46:28 +00:00
Sanjay Patel	848309da7c	Handle sqrt() shrinking in SimplifyLibCalls like any other call This patch removes a chunk of special case logic for folding (float)sqrt((double)x) -> sqrtf(x) in InstCombineCasts and handles it in the mainstream path of SimplifyLibCalls. No functional change intended, but I loosened the restriction on the existing sqrt testcases to allow for this optimization even without unsafe-fp-math because that's the existing behavior. I also added a missing test case for not shrinking the llvm.sqrt.f64 intrinsic in case the result is used as a double. Differential Revision: http://reviews.llvm.org/D5919 llvm-svn: 220514	2014-10-23 21:52:45 +00:00
Frederic Riss	c1892e2d48	Assert that ValueHandleBase::ValueIsRAUWd doesn't change the tracked Value type. This invariant is enforced in Value::replaceAllUsesWith, thus it seems logical to apply it also to ValueHandles. This commit fixes InstCombine to not trigger the assertion during the removal of constant bitcasts in call instructions. Differential Revision: http://reviews.llvm.org/D5828 llvm-svn: 220468	2014-10-23 04:08:42 +00:00
Evgeniy Stepanov	7db296eba5	[msan] Emit checks for constant shadow values under an experimental flag. Does not change the default behavior. llvm-svn: 220457	2014-10-23 01:05:46 +00:00
Benjamin Kramer	26ce8ff637	LoopVectorize: Simplify code. No functionality change. llvm-svn: 220405	2014-10-22 19:13:54 +00:00
Diego Novillo	19e7b7e27c	Shorten auto iterators for function basic blocks. Use consistent naming for basic block instances. No functional changes. llvm-svn: 220404	2014-10-22 18:39:50 +00:00
Diego Novillo	b368b7d558	Use auto iteration in lib/Transforms/Scalar/SampleProfile.cpp. No functional changes. llvm-svn: 220394	2014-10-22 16:51:50 +00:00
Philip Reames	d92c2a7592	Preserving 'nonnull' metadata in SimplifyCFG When we hoist two loads above an if, we can preserve the nonnull metadata. We could also do the same for sinking them, but we appear to not handle metadata at all in that case. Thanks to Hal for the review. Differential Revision: http://reviews.llvm.org/D5910 llvm-svn: 220392	2014-10-22 16:37:13 +00:00
Sanjay Patel	a92fa44740	Shrinkify libcalls: use float versions of double libm functions with fast-math (bug 17850) When a call to a double-precision libm function has fast-math semantics (via function attribute for now because there is no IR-level FMF on calls), we can avoid fpext/fptrunc operations and use the float version of the call if the input and output are both float. We already do this optimization using a command-line option; this patch just adds the ability for fast-math to use the existing functionality. I moved the cl::opt from InstructionCombining into SimplifyLibCalls because it's only ever used internally to that class. Modified the existing test cases to use the unsafe-fp-math attribute rather than repeating all tests. This patch should solve: http://llvm.org/bugs/show_bug.cgi?id=17850 Differential Revision: http://reviews.llvm.org/D5893 llvm-svn: 220390	2014-10-22 15:29:23 +00:00
Diego Novillo	a67c0b43e1	Change error to warning when a profile cannot be found. When the profile for a function cannot be applied, we use to emit an error. This seems extreme. The compiler can continue, it's just that the optimization opportunities won't include profile information. llvm-svn: 220386	2014-10-22 13:36:35 +00:00
Diego Novillo	8027b80b41	Support using sample profiles with partial debug info. Summary: When using a profile, we used to require the use -gmlt so that we could get access to the line locations. This is used to match line numbers in the input profile to the line numbers in the function's IR. But this is actually not necessary. The driver can provide source location tracking without the emission of debug information. In these cases, the annotation 'llvm.dbg.cu' is missing from the IR, but the actual line location annotations are still present. This patch adds a new way of looking for the start of the current function. Instead of looking through the compile units in llvm.dbg.cu, we can walk up the scope for the first instruction in the function with a debug loc. If that describes the function, we use it. Otherwise, we keep looking until we find one. If no such instruction is found, we then give up and produce an error. Reviewers: echristo, dblaikie Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5887 llvm-svn: 220382	2014-10-22 12:59:00 +00:00
Evgeniy Stepanov	35eb265421	[msan] Handle param-tls overflow. ParamTLS (shadow for function arguments) is of limited size. This change makes all arguments that do not fit unpoisoned, and avoids writing past the end of a TLS buffer. llvm-svn: 220351	2014-10-22 00:12:40 +00:00
Hans Wennborg	0b39fc0d16	Revert "Teach the load analysis to allow finding available values which require" (r220277) This seems to have caused PR21330. llvm-svn: 220349	2014-10-21 23:49:52 +00:00
JF Bastien	f42a6ea5ac	LTO: respect command-line options that disable vectorization. Summary: Patches 202051 and 208013 added calls to LTO's PassManager which unconditionally add LoopVectorizePass and SLPVectorizerPass instead of following the logic in PassManagerBuilder::populateModulePassManager and honoring the -vectorize-loops -run-slp-after-loop-vectorization flags. Reviewers: nadav, aschwaighofer, yijiang Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5884 llvm-svn: 220345	2014-10-21 23:18:21 +00:00
Matt Arsenault	d6511b49ac	Add minnum / maxnum intrinsics These are named following the IEEE-754 names for these functions, rather than the libm fmin / fmax to avoid possible ambiguities. Some languages may implement something resembling fmin / fmax which return NaN if either operand is to propagate errors. These implement the IEEE-754 semantics of returning the other operand if either is a NaN representing missing data. llvm-svn: 220341	2014-10-21 23:00:20 +00:00
Philip Reames	d7c21364a9	Teach combineMetadata how to merge 'nonnull' metadata. combineMetadata is used when merging two instructions into one. This change teaches it how to merge 'nonnull' - i.e. only preserve it on the new instruction if it's set on both sources. This isn't actually used yet since I haven't adjusted any of the call sites to pass in nonnull as a 'known metadata'. llvm-svn: 220325	2014-10-21 21:02:19 +00:00
Philip Reames	b2d3f035e2	Preserve 'nonnull' when changing type of the load. When changing the type of a load in Chandler's recent InstCombine changes, we can preserve the new 'nonnull' metadata. I considered adding an assert since 'nonnull' is only valid on pointer types, but casting a pointer to a non-pointer would involve more than a bitcast anyways. If someone extends this transform to handle more than bitcasts, the verifier will report the malformed IR, so a separate assertion isn't needed. Also, the fpmath flags would have the same problem. llvm-svn: 220324	2014-10-21 21:00:03 +00:00
David Majnemer	d205602a0b	InstCombine: Simplify FoldICmpCstShrCst This function was complicated by the fact that it tried to perform canonicalizations that were already preformed by InstSimplify. Remove this extra code and move the tests over to InstSimplify. Add asserts to make sure our preconditions hold before we make any assumptions. llvm-svn: 220314	2014-10-21 19:51:55 +00:00
Chandler Carruth	aa72a6dd3b	Teach the load analysis to allow finding available values which require inttoptr or ptrtoint cast provided there is datalayout available. Eventually, the datalayout can just be required but in practice it will always be there today. To go with the ability to expose available values requiring a ptrtoint or inttoptr cast, helpers are added to perform one of these three casts. These smarts are necessary to finish canonicalizing loads and stores to the operational type requirements without regressing fundamental combines. I've added some test cases. These should actually improve as the load combining and store combining improves, but they may fundamentally be highlighting some missing combines for select in addition to exercising the specific added logic to load analysis. llvm-svn: 220277	2014-10-21 09:00:40 +00:00
Paul Robinson	f60e0a160f	Do not attribute static allocas to the call site's DebugLoc. When functions are inlined, instructions without debug information are attributed to the call site's DebugLoc. After inlining, inlined static allocas are moved to the caller's entry block, adjacent to the caller's original static alloca instructions. By retaining the call site's DebugLoc, these instructions could cause instructions that were subsequently inserted at the entry block to pick up the same DebugLoc. Patch by Wolfgang Pieb! llvm-svn: 220255	2014-10-21 01:00:55 +00:00
Philip Reames	5a3f5f751b	Introduce enum values for previously defined metadata types. (NFC) Our metadata scheme lazily assigns IDs to string metadata, but we have a mechanism to preassign them as well. Using a preassigned ID is helpful since we get compile time type checking, and avoid some (minimal) string construction and comparison. This change adds enum value for three existing metadata types: + MD_nontemporal = 9, // "nontemporal" + MD_mem_parallel_loop_access = 10, // "llvm.mem.parallel_loop_access" + MD_nonnull = 11 // "nonnull" I went through an updated various uses as well. I made no attempt to get all uses; I focused on the ones which were easily grepable and easily to translate. For example, there were several items in LoopInfo.cpp I chose not to update. llvm-svn: 220248	2014-10-21 00:13:20 +00:00
David Majnemer	f3cadce84c	IR: Replace DataLayout::RoundUpAlignment with RoundUpToAlignment No functional change intended, just cleaning up some code. llvm-svn: 220187	2014-10-20 06:13:33 +00:00
Chandler Carruth	6665d62117	Fix a somewhat subtle pair of issues with JumpThreading I introduced in r220178. First, the creation routine doesn't insert prior to the terminator of the basic block provided, but really at the end of the basic block. Instead, get the terminator and insert before that. The next issue was that we need to ensure multiple PHI node entries for a single predecessor re-use the same cast instruction rather than creating new ones. All of the logic here was without tests previously. I've reduced and added a test case from the test suite that crashed without both of these fixes. llvm-svn: 220186	2014-10-20 05:34:36 +00:00
Chandler Carruth	eeec35ae1c	Teach the load analysis driving core instcombine logic and other bits of logic to look through pointer casts, making them trivially stronger in the face of loads and stores with intervening pointer casts. I've included a few test cases that demonstrate the kind of folding instcombine can do without pointer casts and then variations which obfuscate the logic through bitcasts. Without this patch, the variations all fail to optimize fully. This is more important now than it has been in the past as I've started moving the load canonicialization to more closely follow the value type requirements rather than the pointer type requirements and thus this needs to be prepared for more pointer casts. When I made the same change to stores several test cases regressed without logic along these lines so I wanted to systematically improve matters first. llvm-svn: 220178	2014-10-20 00:24:14 +00:00
Chandler Carruth	bc6378defb	Do a better and more complete job of preserving metadata when combining loads. This handles many more cases than just the AA metadata, some of them suggested by Hal in his review of the AA metadata handling patch. I've tried to test this behavior where tractable to do so. I'll point out that I have specifically not included a test for debuginfo because it was going to require 2 or 3 times as much work to craft some input which would survive the "helpful" stripping of debug info metadata that doesn't match the desired schema. This is another good example of why the current state of write-ability for our debug info metadata is unacceptable. I spent over 30 minutes trying to conjure some test case that would survive, even copying from other debug info tests, but it always failed to survive with no explanation of why or how I might fix it. =[ llvm-svn: 220165	2014-10-19 10:46:46 +00:00
David Majnemer	312c3e5f39	InstCombine: (sub (or A B) (xor A B)) --> (and A B) The following implements the transformation: (sub (or A B) (xor A B)) --> (and A B). Patch by Ankur Garg! Differential Revision: http://reviews.llvm.org/D5719 llvm-svn: 220163	2014-10-19 08:32:32 +00:00
David Majnemer	59939acd26	InstCombine: Optimize icmp eq/ne (shl Const2, A), Const1 The following implements the optimization for sequences of the form: icmp eq/ne (shl Const2, A), Const1 Such sequences can be transformed to: icmp eq/ne A, (TrailingZeros(Const1) - TrailingZeros(Const2)) This handles only the equality operators for now. Other operators need to be handled. Patch by Ankur Garg! llvm-svn: 220162	2014-10-19 08:23:08 +00:00
Chandler Carruth	a801dd5799	Fix a long-standing miscompile in the load analysis that was uncovered by my refactoring of this code. The method isSafeToLoadUnconditionally assumes that the load will proceed with the preferred type alignment. Given that, it has to ensure that the alloca or global is at least that aligned. It has always done this historically when a datalayout is present, but has never checked it when the datalayout is absent. When I refactored the code in r220156, I exposed this path when datalayout was present and that turned the latent bug into a patent bug. This fixes the issue by just removing the special case which allows folding things without datalayout. This isn't worth the complexity of trying to tease apart when it is or isn't safe without actually knowing the preferred alignment. llvm-svn: 220161	2014-10-19 08:17:50 +00:00
Chandler Carruth	be9dccd64d	Preserve AA metadata when combining (cast (load (...))) -> (load (cast (...))). llvm-svn: 220141	2014-10-18 11:00:12 +00:00
Chandler Carruth	2f75fcfef3	[InstCombine] Do an about-face on how LLVM canonicalizes (cast (load ...)) and (load (cast ...)): canonicalize toward the former. Historically, we've tried to load using the type of the pointer, and tried to match that type as closely as possible removing as many pointer casts as we could and trading them for bitcasts of the loaded value. This is deeply and fundamentally wrong. Repeat after me: memory does not have a type! This was a hard lesson for me to learn working on SROA. There is only one thing that should actually drive the type used for a pointer, and that is the type which we need to use to load from that pointer. Matching up pointer types to the loaded value types is very useful because it minimizes the physical size of the IR required for no-op casts. Similarly, the only thing that should drive the type used for a loaded value is how that value is used! Again, this minimizes casts. And in fact, the only thing motivating types in any part of LLVM's IR are the types used by the operations in the IR. We should match them as closely as possible. I've ended up removing some tests here as they were testing bugs or behavior that is no longer present. Mostly though, this is just cleanup to let the tests continue to function as intended. The only fallout I've found so far from this change was SROA and I have fixed it to not be impeded by the different type of load. If you find more places where this change causes optimizations not to fire, those too are likely bugs where we are assuming that the type of pointers is "significant" for optimization purposes. llvm-svn: 220138	2014-10-18 06:36:22 +00:00
Chandler Carruth	2dc9682e59	[SROA] Change how SROA does vector-based promotion of allocas to handle cases where the alloca type, the load types, and the store types used all disagree. Previously, the only way that vector-based promotion occured was if the alloca type was a vector type. This was one of the very few remaining uses of the alloca's type to guide SROA/mem2reg left in LLVM. It turns out it was a bad idea. The alloca type can change very easily based on the mixture of types loaded and stored to that alloca. We shouldn't be relying on it as a signal for very much. Instead, the source of truth should be loads and stores. We should canonicalize the loads and stores as much as possible and then rely on them exclusively in SROA. When looking and loads and stores, we may find many different candidate vector types. This change will let SROA try all of them to find a vector type which is a viable way to promote the entire alloca to a vector register. With this change, it becomes possible to do better canonicalization and optimization of loads and stores without breaking SROA in random ways, and that should allow fixing a core source of performance loss in hot numerical loops such as those in Eigen. llvm-svn: 220116	2014-10-18 00:44:02 +00:00
Evgeniy Stepanov	e08633e900	[msan] Fix handling of byval arguments with large alignment. MSan param-tls slots are 8-byte aligned. This change clips alignment of memcpy into param-tls to 8. llvm-svn: 220101	2014-10-17 23:29:44 +00:00
Rafael Espindola	7da1ea83a9	Revert "TRE: make TRE a bit more aggressive" This reverts commit r219899. This also updates byval-tail-call.ll to make it clear what was breaking. Adding r219899 again will cause the load/store to disappear. llvm-svn: 220093	2014-10-17 21:25:48 +00:00
Hal Finkel	dd38c0b876	[DSE] Remove no-data-layout-only type-based overlap checking DSE's overlap checking contained special logic, used only when no DataLayout was available, which inferred a complete overwrite when the pointee types were equal. This logic seems fine for regular loads/stores, but does not work for memcpy and friends. Instead of fixing this, I'm just removing it. Philosophically, transformations should not contain enhanced behavior used only when data layout is lacking (data layout should be strictly additive), and maintaining these rarely-tested code paths seems not worthwhile at this stage. Credit to Aliaksei Zasenka for the bug report and the diagnosis. The test case (slightly reduced from that provided by Aliaksei) replaces the original contents of test/Transforms/DeadStoreElimination/no-targetdata.ll -- a few other tests have been updated to have a data layout. llvm-svn: 220035	2014-10-17 11:56:00 +00:00
Chandler Carruth	8393406f05	[SROA] Switch the common variable name for the 'AllocaSlices' class to 'AS'. Using 'S' as this was a terrible idea. Arguably, 'AS' is not much better, but it at least follows the idea of using initialisms and removes active confusion about the AllocaSlices variable and a Slice variable. llvm-svn: 219963	2014-10-16 21:11:55 +00:00
Chandler Carruth	61747042c1	[SROA] More range-based cleanups to SROA, these brought to you by clang-modernize. I did have to clean up the variable types and whitespace a bit because the use of auto made the code much less readable here. llvm-svn: 219962	2014-10-16 21:05:14 +00:00
Chandler Carruth	57d4cae202	[SROA] Switch a couple of overly complex iterator accessors to just be ArrayRef accessors. I think this even came up in review that this was over-engineered, and indeed it was. Time to un-build it. llvm-svn: 219958	2014-10-16 20:42:08 +00:00
Chandler Carruth	c659df9389	[SROA] Start more deeply moving SROA to use ranges rather than just iterators. There are a ton of places where it essentially wants ranges rather than just iterators. This is just the first step that adds the core slice range typedefs and uses them in a couple of places. I still have to explicitly construct them because they've not been punched throughout the entire set of code. More range-based cleanups incoming. llvm-svn: 219955	2014-10-16 20:24:07 +00:00
Bjorn Steinbrink	d20816fde9	Allow call-slop optzn for destinations with a suitable dereferenceable attribute Summary: Currently, call slot optimization requires that if the destination is an argument, the argument has the sret attribute. This is to ensure that the memory access won't trap. In addition to sret, we can also allow the optimization to happen for arguments that have the new dereferenceable attribute, which gives the same guarantee. Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5832 llvm-svn: 219950	2014-10-16 19:43:08 +00:00
Sanjay Patel	c699a6117b	fold: sqrt(x * x * y) -> fabs(x) * sqrt(y) If a square root call has an FP multiplication argument that can be reassociated, then we can hoist a repeated factor out of the square root call and into a fabs(). In the simplest case, this: y = sqrt(x * x); becomes this: y = fabs(x); This patch relies on an earlier optimization in instcombine or reassociate to put the multiplication tree into a canonical form, so we don't have to search over every permutation of the multiplication tree. Because there are no IR-level FastMathFlags for intrinsics (PR21290), we have to use function-level attributes to do this optimization. This needs to be fixed for both the intrinsics and in the backend. Differential Revision: http://reviews.llvm.org/D5787 llvm-svn: 219944	2014-10-16 18:48:17 +00:00
Akira Hatanaka	5c221ef98f	Reapply r219832 - InstCombine: Narrow switch instructions using known bits. The code committed in r219832 asserted when it attempted to shrink a switch statement whose type was larger than 64-bit. llvm-svn: 219902	2014-10-16 06:00:46 +00:00
Saleem Abdulrasool	7f52921976	TRE: make TRE a bit more aggressive Make tail recursion elimination a bit more aggressive. This allows us to get tail recursion on functions that are just branches to a different function. The fact that the function takes a byval argument does not restrict it from being optimised into just a tail call. llvm-svn: 219899	2014-10-16 03:27:30 +00:00
Akira Hatanaka	40c2cf4afc	Revert r219832. llvm-svn: 219884	2014-10-16 01:17:02 +00:00
Hal Finkel	68dc3c7ab2	Preserve non-byval pointer alignment attributes using @llvm.assume when inlining For pointer-typed function arguments, enhanced alignment can be asserted using the 'align' attribute. When inlining, if this enhanced alignment information is not otherwise available, preserve it using @llvm.assume-based alignment assumptions. llvm-svn: 219876	2014-10-15 23:44:41 +00:00
Chris Bieneman	5c4e9551c9	Fixing the build failure due to compiler warnings and unnecessary disambiguation. llvm-svn: 219861	2014-10-15 23:11:35 +00:00
Chris Bieneman	732e0aa9fb	Defining a new API for debug options that doesn't rely on static global cl::opts. Summary: This is based on the discussions from the LLVMDev thread: http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-August/075886.html Reviewers: chandlerc Reviewed By: chandlerc Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5389 llvm-svn: 219854	2014-10-15 21:54:35 +00:00
Akira Hatanaka	5bb9346a45	InstCombine: Narrow switch instructions using known bits. Truncate the operands of a switch instruction to a narrower type if the upper bits are known to be all ones or zeros. rdar://problem/17720004 llvm-svn: 219832	2014-10-15 19:05:50 +00:00
Hal Finkel	3b7fc86677	[SLPVectorize] Basic ephemeral-value awareness The SLP vectorizer should not vectorize ephemeral values. These are used to express information to the optimizer, and vectorizing them does not lead to faster code (because the ephemeral values are dropped prior to code generation, vectorized or not), and obscures the information the instructions are attempting to communicate (the logic that interprets the arguments to @llvm.assume generically does not understand vectorized conditions). Also, uses by ephemeral values are free (because they, and the necessary extractelement instructions, will be dropped prior to code generation). llvm-svn: 219816	2014-10-15 17:35:01 +00:00
Eric Christopher	611f0488ff	No need to cache this unused variable. Patch by Ehsan Akhgari. llvm-svn: 219749	2014-10-14 23:58:51 +00:00
Hal Finkel	1a600faba0	[LoopVectorize] Ignore @llvm.assume for cost estimates and legality A few minor changes to prevent @llvm.assume from interfering with loop vectorization. First, treat @llvm.assume like the lifetime intrinsics, which are scalarized (but don't otherwise interfere with the legality checking). Second, ignore the cost of ephemeral instructions in the loop (these will go away anyway during CodeGen). Alignment assumptions and other uses of @llvm.assume can often end up inside of loops that should be vectorized (this is not uncommon for assumptions generated by __attribute__((align_value(n))), for example). llvm-svn: 219741	2014-10-14 22:59:49 +00:00
Sanjay Patel	0ca42bb5a8	Optimize away fabs() calls when input is squared (known positive). Eliminate library calls and intrinsic calls to fabs when the input is a squared value. Note that no unsafe-math / fast-math assumptions are needed for this optimization. Differential Revision: http://reviews.llvm.org/D5777 llvm-svn: 219717	2014-10-14 20:43:11 +00:00
David Majnemer	dad2103801	InstCombine: Don't miscompile X % ((Pow2 << A) >>u B) We assumed that A must be greater than B because the right hand side of a remainder operator must be nonzero. However, it is possible for A to be less than B if Pow2 is a power of two greater than 1. Take for example: i32 %A = 0 i32 %B = 31 i32 Pow2 = 2147483648 ((Pow2 << 0) >>u 31) is non-zero but A is less than B. This fixes PR21274. llvm-svn: 219713	2014-10-14 20:28:40 +00:00
Marcello Maggioni	5bbe3df63f	Switch to select optimization for two-case switches This is the same optimization of r219233 with modifications to support PHIs with multiple incoming edges from the same block and a test to check that this condition is handled. llvm-svn: 219656	2014-10-14 01:58:26 +00:00
Sanjay Patel	17045f7fac	fix formatting; NFC llvm-svn: 219645	2014-10-14 00:33:23 +00:00
Chandler Carruth	7b8297a61e	Add some optional passes around the vectorizer to both better prepare the IR going into it and to clean up the IR produced by the vectorizers. Note that these are off by default right now while folks collect data on whether the performance tradeoff is reasonable. In a build of the 'opt' binary, I see about 2% compile time regression due to this change on average. This is in my mind essentially the worst expected case: very little of the opt binary is going to benefit from these extra passes. I've seen several benchmarks improve in performance my small amounts due to running these passes, and there are certain (rare) cases where these passes make a huge difference by either enabling the vectorizer at all or by hoisting runtime checks out of the outer loop. My primary motivation is to prevent people from seeing runtime check overhead in benchmarks where the existing passes and optimizers would be able to eliminate that. I've chosen the sequence of passes based on the kinds of things that seem likely to be relevant for the code at each stage: rotaing loops for the vectorizer, finding correlated values, loop invariants, and unswitching opportunities from any runtime checks, and cleaning up commonalities exposed by the SLP vectorizer. I'll be pinging existing threads where some of these issues have come up and will start new threads to get folks to benchmark and collect data on whether this is the right tradeoff or we should do something else. llvm-svn: 219644	2014-10-14 00:31:29 +00:00
David Majnemer	db0773089f	InstCombine: Fix miscompile in X % -Y -> X % Y transform We assumed that negation operations of the form (0 - %Z) resulted in a negative number. This isn't true if %Z was originally negative. Substituting the negative number into the remainder operation may result in undefined behavior because the dividend might be INT_MIN. This fixes PR21256. llvm-svn: 219639	2014-10-13 22:37:51 +00:00
David Majnemer	a252138942	InstCombine: Don't miscompile (x lshr C1) udiv C2 We have a transform that changes: (x lshr C1) udiv C2 into: x udiv (C2 << C1) However, it is unsafe to do so if C2 << C1 discards any of C2's bits. This fixes PR21255. llvm-svn: 219634	2014-10-13 21:48:30 +00:00
Joerg Sonnenberger	5ca10d0edb	Revert r219223, it creates invalid PHI nodes. llvm-svn: 219587	2014-10-12 17:16:04 +00:00
Benjamin Kramer	240b85eec5	InstCombine: Turn (x != 0 & x <u C) into the canonical range check form (x-1 <u C-1) llvm-svn: 219585	2014-10-12 14:02:34 +00:00
David Majnemer	27adb1240f	InstCombine: Simplify commonIDivTransforms A helper routine, MultiplyOverflows, was a less efficient reimplementation of APInt's smul_ov and umul_ov. While we are here, clean up the code so it's more uniform. No functionality change intended. llvm-svn: 219583	2014-10-12 08:34:24 +00:00
David Majnemer	fe7fccff11	InstCombine: Don't fold (X <<s log(INT_MIN)) /s INT_MIN to X Consider the case where X is 2. (2 <<s 31)/s-2147483648 is zero but we would fold to X. Note that this is valid when we are in the unsigned domain because we require NUW: 2 <<u 31 results in poison. This fixes PR21245. llvm-svn: 219568	2014-10-11 10:20:04 +00:00
David Majnemer	cb9d596655	InstCombine, InstSimplify: (%X /s C1) /s C2 isn't always 0 when C1 * C2 overflow consider: C1 = INT_MIN C2 = -1 C1 * C2 overflows without a doubt but consider the following: %x = i32 INT_MIN This means that (%X /s C1) is 1 and (%X /s C1) /s C2 is -1. N. B. Move the unsigned version of this transform to InstSimplify, it doesn't create any new instructions. This fixes PR21243. llvm-svn: 219567	2014-10-11 10:20:01 +00:00
David Majnemer	3cac85e071	InstCombine: mul to shl shouldn't preserve nsw consider: mul i32 nsw %x, -2147483648 this instruction will not result in poison if %x is 1 however, if we transform this into: shl i32 nsw %x, 31 then we will be generating poison because we just shifted into the sign bit. This fixes PR21242. llvm-svn: 219566	2014-10-11 10:19:52 +00:00
Chandler Carruth	bff0ae772c	[SCEV] Fix one more caller blindly passing the latch to SCEV's getSmallConstantTripCount even when it isn't the exiting block. I missed this in my first audit, very sorry. This was found in LNT and elsewhere. I don't have a test case, but it was completely obvious from inspection that this was the problem. I'll see if I can reduce a test case, but I'm not really hopeful, and the value seems quite low. llvm-svn: 219562	2014-10-11 05:28:30 +00:00
Chandler Carruth	6666c27e99	[SCEV] Add some asserts to the recently improved trip count computation routines and fix all of the bugs they expose. I hit a test case that crashed even without these asserts due to passing a non-exiting latch to the ExitingBlock parameter of the trip count computation machinery. However, when I add the nice asserts, it turns out we have plenty of coverage of these bugs, they just didn't manifest in crashers. The core problem seems to stem from an assumption that the latch is the exiting block. While this is often true, and somewhat the "normal" way to think about loops, it isn't necessarily true. The correct way to call the trip count routines in a generic fashion (that is, without a particular exit in mind) is to just use the loop's single exiting block if it has one. The trip count can't be computed generically unless it does. This works great for the loop vectorizer. The loop unroller actually wants to select the latch when it has to chose between multiple exits because for unrolling it is the latch trips that matter. But if this is the desire, it needs to explicitly guard for non-exiting latches and check for the generic trip count in that case. I've added the asserts, and added convenience APIs for querying the trip count generically that check for a single exit block. I've kept the APIs consistent between computing trip count and trip multiples. Thansk to Mark for the help debugging and tracking down the right fix here! llvm-svn: 219550	2014-10-11 00:12:11 +00:00
Arnold Schwaighofer	d7d010eb2a	SimplifyCFG: Don't convert phis into selects if we could remove undef behavior instead We used to transform this: define void @test6(i1 %cond, i8* %ptr) { entry: br i1 %cond, label %bb1, label %bb2 bb1: br label %bb2 bb2: %ptr.2 = phi i8* [ %ptr, %entry ], [ null, %bb1 ] store i8 2, i8* %ptr.2, align 8 ret void } into this: define void @test6(i1 %cond, i8* %ptr) { %ptr.2 = select i1 %cond, i8* null, i8* %ptr store i8 2, i8* %ptr.2, align 8 ret void } because the simplifycfg transformation into selects would happen to happen before the simplifycfg transformation that removes unreachable control flow (We have 'unreachable control flow' due to the store to null which is undefined behavior). The existing transformation that removes unreachable control flow in simplifycfg is: /// If BB has an incoming value that will always trigger undefined behavior /// (eg. null pointer dereference), remove the branch leading here. static bool removeUndefIntroducingPredecessor(BasicBlock BB) Now we generate: define void @test6(i1 %cond, i8 %ptr) { store i8 2, i8* %ptr.2, align 8 ret void } I did not see any impact on the test-suite + externals. rdar://18596215 llvm-svn: 219462	2014-10-10 01:27:02 +00:00
Chad Rosier	bd64d46188	[Reassociate] Don't canonicalize X - undef to X + (-undef). Phabricator Revision: http://reviews.llvm.org/D5674 PR21205 llvm-svn: 219434	2014-10-09 20:06:29 +00:00
Andrea Di Biagio	458a669f49	[InstCombine] Fix wrong folding of constant comparisons involving ashr and negative values. This patch fixes a bug in method InstCombiner::FoldCmpCstShrCst where we wrongly computed the distance between the highest bits set of two negative values. This fixes PR21222. Differential Revision: http://reviews.llvm.org/D5700 llvm-svn: 219406	2014-10-09 12:41:49 +00:00
Bob Wilson	9868d71ffe	Use triple's isiOS() and isOSDarwin() methods. These methods are already used in lots of places. This makes things more consistent. NFC. llvm-svn: 219386	2014-10-09 05:43:30 +00:00
David Majnemer	ac07703842	Inliner: Non-local functions in COMDATs shouldn't be dropped A function with discardable linkage cannot be discarded if its a member of a COMDAT group without considering all the other COMDAT members as well. This sort of thing is already handled by GlobalOpt/GlobalDCE. This fixes PR21206. llvm-svn: 219335	2014-10-08 19:32:32 +00:00
Justin Bogner	894eff7a9f	Revert "[InstCombine] re-commit r218721 with fix for pr21199" This seems to cause a miscompile when building clang, which causes a bootstrapped clang to fail or crash in several of its tests. See: http://lab.llvm.org:8013/builders/clang-x86_64-darwin11-RA/builds/1184 http://bb.pgr.jp/builders/clang-3stage-x86_64-linux/builds/7813 This reverts commit r219282. llvm-svn: 219317	2014-10-08 16:30:22 +00:00
Suyog Sarda	cba4b1d64d	Format spacing and remove extra lines to comply with standards. NFC. Differential Revision: http://reviews.llvm.org/D5649 llvm-svn: 219286	2014-10-08 08:37:49 +00:00
David Majnemer	1b3b70e371	GlobalOpt: Don't drop unused memberes of a Comdat A linkonce_odr member of a COMDAT shouldn't be dropped if we need to keep the entire COMDAT group. This fixes PR21191. llvm-svn: 219283	2014-10-08 07:23:31 +00:00
Gerolf Hoflehner	e2ff5b9223	[InstCombine] re-commit r218721 with fix for pr21199 The icmp-select-icmp optimization targets select-icmp.eq only. This is now ensured by testing the branch predicate explictly. This commit also includes the test case for pr21199. llvm-svn: 219282	2014-10-08 06:42:19 +00:00
Hans Wennborg	1256198bbc	Revert r219175 - [InstCombine] re-commit r218721 icmp-select-icmp optimization This seems to have caused PR21199. llvm-svn: 219264	2014-10-08 01:05:57 +00:00
David Blaikie	c6c6c7b177	DebugInfo+DFSan: Ensure that debug info references to llvm::Functions remain pointing to the underlying function when wrappers are created This is somewhat the inverse of how similar bugs in DAE and ArgPromo manifested and were addressed. In those passes, individual call sites were visited explicitly, and then the old function was deleted. This left the debug info with a null llvm::Function* that needed to be updated to point to the new function. In the case of DFSan, it RAUWs the old function with the wrapper, which includes debug info. So now the debug info refers to the wrapper, which doesn't actually have any instructions with debug info in it, so it is ignored entirely - resulting in a DW_TAG_subprogram with no high/low pc, etc. Instead, fix up the debug info to refer to the original function after the RAUW messed it up. Reviewed/discussed with Peter Collingbourne on the llvm-dev mailing list. llvm-svn: 219249	2014-10-07 22:59:46 +00:00
Duncan P. N. Exon Smith	c46cfcbbc6	LoopUnroll: Create sub-loops in LoopInfo `LoopUnrollPass` says that it preserves `LoopInfo` -- make it so. In particular, tell `LoopInfo` about copies of inner loops when unrolling the outer loop. Conservatively, also tell `ScalarEvolution` to forget about the original versions of these loops, since their inputs may have changed. Fixes PR20987. llvm-svn: 219241	2014-10-07 21:19:00 +00:00
Duncan P. N. Exon Smith	9b4d37e8f5	LoopUnroll: Only check for ScalarEvolution analysis once, NFC A follow-up commit will add use to a tight loop. We might as well just find it once anyway. llvm-svn: 219239	2014-10-07 21:12:44 +00:00
Marcello Maggioni	963bc87dbd	Two case switch to select optimization This optimization tries to convert switch instructions that are used to select a value with only 2 unique cases + default block to a select or a couple of selects (depending if the default block is reachable or not). The typical case this optimization wants to be able to optimize is this one: Example: switch (a) { case 10: %0 = icmp eq i32 %a, 10 return 10; %1 = select i1 %0, i32 10, i32 4 case 20: ----> %2 = icmp eq i32 %a, 20 return 2; %3 = select i1 %2, i32 2, i32 %1 default: return 4; } It also sets the base for further optimizations that are planned and being reviewed. llvm-svn: 219223	2014-10-07 18:16:44 +00:00
David Blaikie	17364d4e05	DebugInfo+DeadArgElimination: Ensure llvm::Function*s from debug info are updated even when DAE removes both varargs and non-varargs arguments on the same function. After some stellar (& inspired) help from Reid Kleckner providing a test case for some rather unstable undefined behavior showing up as assertions produced by r214761, I was able to fix this issue in DAE involving the application of both varargs removal, followed by normal argument removal. Indeed I introduced this same bug into ArgumentPromotion (r212128) by copying the code from DAE, and when I fixed the bug in ArgPromo (r213805) and commented in that patch that I didn't need to address the same issue in DAE because it was a single pass. Turns out it's two pass, one for the varargs and one for the normal arguments, so the same fix is needed (at least during varargs removal). So here it is. (the observable/net effect of this bug, even when it didn't result in assertion failure, is that debug info would describe the DAE'd function in the abstract, but wouldn't provide high/low_pc, variable locations, line table, etc (it would appear as though the function had been entirely optimized away), see the original PR14016 for details of the general problem) I'm not recommitting the assertion just yet, as there's been another regression of it since I last tried. It might just be a few test cases weren't adequately updated after Adrian or Duncan's recent schema changes. llvm-svn: 219210	2014-10-07 15:10:23 +00:00
Suyog Sarda	65f5ae997c	Reformat if statement to comply with LLVM standards. NFC. Differential Revision: http://reviews.llvm.org/D5644 llvm-svn: 219203	2014-10-07 12:04:07 +00:00
Suyog Sarda	ea205517a9	Reformat to comply with LLVM coding standards using clang-format. NFC. Differential Revision: http://reviews.llvm.org/D5645 llvm-svn: 219202	2014-10-07 11:56:06 +00:00
Tilmann Scheller	2bc5cb687b	[InstCombine] Reformat if statements to comply with LLVM Coding Standards. Patch by Sonam Kumari! Differential Revision: http://reviews.llvm.org/D5643 llvm-svn: 219198	2014-10-07 10:19:34 +00:00
David Majnemer	e025321d36	GlobalDCE: Don't drop any COMDAT members If we require a single member of a comdat, require all of the other members as well. This fixes PR20981. llvm-svn: 219191	2014-10-07 07:07:19 +00:00
Gerolf Hoflehner	c0b4c20e5e	[InstCombine] re-commit r218721 icmp-select-icmp optimization Takes care of the assert that caused build fails. Rather than asserting the code checks now that the definition and use are in the same block, and does not attempt to optimize when that is not the case. llvm-svn: 219175	2014-10-07 00:16:12 +00:00
David Blaikie	e44ee92a3f	range-for some loops in DAE llvm-svn: 219167	2014-10-06 22:59:29 +00:00
Duncan P. N. Exon Smith	e5d7d9797b	LoopUnroll: Change code order of changes to new basic blocks Add new basic blocks to `LoopInfo` earlier. No functionality change intended (simplifies upcoming bugfix patch). llvm-svn: 219150	2014-10-06 22:05:02 +00:00
Duncan P. N. Exon Smith	0bbf5418c6	Sink comment, NFC llvm-svn: 219149	2014-10-06 22:04:59 +00:00
Owen Anderson	8373d338f6	Give the Reassociate pass a bit more flexibility and autonomy when optimizing expressions. Particularly, it addresses cases where Reassociate breaks Subtracts but then fails to optimize combinations like I1 + -I2 where I1 and I2 have the same rank and are identical. Patch by Dmitri Shtilman. llvm-svn: 219092	2014-10-05 23:41:26 +00:00
Hal Finkel	4564688806	[InstCombine] Simplify the logic from r219067 using ValueTracking Joerg suggested on IRC that I look at generalizing the logic from r219067 to handle more general redundancies (like removing an assume(x > 3) dominated by an assume(x > 5)). The way to do this would be to ask ValueTracking to determine the value of the i1 argument. It turns out that ValueTracking is not very good at this right now (although it does get the trivial redundancy case) because it does not understand ICmps. Nevertheless, the resulting code in InstCombine is simpler than r219067, so we might as well do it now. llvm-svn: 219070	2014-10-05 00:53:02 +00:00
Hal Finkel	04a156139e	[InstCombine] Remove redundant @llvm.assume intrinsics For any @llvm.assume intrinsic, if there is another which dominates it and uses the same condition, then it is redundant and can be removed. While this does not alter the semantics of the @llvm.assume intrinsics, it makes subsequent handling more efficient (and the resulting IR easier to read). llvm-svn: 219067	2014-10-04 21:27:06 +00:00
Benjamin Kramer	c6cc58e703	Remove unnecessary copying or replace it with moves in a bunch of places. NFC. llvm-svn: 219061	2014-10-04 16:55:56 +00:00
Duncan P. N. Exon Smith	176b691d32	Revert "Revert "DI: Fold constant arguments into a single MDString"" This reverts commit r218918, effectively reapplying r218914 after fixing an Ocaml bindings test and an Asan crash. The root cause of the latter was a tightened-up check in `DILexicalBlock::Verify()`, so I'll file a PR to investigate who requires the loose check (and why). Original commit message follows. -- This patch addresses the first stage of PR17891 by folding constant arguments together into a single MDString. Integers are stringified and a `\0` character is used as a separator. Part of PR17891. Note: I've attached my testcases upgrade scripts to the PR. If I've just broken your out-of-tree testcases, they might help. llvm-svn: 219010	2014-10-03 20:01:09 +00:00
Benjamin Kramer	e12a6bac32	Eliminate some deep std::vector copies. NFC. llvm-svn: 218999	2014-10-03 18:33:16 +00:00
Duncan P. N. Exon Smith	786cd049fc	Revert "DI: Fold constant arguments into a single MDString" This reverts commit r218914 while I investigate some bots. llvm-svn: 218918	2014-10-02 22:15:31 +00:00
Duncan P. N. Exon Smith	571f97bd90	DI: Fold constant arguments into a single MDString This patch addresses the first stage of PR17891 by folding constant arguments together into a single MDString. Integers are stringified and a `\0` character is used as a separator. Part of PR17891. Note: I've attached my testcases upgrade scripts to the PR. If I've just broken your out-of-tree testcases, they might help. llvm-svn: 218914	2014-10-02 21:56:57 +00:00
Sanjay Patel	12d1ce5408	Optimize square root squared (PR21126). When unsafe-fp-math is enabled, we can turn sqrt(X) * sqrt(X) into X. This can happen in the real world when calculating x ** 3/2. This occurs in test-suite/SingleSource/Benchmarks/BenchmarkGame/n-body.c. Differential Revision: http://reviews.llvm.org/D5584 llvm-svn: 218906	2014-10-02 21:10:54 +00:00
Sanjay Patel	b41d46118a	Use the local variable that other clauses around here are already using. llvm-svn: 218876	2014-10-02 15:20:45 +00:00
Zinovy Nis	ccc3e3733b	[BUG][INDVAR] Fix for PR21014: wrong SCEV operands commuting for non-commutative instructions My commit rL216160 introduced a bug PR21014: IndVars widens code 'for (i = ; i < ...; i++) arr[ CONST - i]' into 'for (i = ; i < ...; i++) arr[ i - CONST]' thus inverting index expression. This patch fixes it. Thanks to Jörg Sonnenberger for pointing. Differential Revision: http://reviews.llvm.org/D5576 llvm-svn: 218867	2014-10-02 13:01:15 +00:00
Duncan P. N. Exon Smith	611afb229c	DIBuilder: Encapsulate DIExpression's element type `DIExpression`'s elements are 64-bit integers that are stored as `ConstantInt`. The accessors already encapsulate the storage. This commit updates the `DIBuilder` API to also encapsulate that. llvm-svn: 218797	2014-10-01 20:26:08 +00:00
Adrian Prantl	87b7eb9d0f	Move the complex address expression out of DIVariable and into an extra argument of the llvm.dbg.declare/llvm.dbg.value intrinsics. Previously, DIVariable was a variable-length field that has an optional reference to a Metadata array consisting of a variable number of complex address expressions. In the case of OpPiece expressions this is wasting a lot of storage in IR, because when an aggregate type is, e.g., SROA'd into all of its n individual members, the IR will contain n copies of the DIVariable, all alike, only differing in the complex address reference at the end. By making the complex address into an extra argument of the dbg.value/dbg.declare intrinsics, all of the pieces can reference the same variable and the complex address expressions can be uniqued across the CU, too. Down the road, this will allow us to move other flags, such as "indirection" out of the DIVariable, too. The new intrinsics look like this: declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr) declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr) This patch adds a new LLVM-local tag to DIExpressions, so we can detect and pretty-print DIExpression metadata nodes. What this patch doesn't do: This patch does not touch the "Indirect" field in DIVariable; but moving that into the expression would be a natural next step. http://reviews.llvm.org/D4919 rdar://problem/17994491 Thanks to dblaikie and dexonsmith for reviewing this patch! Note: I accidentally committed a bogus older version of this patch previously. llvm-svn: 218787	2014-10-01 18:55:02 +00:00
Adrian Prantl	b458dc2eee	Revert r218778 while investigating buldbot breakage. "Move the complex address expression out of DIVariable and into an extra" llvm-svn: 218782	2014-10-01 18:10:54 +00:00
Adrian Prantl	25a7174e7a	Move the complex address expression out of DIVariable and into an extra argument of the llvm.dbg.declare/llvm.dbg.value intrinsics. Previously, DIVariable was a variable-length field that has an optional reference to a Metadata array consisting of a variable number of complex address expressions. In the case of OpPiece expressions this is wasting a lot of storage in IR, because when an aggregate type is, e.g., SROA'd into all of its n individual members, the IR will contain n copies of the DIVariable, all alike, only differing in the complex address reference at the end. By making the complex address into an extra argument of the dbg.value/dbg.declare intrinsics, all of the pieces can reference the same variable and the complex address expressions can be uniqued across the CU, too. Down the road, this will allow us to move other flags, such as "indirection" out of the DIVariable, too. The new intrinsics look like this: declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr) declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr) This patch adds a new LLVM-local tag to DIExpressions, so we can detect and pretty-print DIExpression metadata nodes. What this patch doesn't do: This patch does not touch the "Indirect" field in DIVariable; but moving that into the expression would be a natural next step. http://reviews.llvm.org/D4919 rdar://problem/17994491 Thanks to dblaikie and dexonsmith for reviewing this patch! llvm-svn: 218778	2014-10-01 17:55:39 +00:00
Tom Stellard	0a4e9a3b25	C API: Add LLVMCloneModule() llvm-svn: 218775	2014-10-01 17:14:57 +00:00
Evgeniy Stepanov	815f2869ad	Revert r218721, r218735. Failing bootstrap on Linux (arm, x86). http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/13139/steps/bootstrap%20clang/logs/stdio http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15-selfhost/builds/470 http://lab.llvm.org:8011/builders/clang-native-arm-lnt/builds/8518 llvm-svn: 218752	2014-10-01 10:07:28 +00:00
Gerolf Hoflehner	19fc3dafc8	[InstCombine] Fix for assert build failures caused by r218721 The icmp-select-icmp optimization made the implicit assumption that the select-icmp instructions are in the same block and asserted on it. The fix explicitly checks for that condition and conservatively suppresses the optimization when it is violated. llvm-svn: 218735	2014-10-01 03:24:39 +00:00
Gerolf Hoflehner	08cc4b950c	[InstCombine] Optimize icmp-select-icmp In special cases select instructions can be eliminated by replacing them with a cheaper bitwise operation even when the select result is used outside its home block. The instances implemented are patterns like %x=icmp.eq %y=select %x,%r, null %z=icmp.eq\|neq %y, null br %z,true, false ==> %x=icmp.ne %y=icmp.eq %r,null %z=or %x,%y br %z,true,false The optimization is integrated into the instruction combiner and performed only when all uses of the select result can be replaced by the select operand proper. For this dominator information is used and dominance is now a required analysis pass in the combiner. The optimization itself is iterative. The critical step is to replace the select result with the non-constant select operand. So the select becomes local and the combiner iteratively works out simpler code pattern and eventually eliminates the select. rdar://17853760 llvm-svn: 218721	2014-10-01 00:13:22 +00:00
Jingyue Wu	fc0296704c	[SimplifyCFG] threshold for folding branches with common destination Summary: This patch adds a threshold that controls the number of bonus instructions allowed for folding branches with common destination. The original code allows at most one bonus instruction. With this patch, users can customize the threshold to allow multiple bonus instructions. The default threshold is still 1, so that the code behaves the same as before when users do not specify this threshold. The motivation of this change is that tuning this threshold significantly (up to 25%) improves the performance of some CUDA programs in our internal code base. In general, branch instructions are very expensive for GPU programs. Therefore, it is sometimes worth trading more arithmetic computation for a more straightened control flow. Here's a reduced example: __global__ void foo(int a, int b, int c, int d, int e, int n, const int input, int output) { int sum = 0; for (int i = 0; i < n; ++i) sum += (((i ^ a) > b) && (((i \| c ) ^ d) > e)) ? 0 : input[i]; *output = sum; } The select statement in the loop body translates to two branch instructions "if ((i ^ a) > b)" and "if (((i \| c) ^ d) > e)" which share a common destination. With the default threshold, SimplifyCFG is unable to fold them, because computing the condition of the second branch "(i \| c) ^ d > e" requires two bonus instructions. With the threshold increased, SimplifyCFG can fold the two branches so that the loop body contains only one branch, making the code conceptually look like: sum += (((i ^ a) > b) & (((i \| c ) ^ d) > e)) ? 0 : input[i]; Increasing the threshold significantly improves the performance of this particular example. In the configuration where both conditions are guaranteed to be true, increasing the threshold from 1 to 2 improves the performance by 18.24%. Even in the configuration where the first condition is false and the second condition is true, which favors shortcuts, increasing the threshold from 1 to 2 still improves the performance by 4.35%. We are still looking for a good threshold and maybe a better cost model than just counting the number of bonus instructions. However, according to the above numbers, we think it is at least worth adding a threshold to enable more experiments and tuning. Let me know what you think. Thanks! Test Plan: Added one test case to check the threshold is in effect Reviewers: nadav, eliben, meheff, resistor, hfinkel Reviewed By: hfinkel Subscribers: hfinkel, llvm-commits Differential Revision: http://reviews.llvm.org/D5529 llvm-svn: 218711	2014-09-30 22:23:38 +00:00
Lorenzo Martignoni	40d3deeb7d	Introduce support for custom wrappers for vararg functions. Differential Revision: http://reviews.llvm.org/D5412 llvm-svn: 218671	2014-09-30 12:33:16 +00:00
Chad Rosier	aab5d7bd33	[IndVarSimplify] Widen loop unsigned compares. This patch extends r217953 to handle unsigned comparison. Phabricator revision: http://reviews.llvm.org/D5526 llvm-svn: 218659	2014-09-30 03:17:42 +00:00
Kevin Qin	fc02e3c363	Use a loop to simplify the runtime unrolling prologue. Runtime unrolling will create a prologue to execute the extra iterations which is can't divided by the unroll factor. It generates an if-then-else sequence to jump into a factor -1 times unrolled loop body, like extraiters = tripcount % loopfactor if (extraiters == 0) jump Loop: if (extraiters == loopfactor) jump L1 if (extraiters == loopfactor-1) jump L2 ... L1: LoopBody; L2: LoopBody; ... if tripcount < loopfactor jump End Loop: ... End: It means if the unroll factor is 4, the loop body will be 7 times unrolled, 3 are in loop prologue, and 4 are in the loop. This commit is to use a loop to execute the extra iterations in prologue, like extraiters = tripcount % loopfactor if (extraiters == 0) jump Loop: else jump Prol Prol: LoopBody; extraiters -= 1 // Omitted if unroll factor is 2. if (extraiters != 0) jump Prol: // Omitted if unroll factor is 2. if (tripcount < loopfactor) jump End Loop: ... End: Then when unroll factor is 4, the loop body will be copied by only 5 times, 1 in the prologue loop, 4 in the original loop. And if the unroll factor is 2, new loop won't be created, just as the original solution. llvm-svn: 218604	2014-09-29 11:15:00 +00:00
Chad Rosier	7b974b73ae	[IndVar] Don't widen loop compare unless IV user is sign extended. PR21030 llvm-svn: 218539	2014-09-26 20:05:35 +00:00
Kostya Serebryany	34ddf8725c	[asan] don't instrument module CTORs that may be run before asan.module_ctor. This fixes asan running together -coverage llvm-svn: 218421	2014-09-24 22:41:55 +00:00
David Peixotto	0d4d5e64ec	Fix assertion in LICM doFinalization() The doFinalization method checks that the LoopToAliasSetMap is empty. LICM populates that map as it runs through the loop nest, deleting the entries for child loops as it goes. However, if a child loop is deleted by another pass (e.g. unrolling) then the loop will never be deleted from the map because LICM walks the loop nest to find entries it can delete. The fix is to delete the loop from the map and free the alias set when the loop is deleted from the loop nest. Differential Revision: http://reviews.llvm.org/D5305 llvm-svn: 218387	2014-09-24 16:48:31 +00:00
Michael Liao	d120916ca7	Allow BB duplication threshold to be adjusted through JumpThreading's ctor - BB duplication may not be desired on targets where there is no or small branch penalty and code duplication needs restrict control. llvm-svn: 218375	2014-09-24 04:59:06 +00:00
Reid Kleckner	78927e884b	GlobalOpt: Preserve comdats of unoptimized initializers Rather than slurping in and splatting out the whole ctor list, preserve the existing array entries without trying to understand them. Only remove the entries that we know we can optimize away. This way we don't need to wire through priority and comdats or anything else we might add. Fixes a linker issue where the .init_array or .ctors entry would point to discarded initialization code if the comdat group from the TU with the faulty global_ctors entry was dropped. llvm-svn: 218337	2014-09-23 22:33:01 +00:00
Lenny Maiorani	9eefc81219	Using a deque to manage the stack of nodes is faster here. Vector is slow due to many reallocations as the size regularly changes in unpredictable ways. See the investigation provided on the mailing list for more information: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20120116/135228.html llvm-svn: 218182	2014-09-20 13:29:20 +00:00
Eric Christopher	d85ffb1fc0	Add a new pass FunctionTargetTransformInfo. This pass serves as a shim between the TargetTransformInfo immutable pass and the Subtarget via the TargetMachine and Function. Migrate a single call from BasicTargetTransformInfo as an example and provide shims where TargetMachine begins taking a Function to determine the subtarget. No functional change. llvm-svn: 218004	2014-09-18 00:34:14 +00:00
David Blaikie	dba94ec3c7	Reapply fix in r217988 (reverted in r217989) and remove the alternative fix committed in r217987. This type isn't owned polymorphically (as demonstrated by making the dtor protected and everything still compiling) so just address the warning by protecting the base dtor and making the derived class final. llvm-svn: 217990	2014-09-17 22:27:36 +00:00
David Blaikie	d8978ec085	Revert "Fix -Wnon-virtual-dtor warning introduced in r217982." An alternative fix was already committed. This reverts commit r217988. llvm-svn: 217989	2014-09-17 22:17:59 +00:00
David Blaikie	20dd05ccfd	Fix -Wnon-virtual-dtor warning introduced in r217982. llvm-svn: 217988	2014-09-17 22:15:40 +00:00
Chris Bieneman	cf93cbb7a4	Fixing a build error. llvm-svn: 217983	2014-09-17 21:06:59 +00:00
Chris Bieneman	ad070d0588	Refactoring SimplifyLibCalls to remove static initializers and generally cleaning up the code. Summary: This eliminates ~200 lines of code mostly file scoped struct definitions that were unnecessary. Reviewers: chandlerc, resistor Reviewed By: resistor Subscribers: morisset, resistor, llvm-commits Differential Revision: http://reviews.llvm.org/D5364 llvm-svn: 217982	2014-09-17 20:55:46 +00:00
Chad Rosier	307b50b0f6	[IndVarSimplify] Partially revert r217953 to see if this fixes the bots. Specifically, disable widening of unsigned compare instructions. llvm-svn: 217962	2014-09-17 16:35:09 +00:00
Chad Rosier	bb99f40530	[IndVarSimplify] Widen loop compare instructions. This improves other optimizations such as LSR. A sext may be added to the compare's other operand, but this can often be hoisted outside of the loop. llvm-svn: 217953	2014-09-17 14:10:33 +00:00
Andrea Di Biagio	5b92b4971a	[InstCombine] Fix wrong folding of constant comparison involving ahsr and negative quantities (PR20945). Example: define i1 @foo(i32 %a) { %shr = ashr i32 -9, %a %cmp = icmp ne i32 %shr, -5 ret i1 %cmp } Before this fix, the instruction combiner wrongly thought that %shr could have never been equal to -5. Therefore, %cmp was always folded to 'true'. However, when %a is equal to 1, then %cmp evaluates to 'false'. Therefore, in this example, it is not valid to fold %cmp to 'true'. The problem was only affecting the case where the comparison was between negative quantities where one of the quantities was obtained from arithmetic shift of a negative constant. This patch fixes the problem with the wrong folding (fixes PR20945). With this patch, the 'icmp' from the example is now simplified to a comparison between %a and 1. This still allows us to get rid of the arithmetic shift (%shr). llvm-svn: 217950	2014-09-17 11:32:31 +00:00
Jingyue Wu	b67140b812	Remove dead code in SimplifyCFG Summary: UsedByBranch is always true according to how BonusInst is defined. Test Plan: Passes check-all, and also verified if (BonusInst && !UsedByBranch) { ... } is never entered during check-all. Reviewers: resistor, nadav, jingyue Reviewed By: jingyue Subscribers: llvm-commits, eliben, meheff Differential Revision: http://reviews.llvm.org/D5324 llvm-svn: 217824	2014-09-15 20:48:13 +00:00
Nick Lewycky	9e6d184803	Add control of function merging to the PMBuilder. llvm-svn: 217731	2014-09-13 21:46:00 +00:00
Benjamin Kramer	0bd147da17	Simplify code. No functionality change. llvm-svn: 217726	2014-09-13 12:38:49 +00:00
Juergen Ributzka	14ae60407d	[C API] Make the 'lower switch' pass available via the C API. llvm-svn: 217630	2014-09-11 21:32:32 +00:00
Hal Finkel	f83e1f7f66	[AlignmentFromAssumptions] Don't crash just because the target is 32-bit We used to crash processing any relevant @llvm.assume on a 32-bit target (because we'd ask SE to subtract expressions of differing types). I've copied our 'simple.ll' test, but with the data layout from arm-linux-gnueabihf to get some meaningful test coverage here. llvm-svn: 217574	2014-09-11 08:40:17 +00:00
Rafael Espindola	c435adcde0	Add doInitialization/doFinalization to DataLayoutPass. With this a DataLayoutPass can be reused for multiple modules. Once we have doInitialization/doFinalization, it doesn't seem necessary to pass a Module to the constructor. Overall this change seems in line with the idea of making DataLayout a required part of Module. With it the only way of having a DataLayout used is to add it to the Module. llvm-svn: 217548	2014-09-10 21:27:43 +00:00
Hal Finkel	71b7084112	[AlignmentFromAssumptions] Don't divide by zero for unknown starting alignment The routine that determines an alignment given some SCEV returns zero if the answer is unknown. In a case where we could determine the increment of an AddRec but not the starting alignment, we would compute the integer modulus by zero (which is illegal and traps). Prevent this by returning early if either the start or increment alignment is unknown (zero). llvm-svn: 217544	2014-09-10 21:05:52 +00:00
Gerolf Hoflehner	008e5cdcba	[PassManager] Adding Hidden attribute to EnableMLSM option llvm-svn: 217539	2014-09-10 20:24:03 +00:00
Gerolf Hoflehner	24815d9b8f	[MergedLoadStoreMotion] Move pass enabling option to PassManagerBuilder llvm-svn: 217538	2014-09-10 19:55:29 +00:00
Sanjay Patel	b653de1ada	Rename getMaximumUnrollFactor -> getMaxInterleaveFactor; also rename option names controlling this variable. "Unroll" is not the appropriate name for this variable. Clang already uses the term "interleave" in pragmas and metadata for this. Differential Revision: http://reviews.llvm.org/D5066 llvm-svn: 217528	2014-09-10 17:58:16 +00:00
Gerolf Hoflehner	e4f6684d1b	Removed misleading comment. llvm-svn: 217527	2014-09-10 17:54:50 +00:00
Stepan Dyatkovskiy	fe134cdfa7	MergeFunctions: FunctionPtr has been renamed to FunctionNode. It's supposed to store additional pass information for current function here. That was the reason for name change. llvm-svn: 217483	2014-09-10 10:08:25 +00:00
NAKAMURA Takumi	1ab0cf0e28	SampleProfile.cpp: Prune a stray \param added in r217437. [-Wdocumentation] llvm-svn: 217465	2014-09-09 22:44:30 +00:00
NAKAMURA Takumi	bb4fac9050	ScalarOpts/LLVMBuild.txt: Prune unused dependency to IPA. llvm-svn: 217448	2014-09-09 15:00:38 +00:00
NAKAMURA Takumi	37ffecf06b	ScalarOpts/LLVMBuild.txt: Reorder. llvm-svn: 217447	2014-09-09 15:00:26 +00:00
Diego Novillo	de1ab26f52	Re-factor sample profile reader into lib/ProfileData. Summary: This patch moves the profile reading logic out of the Sample Profile transformation into a generic profile reader facility in lib/ProfileData. The intent is to use this new reader to implement a sample profile reader/writer that can be used to convert sample profiles from external sources into LLVM. This first patch introduces no functional changes. It moves the profile reading code from lib/Transforms/SampleProfile.cpp into lib/ProfileData/SampleProfReader.cpp. In subsequent patches I will: - Add a bitcode format for sample profiles to allow for more efficient encoding of the profile. - Add a writer for both text and bitcode format profiles. - Add a 'convert' command to llvm-profdata to be able to convert between the two (and serve as entry point for other sample profile formats). Reviewers: bogner, echristo Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5250 llvm-svn: 217437	2014-09-09 12:40:50 +00:00
Andrew Trick	8fc3c6c093	Add a comment to getNewAlignmentDiff. llvm-svn: 217350	2014-09-07 23:16:24 +00:00
Hal Finkel	93873cc10e	Check for all known bits on ret in InstCombine From a combination of @llvm.assume calls (and perhaps through other means, such as range metadata), it is possible that all bits of a return value might be known. Previously, InstCombine did not check for this (which is understandable given assumptions of constant propagation), but means that we'd miss simple cases where assumptions are involved. llvm-svn: 217346	2014-09-07 21:28:34 +00:00
Hal Finkel	7e1844940e	Make use of @llvm.assume from LazyValueInfo This change teaches LazyValueInfo to use the @llvm.assume intrinsic. Like with the known-bits change (r217342), this requires feeding a "context" instruction pointer through many functions. Aside from a little refactoring to reuse the logic that turns predicates into constant ranges in LVI, the only new code is that which can 'merge' the range from an assumption into that otherwise computed. There is also a small addition to JumpThreading so that it can have LVI use assumptions in the same block as the comparison feeding a conditional branch. With this patch, we can now simplify this as expected: int foo(int a) { __builtin_assume(a > 5); if (a > 3) { bar(); return 1; } return 0; } llvm-svn: 217345	2014-09-07 20:29:59 +00:00
Hal Finkel	d67e463901	Add an AlignmentFromAssumptions Pass This adds a ScalarEvolution-powered transformation that updates load, store and memory intrinsic pointer alignments based on invariant((a+q) & b == 0) expressions. Many of the simple cases we can get with ValueTracking, but we still need something like this for the more complicated cases (such as those with an offset) that require some algebra. Note that gcc's __builtin_assume_aligned's optional third argument provides exactly for this kind of 'misalignment' offset for which this kind of logic is necessary. The primary motivation is to fixup alignments for vector loads/stores after vectorization (and unrolling). This pass is added to the optimization pipeline just after the SLP vectorizer runs (which, admittedly, does not preserve SE, although I imagine it could). Regardless, I actually don't think that the preservation matters too much in this case: SE computes lazily, and this pass won't issue any SE queries unless there are any assume intrinsics, so there should be no real additional cost in the common case (SLP does preserve DT and LoopInfo). llvm-svn: 217344	2014-09-07 20:05:11 +00:00
Hal Finkel	15aeaaf24a	Add additional patterns for @llvm.assume in ValueTracking This builds on r217342, which added the infrastructure to compute known bits using assumptions (@llvm.assume calls). That original commit added only a few patterns (to catch common cases related to determining pointer alignment); this change adds several other patterns for simple cases. r217342 contained that, for assume(v & b = a), bits in the mask that are known to be one, we can propagate known bits from the a to v. It also had a known-bits transfer for assume(a = b). This patch adds: assume(~(v & b) = a) : For those bits in the mask that are known to be one, we can propagate inverted known bits from the a to v. assume(v \| b = a) : For those bits in b that are known to be zero, we can propagate known bits from the a to v. assume(~(v \| b) = a): For those bits in b that are known to be zero, we can propagate inverted known bits from the a to v. assume(v ^ b = a) : For those bits in b that are known to be zero, we can propagate known bits from the a to v. For those bits in b that are known to be one, we can propagate inverted known bits from the a to v. assume(~(v ^ b) = a) : For those bits in b that are known to be zero, we can propagate inverted known bits from the a to v. For those bits in b that are known to be one, we can propagate known bits from the a to v. assume(v << c = a) : For those bits in a that are known, we can propagate them to known bits in v shifted to the right by c. assume(~(v << c) = a) : For those bits in a that are known, we can propagate them inverted to known bits in v shifted to the right by c. assume(v >> c = a) : For those bits in a that are known, we can propagate them to known bits in v shifted to the right by c. assume(~(v >> c) = a) : For those bits in a that are known, we can propagate them inverted to known bits in v shifted to the right by c. assume(v >=_s c) where c is non-negative: The sign bit of v is zero assume(v >_s c) where c is at least -1: The sign bit of v is zero assume(v <=_s c) where c is negative: The sign bit of v is one assume(v <_s c) where c is non-positive: The sign bit of v is one assume(v <=_u c): Transfer the known high zero bits assume(v <_u c): Transfer the known high zero bits (if c is know to be a power of 2, transfer one more) A small addition to InstCombine was necessary for some of the test cases. The problem is that when InstCombine was simplifying and, or, etc. it would fail to check the 'do I know all of the bits' condition before checking less specific conditions and would not fully constant-fold the result. I'm not sure how to trigger this aside from using assumptions, so I've just included the change here. llvm-svn: 217343	2014-09-07 19:21:07 +00:00
Hal Finkel	60db05896a	Make use of @llvm.assume in ValueTracking (computeKnownBits, etc.) This change, which allows @llvm.assume to be used from within computeKnownBits (and other associated functions in ValueTracking), adds some (optional) parameters to computeKnownBits and friends. These functions now (optionally) take a "context" instruction pointer, an AssumptionTracker pointer, and also a DomTree pointer, and most of the changes are just to pass this new information when it is easily available from InstSimplify, InstCombine, etc. As explained below, the significant conceptual change is that known properties of a value might depend on the control-flow location of the use (because we care that the @llvm.assume dominates the use because assumptions have control-flow dependencies). This means that, when we ask if bits are known in a value, we might get different answers for different uses. The significant changes are all in ValueTracking. Two main changes: First, as with the rest of the code, new parameters need to be passed around. To make this easier, I grouped them into a structure, and I made internal static versions of the relevant functions that take this structure as a parameter. The new code does as you might expect, it looks for @llvm.assume calls that make use of the value we're trying to learn something about (often indirectly), attempts to pattern match that expression, and uses the result if successful. By making use of the AssumptionTracker, the process of finding @llvm.assume calls is not expensive. Part of the structure being passed around inside ValueTracking is a set of already-considered @llvm.assume calls. This is to prevent a query using, for example, the assume(a == b), to recurse on itself. The context and DT params are used to find applicable assumptions. An assumption needs to dominate the context instruction, or come after it deterministically. In this latter case we only handle the specific case where both the assumption and the context instruction are in the same block, and we need to exclude assumptions from being used to simplify their own ephemeral values (those which contribute only to the assumption) because otherwise the assumption would prove its feeding comparison trivial and would be removed. This commit adds the plumbing and the logic for a simple masked-bit propagation (just enough to write a regression test). Future commits add more patterns (and, correspondingly, more regression tests). llvm-svn: 217342	2014-09-07 18:57:58 +00:00
Hal Finkel	57f03dda49	Add functions for finding ephemeral values This adds a set of utility functions for collecting 'ephemeral' values. These are LLVM IR values that are used only by @llvm.assume intrinsics (directly or indirectly), and thus will be removed prior to code generation, implying that they should be considered free for certain purposes (like inlining). The inliner's cost analysis, and a few other passes, have been updated to account for ephemeral values using the provided functionality. This functionality is important for the usability of @llvm.assume, because it limits the "non-local" side-effects of adding llvm.assume on inlining, loop unrolling, etc. (these are hints, and do not generate code, so they should not directly contribute to estimates of execution cost). llvm-svn: 217335	2014-09-07 13:49:57 +00:00
Hal Finkel	74c2f355d2	Add an Assumption-Tracking Pass This adds an immutable pass, AssumptionTracker, which keeps a cache of @llvm.assume call instructions within a module. It uses callback value handles to keep stale functions and intrinsics out of the map, and it relies on any code that creates new @llvm.assume calls to notify it of the new instructions. The benefit is that code needing to find @llvm.assume intrinsics can do so directly, without scanning the function, thus allowing the cost of @llvm.assume handling to be negligible when none are present. The current design is intended to be lightweight. We don't keep track of anything until we need a list of assumptions in some function. The first time this happens, we scan the function. After that, we add/remove @llvm.assume calls from the cache in response to registration calls and ValueHandle callbacks. There are no new direct test cases for this pass, but because it calls it validation function upon module finalization, we'll pick up detectable inconsistencies from the other tests that touch @llvm.assume calls. This pass will be used by follow-up commits that make use of @llvm.assume. llvm-svn: 217334	2014-09-07 12:44:26 +00:00
David Majnemer	6fe6ea740c	InstCombine: Remove a special case pattern The special case did not work when run under -reassociate and can easily be expressed by a further generalization of an existing pattern. llvm-svn: 217227	2014-09-05 06:09:24 +00:00
James Molloy	6b95d8ed36	Enable noalias metadata by default and swap the order of the SLP and Loop vectorizers by default. After some time maturing, hopefully the flags themselves will be removed. llvm-svn: 217144	2014-09-04 13:23:08 +00:00
Tilmann Scheller	faabbb5fb6	[GVN] Format variable name. Local variables need to start with an upper case letter. llvm-svn: 217133	2014-09-04 06:38:00 +00:00
David Majnemer	13046deef3	IndVarSimplify: Address review comments for r217102 No functional change intended, just some cleanups and comments added. llvm-svn: 217115	2014-09-04 00:23:13 +00:00
Kostya Serebryany	3175521844	[asan] fix debug info produced for asan-coverage=2 llvm-svn: 217106	2014-09-03 23:24:18 +00:00
David Majnemer	c6ab01ecca	IndVarSimplify: Don't let LFTR compare against a poison value LinearFunctionTestReplace tries to use the next indvar to compare against when possible. However, it may be the case that the calculation for the next indvar has NUW/NSW flags and that it may only be safely used inside the loop. Using it in a comparison to calculate the exit condition could result in observing poison. This fixes PR20680. Differential Revision: http://reviews.llvm.org/D5174 llvm-svn: 217102	2014-09-03 23:03:18 +00:00
Kostya Serebryany	351b078b6d	[asan] add -asan-coverage=3: instrument all blocks and critical edges. llvm-svn: 217098	2014-09-03 22:37:37 +00:00
Benjamin Kramer	89854ebe8e	Make some helpers static or move into the llvm namespace. llvm-svn: 217077	2014-09-03 21:04:12 +00:00
Sanjay Patel	9433a28845	Preserve IR flags (nsw, nuw, exact, fast-math) in SLP vectorizer (PR20802). The SLP vectorizer should propagate IR-level optimization hints/flags (nsw, nuw, exact, fast-math) when converting scalar instructions into vectors. But this isn't a simple copy - we need to take the intersection (the logical 'and') of the sets of flags on the scalars. The solution is further complicated because we can have non-uniform (non-SIMD) vector ops after: http://reviews.llvm.org/D4015 http://llvm.org/viewvc/llvm-project?view=revision&revision=211339 The vast majority of changed files are existing tests that were not propagating IR flags, but I've also added a new test file for focused testing of IR flag possibilities. Differential Revision: http://reviews.llvm.org/D5172 llvm-svn: 217051	2014-09-03 17:40:30 +00:00
Sanjay Patel	a982d992f0	Change name of copyFlags() to copyIRFlags(). Add convenience method for logical 'and' of all flags. NFC. Adding 'IR' to the names in an attempt to be less ambiguous about the flags we're dealing with here. The 'and' method is needed by the SLPVectorizer (PR20802) and possibly other passes. llvm-svn: 217004	2014-09-03 01:06:50 +00:00
Hal Finkel	445dda5c4a	Add pass-manager flags to use CFL AA Add -use-cfl-aa (and -use-cfl-aa-in-codegen) to add CFL AA in the default pass managers (for easy testing). llvm-svn: 216978	2014-09-02 22:12:54 +00:00
Kostya Serebryany	ad23852ac3	[asan] Assign a low branch weight to ASan's slow path, patch by Jonas Wagner. This speeds up asan (at least on SPEC) by 1%-5% or more. Also fix lint in dfsan. llvm-svn: 216972	2014-09-02 21:46:51 +00:00
Yi Jiang	77a609b556	Generate extract for in-tree uses if the use is scalar operand in vectorized instruction. radar://18144665 llvm-svn: 216946	2014-09-02 21:00:39 +00:00
David Blaikie	15913f46b2	unique_ptrify the result of SpecialCaseList::create llvm-svn: 216925	2014-09-02 18:13:54 +00:00
David Majnemer	49428105aa	LICM: Don't crash when an instruction is used by an unreachable BB Summary: BBs might contain non-LCSSA'd values after the LCSSA pass is run if they are unreachable from the entry block. Normally, the users of the instruction would be PHIs but the unreachable BBs have normal users; rewrite their uses to be undef values. An alternative fix could involve fixing this at LCSSA but that would require this invariant to hold after subsequent transforms. If a BB created an unreachable block, they would be in violation of this. This fixes PR19798. Differential Revision: http://reviews.llvm.org/D5146 llvm-svn: 216911	2014-09-02 16:22:00 +00:00
David Majnemer	d4cffcf073	SROA: Don't insert instructions before a PHI SROA may decide that it needs to insert a bitcast and would set it's insertion point before a PHI. This will create an invalid module right quick. Instead, choose the first insertion point in the basic block that holds our PHI. This fixes PR20822. Differential Revision: http://reviews.llvm.org/D5141 llvm-svn: 216891	2014-09-01 21:20:14 +00:00
David Majnemer	d2df50196f	Revert "Revert two GEP-related InstCombine commits" This reverts commit r216698 which reverted r216523 and r216598. We would attempt to perform the transformation even if the match() failed because, as a side effect, it would set V. This would trick us into believing that we correctly found a place to correctly apply the transform. An additional test case was added to getelementptr.ll so that we might not regress in the future. llvm-svn: 216890	2014-09-01 21:10:02 +00:00
Sanjay Patel	5ad239e15a	Add a convenience method to copy wrapping, exact, and fast-math flags (NFC). The loop vectorizer preserves wrapping, exact, and fast-math properties of scalar instructions. This patch adds a convenience method to make that operation easier because we need to do this in the loop vectorizer, SLP vectorizer, and possibly other places. Although this is a 'no functional change' patch, I've added a testcase to verify that the exact flag is preserved by the loop vectorizer. The wrapping and fast-math flags are already checked in existing testcases. Differential Revision: http://reviews.llvm.org/D5138 llvm-svn: 216886	2014-09-01 18:44:57 +00:00
Chandler Carruth	18cee1defc	Fix a really bad miscompile introduced in r216865 - the else-if logic chain became completely broken here as all intrinsic users ended up being skipped, and the ones that seemed to be singled out were actually the exact wrong set. This is a great example of why long else-if chains can be easily confusing. Switch the entire code to use early exits and early continues to have simpler (and more importantly, correct) logic here, as well as fixing the reversed logic for detecting and continuing on lifetime intrinsics. I've also significantly cleaned up the test case and added another test case demonstrating an example where the optimization is not (trivially) safe to perform. llvm-svn: 216871	2014-09-01 10:09:18 +00:00
Renato Golin	86a6c3f269	Small refactor on VectorizerHint for deduplication Previously, the hint mechanism relied on clean up passes to remove redundant metadata, which still showed up if running opt at low levels of optimization. That also has shown that multiple nodes of the same type, but with different values could still coexist, even if temporary, and cause confusion if the next pass got the wrong value. This patch makes sure that, if metadata already exists in a loop, the hint mechanism will never append a new node, but always replace the existing one. It also enhances the algorithm to cope with more metadata types in the future by just adding a new type, not a lot of code. Re-applying again due to MSVC 2013 being minimum requirement, and this patch having C++11 that MSVC 2012 didn't support. Fixes PR20655. llvm-svn: 216870	2014-09-01 10:00:17 +00:00
Hal Finkel	0c083024f0	Feed AA to the inliner and use AA->getModRefBehavior in AddAliasScopeMetadata This feeds AA through the IFI structure into the inliner so that AddAliasScopeMetadata can use AA->getModRefBehavior to figure out which functions only access their arguments (instead of just hard-coding some knowledge of memory intrinsics). Most of the information is only available from BasicAA; this is important for preserving alias scoping information for target-specific intrinsics when doing the noalias parameter attribute to metadata conversion. llvm-svn: 216866	2014-09-01 09:01:39 +00:00
Nick Lewycky	fc243d54d2	Ignore lifetime intrinsics in use list for MemCpyOptimizer. Patch by Luqman Aden, review by Hal Finkel. llvm-svn: 216865	2014-09-01 06:03:11 +00:00
Hal Finkel	cbb85f249e	Fix AddAliasScopeMetadata again - alias.scope must be a complete description I thought that I had fixed this problem in r216818, but I did not do a very good job. The underlying issue is that when we add alias.scope metadata we are asserting that this metadata completely describes the aliasing relationships within the current aliasing scope domain, and so in the context of translating noalias argument attributes, the pointers must all be based on noalias arguments (as underlying objects) and have no other kind of underlying object. In r216818 excluding appropriate accesses from getting alias.scope metadata is done by looking for underlying objects that are not identified function-local objects -- but that's wrong because allocas, etc. are also function-local objects and we need to explicitly check that all underlying objects are the noalias arguments for which we're adding metadata aliasing scopes. This fixes the underlying-object check for adding alias.scope metadata, and does some refactoring of the related capture-checking eligibility logic (and adds more comments; hopefully making everything a bit clearer). Fixes self-hosting on x86_64 with -mllvm -enable-noalias-to-md-conversion (the feature is still disabled by default). llvm-svn: 216863	2014-09-01 04:26:40 +00:00
Craig Topper	6dc4a8bc2c	Fix some cases where StringRef was being passed by const reference. Remove const from some other StringRefs since its implicitly const already. llvm-svn: 216820	2014-08-30 16:48:02 +00:00
Hal Finkel	a3708df41a	Fix AddAliasScopeMetadata to not add scopes when deriving from unknown pointers The previous implementation of AddAliasScopeMetadata, which adds noalias metadata to preserve noalias parameter attribute information when inlining had a flaw: it would add alias.scope metadata to accesses which might have been derived from pointers other than noalias function parameters. This was incorrect because even some access known not to alias with all noalias function parameters could easily alias with an access derived from some other pointer. Instead, when deriving from some unknown pointer, we cannot add alias.scope metadata at all. This fixes a miscompile of the test-suite's tramp3d-v4. Furthermore, we cannot add alias.scope to functions unless we know they access only argument-derived pointers (currently, we know this only for memory intrinsics). Also, we fix a theoretical problem with using the NoCapture attribute to skip the capture check. This is incorrect (as explained in the comment added), but would not matter in any code generated by Clang because we get only inferred nocapture attributes in Clang-generated IR. This functionality is not yet enabled by default. llvm-svn: 216818	2014-08-30 12:48:33 +00:00
David Majnemer	492e612e01	InstCombine: Respect recursion depth in visitUDivOperand llvm-svn: 216817	2014-08-30 09:19:05 +00:00
David Majnemer	5e96f1b4c8	InstCombine: Try harder to combine icmp instructions consider: (and (icmp X, Y), (and Z, (icmp A, B))) It may be possible to combine (icmp X, Y) with (icmp A, B). If we successfully combine, create an 'and' instruction with Z. This fixes PR20814. N.B. There is room for improvement after this change but I'm not convinced it's worth chasing yet. llvm-svn: 216814	2014-08-30 06:18:20 +00:00
Hal Finkel	2d3d6da44b	Fix a typo in AddAliasScopeMetadata llvm-svn: 216741	2014-08-29 16:33:41 +00:00
David Majnemer	400e725bde	Revert two GEP-related InstCombine commits This reverts commit r216523 and r216598; people have reported regressions. llvm-svn: 216698	2014-08-29 00:06:43 +00:00
Reid Kleckner	febb279c9c	Don't promote byval pointer arguments when padding matters Don't promote byval pointer arguments when when their size in bits is not equal to their alloc size in bits. This can happen for x86_fp80, where the size in bits is 80 but the alloca size in bits in 128. Promoting these types can break passing unions of x86_fp80s and other types. Patch by Thomas Jablin! Reviewed By: rnk Differential Revision: http://reviews.llvm.org/D5057 llvm-svn: 216693	2014-08-28 22:42:00 +00:00
David Majnemer	074052b623	InstCombine: Remove redundant combines InstSimplify already handles icmp (X+Y), X (and things like it) appropriately. The first thing that InstCombine does is run InstSimplify on the instruction. llvm-svn: 216659	2014-08-28 10:08:37 +00:00
Erik Eckstein	8354cfaf95	Fix: SLPVectorizer tried to move an instruction which was replaced by a vector instruction. For a detailed description of the problem see the comment in the test file. The problematic moveBefore() calls are not required anymore because the new scheduling algorithm ensures a correct ordering anyway. llvm-svn: 216656	2014-08-28 07:04:02 +00:00
David Majnemer	76d06bc613	InstSimplify: Move a transform from InstCombine to InstSimplify Several combines involving icmp (shl C2, %X) C1 can be simplified without introducing any new instructions. Move them to InstSimplify; while we are at it, make them more powerful. llvm-svn: 216642	2014-08-28 03:34:28 +00:00
David Majnemer	22ccfc4484	InstCombine: Combine gep X, (Y-X) to Y We try to perform this transform in InstSimplify but we aren't always able to. Sometimes, we need to insert a bitcast if X and Y don't have the same time. llvm-svn: 216598	2014-08-27 20:08:37 +00:00
Michael Zolotukhin	5dc466b863	[SLP] Re-enable vectorization of GEP expressions (re-apply r210342 with a fix). llvm-svn: 216549	2014-08-27 15:01:18 +00:00
Craig Topper	e1d1294853	Simplify creation of a bunch of ArrayRefs by using None, makeArrayRef or just letting them be implicitly created. llvm-svn: 216525	2014-08-27 05:25:25 +00:00
Craig Topper	3af9722529	Fix some cases were ArrayRefs were being passed by reference. Also remove 'const' from some other ArrayRef uses since its implicitly const already. llvm-svn: 216524	2014-08-27 05:25:00 +00:00
David Majnemer	54e97d5dc0	InstCombine: Optimize GEP's involving ptrtoint better We supported transforming: (gep i8* X, -(ptrtoint Y)) to: (inttoptr (sub (ptrtoint X), (ptrtoint Y))) However, this only fired if 'X' had type i8*. Generalize this to support various types of different sizes. This results in much better CodeGen, especially for pointers to packed structs. llvm-svn: 216523	2014-08-27 05:16:04 +00:00
Joerg Sonnenberger	cb5674b9c2	Revert r210342 and r210343, add test case for the crasher. PR 20642. llvm-svn: 216475	2014-08-26 19:06:41 +00:00
Dinesh Dwivedi	4919bbe29d	This patch enables SimplifyUsingDistributiveLaws() to handle following pattens. (X >> Z) & (Y >> Z) -> (X&Y) >> Z for all shifts. (X >> Z) \| (Y >> Z) -> (X\|Y) >> Z for all shifts. (X >> Z) ^ (Y >> Z) -> (X^Y) >> Z for all shifts. These patterns were previously handled separately in visitAnd()/visitOr()/visitXor(). Differential Revision: http://reviews.llvm.org/D4951 llvm-svn: 216443	2014-08-26 08:53:32 +00:00
Reid Kleckner	3715461b48	musttail: Don't eliminate varargs packs if there is a forwarding call Also clean up and beef up this grep test for the feature. llvm-svn: 216425	2014-08-26 00:59:51 +00:00
Sanjay Patel	4e31cdabd1	fix typos in comments llvm-svn: 216424	2014-08-26 00:59:15 +00:00
Reid Kleckner	e6e88f99b3	ArgPromotion: Don't touch variadic functions Adding, removing, or changing non-pack parameters can change the ABI classification of pack parameters. Clang and other frontends encode the classification in the IR of the call site, but the callee side determines it dynamically based on the number of registers consumed so far. Changing the prototype affects the number of registers consumed would break such code. Dead argument elimination performs a similar task and already has a similar check to avoid this problem. Patch by Thomas Jablin! llvm-svn: 216421	2014-08-25 23:58:48 +00:00
Rafael Espindola	3fd1e9933f	Modernize raw_fd_ostream's constructor a bit. Take a StringRef instead of a "const char *". Take a "std::error_code &" instead of a "std::string &" for error. A create static method would be even better, but this patch is already a bit too big. llvm-svn: 216393	2014-08-25 18:16:47 +00:00
Bruno Cardoso Lopes	e2a1fa35df	Remove dangling initializers in GlobalDCE GlobalDCE deletes global vars and updates their initializers to nullptr while leaving underlying constants to be cleaned up later by its uses. The clean up may never happen, fix this by forcing it every time it's safe to destroy constants. Final patch by Rafael Espindola http://reviews.llvm.org/D4931 <rdar://problem/17523868> llvm-svn: 216390	2014-08-25 17:51:14 +00:00
Stepan Dyatkovskiy	c90308bf83	MergeFunctions, tiny refactoring: cmpAPFloat has been renamed to cmpAPFloats (multiple form). llvm-svn: 216376	2014-08-25 08:22:46 +00:00
Stepan Dyatkovskiy	7f895c1184	MergeFunctions, tiny refactoring: cmpAPInt has been renamed to cmpAPInts (multiple form). llvm-svn: 216375	2014-08-25 08:19:50 +00:00
Stepan Dyatkovskiy	0b765dee6e	MergeFunctions, tiny refactoring: cmpType has been renamed to cmpTypes (multiple form). llvm-svn: 216374	2014-08-25 08:16:39 +00:00
Stepan Dyatkovskiy	016daddc52	MergeFunctions, tiny refactoring: cmpGEP has been renamed to cmpGEPs (multiple form). llvm-svn: 216373	2014-08-25 08:12:45 +00:00
Karthik Bhat	7f33ff7dea	Allow vectorization of division by uniform power of 2. This patch adds support to recognize division by uniform power of 2 and modifies the cost table to vectorize division by uniform power of 2 whenever possible. Updates Cost model for Loop and SLP Vectorizer.The cost table is currently only updated for X86 backend. Thanks to Hal, Andrea, Sanjay for the review. (http://reviews.llvm.org/D4971) llvm-svn: 216371	2014-08-25 04:56:54 +00:00
Craig Topper	4627679cec	Use range based for loops to avoid needing to re-mention SmallPtrSet size. llvm-svn: 216351	2014-08-24 23:23:06 +00:00
David Majnemer	0ffccf7fb5	InstCombine: Properly optimize or'ing bittests together CFE, with -03, would turn: bool f(unsigned x) { bool a = x & 1; bool b = x & 2; return a \| b; } into: %1 = lshr i32 %x, 1 %2 = or i32 %1, %x %3 = and i32 %2, 1 %4 = icmp ne i32 %3, 0 This sort of thing exposes a nasty pathology in GCC, ICC and LLVM. Instead, we would rather want: %1 = and i32 %x, 3 %2 = icmp ne i32 %1, 0 Things get a bit more interesting in the following case: %1 = lshr i32 %x, %y %2 = or i32 %1, %x %3 = and i32 %2, 1 %4 = icmp ne i32 %3, 0 Replacing it with the following sequence is better: %1 = shl nuw i32 1, %y %2 = or i32 %1, 1 %3 = and i32 %2, %x %4 = icmp ne i32 %3, 0 This sequence is preferable because %1 doesn't involve %x and could potentially be hoisted out of loops if it is invariant; only perform this transform in the non-constant case if we know we won't increase register pressure. llvm-svn: 216343	2014-08-24 09:10:57 +00:00
Jingyue Wu	ec33fa9aca	[SROA] Fold a PHI node if all its incoming values are the same Summary: Fixes PR20425. During slice building, if all of the incoming values of a PHI node are the same, replace the PHI node with the common value. This simplification makes alloca's used by PHI nodes easier to promote. Test Plan: Added three more tests in phi-and-select.ll Reviewers: nlewycky, eliben, meheff, chandlerc Reviewed By: chandlerc Subscribers: zinovy.nis, hfinkel, baldrick, llvm-commits Differential Revision: http://reviews.llvm.org/D4659 llvm-svn: 216299	2014-08-22 22:45:57 +00:00
David Majnemer	49775e0173	InstCombine: Don't unconditionally preserve 'nuw' when shrinking constants Consider: %add = add nuw i32 %a, -16777216 %and = and i32 %add, 255 Regardless of whether or not we demand the sign bit of %add, we cannot replace -16777216 with 2130706432 without also removing 'nuw' from the instruction. llvm-svn: 216273	2014-08-22 17:11:04 +00:00
David Majnemer	0e6c986696	InstCombine: sub nsw %x, C -> add nsw %x, -C if C isn't INT_MIN We can preserve nsw during this transform if -C won't overflow. llvm-svn: 216269	2014-08-22 16:41:23 +00:00
David Majnemer	42b83a5e36	InstCombine: Don't unconditionally preserve 'nsw' when shrinking constants Consider: %add = add nsw i32 %a, -16777216 %and = and i32 %add, 255 Regardless of whether or not we demand the sign bit of %add, we cannot replace -16777216 with 2130706432 without also removing 'nsw' from the instruction. This fixes PR20377. llvm-svn: 216261	2014-08-22 07:56:32 +00:00
Erik Eckstein	b49d7abb7b	fix: SLPVectorizer crashes for unreachable blocks containing not schedulable instructions. In unreachable blocks it's legal to have instructions like "%x = op %x". Such instuctions are not schedulable. Therefore the SLPVectorizer has to check for unreachable blocks and ignore them. Fixes bug 20646. llvm-svn: 216256	2014-08-22 01:18:39 +00:00
Peter Collingbourne	fab565a56b	[dfsan] Fix non-determinism bug in non-zero label check annotator. We now use a std::vector instead of a DenseSet to store the list of label checks so that we can iterate over it deterministically. llvm-svn: 216255	2014-08-22 01:18:18 +00:00
Reid Kleckner	c36f48f08a	SROA: Handle a case of store size being smaller than allocation size In this case, we are creating an x86_fp80 slice for a union from C where the padding bytes may contain real data. An x86_fp80 alloca is 16 bytes, and that's just fine. We can't, however, use regular loads and stores to access the slice, because the store size is only 10 bytes / 80 bits. Instead, use memcpy and memset. Fixes PR18726. Reviewed By: chandlerc Differential Revision: http://reviews.llvm.org/D5012 llvm-svn: 216248	2014-08-22 00:09:56 +00:00
David Blaikie	2f3f76fdb1	Use DILexicalBlockFile, rather than DILexicalBlock, to track discriminator changes to ensure discriminator changes don't introduce new DWARF DW_TAG_lexical_blocks. Somewhat unnoticed in the original implementation of discriminators, but it could cause instructions to end up in new, small, DW_TAG_lexical_blocks due to the use of DILexicalBlock to track discriminator changes. Instead, use DILexicalBlockFile which we already use to track file changes without introducing new scopes, so it works well to track discriminator changes in the same way. llvm-svn: 216239	2014-08-21 22:45:21 +00:00
Rafael Espindola	7cebf36a95	Move some logic to populateLTOPassManager. This will avoid code duplication in the next commit which calls it directly from the gold plugin. llvm-svn: 216211	2014-08-21 20:03:44 +00:00
Rafael Espindola	216e0c0617	Respect LibraryInfo in populateLTOPassManager and use it. NFC. llvm-svn: 216203	2014-08-21 18:49:52 +00:00
Rafael Espindola	e07caad9e7	Handle inlining in populateLTOPassManager like in populateModulePassManager. No functionality change. llvm-svn: 216178	2014-08-21 13:35:30 +00:00
Zinovy Nis	33406da5f4	[CLNUP] Remove return after llvm_unreachable. Thanks to Hal Finkel for pointing. llvm-svn: 216176	2014-08-21 13:30:05 +00:00
Rafael Espindola	208bc533cd	Move DisableGVNLoadPRE from populateLTOPassManager to PassManagerBuilder. llvm-svn: 216174	2014-08-21 13:13:17 +00:00
Erik Verbruggen	2b98bd2a80	Reassociate x + -0.1234 * y into x - 0.1234 * y This does not require -ffast-math, and it gives CSE/GVN more options to eliminate duplicate expressions in, e.g.: return ((x + 0.1234 * y) * (x - 0.1234 * y)); Differential Revision: http://reviews.llvm.org/D4904 llvm-svn: 216169	2014-08-21 10:45:30 +00:00
Zinovy Nis	0a36cba29d	[INDVARS] Extend using of widening of induction variables for the cases of "sub nsw" and "mul nsw" instructions. Currently only "add nsw" are widened. This patch eliminates tons of "sext" instructions for 64 bit code (and the corresponding target code) in cases like: int N = 100; float *A; void foo(int x0, int x1) { float A_cur = &A[0][0]; float * A_next = &A[1][0]; for(int x = x0; x < x1; ++x). { // Currently only [x+N] case is widened. Others 2 cases lead to sext. // This patch fixes it, so all 3 cases do not need sext. const float div = A_cur[x + N] + A_cur[x - N] + A_cur[x * N]; A_next[x] = div; } } ... > clang++ test.cpp -march=core-avx2 -Ofast -fno-unroll-loops -fno-tree-vectorize -S -o - Differential Revision: http://reviews.llvm.org/D4695 llvm-svn: 216160	2014-08-21 08:25:45 +00:00
Craig Topper	71b7b68b74	Repace SmallPtrSet with SmallPtrSetImpl in function arguments to avoid needing to mention the size. llvm-svn: 216158	2014-08-21 05:55:13 +00:00
David Majnemer	5d1aeba2ea	InstCombine: Fold ((A \| B) & C1) ^ (B & C2) -> (A & C1) ^ B if C1^C2=-1 Adapted from a patch by Richard Smith, test-case written by me. llvm-svn: 216157	2014-08-21 05:14:48 +00:00
James Molloy	82c995d450	[LoopVectorizer] Limit unroll factor in the presence of nested reductions. If we have a scalar reduction, we can increase the critical path length if the loop we're unrolling is inside another loop. Limit, by default to 2, so the critical path only gets increased by one reduction operation. llvm-svn: 216140	2014-08-20 23:53:52 +00:00
Yi Jiang	1a4e73d7bf	New InstCombine pattern: (icmp ult/ule (A + C1), C3) \| (icmp ult/ule (A + C2), C3) to (icmp ult/ule ((A & ~(C1 ^ C2)) + max(C1, C2)), C3) under certain condition llvm-svn: 216135	2014-08-20 22:55:40 +00:00
David Majnemer	42158f3eea	InstCombine: Annotate sub with nuw when we prove it's safe We can prove that a 'sub' can be a 'sub nuw' if the left-hand side is negative and the right-hand side is non-negative. llvm-svn: 216045	2014-08-20 07:17:31 +00:00
Peter Collingbourne	f39430bd4a	[dfsan] Treat vararg custom functions like unimplemented functions. Because declarations of these functions can appear in places like autoconf checks, they have to be handled somehow, even though we do not support vararg custom functions. We do so by printing a warning and calling the uninstrumented function, as we do for unimplemented functions. llvm-svn: 216042	2014-08-20 01:40:23 +00:00
David Majnemer	57d5bc8849	InstCombine: Annotate sub with nsw when we prove it's safe We can prove that a 'sub' can be a 'sub nsw' under certain conditions: - The sign bits of the operands is the same. - Both operands have more than 1 sign bit. The subtraction cannot be a signed overflow in either case. llvm-svn: 216037	2014-08-19 23:36:30 +00:00
Renato Golin	06d601fb3e	Revert "Small refactor on VectorizerHint for deduplication" This reverts commit r215994 because MSVC 2012 can't cope with its C++11 goodness. llvm-svn: 215999	2014-08-19 18:08:50 +00:00
Renato Golin	dd6394d833	Small refactor on VectorizerHint for deduplication Previously, the hint mechanism relied on clean up passes to remove redundant metadata, which still showed up if running opt at low levels of optimization. That also has shown that multiple nodes of the same type, but with different values could still coexist, even if temporary, and cause confusion if the next pass got the wrong value. This patch makes sure that, if metadata already exists in a loop, the hint mechanism will never append a new node, but always replace the existing one. It also enhances the algorithm to cope with more metadata types in the future by just adding a new type, not a lot of code. llvm-svn: 215994	2014-08-19 17:30:43 +00:00
Mayur Pandey	960507beb4	InstCombine: ((A & ~B) ^ (~A & B)) to A ^ B Proof using CVC3 follows: $ cat t.cvc A, B : BITVECTOR(32); QUERY BVXOR((A & ~B),(~A & B)) = BVXOR(A,B); $ cvc3 t.cvc Valid. Differential Revision: http://reviews.llvm.org/D4898 llvm-svn: 215974	2014-08-19 08:19:19 +00:00
Craig Topper	97ebe53032	Const-correct and prevent a copy of a SmallPtrSet. llvm-svn: 215973	2014-08-19 07:44:27 +00:00
Mayur Pandey	75b76c6a92	test commit (spelling correction) llvm-svn: 215970	2014-08-19 06:41:55 +00:00
Craig Topper	6230691c91	Revert "Repace SmallPtrSet with SmallPtrSetImpl in function arguments to avoid needing to mention the size." Getting a weird buildbot failure that I need to investigate. llvm-svn: 215870	2014-08-18 00:24:38 +00:00
Craig Topper	5229cfd163	Repace SmallPtrSet with SmallPtrSetImpl in function arguments to avoid needing to mention the size. llvm-svn: 215868	2014-08-17 23:47:00 +00:00
Owen Anderson	a4428aa484	Remove an InstCombine that transformed patterns like (x * uitofp i1 y) to (select y, x, 0.0) when the multiply has fast math flags set. While this might seem like an obvious canonicalization, there is one subtle problem with it. The result of the original expression is undef when x is NaN (remember, fast math flags), but the result of the select is always defined when x is NaN. This means that the new expression is strictly more defined than the original one. One unfortunate consequence of this is that the transform is not reversible! It's always legal to make increase the defined-ness of an expression, but it's not legal to reduce it. Thus, targets that prefer the original form of the expression cannot reverse the transform to recover it. Another way to think of it is that the transform has lost source-level information (the fast math flags), which is undesirable. llvm-svn: 215825	2014-08-17 03:51:29 +00:00
David Majnemer	1a0bbc8a5c	InstCombine: Fix a potential bug in 0 - (X sdiv C) -> (X sdiv -C) While most (X sdiv 1) operations will get caught by InstSimplify, it is still possible for a sdiv to appear in the worklist which hasn't been simplified yet. This means that it is possible for 0 - (X sdiv 1) to get transformed into (X sdiv -1); dividing by -1 can make the transform produce undef values instead of the proper result. Sorry for the lack of testcase, it's a bit problematic because it relies on the exact order of operations in the worklist. llvm-svn: 215818	2014-08-16 09:23:42 +00:00
David Majnemer	f9a095d606	InstCombine: Combine mul with div. We can combne a mul with a div if one of the operands is a multiple of the other: %mul = mul nsw nuw %a, C1 %ret = udiv %mul, C2 => %ret = mul nsw %a, (C1 / C2) This can expose further optimization opportunities if we end up multiplying or dividing by a power of 2. Consider this small example: define i32 @f(i32 %a) { %mul = mul nuw i32 %a, 14 %div = udiv exact i32 %mul, 7 ret i32 %div } which gets CodeGen'd to: imull $14, %edi, %eax imulq $613566757, %rax, %rcx shrq $32, %rcx subl %ecx, %eax shrl %eax addl %ecx, %eax shrl $2, %eax retq We can now transform this into: define i32 @f(i32 %a) { %shl = shl nuw i32 %a, 1 ret i32 %shl } which gets CodeGen'd to: leal (%rdi,%rdi), %eax retq This fixes PR20681. llvm-svn: 215815	2014-08-16 08:55:06 +00:00
Rafael Espindola	ea46c32f81	Introduce a helper to combine instruction metadata. Replace the old code in GVN and BBVectorize with it. Update SimplifyCFG to use it. Patch by Björn Steinbrink! llvm-svn: 215723	2014-08-15 15:46:38 +00:00
Hal Finkel	61c386126b	Copy noalias metadata from call sites to inlined instructions When a call site with noalias metadata is inlined, that metadata can be propagated directly to the inlined instructions (only those that might access memory because it is not useful on the others). Prior to inlining, the noalias metadata could express that a call would not alias with some other memory access, which implies that no instruction within that called function would alias. By propagating the metadata to the inlined instructions, we preserve that knowledge. This should complete the enhancements requested in PR20500. llvm-svn: 215676	2014-08-14 21:09:37 +00:00
Hal Finkel	d2dee16c27	Add noalias metadata for general calls (not just memory intrinsics) during inlining When preserving noalias function parameter attributes by adding noalias metadata in the inliner, we should do this for general function calls (not just memory intrinsics). The logic is very similar to what already existed (except that we want to add this metadata even for functions taking no relevant parameters). This metadata can be used by ModRef queries in the caller after inlining. This addresses the first part of PR20500. Adding noalias metadata during inlining is still turned off by default. llvm-svn: 215657	2014-08-14 16:44:03 +00:00
Chad Rosier	11ab941644	[Reassociation] Add support for reassociation with unsafe algebra. Vector instructions are (still) not supported for either integer or floating point. Hopefully, that work will be landed shortly. llvm-svn: 215647	2014-08-14 15:23:01 +00:00
David Majnemer	698dca0b95	InstCombine: ((A \| ~B) ^ (~A \| B)) to A ^ B Proof using CVC3 follows: $ cat t.cvc A, B : BITVECTOR(32); QUERY BVXOR((A \| ~B),(~A \|B)) = BVXOR(A,B); $ cvc3 t.cvc Valid. Patch by Mayur Pandey! Differential Revision: http://reviews.llvm.org/D4883 llvm-svn: 215621	2014-08-14 06:46:25 +00:00
David Majnemer	f1eda23514	Added InstCombine Transform for ((B \| C) & A) \| B -> B \| (A & C) Transform ((B \| C) & A) \| B --> B \| (A & C) Z3 Link: http://rise4fun.com/Z3/hP6p Patch by Sonam Kumari! Differential Revision: http://reviews.llvm.org/D4865 llvm-svn: 215619	2014-08-14 06:41:38 +00:00
Jan Vesely	0cd3ec6cfa	utils: Fix segfault in flattencfg v2: continue iterating through the rest of the bb use for loop v3: initialize FlattenCFG pass in ScalarOps add test v4: split off initializing flattencfg to a separate patch add comment Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 215574	2014-08-13 20:31:53 +00:00
Jan Vesely	5a956d49f7	Initialize FlattenCFG pass Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 215573	2014-08-13 20:31:52 +00:00
Benjamin Kramer	a7c40ef022	Canonicalize header guards into a common format. Add header guards to files that were missing guards. Remove #endif comments as they don't seem common in LLVM (we can easily add them back if we decide they're useful) Changes made by clang-tidy with minor tweaks. llvm-svn: 215558	2014-08-13 16:26:38 +00:00
Chandler Carruth	0fb998110a	[optnone] Make the optnone attribute effective at suppressing function attribute and function argument attribute synthesizing and propagating. As with the other uses of this attribute, the goal remains a best-effort (no guarantees) attempt to not optimize the function or assume things about the function when optimizing. This is particularly useful for compiler testing, bisecting miscompiles, triaging things, etc. I was hitting specific issues using optnone to isolate test code from a test driver for my fuzz testing, and this is one step of fixing that. llvm-svn: 215538	2014-08-13 10:49:33 +00:00
Chandler Carruth	3f92ecc2a0	Revert r215415 which causse MSan to crash on a great deal of C++ code. I've followed up on the original commit as well. llvm-svn: 215532	2014-08-13 09:19:39 +00:00
Karthik Bhat	a4a4db91be	InstCombine: Combine (xor (or %a, %b) (xor %a, %b)) to (add %a, %b) Correctness proof of the transform using CVC3- $ cat t.cvc A, B : BITVECTOR(32); QUERY BVXOR(A \| B, BVXOR(A,B) ) = A & B; $ cvc3 t.cvc Valid. llvm-svn: 215524	2014-08-13 05:13:14 +00:00
Matt Arsenault	4815f09bbe	Allwo bitcast + struct GEP transform to work with addrspacecast llvm-svn: 215467	2014-08-12 19:46:13 +00:00
Reid Kleckner	3ae6e1528a	msan: Handle musttail calls First, avoid calling setTailCall(false) on musttail calls. The funciton prototypes should be "congruent", so the shadow layout should be exactly the same. Second, avoid inserting instrumentation after a musttail call to propagate the return value shadow. We don't need to propagate the result of a tail call, it should already be in the right place. Reviewed By: eugenis Differential Revision: http://reviews.llvm.org/D4331 llvm-svn: 215415	2014-08-12 00:12:43 +00:00
Reid Kleckner	e31acf239a	Move helper for getting a terminating musttail call to BasicBlock No functional change. To be used in future commits that need to look for such instructions. Reviewed By: rafael Differential Revision: http://reviews.llvm.org/D4504 llvm-svn: 215413	2014-08-12 00:05:15 +00:00
David Majnemer	ab07f00c64	InstCombine: Combine (add (and %a, %b) (or %a, %b)) to (add %a, %b) What follows bellow is a correctness proof of the transform using CVC3. $ < t.cvc A, B : BITVECTOR(32); QUERY BVPLUS(32, A & B, A \| B) = BVPLUS(32, A, B); $ cvc3 < t.cvc Valid. llvm-svn: 215400	2014-08-11 22:32:02 +00:00
James Molloy	65b08f5e46	[LoopVectorizer] Enable support for floating-point subtraction reductions llvm-svn: 215200	2014-08-08 12:41:08 +00:00
David Majnemer	fe8c7540b0	GlobalOpt: Optimize in the face of insertvalue/extractvalue GlobalOpt didn't know how to simulate InsertValueInst or ExtractValueInst. Optimizing these is pretty straightforward. N.B. This came up when looking at clang's IRGen for MS ABI member pointers; they are represented as aggregates. llvm-svn: 215184	2014-08-08 05:50:43 +00:00
Gerolf Hoflehner	ea96a3d336	Fix for multi-line comment warning llvm-svn: 215169	2014-08-07 23:19:55 +00:00
Arnold Schwaighofer	4fb3c47456	SLPVectorizer: Use the type of the value loaded/stored to get the ABI alignment We were using the pointer type which is incorrect. llvm-svn: 215162	2014-08-07 22:47:27 +00:00
Owen Anderson	6c19ab1b5d	Fix a case in SROA where lifetime intrinsics could inhibit alloca promotion. In this case, the code path dealing with vector promotion was missing the explicit checks for lifetime intrinsics that were present on the corresponding integer promotion path. llvm-svn: 215148	2014-08-07 21:07:35 +00:00
Rui Ueyama	c487f7728e	Revert "r214897 - Remove dead zero store to calloc initialized memory" It broke msan. llvm-svn: 214989	2014-08-06 19:30:38 +00:00
James Molloy	568da0990e	Add a new option -run-slp-after-loop-vectorization. This swaps the order of the loop vectorizer and the SLP/BB vectorizers. It is disabled by default so we can do performance testing - ideally we want to change to having the loop vectorizer running first, and the SLP vectorizer using its leftovers instead of the other way around. llvm-svn: 214963	2014-08-06 12:56:19 +00:00
Peter Collingbourne	df240b252a	[dfsan] Try not to create too many additional basic blocks in functions which already have a large number of blocks. Works around a performance issue with the greedy register allocator. llvm-svn: 214944	2014-08-06 00:33:40 +00:00
JF Bastien	ac8b66b32c	Fix typos in comments and doc Committing http://reviews.llvm.org/D4798 for Robin Morisset (morisset@google.com) llvm-svn: 214934	2014-08-05 23:27:34 +00:00
Rafael Espindola	f9e52cf015	Don't internalize all but main by default. This is mostly a cleanup, but it changes a fairly old behavior. Every "real" LTO user was already disabling the silly internalize pass and creating the internalize pass itself. The difference with this patch is for "opt -std-link-opts" and the C api. Now to get a usable behavior out of opt one doesn't need the funny looking command line: opt -internalize -disable-internalize -internalize-public-api-list=foo,bar -std-link-opts llvm-svn: 214919	2014-08-05 20:10:38 +00:00
Philip Reames	00c9b6461f	Remove dead zero store to calloc initialized memory Optimize the following IR: %1 = tail call noalias i8* @calloc(i64 1, i64 4) %2 = bitcast i8* %1 to i32* ; This store is dead and should be removed store i32 0, i32* %2, align 4 Memory returned by calloc is guaranteed to be zero initialized. If the value being stored is the constant zero (and the store is not otherwise observable across threads), we can delete the store. If the store is to an out of bounds address, it is undefined and thus also removable. Reviewed By: nicholas Differential Revision: http://reviews.llvm.org/D3942 llvm-svn: 214897	2014-08-05 17:48:20 +00:00
James Molloy	2b8933c354	Teach the SLP Vectorizer that keeping some values live over a callsite can have a cost. Some types, such as 128-bit vector types on AArch64, don't have any callee-saved registers. So if a value needs to stay live over a callsite, it must be spilled and refilled. This cost is now taken into account. llvm-svn: 214859	2014-08-05 12:30:34 +00:00
Manman Ren	062f58d550	[SimplifyCFG] fix accessing deleted PHINodes in switch-to-table conversion. When we have a covered lookup table, make sure we don't delete PHINodes that are cached in PHIs. rdar://17887153 llvm-svn: 214642	2014-08-02 23:41:54 +00:00
Erik Eckstein	26a1bf7d84	fix bug 20513 - Crash in SLP Vectorizer llvm-svn: 214638	2014-08-02 19:39:42 +00:00
Alexey Samsonov	d9ad5cec0c	[ASan] Use metadata to pass source-level information from Clang to ASan. Instead of creating global variables for source locations and global names, just create metadata nodes and strings. They will be transformed into actual globals in the instrumentation pass (if necessary). This approach is more flexible: 1) we don't have to ensure that our custom globals survive all the optimizations 2) if globals are discarded for some reason, we will simply ignore metadata for them and won't have to erase corresponding globals 3) metadata for source locations can be reused for other purposes: e.g. we may attach source location metadata to alloca instructions and provide better descriptions for stack variables in ASan error reports. No functionality change. llvm-svn: 214604	2014-08-02 00:35:50 +00:00
Tyler Nowicki	064896bbc5	Add diagnostics to the vectorizer cost model. When the cost model determines vectorization is not possible/profitable these remarks print an analysis of that decision. Note that in selectVectorizationFactor() we can assume that OptForSize and ForceVectorization are mutually exclusive. Reviewed by Arnold Schwaighofer llvm-svn: 214599	2014-08-02 00:14:03 +00:00
Peter Collingbourne	e52646cd80	PartiallyInlineLibCalls: Check sqrt result type before transforming it. Some configure scripts declare this with the wrong prototype, which can lead to an assertion failure. llvm-svn: 214593	2014-08-01 23:21:21 +00:00
Peter Collingbourne	142fdff0d5	[dfsan] Correctly handle loads and stores of zero size. llvm-svn: 214561	2014-08-01 21:18:18 +00:00
Rafael Espindola	3f6481d0d3	Remove some calls to std::move. Instead of moving out the data in a ErrorOr<std::unique_ptr<Foo>>, get a reference to it. Thanks to David Blaikie for the suggestion. llvm-svn: 214516	2014-08-01 14:31:55 +00:00
Erik Eckstein	690dd037d9	SLPVectorizer: fix build problem in Release configuration llvm-svn: 214496	2014-08-01 09:47:38 +00:00
Erik Eckstein	c80e1dc081	SLPVectorizer: improved scheduling algorithm. llvm-svn: 214494	2014-08-01 09:20:42 +00:00
Erik Eckstein	f16a808292	SLP Vectorizer: added statistics counter llvm-svn: 214487	2014-08-01 08:14:28 +00:00
Erik Eckstein	4944b2ff94	SLP Vectorizer: improve canonicalize tree operands of commutitive binary operands. This reverts r214338 (except the test file) and replaces it with a more general algorithm. llvm-svn: 214485	2014-08-01 08:05:55 +00:00
Suyog Sarda	56c9a87035	This patch implements transform for pattern "(A & ~B) ^ (~A) -> ~(A & B)". Differential Revision: http://reviews.llvm.org/D4653 llvm-svn: 214479	2014-08-01 05:07:20 +00:00
Suyog Sarda	1c6c2f69f7	This patch implements transform for pattern "(A \| B) & ((~A) ^ B) -> (A & B)". Differential Revision: http://reviews.llvm.org/D4628 llvm-svn: 214478	2014-08-01 04:59:26 +00:00
Suyog Sarda	52324c82cc	This patch implements transform for pattern "( A & (~B)) \| (A ^ B) -> (A ^ B)" Differential Revision: http://reviews.llvm.org/D4652 llvm-svn: 214477	2014-08-01 04:50:31 +00:00
Suyog Sarda	16d646594e	This patch implements transform for pattern "(A & B) \| ((~A) ^ B) -> (~A ^ B)". Patch Credit to Ankit Jain ! Differential Revision: http://reviews.llvm.org/D4655 llvm-svn: 214476	2014-08-01 04:41:43 +00:00
Tyler Nowicki	b5a65395cc	Improve the remark generated for -Rpass-missed. The current remark is ambiguous and makes it sounds like explicitly specifying vectorization will allow the loop to be vectorized. This is not the case. The improved remark directs the user to -Rpass-analysis=loop-vectorize to determine the cause of the pass-miss. Reviewed by Arnold Schwaighofer` llvm-svn: 214445	2014-07-31 21:22:22 +00:00
Tyler Nowicki	9fe497fcac	Improve the remark generated when a variable that is used outside the loop is not a reduction or induction variable. Reviewed by Arnold Schwaighofer llvm-svn: 214440	2014-07-31 21:02:40 +00:00
Evgeniy Stepanov	5997feb7dc	[msan] Fix handling of array types. Switch array type shadow from a single integer to an array of integers (i.e. make it per-element). This simplifies instrumentation of extractvalue and fixes PR20493. llvm-svn: 214398	2014-07-31 11:02:27 +00:00
Stepan Dyatkovskiy	87c046189d	MergeFunctions, tiny refactoring: cmpOperation has been renamed to cmpOperations (multiple form). llvm-svn: 214392	2014-07-31 07:16:59 +00:00
David Majnemer	a92687d636	InstCombine: Correctly propagate NSW/NUW for x-(-A) -> x+A We can only propagate the nsw bits if both subtraction instructions are marked with the appropriate bit. N.B. We only propagate the nsw bit in InstCombine because the nuw case is already handled in InstSimplify. This fixes PR20189. llvm-svn: 214385	2014-07-31 04:49:29 +00:00
David Majnemer	42af3601c2	InstCombine: Simplify (A ^ B) or/and (A ^ B ^ C) While we can already transform A \| (A ^ B) into A \| B, things get bad once we have (A ^ B) \| (A ^ B ^ Cst) because reassociation will morph this into (A ^ B) \| ((A ^ Cst) ^ B). Our existing patterns fail once this happens. To fix this, we add a new pattern which looks through the tree of xor binary operators to see that, in fact, there exists a redundant xor operation. What follows bellow is a correctness proof of the transform using CVC3. $ cat t.cvc A, B, C : BITVECTOR(64); QUERY BVXOR(A, B) \| BVXOR(BVXOR(B, C), A) = BVXOR(A, B) \| C; QUERY BVXOR(BVXOR(A, C), B) \| BVXOR(A, B) = BVXOR(A, B) \| C; QUERY BVXOR(A, B) & BVXOR(BVXOR(B, C), A) = BVXOR(A, B) & ~C; QUERY BVXOR(BVXOR(A, C), B) & BVXOR(A, B) = BVXOR(A, B) & ~C; $ cvc3 < t.cvc Valid. Valid. Valid. Valid. llvm-svn: 214342	2014-07-30 21:26:37 +00:00
Chad Rosier	78f41b3ca7	SLP Vectorizer: Canonicalize tree operands of commutitive binary operands. llvm-svn: 214338	2014-07-30 21:07:56 +00:00
Rafael Espindola	d07cf400ab	SimplifyCFG: Avoid miscompilations due to removed lifetime intrinsics. The lifetime intrinsics need some work in order to make it clear which optimizations are or are not valid. For now dropping this optimization avoids a miscompilation. Patch by Björn Steinbrink. llvm-svn: 214336	2014-07-30 21:04:00 +00:00
Aaron Ballman	573f3b5313	Fixing a few -Woverloaded-virtual warnings by exposing the hidden virtual function as well. No functional changes intended. llvm-svn: 214325	2014-07-30 19:23:59 +00:00
Rafael Espindola	3cf4af11d5	Add the missing hasLinkOnceODRLinkage predicate. llvm-svn: 214312	2014-07-30 15:57:51 +00:00
Manman Ren	f8a1967c8c	[Debug Info] add DISubroutineType and its creation takes DITypeArray. DITypeArray is an array of DITypeRef, at its creation, we will create DITypeRef (i.e use the identifier if the type node has an identifier). This is the last patch to unique the type array of a subroutine type. rdar://17628609 llvm-svn: 214132	2014-07-28 22:24:06 +00:00
Manman Ren	ab8ffbaaee	[Debug Info] rename getTypeArray to getElements, setTypeArray to setArrays. This is the second of a series of patches to handle type uniqueing of the type array for a subroutine type. For vector and array types, getElements returns the array of subranges, so it is a better name than getTypeArray. Even for class, struct and enum types, getElements returns the members, which can be subprograms. setArrays can set up to two arrays, the second is the templates. This commit should have no functionality change. llvm-svn: 214112	2014-07-28 19:14:13 +00:00

... 4 5 6 7 8 ...

12258 Commits