llvm-project

Commit Graph

Author	SHA1	Message	Date
Sanjay Patel	fcc7c1a0ba	[LibCallSimplifier] don't get fooled by a fake fmin() This is similar to the bug/fix: https://llvm.org/bugs/show_bug.cgi?id=26211 http://reviews.llvm.org/rL258325 The fmin() test case reveals another bug caused by sloppy code duplication. It will crash without this patch because fp128 is a valid floating-point type, but we would think that we had matched a function that used doubles. The new helper function can be used to replace similar checks that are used in several other places in this file. llvm-svn: 258428	2016-01-21 20:19:54 +00:00
David Majnemer	3af5bf30e3	[InstCombine] Simplify (x >> y) <= x This commit extends the patterns recognised by InstSimplify to also handle (x >> y) <= x in the same way as (x /u y) <= x. The missing optimisation was found investigating why LLVM did not optimise away bound checks in a binary search: https://github.com/rust-lang/rust/pull/30917 Patch by Andrea Canciani! Differential Revision: http://reviews.llvm.org/D16402 llvm-svn: 258422	2016-01-21 18:55:54 +00:00
Rong Xu	ed9fec7365	[PGO] IR level instrumentation of indirect call value profiling This patch adds the instrumentation for indirect call value profiling. It finds all the indirect call-sites and generates instrprof_value_profile intrinsic calls. A new opt level option -disable-vp is introduced to disable this instrumentation. Reviewers: davidxl, betulb, vsk Differential Revision: http://reviews.llvm.org/D16016 llvm-svn: 258417	2016-01-21 18:11:44 +00:00
Matthew Simpson	486bace5cc	Revert "[SLP] Truncate expressions to minimum required bit width" This reverts commit r258404. llvm-svn: 258408	2016-01-21 17:17:20 +00:00
Vedant Kumar	61035fa3cb	[GCOV] Avoid emitting profile arcs for module and skeleton CUs Do not emit profile arc files and note files for module and skeleton CU's. Our users report seeing unexpected .gcda and .gcno files in their projects when using gcov-style profiling with modules or frameworks. The unwanted files come from these modules. This is not very helpful for end-users. Further, we've seen reports of instrumented programs crashing while writing these files out (due to I/O failures). rdar://problem/22838296 Reviewed-by: aprantl Differential Revision: http://reviews.llvm.org/D15997 llvm-svn: 258406	2016-01-21 17:04:42 +00:00
Matthew Simpson	cb17d72170	[SLP] Truncate expressions to minimum required bit width This change attempts to produce vectorized integer expressions in bit widths that are narrower than their scalar counterparts. The need for demotion arises especially on architectures in which the small integer types (e.g., i8 and i16) are not legal for scalar operations but can still be used in vectors. Like similar work done within the loop vectorizer, we rely on InstCombine to perform the actual type-shrinking. We use the DemandedBits analysis and ComputeNumSignBits from ValueTracking to determine the minimum required bit width of an expression. Differential revision: http://reviews.llvm.org/D15815 llvm-svn: 258404	2016-01-21 16:31:55 +00:00
Sanjoy Das	a34ce95b60	Add a "gc-transition" operand bundle Summary: This adds a new kind of operand bundle to LLVM denoted by the `"gc-transition"` tag. Inputs to `"gc-transition"` operand bundle are lowered into the "transition args" section of `gc.statepoint` by `RewriteStatepointsForGC`. This removes the last bit of functionality that was unsupported in the deopt bundle based code path in `RewriteStatepointsForGC`. Reviewers: pgavlin, JosephTremoulet, reames Subscribers: sanjoy, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D16342 llvm-svn: 258338	2016-01-20 19:50:25 +00:00
Sanjay Patel	bd2dc67142	[LibCallSimplifier] don't get fooled by a fake sqrt() The test case will crash without this patch because the subsequent call to hasUnsafeAlgebra() assumes that the call instruction is an FPMathOperator (ie, returns an FP type). This part of the function signature check was omitted for the sqrt() case, but seems to be in place for all other transforms. Before: http://reviews.llvm.org/rL257400 ...we would have needlessly continued execution in optimizeSqrt(), but the bug was harmless because we'd eventually fail some other check and return without damage. This should fix: https://llvm.org/bugs/show_bug.cgi?id=26211 Differential Revision: http://reviews.llvm.org/D16198 llvm-svn: 258325	2016-01-20 17:41:14 +00:00
Joseph Tremoulet	b41632bf0f	[Inliner/WinEH] Honor implicit nounwinds Summary: Funclet EH tables require that a given funclet have only one unwind destination for exceptional exits. The verifier will therefore reject e.g. two cleanuprets with different unwind dests for the same cleanup, or two invokes exiting the same funclet but to different unwind dests. Because catchswitch has no 'nounwind' variant, and because IR producers are not required to annotate calls which will not unwind as 'nounwind', it is legal to nest a call or an "unwind to caller" catchswitch within a funclet pad that has an unwind destination other than caller; it is undefined behavior for such a call or catchswitch to unwind. Normally when inlining an invoke, calls in the inlined sequence are rewritten to invokes that unwind to the callsite invoke's unwind destination, and "unwind to caller" catchswitches in the inlined sequence are rewritten to unwind to the callsite invoke's unwind destination. However, if such a call or "unwind to caller" catchswitch is located in a callee funclet that has another exceptional exit with an unwind destination within the callee, applying the normal transformation would give that callee funclet multiple unwind destinations for its exceptional exits. There would be no way for EH table generation to determine which is the "true" exit, and the verifier would reject the function accordingly. Add logic to the inliner to detect these cases and leave such calls and "unwind to caller" catchswitches as calls and "unwind to caller" catchswitches in the inlined sequence. This fixes PR26147. Reviewers: rnk, andrew.w.kaylor, majnemer Subscribers: alexcrichton, llvm-commits Differential Revision: http://reviews.llvm.org/D16319 llvm-svn: 258273	2016-01-20 02:15:15 +00:00
Sanjay Patel	582857c95c	add tests to show missing memset/malloc optimizations (PR25892) llvm-svn: 258218	2016-01-19 23:07:10 +00:00
Sanjoy Das	29a4b5dc0d	[SCEV] Fix PR26207 In some cases, the max backedge taken count can be more conservative than the exact backedge taken count (for instance, because ScalarEvolution::getRange is not control-flow sensitive whereas computeExitLimitFromICmp can be). In these cases, computeExitLimitFromCond (specifically the bit that deals with `and` and `or` instructions) can create an ExitLimit instance with a `SCEVCouldNotCompute` max backedge count expression, but a computable exact backedge count expression. This violates an implicit SCEV assumption: a computable exact BE count should imply a computable max BE count. This change - Makes the above implicit invariant explicit by adding an assert to ExitLimit's constructor - Changes `computeExitLimitFromCond` to be more robust around conservative max backedge counts llvm-svn: 258184	2016-01-19 20:53:51 +00:00
Sanjay Patel	d1f4f03f5e	[LibCallSimplifier] use instruction-level fast-math-flags to shrink calls This is a continuation of adding FMF to call instructions: http://reviews.llvm.org/rL255555 llvm-svn: 258158	2016-01-19 18:38:52 +00:00
Sanjay Patel	81a63cd11f	[LibCallSimplifier] use instruction-level fast-math-flags to transform pow(x, [small integer]) calls This is a continuation of adding FMF to call instructions: http://reviews.llvm.org/rL255555 As with D15937, the intent of the patch is to preserve the current behavior of the transform except that we use the pow call's 'fast' attribute as a trigger rather than a function-level attribute. The TODO comment notes a potential follow-on patch that would propagate FMF to the new instructions. Differential Revision: http://reviews.llvm.org/D16122 llvm-svn: 258153	2016-01-19 18:15:12 +00:00
Sanjoy Das	de47590589	[IndVars] Fix PR25576 `LCSSASafePhiForRAUW` as computed was incorrect -- in cases like these (this exact example does not actually trigger the bug): define i32 @f(i32 %n, i1* %c) { entry: br label %outer.loop outer.loop: br label %inner.loop inner.loop: %iv = phi i32 [ 0, %outer.loop ], [ %iv.inc, %inner.loop ] %iv.inc = add nuw nsw i32 %iv, 1 %tc = udiv i32 %n, 13 %be.cond = icmp ult i32 %iv, %tc br i1 %be.cond, label %inner.loop, label %inner.exit inner.exit: %iv.lcssa = phi i32 [ %iv, %inner.loop ] %outer.be.cond = load volatile i1, i1* %c br i1 %outer.be.cond, label %outer.loop, label %leave leave: %iv.lcssa.lcssa = phi i32 [ %iv.lcssa, %inner.exit ] ret i32 %iv.lcssa.lcssa } `LCSSASafePhiForRAUW` is true for `%iv.lcssa` when re-rewriting the exit value of `%iv` for `%inner.loop` to `%tc` (this can happen due to `SCEVExpander::findExistingExpansion`), but the RAUW breaks LCSSA. To fix this, instead of computing `SafePhi` with special logic, decide the safety of RAUW directly via `replacementPreservesLCSSAForm`. llvm-svn: 258016	2016-01-17 18:12:52 +00:00
Artur Pilipenko	f84dc06e5b	Push isDereferenceableAndAlignedPointer down into isSafeToLoadUnconditionally Reviewed By: reames Differential Revision: http://reviews.llvm.org/D16226 llvm-svn: 258010	2016-01-17 12:35:29 +00:00
Igor Laevsky	28eeb3f66c	[BasicAliasAnalysis] Take into account operand bundles in the getModRefInfo function Differential Revision: http://reviews.llvm.org/D16225 llvm-svn: 257991	2016-01-16 12:15:53 +00:00
Matthew Simpson	57fe1b10db	Reapply r257800 with fix The fix uniques the bundle of getelementptr indices we are about to vectorize since it's possible for the same index to be used by multiple instructions. The original commit message is below. [SLP] Vectorize the index computations of getelementptr instructions. This patch seeds the SLP vectorizer with getelementptr indices. The primary motivation in doing so is to vectorize gather-like idioms beginning with consecutive loads (e.g., g[a[0] - b[0]] + g[a[1] - b[1]] + ...). While these cases could be vectorized with a top-down phase, seeding the existing bottom-up phase with the index computations avoids the complexity, compile-time, and phase ordering issues associated with a full top-down pass. Only bundles of single-index getelementptrs with non-constant differences are considered for vectorization. llvm-svn: 257918	2016-01-15 18:51:51 +00:00
Silviu Baranga	f29dfd36bb	Re-commit r257064, after it was reverted in r257340. This contains a fix for the issue that caused the revert: we no longer assume that we can insert instructions after the instruction that produces the base pointer. We previously assumed that this would be ok, because the instruction produces a value and therefore is not a terminator. This is false for invoke instructions. We will now insert these new instruction directly at the location of the users. Original commit message: [InstCombine] Look through PHIs, GEPs, IntToPtrs and PtrToInts to expose more constants when comparing GEPs Summary: When comparing two GEP instructions which have the same base pointer and one of them has a constant index, it is possible to only compare indices, transforming it to a compare with a constant. This removes one use for the GEP instruction with the constant index, can reduce register pressure and can sometimes lead to removing the comparisson entirely. InstCombine was already doing this when comparing two GEPs if the base pointers were the same. However, in the case where we have complex pointer arithmetic (GEPs applied to GEPs, PHIs of GEPs, conversions to or from integers, etc) the value of the original base pointer will be hidden to the optimizer and this transformation will be disabled. This change detects when the two sides of the comparison can be expressed as GEPs with the same base pointer, even if they don't appear as such in the IR. The transformation will convert all the pointer arithmetic to arithmetic done on indices and all the relevant uses of GEPs to GEPs with a common base pointer. The GEP comparison will be converted to a comparison done on indices. Reviewers: majnemer, jmolloy Subscribers: hfinkel, jevinskie, jmolloy, aadg, llvm-commits Differential Revision: http://reviews.llvm.org/D15146 llvm-svn: 257897	2016-01-15 15:52:05 +00:00
Matthew Simpson	9258e013a2	Revert "[SLP] Vectorize the index computations of getelementptr instructions." This reverts commit r257800. llvm-svn: 257888	2016-01-15 13:10:46 +00:00
James Molloy	f01488e2bc	[InstCombine] Rewrite bswap/bitreverse handling completely. There are several requirements that ended up with this design; 1. Matching bitreversals is too heavyweight for InstCombine and doesn't really need to be done so early. 2. Bitreversals and byteswaps are very related in their matching logic. 3. We want to implement support for matching more advanced bswap/bitreverse patterns like partial bswaps/bitreverses. 4. Bswaps are best matched early in InstCombine. The result of these is that a new utility function is created in Transforms/Utils/Local.h that can be configured to search for bswaps, bitreverses or both. InstCombine uses it to find only bswaps, CGP uses it to find only bitreversals. We can then extend the matching logic in one place only. llvm-svn: 257875	2016-01-15 09:20:19 +00:00
Keno Fischer	81e2e9ef86	Reapply r257105 "[Verifier] Check that debug values have proper size" I originally reapplied this in 257550, but had to revert again due to bot breakage. The only change in this version is to allow either the TypeSize or the TypeAllocSize of the variable to be the one represented in debug info (hopefully in the future we can figure out how to encode the difference). Additionally, several bot failures following r257550, were due to optimizer bugs now fixed in r257787 and r257795. r257550 commit message was: ``` The follow extra changes were made to test cases: Manually making the variable be the actual type instead of a pointer to avoid pointer-size differences in generic code: LLVM :: DebugInfo/Generic/2010-03-24-MemberFn.ll LLVM :: DebugInfo/Generic/2010-04-06-NestedFnDbgInfo.ll LLVM :: DebugInfo/Generic/2010-05-03-DisableFramePtr.ll LLVM :: DebugInfo/Generic/varargs.ll Delete sizing information from debug info for the same reason (but the presence of the pointer was important to the test case): LLVM :: DebugInfo/Generic/restrict.ll LLVM :: DebugInfo/Generic/tu-composite.ll LLVM :: Linker/type-unique-type-array-a.ll LLVM :: Linker/type-unique-simple2.ll Fixing an incorrect DW_OP_deref LLVM :: DebugInfo/Generic/2010-05-03-OriginDIE.ll Fixing a missing DW_OP_deref LLVM :: DebugInfo/Generic/incorrect-variable-debugloc.ll Additionally, clang should no longer complain during bootstrap should no longer happen after r257534. The original commit message was: `` Summary: Teach the Verifier to make sure that the storage size given to llvm.dbg.declare or the value size given to llvm.dbg.value agree with what is declared in DebugInfo. This is implicitly assumed in a number of passes (e.g. in SROA). Additionally this catches a number of common mistakes, such as passing a pointer when a value was intended or vice versa. One complication comes from stack coloring which modifies the original IR when it merges allocas in order to make sure that if AA falls back to the IR it gets the correct result. However, given this new invariant, indiscriminately replacing one alloca by a different (differently sized one) is no longer valid. Fix this by just undefing out any use of the alloca in a dbg.declare in this case. Additionally, I had to fix a number of test cases. Of particular note: - I regenerated dbg-changes-codegen-branch-folding.ll from the given source as it was affected by the bug fixed in r256077 - two-cus-from-same-file.ll was changed to avoid having a variable-typed debug variable as that would depend on the target, even though this test is supposed to be generic - I had to manually declared size/align for reference type. See also the discussion for D14275/r253186. - fpstack-debuginstr-kill.ll required changing `double` to `long double` - most others were just a question of adding OP_deref `` ``` llvm-svn: 257850	2016-01-15 00:46:17 +00:00
Matthew Simpson	791fd160c3	[SLP] Vectorize the index computations of getelementptr instructions. This patch seeds the SLP vectorizer with getelementptr indices. The primary motivation in doing so is to vectorize gather-like idioms beginning with consecutive loads (e.g., g[a[0] - b[0]] + g[a[1] - b[1]] + ...). While these cases could be vectorized with a top-down phase, seeding the existing bottom-up phase with the index computations avoids the complexity, compile-time, and phase ordering issues associated with a full top-down pass. Only bundles of single-index getelementptrs with non-constant differences are considered for vectorization. Differential Revision: http://reviews.llvm.org/D14829 llvm-svn: 257800	2016-01-14 20:46:27 +00:00
Keno Fischer	d5354fdddb	[SROA] Also insert a bit piece expression if only one piece is needed Summary: If SROA creates only one piece (e.g. because the other is not needed), it still needs to create a bit_piece expression if that bit piece is smaller than the original size of the alloca. Reviewers: aprantl Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D16187 llvm-svn: 257795	2016-01-14 20:06:34 +00:00
Keno Fischer	1dd319f3b6	[Utils] Fix incorrect dbg.declare store conversion Summary: The dbg.declare -> dbg.value conversion did not check which operand of the store instruction the alloca was passed to. As a result code that stored the address of an alloca, rather than storing to the alloca, would still trigger the conversion routine, leading to the insertion of an incorrect dbg.value intrinsic. Reviewers: aprantl Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D16169 llvm-svn: 257787	2016-01-14 19:12:27 +00:00
Akira Hatanaka	b45b1ea86f	[Inliner] Merge the attributes of the caller and callee functions This patch turns off the fast-math optimization attribute on the caller if the callee's fast-math attribute is not turned on. For example, - before inlining caller: "less-precise-fpmad"="true" callee: "less-precise-fpmad"="false" - after inlining caller: "less-precise-fpmad"="false" Alternatively, it's possible to block inlining if the caller's and callee's attributes don't match. If this approach is preferable to the one in this patch, we can discuss post-commit. rdar://problem/19836465 Differential Revision: http://reviews.llvm.org/D7802 llvm-svn: 257575	2016-01-13 06:02:45 +00:00
Keno Fischer	78e5c9e6e2	Re-Revert r257105 (Verifier debug info changes) While I investigate some new buildbot failures. This was originally reapplied as r257550 and r257558. llvm-svn: 257563	2016-01-13 02:31:14 +00:00
Keno Fischer	25916079ff	Reapply r257105 "[Verifier] Check that debug values have proper size" The follow extra changes were made to test cases: Manually making the variable be the actual type instead of a pointer to avoid pointer-size differences in generic code: LLVM :: DebugInfo/Generic/2010-03-24-MemberFn.ll LLVM :: DebugInfo/Generic/2010-04-06-NestedFnDbgInfo.ll LLVM :: DebugInfo/Generic/2010-05-03-DisableFramePtr.ll LLVM :: DebugInfo/Generic/varargs.ll Delete sizing information from debug info for the same reason (but the presence of the pointer was important to the test case): LLVM :: DebugInfo/Generic/restrict.ll LLVM :: DebugInfo/Generic/tu-composite.ll LLVM :: Linker/type-unique-type-array-a.ll LLVM :: Linker/type-unique-simple2.ll Fixing an incorrect DW_OP_deref LLVM :: DebugInfo/Generic/2010-05-03-OriginDIE.ll Fixing a missing DW_OP_deref LLVM :: DebugInfo/Generic/incorrect-variable-debugloc.ll Additionally, clang should no longer complain during bootstrap should no longer happen after r257534. The original commit message was: ``` Summary: Teach the Verifier to make sure that the storage size given to llvm.dbg.declare or the value size given to llvm.dbg.value agree with what is declared in DebugInfo. This is implicitly assumed in a number of passes (e.g. in SROA). Additionally this catches a number of common mistakes, such as passing a pointer when a value was intended or vice versa. One complication comes from stack coloring which modifies the original IR when it merges allocas in order to make sure that if AA falls back to the IR it gets the correct result. However, given this new invariant, indiscriminately replacing one alloca by a different (differently sized one) is no longer valid. Fix this by just undefing out any use of the alloca in a dbg.declare in this case. Additionally, I had to fix a number of test cases. Of particular note: - I regenerated dbg-changes-codegen-branch-folding.ll from the given source as it was affected by the bug fixed in r256077 - two-cus-from-same-file.ll was changed to avoid having a variable-typed debug variable as that would depend on the target, even though this test is supposed to be generic - I had to manually declared size/align for reference type. See also the discussion for D14275/r253186. - fpstack-debuginstr-kill.ll required changing `double` to `long double` - most others were just a question of adding OP_deref ``` llvm-svn: 257550	2016-01-13 00:31:44 +00:00
Fiona Glaser	db7824f0c1	CannotBeOrderedLessThanZero: add some missing cases llvm-svn: 257542	2016-01-12 23:37:30 +00:00
Keno Fischer	9aae445e09	[Utils] Insert DW_OP_bit_piece when only describing part of the variable Summary: The dbg.declare -> dbg.value conversion looks through any zext/sext to find a value to describe the variable (in the expectation that those zext/sext instruction will go away later). However, those values do not cover the entire variable and thus need a DW_OP_bit_piece. Reviewers: aprantl Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D16061 llvm-svn: 257534	2016-01-12 22:46:09 +00:00
Sanjay Patel	53ba88dbb0	[LibCallSimplifier] use instruction-level fast-math-flags to transform pow(x, 0.5) calls Also, propagate the FMF to the newly created sqrt() call. llvm-svn: 257503	2016-01-12 19:06:35 +00:00
Teresa Johnson	1e26e39579	Fix bot failure from r257493: remove extraneous temp file read This was left from an earlier version of the test. llvm-svn: 257494	2016-01-12 17:53:59 +00:00
Teresa Johnson	388497e8be	[ThinLTO] Handle an external call from an import to an alias in dest The findExternalCalls routine ignores calls to functions already defined in the dest module. This was not handling the case where the definition in the current module is actually an alias to a function call. llvm-svn: 257493	2016-01-12 17:48:44 +00:00
Sanjay Patel	6002e78a06	[LibCallSimplifier] use instruction-level fast-math-flags to transform pow(exp(x)) calls See also: http://reviews.llvm.org/rL255555 http://reviews.llvm.org/rL256871 http://reviews.llvm.org/rL256964 http://reviews.llvm.org/rL257400 http://reviews.llvm.org/rL257404 http://reviews.llvm.org/rL257414 llvm-svn: 257491	2016-01-12 17:30:37 +00:00
Sanjay Patel	dc19b8de76	consolidate exp/exp2 tests The transform is identical, so keep the tests together and save some overhead. llvm-svn: 257484	2016-01-12 17:00:38 +00:00
Sanjay Patel	71e550fefb	Add/edit tests to include instruction-level FMF on calls Prepatory patch before changing LibCallSimplifier to use the FMF. Also, tighten the CHECK lines and give the tests more meaningful names. Similar changes to: http://reviews.llvm.org/rL257414 llvm-svn: 257481	2016-01-12 16:50:17 +00:00
Teresa Johnson	5fe40050bd	[IRMover] Don't copy personality, etc unless creating def Function::copyAttributesFrom will copy the personality function, prefix data and prolog data from the source function to the new function, and is invoked when the IRMover copies the function prototype. This puts a reference to a constant in the source module on a function in the dest module, which causes an error when deleting the source module after importing, since the personality function in the source module still has uses (this would presumably also be an issue for the prologue and prefix data). Remove the copies added to the dest copy when creating the new prototype, as they are mapped properly when/if we link the function body. llvm-svn: 257420	2016-01-12 00:24:24 +00:00
Sanjay Patel	e896ede7f1	[LibCallSimplifier] use instruction-level fast-math-flags to transform log calls Also, add tests to verify that we're checking 'fast' on both calls of each transform pair, tighten the CHECK lines, and give the tests more meaningful names. This is a continuation of: http://reviews.llvm.org/rL255555 http://reviews.llvm.org/rL256871 http://reviews.llvm.org/rL256964 http://reviews.llvm.org/rL257400 http://reviews.llvm.org/rL257404 llvm-svn: 257414	2016-01-11 23:31:48 +00:00
Sanjay Patel	6c1ddbb7b6	[LibCallSimplifier] don't allow sqrt transform unless all ops are unsafe Fix the FIXME added with: http://reviews.llvm.org/rL257400 llvm-svn: 257404	2016-01-11 22:50:36 +00:00
Justin Bogner	0fb7ed5726	LoopUnroll: Use the optsize threshold for minsize as well Currently we're unrolling loops more in minsize than in optsize, which means -Oz will have a larger code size than -Os. That doesn't make any sense. This resolves the FIXME about this in LoopUnrollPass and extends the optsize test to make sure we use the smaller threshold for minsize as well. llvm-svn: 257402	2016-01-11 22:39:43 +00:00
Sanjay Patel	683f29735f	[LibCallSimplifier] use instruction-level fast-math-flags to transform sqrt calls This is a continuation of adding FMF to call instructions: http://reviews.llvm.org/rL255555 The intent of the patch is to preserve the current behavior of the transform except that we use the sqrt instruction's 'fast' attribute as a trigger rather than the function-level attribute. But this raises a bug noted by the new FIXME comment. In order to do this transform: sqrt((x * x) * y) ---> fabs(x) * sqrt(y) ...we need all of the sqrt, the first fmul, and the second fmul to be 'fast'. If any of those ops is strict, we should bail out. Differential Revision: http://reviews.llvm.org/D15937 llvm-svn: 257400	2016-01-11 22:34:19 +00:00
Silviu Baranga	603954ef0e	Revert r257164 - it has caused spec2k6 failures in LTO mode llvm-svn: 257340	2016-01-11 16:19:38 +00:00
David Majnemer	7a99dd4d3f	Add test for r257279. llvm-svn: 257280	2016-01-10 07:13:33 +00:00
Chen Li	1689c2f54b	[SimplifyCFG] Extend SimplifyResume to handle phi of trivial landing pad. Summary: This is a fix of D13718. D13718 was committed but then reverted because of the following bug: https://llvm.org/bugs/show_bug.cgi?id=25299 This patch fixes the issue shown in the bug. Reviewers: majnemer, reames Subscribers: jevinskie, llvm-commits Differential Revision: http://reviews.llvm.org/D14308 llvm-svn: 257277	2016-01-10 05:48:01 +00:00
Manuel Jacob	734e73342d	[RS4GC] Update and simplify handling of Constants in findBaseDefiningValueOfVector(). Summary: This is analogous to r256079, which removed an overly strong assertion, and r256812, which simplified the code by replacing three conditionals by one. Reviewers: reames Subscribers: sanjoy, llvm-commits Differential Revision: http://reviews.llvm.org/D16019 llvm-svn: 257250	2016-01-09 04:02:16 +00:00
Philip Reames	5715f576ea	[rs4gc] Optionally directly relocated vector of pointers This patch teaches rewrite-statepoints-for-gc to relocate vector-of-pointers directly rather than trying to split them. This builds on the recent lowering/IR changes to allow vector typed gc.relocates. The motivation for this is that we recently found a bug in the vector splitting code where depending on visit order, a vector might not be relocated at some safepoint. Specifically, the bug is that the splitting code wasn't updating the side tables (live vector) of other safepoints. As a result, a vector which was live at two safepoints might not be updated at one of them. However, if you happened to visit safepoints in post order over the dominator tree, everything worked correctly. Weirdly, it turns out that post order is actually an incredibly common order to visit instructions in in practice. Frustratingly, I have not managed to write a test case which actually hits this. I can only reproduce it in large IR files produced by actual applications. Rather than continue to make this code more complicated, we can remove all of the complexity by just representing the relocation of the entire vector natively in the IR. At the moment, the new functionality is hidden behind a flag. To use this code, you need to pass "-rs4gc-split-vector-values=0". Once I have a chance to stress test with this option and get feedback from other users, my plan is to flip the default and remove the original splitting code. I would just remove it now, but given the rareness of the bug, I figured it was better to leave it in place until the new approach has been stress tested. Differential Revision: http://reviews.llvm.org/D15982 llvm-svn: 257244	2016-01-09 01:31:13 +00:00
Haicheng Wu	a6a3279bd3	[JumpThreading] Split select that has constant conditions coming from the PHI node Look for PHI/Select in the same BB of the form bb: %p = phi [false, %bb1], [true, %bb2], [false, %bb3], [true, %bb4], ... %s = select p, trueval, falseval And expand the select into a branch structure. This later enables jump-threading over bb in this pass. Using the similar approach of SimplifyCFG::FoldCondBranchOnPHI(), unfold select if the associated PHI has at least one constant. If the unfolded select is not jump-threaded, it will be folded again in the later optimizations. llvm-svn: 257198	2016-01-08 19:39:39 +00:00
Justin Bogner	e9fb228d59	LoopInfo: Simplify ownership of Loop objects It's strange that LoopInfo mostly owns the Loop objects, but that it defers deleting them to the loop pass manager. Instead, change the oddly named "updateUnloop" to "markAsRemoved" and have it queue the Loop object for deletion. We can't delete the Loop immediately when we remove it, since we need its pointer identity still, so we'll mark the object as "invalid" so that clients can see what's going on. llvm-svn: 257191	2016-01-08 19:08:53 +00:00
Teresa Johnson	a1080ee6f0	[ThinLTO] Delay metadata materializtion in function importer The function importer was still materializing metadata when modules were loaded for function importing. We only want to materialize it when we are going to invoke the metadata linking postpass. Materializing it before function importing is not only unnecessary, but also causes metadata referenced by imported functions to be mapped in early, and then not connected to the rest of the module level metadata when it is ultimately linked in. Augmented the test case to specifically check for the metadata being properly connected, which it wasn't before this fix. llvm-svn: 257171	2016-01-08 14:17:41 +00:00
Silviu Baranga	9e007efad2	Re-commit r257064, this time with a fixed assert In setInsertionPoint if the value is not a PHI, Instruction or Argument it should be a Constant, not a ConstantExpr. Original commit message: [InstCombine] Look through PHIs, GEPs, IntToPtrs and PtrToInts to expose more constants when comparing GEPs Summary: When comparing two GEP instructions which have the same base pointer and one of them has a constant index, it is possible to only compare indices, transforming it to a compare with a constant. This removes one use for the GEP instruction with the constant index, can reduce register pressure and can sometimes lead to removing the comparisson entirely. InstCombine was already doing this when comparing two GEPs if the base pointers were the same. However, in the case where we have complex pointer arithmetic (GEPs applied to GEPs, PHIs of GEPs, conversions to or from integers, etc) the value of the original base pointer will be hidden to the optimizer and this transformation will be disabled. This change detects when the two sides of the comparison can be expressed as GEPs with the same base pointer, even if they don't appear as such in the IR. The transformation will convert all the pointer arithmetic to arithmetic done on indices and all the relevant uses of GEPs to GEPs with a common base pointer. The GEP comparison will be converted to a comparison done on indices. Reviewers: majnemer, jmolloy Subscribers: hfinkel, jevinskie, jmolloy, aadg, llvm-commits Differential Revision: http://reviews.llvm.org/D15146 llvm-svn: 257164	2016-01-08 11:11:04 +00:00
Chandler Carruth	1926b70e37	[attrs] Split the late-revisit pattern for deducing norecurse in a top-down manner into a true top-down or RPO pass over the call graph. There are specific patterns of function attributes, notably the norecurse attribute, which are most effectively propagated top-down because all they us caller information. Walk in RPO over the call graph SCCs takes the form of a module pass run immediately after the CGSCC pass managers postorder walk of the SCCs, trying again to deduce norerucrse for each singular SCC in the call graph. This removes a very legacy pass manager specific trick of using a lazy revisit list traversed during finalization of the CGSCC pass. There is no analogous finalization step in the new pass manager, and a lazy revisit list is just trying to produce an RPO iteration of the call graph. We can do that more directly if more expensively. It seems unlikely that this will be the expensive part of any compilation though as we never examine the function bodies here. Even in an LTO run over a very large module, this should be a reasonable fast set of operations over a reasonably small working set -- the function call graph itself. In the future, if this really is a compile time performance issue, we can look at building support for both post order and RPO traversals directly into a pass manager that builds and maintains the PO list of SCCs. Differential Revision: http://reviews.llvm.org/D15785 llvm-svn: 257163	2016-01-08 10:55:52 +00:00
Sanjay Patel	d72a458d28	[InstCombine] insert a new shuffle in a safe place (PR25999) Limit this transform to a basic block and guard against PHIs. Hopefully, this fixes the remaining failures in PR25999: https://llvm.org/bugs/show_bug.cgi?id=25999 llvm-svn: 257133	2016-01-08 01:39:16 +00:00
Aditya Nandakumar	f94c149f7f	Instructions to be redone only if from the same BB While adding instructions(possible roots) to be redone, make sure they are from the same basic block. llvm-svn: 257112	2016-01-07 23:22:55 +00:00
Keno Fischer	ea33a25816	Temporarily revert r257105 "[Verifier] Check that debug values have proper size" Looks like there's a case where clang generates debug info that triggers the new verifier check. Reverting while investigating. llvm-svn: 257107	2016-01-07 22:39:11 +00:00
Keno Fischer	b3326be6ad	[Verifier] Check that debug values have proper size Summary: Teach the Verifier to make sure that the storage size given to llvm.dbg.declare or the value size given to llvm.dbg.value agree with what is declared in DebugInfo. This is implicitly assumed in a number of passes (e.g. in SROA). Additionally this catches a number of common mistakes, such as passing a pointer when a value was intended or vice versa. One complication comes from stack coloring which modifies the original IR when it merges allocas in order to make sure that if AA falls back to the IR it gets the correct result. However, given this new invariant, indiscriminately replacing one alloca by a different (differently sized one) is no longer valid. Fix this by just undefing out any use of the alloca in a dbg.declare in this case. Additionally, I had to fix a number of test cases. Of particular note: - I regenerated dbg-changes-codegen-branch-folding.ll from the given source as it was affected by the bug fixed in r256077 - two-cus-from-same-file.ll was changed to avoid having a variable-typed debug variable as that would depend on the target, even though this test is supposed to be generic - I had to manually declared size/align for reference type. See also the discussion for D14275/r253186. - fpstack-debuginstr-kill.ll required changing `double` to `long double` - most others were just a question of adding OP_deref Reviewers: aprantl Differential Revision: http://reviews.llvm.org/D14276 llvm-svn: 257105	2016-01-07 22:18:37 +00:00
David Majnemer	f1a9c9e148	[SCCP] Don't violate the lattice invariants We marked values which are 'undef' as constant instead of undefined which violates SCCP's invariants. If we can figure out that a computation results in 'undef', leave it in the undefined state. This fixes PR16052. llvm-svn: 257102	2016-01-07 21:36:16 +00:00
David Majnemer	867bbc775f	Add test for r256912 I forgot to add this with the rest of r256912. llvm-svn: 257088	2016-01-07 19:27:16 +00:00
David Majnemer	bae945735a	[SCCP] Can't go from overdefined to constant The fix for PR23999 made us mark loads of null as producing the constant undef which upsets the lattice. Instead, keep the load as "undefined". This fixes PR26044. llvm-svn: 257087	2016-01-07 19:25:39 +00:00
Silviu Baranga	dd68d46ec1	Revert r257064. It caused failures in some sanitizer tests. llvm-svn: 257069	2016-01-07 15:46:43 +00:00
Silviu Baranga	57b1b90996	[InstCombine] Look through PHIs, GEPs, IntToPtrs and PtrToInts to expose more constants when comparing GEPs Summary: When comparing two GEP instructions which have the same base pointer and one of them has a constant index, it is possible to only compare indices, transforming it to a compare with a constant. This removes one use for the GEP instruction with the constant index, can reduce register pressure and can sometimes lead to removing the comparisson entirely. InstCombine was already doing this when comparing two GEPs if the base pointers were the same. However, in the case where we have complex pointer arithmetic (GEPs applied to GEPs, PHIs of GEPs, conversions to or from integers, etc) the value of the original base pointer will be hidden to the optimizer and this transformation will be disabled. This change detects when the two sides of the comparison can be expressed as GEPs with the same base pointer, even if they don't appear as such in the IR. The transformation will convert all the pointer arithmetic to arithmetic done on indices and all the relevant uses of GEPs to GEPs with a common base pointer. The GEP comparison will be converted to a comparison done on indices. Reviewers: majnemer, jmolloy Subscribers: hfinkel, jevinskie, jmolloy, aadg, llvm-commits Differential Revision: http://reviews.llvm.org/D15146 llvm-svn: 257064	2016-01-07 14:56:08 +00:00
Mehdi Amini	0535003bef	Fix PR26051: Memcpy optimization should introduce a call to memcpy before the store destination position This is a conservative fix, I expect Amaury to relax this. Follow-up for r256923 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 256999	2016-01-06 23:50:22 +00:00
Chen Li	78bde83003	[SplitLandingPadPredecessors] Create a PHINode for the original landingpad only if it has some uses Summary: This patch adds a check in SplitLandingPadPredecessors to see if the original landingpad instruction has any uses. If not, we don't need to create a PHINode for it in the joint block since it's gonna be a dead code anyway. The motivation for this patch is that we found a bug that SplitLandingPadPredecessors created a PHINode of token type landingpad, which failed the verifier since PHINode can not be token type. However, the created PHINode will never be used in our code pattern. This patch will workaround this bug, and we might add supports in SplitLandingPadPredecessors to handle token type landingpad with uses in the future. Reviewers: reames Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D15835 llvm-svn: 256972	2016-01-06 20:32:05 +00:00
Amaury Sechet	3235c08253	Promote aggregate store to memset when possible Summary: As per title. This will allow the optimizer to pick up on it. Reviewers: craig.topper, spatel, dexonsmith, Prazek, chandlerc, joker.eph, majnemer Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D15923 llvm-svn: 256969	2016-01-06 19:47:24 +00:00
Sanjay Patel	cddcd7256c	[LibCallSimplifier] use instruction-level fast-math-flags for tan/atan transform llvm-svn: 256964	2016-01-06 19:23:35 +00:00
Amaury Sechet	d3b2c0fd94	Improve load/store to memcpy for aggregate Summary: It turns out that if we don't try to do it at the store location, we can do it before any operation that alias the load, as long as no operation alias the store. Reviewers: craig.topper, spatel, dexonsmith, Prazek, chandlerc, joker.eph Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D15903 llvm-svn: 256923	2016-01-06 09:30:39 +00:00
Philip Reames	ae050a5703	[BasicAA] Remove special casing of memset_pattern16 in favor of generic attribute inference Most of the properties of memset_pattern16 can be now covered by the generic attributes and inferred by InferFunctionAttrs. The only exceptions are: - We don't yet have a writeonly attribute for the first argument. - We don't have an attribute for modeling the access size facts encoded in MemoryLocation.cpp. Differential Revision: http://reviews.llvm.org/D15879 llvm-svn: 256911	2016-01-06 04:53:16 +00:00
Manuel Jacob	3eedd11329	[Statepoints] Check for the "gc-leaf-function" attribute on call sites as well. Reviewers: sanjoy, reames Subscribers: sanjoy, llvm-commits Differential Revision: http://reviews.llvm.org/D15900 llvm-svn: 256875	2016-01-05 23:59:08 +00:00
Sanjay Patel	29095ea1b0	[LibCallSimplfier] use instruction-level fast-math-flags for fmin/fmax transforms llvm-svn: 256871	2016-01-05 20:46:19 +00:00
Amaury Sechet	a0c242cdfd	Implement load to store => memcpy in MemCpyOpt for aggregates Summary: Most of the tool chain is able to optimize scalar and memcpy like operation effisciently while it isn't that good with aggregates. In order to improve the support of aggregate, we try to change aggregate manipulation into either scalar or memcpy like ones whenever possible without loosing informations. This is one such opportunity. Reviewers: craig.topper, spatel, dexonsmith, Prazek, chandlerc Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D15894 llvm-svn: 256868	2016-01-05 20:17:48 +00:00
Manuel Jacob	68b753a4fb	Correct my last commit (revision 256860). I forgot to save a small wording improvement before committing. llvm-svn: 256862	2016-01-05 19:45:54 +00:00
Manuel Jacob	b8060cd88a	[PlaceSafepoints] Add a test. Calls of functions with the "gc-leaf-function" attribute shouldn't be turned into a safepoint. llvm-svn: 256860	2016-01-05 19:40:58 +00:00
Sanjay Patel	a1c5347982	[InstCombine] insert a new shuffle before its uses (PR26015) Although this solves the test case in PR26015: https://llvm.org/bugs/show_bug.cgi?id=26015 And may solve PR25999: https://llvm.org/bugs/show_bug.cgi?id=25999 ...I suspect this is not the best solution. I think we want to insert the new shuffle just ahead of the earliest ExtractElementInst that we're replacing, but I don't know how that should be implemented. Differential Revision: http://reviews.llvm.org/D15878 llvm-svn: 256857	2016-01-05 19:09:47 +00:00
David Majnemer	59eb733af1	[SimplifyCFG] Further improve our ability to remove redundant catchpads In r256814, we managed to remove catchpads which were trivially redudant because they were the same SSA value. We can do better using the same algorithm but with a smarter datastructure by hashing the SSA values within the catchpad and comparing them structurally. llvm-svn: 256815	2016-01-05 07:42:17 +00:00
David Majnemer	2fa8651a8f	[SimplifyCFG] Remove redundant catchpads Remove duplicate catchpad handlers from a catchswitch. llvm-svn: 256814	2016-01-05 06:27:50 +00:00
Joseph Tremoulet	0d808888c1	[WinEH] Simplify unreachable catchpads Summary: At least for CoreCLR, a catchpad which immediately executes an `unreachable` instruction indicates that the exception can never have a matching type, and so such catchpads can be removed, and so can their catchswitches if the catchswitch becomes empty. Reviewers: rnk, andrew.w.kaylor, majnemer Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D15846 llvm-svn: 256809	2016-01-05 02:37:41 +00:00
Chen Li	c6021038f6	[InstructionCombining] prepareICWorklistFromFunction halts in infinite loop with instructions of token type Summary: This patch fixes a bug in prepareICWorklistFromFunction, where the loop becomes infinite with instructions of token type. The patch checks if the instruction is token type, and if so it updates EndInst with the current instruction. Reviewers: reames, majnemer Subscribers: llvm-commits, sanjoy Differential Revision: http://reviews.llvm.org/D15859 llvm-svn: 256792	2016-01-04 23:28:57 +00:00
David Majnemer	b33f3a239a	[LICM] Fix a small oversight introduced in r256763 r256763 had promoteLoopAccessesToScalars check for the existence of a catchswitch when the exit blocks were populated but promoteLoopAccessesToScalars may be called with a prepopulated set of exit blocks which would also need to be checked. This fixes PR26019. llvm-svn: 256788	2016-01-04 23:16:22 +00:00
Philip Reames	2466719e44	[MemoryBuiltins] Remove isOperatorNewLike by consolidating non-null inference handling This patch removes the isOperatorNewLike predicate since it was only being used to establish a non-null return value and we have attributes specifically for that purpose with generic handling. To keep approximate the same behaviour for existing frontends, I added the various operator new like (i.e. instances of operator new) to InferFunctionAttrs. It's not really clear to me why this isn't handled in Clang, but I didn't want to break existing code and any subtle assumptions it might have. Once this patch is in, I'm going to start separating the isAllocLike family of predicates. These appear to be being used for a mixture of things which should be more clearly separated and documented. Today, they're being used to indicate (at least) aliasing facts, CSE-ability, and default values from an allocation site. Differential Revision: http://reviews.llvm.org/D15820 llvm-svn: 256787	2016-01-04 22:49:23 +00:00
Aditya Nandakumar	12d060481a	Remove dead instructions before Redoing Before reevaluating instructions, iterate over all instructions to be reevaluated and remove trivially dead instructions and if any of it's operands become trivially dead, mark it for deletion until all trivially dead instructions have been removed llvm-svn: 256773	2016-01-04 19:48:14 +00:00
David Majnemer	219055f9df	[LICM] Don't insert instructions after a catchswitch when performing loop promotion Inserting after a catchswitch results in verifier errors, bail out on promotion if a catchswitch is a loop exit. llvm-svn: 256763	2016-01-04 17:42:19 +00:00
David Majnemer	42a0730c42	[LICM] Make instruction sinking funclet-aware We had two bugs here: - We might try to sink into a catchswitch, causing verifier failures. - We will succeed in sinking into a cleanuppad but we didn't update the funclet operand bundle. This fixes PR26000. llvm-svn: 256728	2016-01-04 03:37:39 +00:00
Dimitry Andric	227b928abc	Fix several accidental DOS line endings in source files Summary: There are a number of files in the tree which have been accidentally checked in with DOS line endings. Convert these to native line endings. There are also a few files which have DOS line endings on purpose, and I have set the svn:eol-style property to 'CRLF' on those. Reviewers: joerg, aaron.ballman Subscribers: aaron.ballman, sanjoy, dsanders, llvm-commits Differential Revision: http://reviews.llvm.org/D15848 llvm-svn: 256707	2016-01-03 17:22:03 +00:00
Sanjay Patel	bee05caa6b	[LibCallSimplifier] propagate FMF when shrinking binary calls llvm-svn: 256682	2015-12-31 23:40:59 +00:00
Sanjay Patel	aa23114cb4	[LibCallSimplifier] propagate FMF when shrinking unary calls llvm-svn: 256679	2015-12-31 21:52:31 +00:00
Sanjay Patel	f6f32bcaa4	change function names to avoid accidentally matching the substring llvm-svn: 256678	2015-12-31 21:25:25 +00:00
Sanjay Patel	4e8b300400	add 'fast' attribute to calls to show that the flag isn't being propagated llvm-svn: 256677	2015-12-31 21:12:19 +00:00
Geoff Berry	43dc285915	[JumpThreading] Fix opcode bonus in getJumpThreadDuplicationCost() The code that was meant to adjust the duplication cost based on the terminator opcode was not being executed in cases where the initial threshold was hit inside the loop. Subscribers: mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D15536 llvm-svn: 256568	2015-12-29 18:10:16 +00:00
Manuel Jacob	9db5b93ffc	[RS4GC] Fix rematerialization of bitcast of bitcast. Summary: Previously, only the outer (last) bitcast was rematerialized, resulting in a use of the unrelocated inner (first) bitcast after the statepoint. See the test case for an example. Reviewers: igor-laevsky, reames Subscribers: reames, alex, llvm-commits, sanjoy Differential Revision: http://reviews.llvm.org/D15789 llvm-svn: 256520	2015-12-28 20:14:05 +00:00
Chandler Carruth	3a040e6d47	[attrs] Extract the pure inference of function attributes into a standalone pass. There is no call graph or even interesting analysis for this part of function attributes -- it is literally inferring attributes based on the target library identification. As such, we can do it using a much simpler module pass that just walks the declarations. This can also happen much earlier in the pass pipeline which has benefits for any number of other passes. In the process, I've cleaned up one particular aspect of the logic which was necessary in order to separate the two passes cleanly. It now counts inferred attributes independently rather than just counting all the inferred attributes as one, and the counts are more clearly explained. The two test cases we had for this code path are both ... woefully inadequate and copies of each other. I've kept the superset test and updated it. We need more testing here, but I had to pick somewhere to stop fixing everything broken I saw here. Differential Revision: http://reviews.llvm.org/D15676 llvm-svn: 256466	2015-12-27 08:41:34 +00:00
Chandler Carruth	f49f1a87ef	[attrs] Split off the forced attributes utility into its own pass that is (by default) run much earlier than FuncitonAttrs proper. This allows forcing optnone or other widely impactful attributes. It is also a bit simpler as the force attribute behavior needs no specific iteration order. I've added the pass into the default module pass pipeline and LTO pass pipeline which mirrors where function attrs itself was being run. Differential Revision: http://reviews.llvm.org/D15668 llvm-svn: 256465	2015-12-27 08:13:45 +00:00
Benjamin Kramer	13dfb7df32	Fix safepoint intrinsic signatures in test. Should bring back the bots after r256443. llvm-svn: 256450	2015-12-26 11:40:48 +00:00
Chen Li	d71999ef1b	[gc.statepoint] Change gc.statepoint intrinsic's return type to token type instead of i32 type Summary: This patch changes gc.statepoint intrinsic's return type to token type instead of i32 type. Using token types could prevent LLVM to merge different gc.statepoint nodes into PHI nodes and cause further problems with gc relocations. The patch also changes the way on how gc.relocate and gc.result look for their corresponding gc.statepoint on unwind path. The current implementation uses the selector value extracted from a { i8*, i32 } landingpad as a hook to find the gc.statepoint, while the patch directly uses a token type landingpad (http://reviews.llvm.org/D15405) to find the gc.statepoint. Reviewers: sanjoy, JosephTremoulet, pgavlin, igor-laevsky, mjacob Subscribers: reames, mjacob, sanjoy, llvm-commits Differential Revision: http://reviews.llvm.org/D15662 llvm-svn: 256443	2015-12-26 07:54:32 +00:00
Sanjay Patel	ae945e7927	[InstCombine] transform more extract/insert pairs into shuffles (PR2109) This is an extension of the shuffle combining from r203229: http://reviews.llvm.org/rL203229 The idea is to widen a short input vector with undef elements so the existing shuffle transform for extract/insert can kick in. The motivation is to finally solve PR2109: https://llvm.org/bugs/show_bug.cgi?id=2109 For that example, the IR becomes: %1 = bitcast <2 x i32>* %P to <2 x float>* %ld1 = load <2 x float>, <2 x float>* %1, align 8 %2 = shufflevector <2 x float> %ld1, <2 x float> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef> %i2 = shufflevector <4 x float> %A, <4 x float> %2, <4 x i32> <i32 0, i32 1, i32 4, i32 5> ret <4 x float> %i2 And x86 SSE output improves from: movq (%rdi), %xmm1 ## xmm1 = mem[0],zero movdqa %xmm1, %xmm2 shufps $229, %xmm2, %xmm2 ## xmm2 = xmm2[1,1,2,3] shufps $48, %xmm0, %xmm1 ## xmm1 = xmm1[0,0],xmm0[3,0] shufps $132, %xmm1, %xmm0 ## xmm0 = xmm0[0,1],xmm1[0,2] shufps $32, %xmm0, %xmm2 ## xmm2 = xmm2[0,0],xmm0[2,0] shufps $36, %xmm2, %xmm0 ## xmm0 = xmm0[0,1],xmm2[2,0] retq To the almost optimal: movhpd (%rdi), %xmm0 Note: There's a tension in the existing transform related to generating arbitrary shufflevector masks. We avoid that in other places in InstCombine because we're scared that codegen can't handle strange masks, but it looks like we're ok with producing those here. I purposely chose weird insert/extract indexes for the regression tests to see the effect in these cases. For PowerPC+Altivec, AArch64, and X86+SSE/AVX, I think the codegen is equal or better for these examples. Differential Revision: http://reviews.llvm.org/D15096 llvm-svn: 256394	2015-12-24 21:17:56 +00:00
David Majnemer	63ad9e0543	[OperandBundles] Have TailCallElim play nice with operand bundles A call site's use of a Value might not correspond to an argument operand but to a bundle operand. This fixes PR25928. llvm-svn: 256328	2015-12-23 09:58:43 +00:00
David Majnemer	02f4787e45	[OperandBundles] Have InstCombine play nice with operand bundles Don't assume a call's use corresponds to an argument operand, it might correspond to a bundle operand. llvm-svn: 256327	2015-12-23 09:58:41 +00:00
David Majnemer	464be3724a	[OperandBundles] Have DeadArgElim play nice with operand bundles A call site's use of a Value might not correspond to an argument operand but to a bundle operand. llvm-svn: 256326	2015-12-23 09:58:36 +00:00
Manuel Jacob	a4efd8ac2e	[RS4GC] Fix base pair printing for constants. Previously, "%" + name of the value was printed for each derived and base pointer. This is correct for instructions, but wrong for e.g. globals. llvm-svn: 256305	2015-12-23 00:19:45 +00:00
Rafael Espindola	10d9a033db	Also add unnamed_addr to functions. llvm-svn: 256281	2015-12-22 20:43:30 +00:00
Rafael Espindola	5349d87a69	Delete dead GlobalAliases. llvm-svn: 256276	2015-12-22 19:50:22 +00:00
Cong Hou	e93b8e1539	[BPI] Replace weights by probabilities in BPI. This patch removes all weight-related interfaces from BPI and replace them by probability versions. With this patch, we won't use edge weight anymore in either IR or MC passes. Edge probabilitiy is a better representation in terms of CFG update and validation. Differential revision: http://reviews.llvm.org/D15519 llvm-svn: 256263	2015-12-22 18:56:14 +00:00
Manuel Jacob	4e4f60ded0	Remove deprecated llvm.experimental.gc.result.{int,float,ptr} intrinsics. Summary: These were deprecated 11 months ago when a generic llvm.experimental.gc.result intrinsic, which works for all types, was added. Reviewers: sanjoy, reames Subscribers: sanjoy, chenli, llvm-commits Differential Revision: http://reviews.llvm.org/D15719 llvm-svn: 256262	2015-12-22 18:44:45 +00:00
Manuel Jacob	990dfa6fe5	[RS4GC] Fix crash in the case that a live variable has a constant base. Summary: Previously, RS4GC crashed in CreateGCRelocates() because it assumed that every base is also in the array of live variables, which isn't true if a live variable has a constant base. This change fixes the crash by making sure CreateGCRelocates() won't try to relocate a live variable with a constant base. This would be unnecessary anyway because anything with a constant base won't move. Reviewers: reames Subscribers: llvm-commits, sanjoy Differential Revision: http://reviews.llvm.org/D15556 llvm-svn: 256252	2015-12-22 16:50:44 +00:00
Easwaran Raman	bdb6f1dcc3	Determine callee's hotness and adjust threshold based on that. NFC. This uses the same criteria used in CFE's CodeGenPGO to identify hot and cold callees and uses values of inlinehint-threshold and inlinecold-threshold respectively as the thresholds for such callees. Differential Revision: http://reviews.llvm.org/D15245 llvm-svn: 256222	2015-12-22 00:32:35 +00:00
Evgeniy Stepanov	8827f2db85	[safestack] Add option for non-TLS unsafe stack pointer. This patch adds an option, -safe-stack-no-tls, for using normal storage instead of thread-local storage for the unsafe stack pointer. This can be useful when SafeStack is applied to an operating system kernel. http://reviews.llvm.org/D15673 Patch by Michael LeMay. llvm-svn: 256221	2015-12-22 00:13:11 +00:00
Evgeniy Stepanov	fda72c52a2	[cfi] Fix LowerBitSets on 32-bit targets. This code attempts to truncate IntPtrTy to i32, which may be the same type. llvm-svn: 256205	2015-12-21 22:14:04 +00:00
Sanjoy Das	ab0626e35f	Nonnull elements in OperandBundleCallSites are not all Instructions `CloneAndPruneIntoFromInst` sometimes RAUW's dead instructions with `undef` before erasing them (to avoid deleting instructions that still have uses). This changes the `WeakVH` in `OperandBundleCallSites` to hold an `undef`, and we need to guard for this situation in eventuality in `llvm::InlineFunction`. llvm-svn: 256110	2015-12-19 22:40:28 +00:00
Sanjoy Das	b496834f8e	[Deopt bundles] Fix a test case The `CHECK-NOT` line was incorrect, and would not have caught a breakage. llvm-svn: 256109	2015-12-19 22:40:22 +00:00
Manuel Jacob	f4b3577d66	Remove double blanks. NFC. llvm-svn: 256100	2015-12-19 18:26:53 +00:00
Philip Reames	5d54689bca	[RS4GC] Remove an overly strong assertion As shown by the included test case, it's reasonable to end up with constant references during base pointer calculation. The code actually handled this case just fine, we only had the assert to help isolate problems under the belief that constant references shouldn't be present in IR generated by managed frontends. This turned out to be wrong on two fronts: 1) Manual Jacobs is working on a language with constant references, and b) we found a case where the optimizer does create them in practice. llvm-svn: 256079	2015-12-19 02:38:22 +00:00
Keno Fischer	00cbf9a69a	Clean up the processing of dbg.value in various places Summary: First up is instcombine, where in the dbg.declare -> dbg.value conversion, the llvm.dbg.value needs to be called on the actual loaded value, rather than the address (since the whole point of this transformation is to be able to get rid of the alloca). Further, now that that's cleaned up, we can remove a hack in the backend, that would add an implicit OP_deref if the argument to dbg.value was an alloca. This stems from before the existence of DIExpression and is no longer necessary since the deref can be expressed explicitly. Now, in order to make sure that the tests pass with this change, we need to correct the printing of DEBUG_VALUE comments to take into account the expression, which wasn't taken into account before. Unfortunately, for both these changes, there were a number of incorrect test cases (mostly the wrong number of DW_OP_derefs, but also a couple where the test itself was broken more badly). aprantl and I have gone through and adjusted these test case in order to make them pass with these fixes and in some cases to make sure they're actually testing what they are meant to test. Reviewers: aprantl Subscribers: dsanders Differential Revision: http://reviews.llvm.org/D14186 llvm-svn: 256077	2015-12-19 02:02:44 +00:00
Matt Arsenault	2aed6ca1d3	AMDGPU: Switch barrier intrinsics to using convergent noduplicate prevents unrolling of small loops that happen to have barriers in them. If a loop has a barrier in it, it is OK to duplicate it for the unroll. llvm-svn: 256075	2015-12-19 01:46:41 +00:00
Jingyue Wu	ba3ca76ed2	[NaryReassociate] allow candidate to have a different type Summary: If Candiadte may have a different type from GEP, we should bitcast or pointer cast it to GEP's type so that the later RAUW doesn't complain. Added a test in nary-gep.ll Reviewers: tra, meheff Subscribers: mcrosier, llvm-commits, jholewinski Differential Revision: http://reviews.llvm.org/D15618 llvm-svn: 256035	2015-12-18 21:36:30 +00:00
Andrew Kaylor	123048d26a	[WinEH] Update LCSSA to handle catchswitch with handlers inside and outside a loop Differential Revision: http://reviews.llvm.org/D15630 llvm-svn: 256005	2015-12-18 18:12:35 +00:00
Philip Reames	d7a6cc859a	[InstCombine] Extend peephole DSE to handle unordered atomics This extends the same line of reasoning used in EarlyCSE w/http://reviews.llvm.org/D15352 to the DSE implementation in InstCombine. Key points: * We only remove unordered or simple stores. * The loads producing values consumed by dead stores don't influence whether the store is dead. Differential Revision: http://reviews.llvm.org/D15354 llvm-svn: 255932	2015-12-17 22:19:27 +00:00
Philip Reames	15145fb7b1	[EarlyCSE] DSE of atomic unordered stores The rules for removing trivially dead stores are a lot less complicated than loads. Since we know the later store post dominates the former and the former dominates the later, unless the former has side effects other than the actual store, we can remove it. One slightly surprising thing is that we can freely remove atomic stores, even if the later one isn't atomic. There's no guarantee the atomic one was every visible. For the moment, we don't handle DSE of ordered atomic stores. We could extend the same chain of reasoning to them, but the catch is we'd then have to model the ordering effect without a store instruction. Since our fences are a stronger than our operation orderings, simple using a fence isn't an obvious win. This arguable calls for a refinement in our fence specification, but that's (much) later work. Differential Revision: http://reviews.llvm.org/D15352 llvm-svn: 255914	2015-12-17 18:50:50 +00:00
Teresa Johnson	e5a6191732	[ThinLTO] Metadata linking for imported functions Summary: Second patch split out from http://reviews.llvm.org/D14752. Maps metadata as a post-pass from each module when importing complete, suturing up final metadata to the temporary metadata left on the imported instructions. This entails saving the mapping from bitcode value id to temporary metadata in the importing pass, and from bitcode value id to final metadata during the metadata linking postpass. Depends on D14825. Reviewers: dexonsmith, joker.eph Subscribers: davidxl, llvm-commits, joker.eph Differential Revision: http://reviews.llvm.org/D14838 llvm-svn: 255909	2015-12-17 17:14:09 +00:00
Charlie Turner	b69b92855d	[NFC] Update horizontal reduction test cases. These testcases no longer need to specify -slp-vectorize-hor, since it was enabled by default in r252733. llvm-svn: 255783	2015-12-16 17:22:24 +00:00
James Molloy	3d21dcf3ed	[SimplifyCFG] Don't create unnecessary PHIs In conditional store merging, we were creating PHIs when we didn't need to. If the value to be predicated isn't defined in the block we're predicating, then it doesn't need a PHI at all (because we only deal with triangles and diamonds, any value not in the predicated BB must dominate the predicated BB). This fixes a large code size increase in some benchmarks in a popular embedded benchmark suite. Now with a fix (and fixed tests) for the conformance issue seen in Chromium. llvm-svn: 255767	2015-12-16 14:12:44 +00:00
Philip Reames	ae1f265bf1	[EarlyCSE] DSE of stores which write back loaded values Extend EarlyCSE with an additional style of dead store elimination. If we write back a value just read from that memory location, we can eliminate the store under the assumption that the value hasn't changed. I'm implementing this mostly because I noticed the omission when looking at the code. It seemed strange to have InstCombine have a peephole which was more powerful than EarlyCSE. :) Differential Revision: http://reviews.llvm.org/D15397 llvm-svn: 255739	2015-12-16 01:01:30 +00:00
Philip Reames	61a24ab6cc	[IR] Add support for floating pointer atomic loads and stores This patch allows atomic loads and stores of floating point to be specified in the IR and adds an adapter to allow them to be lowered via existing backend support for bitcast-to-equivalent-integer idiom. Previously, the only way to specify a atomic float operation was to bitcast the pointer to a i32, load the value as an i32, then bitcast to a float. At it's most basic, this patch simply moves this expansion step to the point we start lowering to the backend. This patch does not add canonicalization rules to convert the bitcast idioms to the appropriate atomic loads. I plan to do that in the future, but for now, let's simply add the support. I'd like to get instruction selection working through at least one backend (x86-64) without the bitcast conversion before canonicalizing into this form. Similarly, I haven't yet added the target hooks to opt out of the lowering step I added to AtomicExpand. I figured it would more sense to add those once at least one backend (x86) was ready to actually opt out. As you can see from the included tests, the generated code quality is not great. I plan on submitting some patches to fix this, but help from others along that line would be very welcome. I'm not super familiar with the backend and my ramp up time may be material. Differential Revision: http://reviews.llvm.org/D15471 llvm-svn: 255737	2015-12-16 00:49:36 +00:00
Evgeniy Stepanov	67849d56c3	Cross-DSO control flow integrity (LLVM part). An LTO pass that generates a __cfi_check() function that validates a call based on a hash of the call-site-known type and the target pointer. llvm-svn: 255693	2015-12-15 23:00:08 +00:00
Cong Hou	a73ffa2206	[LoopVectorizer] Refine loop vectorizer's register usage calculator by ignoring specific instructions. (This is the third attempt to check in this patch, and the first two are r255454 and r255460. The once failed test file reg-usage.ll is now moved to test/Transform/LoopVectorize/X86 directory with target datalayout and target triple indicated.) LoopVectorizationCostModel::calculateRegisterUsage() is used to estimate the register usage for specific VFs. However, it takes into account many instructions that won't be vectorized, such as induction variables, GetElementPtr instruction, etc.. This makes the loop vectorizer too conservative when choosing VF. In this patch, the induction variables that won't be vectorized plus GetElementPtr instruction will be added to ValuesToIgnore set so that their register usage won't be considered any more. Differential revision: http://reviews.llvm.org/D15177 llvm-svn: 255691	2015-12-15 22:45:09 +00:00
Sanjay Patel	38a022623a	[SimplifyCFG] allow speculation of exactly one expensive instruction (PR24818) This is the last general step to allow more IR-level speculation with a safety harness in place in CodeGenPrepare. The intent is to restore the behavior enabled by: http://reviews.llvm.org/rL228826 but prevent bad performance such as: https://llvm.org/bugs/show_bug.cgi?id=24818 Earlier patches in this sequence: D12882 (disable SimplifyCFG speculation for expensive instructions) D13297 (have CGP despeculate expensive ops) D14630 (have CGP despeculate special versions of cttz/ctlz) As shown in the test cases, we only have two instructions currently affected: ctz for some x86 and fdiv generally. Allowing exactly one expensive instruction is a bit of a hack, but it lines up with what is currently implemented in CGP. If we make the despeculation more general in CGP, we can make the speculation here more liberal. A follow-up patch will adjust the cost for sqrt and possibly other typically expensive math intrinsics (currently everything is cheap by default). GPU targets would likely want to override those expensive default costs (just as they probably should already override the cost of div/rem) because just about any math is cheaper than control-flow on those targets. Differential Revision: http://reviews.llvm.org/D15213 llvm-svn: 255660	2015-12-15 17:38:29 +00:00
Nicolai Hahnle	78fd4f087b	AMDGPU: mark ldexp LibCalls as unavailable Summary: The LibCallSimplifier will turn llvm.exp2.* intrinsics into ldexp* libcalls which do not make sense with the AMDGPU backend. In the long run, we'll want an llvm.ldexp.* intrinsic to properly make use of this optimization, but this works around the problem for now. See also: http://reviews.llvm.org/D14327 (suggested llvm.ldexp.* implementation) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92709 Reviewers: arsenm, tstellarAMD Differential Revision: http://reviews.llvm.org/D14990 llvm-svn: 255658	2015-12-15 17:24:15 +00:00
Mehdi Amini	1c131b37ed	Instcombine: destructor loads of structs that do not contains padding For non padded structs, we can just proceed and deaggregate them. We don't want ot do this when there is padding in the struct as to not lose information about this padding (the subsequents passes would then try hard to preserve the padding, which is undesirable). Also update extractvalue.ll and cast.ll so that they use structs with padding. Remove the FIXME in the extractvalue of laod case as the non padded case is handled when processing the load, and we don't want to do it on the padded case. Patch by: Amaury SECHET <deadalnix@gmail.com> Differential Revision: http://reviews.llvm.org/D14483 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 255600	2015-12-15 01:44:07 +00:00
Xinliang David Li	c7018a25c6	[PGO] make profile prefix even shorter and more readable llvm-svn: 255586	2015-12-15 00:32:56 +00:00
Xinliang David Li	0812747979	[PGO] Shorten profile symbol prefixes Profile symbols have long prefixes which waste space and creating pressure for linker. This patch shortens the prefixes to minimal length without losing verbosity. Differential Revision: http://reviews.llvm.org/D15503 llvm-svn: 255575	2015-12-14 23:26:27 +00:00
Reid Kleckner	db9a91e324	Revert "Don't create unnecessary PHIs" This reverts commit r255489. It causes test failures in Chromium and does not appear to respect the AlternativeV parameter. llvm-svn: 255562	2015-12-14 22:36:57 +00:00
Sanjay Patel	fa54acedd1	add fast-math-flags to 'call' instructions (PR21290) This patch adds optional fast-math-flags (the same that apply to fmul/fadd/fsub/fdiv/frem/fcmp) to call instructions in IR. Follow-up patches would use these flags in LibCallSimplifier, add support to clang, and extend FMF to the DAG for calls. Motivating example: %y = fmul fast float %x, %x %z = tail call float @sqrtf(float %y) We'd like to be able to optimize sqrt(x*x) into fabs(x). We do this today using a function-wide attribute for unsafe-math, but we really want to trigger on the instructions themselves: %z = tail call fast float @sqrtf(float %y) because in an LTO build it's possible that calls with fast semantics have been inlined into a function with non-fast semantics. The code changes and tests are based on the recent commits that added "notail": http://reviews.llvm.org/rL252368 and added FMF to fcmp: http://reviews.llvm.org/rL241901 Differential Revision: http://reviews.llvm.org/D14707 llvm-svn: 255555	2015-12-14 21:59:03 +00:00
David Majnemer	bbfc7219ef	[IR] Remove terminatepad It turns out that terminatepad gives little benefit over a cleanuppad which calls the termination function. This is not sufficient to implement fully generic filters but MSVC doesn't support them which makes terminatepad a little over-designed. Depends on D15478. Differential Revision: http://reviews.llvm.org/D15479 llvm-svn: 255522	2015-12-14 18:34:23 +00:00
Sanjay Patel	f727e387be	[InstCombine] fold trunc ([lshr] (bitcast vector) ) --> extractelement (PR25543) This is a fix for PR25543: https://llvm.org/bugs/show_bug.cgi?id=25543 The idea is to take the existing fold of: bitcast ( trunc ( lshr ( bitcast X))) --> extractelement (bitcast X) ( http://reviews.llvm.org/rL112232 ) And break it into less specific transforms so we'll catch more cases such as the example in the bug report: bitcast ( trunc ( lshr ( bitcast X))) --> bitcast ( extractelement (bitcast X)) --> extractelement (bitcast X) Enabling patches for this change: http://reviews.llvm.org/rL255399 (combine bitcasts) http://reviews.llvm.org/rL255433 (canonicalize extractelement(bitcast X)) Differential Revision: http://reviews.llvm.org/D15392 llvm-svn: 255504	2015-12-14 16:16:54 +00:00
James Molloy	2b1e101e99	Don't create unnecessary PHIs In conditional store merging, we were creating PHIs when we didn't need to. If the value to be predicated isn't defined in the block we're predicating, then it doesn't need a PHI at all (because we only deal with triangles and diamonds, any value not in the predicated BB must dominate the predicated BB). This fixes a large code size increase in some benchmarks in a popular embedded benchmark suite. llvm-svn: 255489	2015-12-14 10:57:01 +00:00
Cong Hou	ccec6e4d84	Revert r255460, which still causes test failures on some platforms. Further investigation on the failures is ongoing. llvm-svn: 255463	2015-12-13 17:15:38 +00:00
Cong Hou	e6a210f50b	[LoopVectorizer] Refine loop vectorizer's register usage calculator by ignoring specific instructions. (This is the second attempt to check in this patch: REQUIRES: asserts is added to reg-usage.ll now.) LoopVectorizationCostModel::calculateRegisterUsage() is used to estimate the register usage for specific VFs. However, it takes into account many instructions that won't be vectorized, such as induction variables, GetElementPtr instruction, etc.. This makes the loop vectorizer too conservative when choosing VF. In this patch, the induction variables that won't be vectorized plus GetElementPtr instruction will be added to ValuesToIgnore set so that their register usage won't be considered any more. Differential revision: http://reviews.llvm.org/D15177 llvm-svn: 255460	2015-12-13 16:55:46 +00:00
Cong Hou	7c369156eb	Revert r255454 as it leads to several test failers on buildbots. llvm-svn: 255456	2015-12-13 09:28:57 +00:00
Cong Hou	7f8b43d424	[LoopVectorizer] Refine loop vectorizer's register usage calculator by ignoring specific instructions. LoopVectorizationCostModel::calculateRegisterUsage() is used to estimate the register usage for specific VFs. However, it takes into account many instructions that won't be vectorized, such as induction variables, GetElementPtr instruction, etc.. This makes the loop vectorizer too conservative when choosing VF. In this patch, the induction variables that won't be vectorized plus GetElementPtr instruction will be added to ValuesToIgnore set so that their register usage won't be considered any more. Differential revision: http://reviews.llvm.org/D15177 llvm-svn: 255454	2015-12-13 08:44:08 +00:00
Xinliang David Li	d1bab96045	[PGO] Stop using invalid char in instr variable names. Before the patch, -fprofile-instr-generate compile will fail if no integrated-as is specified when the file contains any static functions (the -S output is also invalid). This is the second try. The fix in this patch is very localized. Only profile symbol names of profile symbols with internal linkage are fixed up while initializer of name syms are not changes. This means there is no format change nor version bump. llvm-svn: 255434	2015-12-12 17:28:03 +00:00
Sanjay Patel	1d49fc9b27	[InstCombine] canonicalize (bitcast (extractelement X)) --> (extractelement(bitcast X)) This change was discussed in D15392. It allows us to remove the fold that was added in: http://reviews.llvm.org/r255261 ...and it will allow us to generalize this fold: http://reviews.llvm.org/rL112232 while preserving the order of bitcast + extract that it produces and testing shows is better handled by the backend. Note that the existing check for "isVectorTy()" wasn't strong enough in general and specifically because: x86_mmx. It's not a vector, but it's not vectorizable either. So here we check VectorType::isValidElementType() directly before proceeding with the transform. llvm-svn: 255433	2015-12-12 16:44:48 +00:00
David Majnemer	550654aaf1	Move catchpad-phi-cast.ll to the X86 specific subdirectory It is X86 specific and will not be properly exercised unless LLVM is built with the X86 target. llvm-svn: 255426	2015-12-12 06:21:08 +00:00
David Majnemer	8a1c45d6e8	[IR] Reformulate LLVM's EH funclet IR While we have successfully implemented a funclet-oriented EH scheme on top of LLVM IR, our scheme has some notable deficiencies: - catchendpad and cleanupendpad are necessary in the current design but they are difficult to explain to others, even to seasoned LLVM experts. - catchendpad and cleanupendpad are optimization barriers. They cannot be split and force all potentially throwing call-sites to be invokes. This has a noticable effect on the quality of our code generation. - catchpad, while similar in some aspects to invoke, is fairly awkward. It is unsplittable, starts a funclet, and has control flow to other funclets. - The nesting relationship between funclets is currently a property of control flow edges. Because of this, we are forced to carefully analyze the flow graph to see if there might potentially exist illegal nesting among funclets. While we have logic to clone funclets when they are illegally nested, it would be nicer if we had a representation which forbade them upfront. Let's clean this up a bit by doing the following: - Instead, make catchpad more like cleanuppad and landingpad: no control flow, just a bunch of simple operands; catchpad would be splittable. - Introduce catchswitch, a control flow instruction designed to model the constraints of funclet oriented EH. - Make funclet scoping explicit by having funclet instructions consume the token produced by the funclet which contains them. - Remove catchendpad and cleanupendpad. Their presence can be inferred implicitly using coloring information. N.B. The state numbering code for the CLR has been updated but the veracity of it's output cannot be spoken for. An expert should take a look to make sure the results are reasonable. Reviewers: rnk, JosephTremoulet, andrew.w.kaylor Differential Revision: http://reviews.llvm.org/D15139 llvm-svn: 255422	2015-12-12 05:38:55 +00:00
Sanjay Patel	93f55dd36d	[InstCombine] allow any pair of bitcasts to be combined This change is discussed in D15392 and should allow us to effectively revert: http://llvm.org/viewvc/llvm-project?view=revision&revision=255261 if we canonicalize bitcasts ahead of extracts. It should be safe to convert any pair of bitcasts into a single bitcast, however, it was mentioned here: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20110829/127089.html that we're not allowed to bitcast from an x86_mmx to some other types, but I'm not seeing any failures from that, and we have regression tests in CodeGen/X86 that appear to cover all of those cases. Some day we'll get to remove that MMX wart from LLVM IR completely? Differential Revision: http://reviews.llvm.org/D15468 llvm-svn: 255399	2015-12-12 00:33:36 +00:00
Sanjay Patel	ffde9e14a2	use FileCheck for better checking llvm-svn: 255394	2015-12-12 00:01:10 +00:00
Sanjay Patel	d497ad43da	Add tests for bitcast-bitcast sequences for all scalar/vector permutations As noted in http://reviews.llvm.org/D15392 , we should be able to improve this. llvm-svn: 255370	2015-12-11 20:26:30 +00:00
Xinliang David Li	a86545b0b5	[PGO] Revert r255365: solution incomplete, not handling lambda yet llvm-svn: 255369	2015-12-11 20:23:22 +00:00
Xinliang David Li	c79283ef29	[PGO] Stop using invalid char in instr variable names. Before the patch, -fprofile-instr-generate compile will fail if no integrated-as is specified when the file contains any static functions (the -S output is also invalid). This patch fixed the issue. With the change, the index format version will be bumped up by 1. Backward compatibility is preserved with this change. Differential Revision: http://reviews.llvm.org/D15243 llvm-svn: 255365	2015-12-11 19:53:19 +00:00
Chad Rosier	d7634fc91d	Revert r255247, r255265, and r255286 due to serious compile-time regressions. Revert "[DSE] Disable non-local DSE to see if the bots go green." Revert "[DeadStoreElimination] Use range-based loops. NFC." Revert "[DeadStoreElimination] Add support for non-local DSE." llvm-svn: 255354	2015-12-11 18:39:41 +00:00
James Molloy	1bb6ea5e2d	[Mem2Reg] Respect optnone Mem2Reg shouldn't be optimizing a function that is marked optnone. There is a test checking this that fails when mem2reg is explicitly added to the standard pass pipeline. llvm-svn: 255336	2015-12-11 13:36:59 +00:00
James Molloy	37b82e79b2	[InstCombine] Make MatchBSwap also match bit reversals MatchBSwap has most of the functionality to match bit reversals already. If we switch it from looking at bytes to individual bits and remove a few early exits, we can extend the main recursive function to match any sequence of ORs, ANDs and shifts that assemble a value from different parts of another, base value. Once we have this bit->bit mapping, we can very simply detect if it is appropriate for a bswap or bitreverse. llvm-svn: 255334	2015-12-11 10:04:51 +00:00
JF Bastien	82bf85ffed	EarlyCSE: add tests Summary: As a follow-up to rL255054 I wasn't able to convince myself that the code did what I thought, so I wrote more tests. Reviewers: reames Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D15371 llvm-svn: 255295	2015-12-10 20:24:34 +00:00
Chad Rosier	843c7b4309	[DSE] Disable non-local DSE to see if the bots go green. I see a few bots timing out, so I'm speculatively disabling r255247. llvm-svn: 255286	2015-12-10 19:23:02 +00:00
Rong Xu	2611ff8a27	[PGO] Use %t as the temporary profdata filename in the test cases. Using %t rather %T/<specific_name> as the temporary profdata filename. llvm-svn: 255271	2015-12-10 18:24:44 +00:00
Sanjay Patel	c83fd9554a	[InstCombine] fold bitcasts around an extractelement (3rd try) This is a redo of r255137 (reverted at r255227) which was a redo of r255124 (reverted at r255126) with a fixed check for a scalar source type and an added test for the failure that caused the revert. Original commit message: Example: bitcast (extractelement (bitcast <2 x float> %X to <2 x i32>), 1) to float ---> extractelement <2 x float> %X, i32 1 This is part of fixing PR25543: https://llvm.org/bugs/show_bug.cgi?id=25543 The next step will be to generalize this fold: trunc ( lshr ( bitcast X) ) -> extractelement (X) Ie, I'm hoping to replace the existing transform of: bitcast ( trunc ( lshr ( bitcast X))) added by: http://reviews.llvm.org/rL112232 with 2 less specific transforms to catch the case in the bug report. Differential Revision: http://reviews.llvm.org/D14879 llvm-svn: 255261	2015-12-10 17:09:28 +00:00
Chad Rosier	533bc3fcac	[DeadStoreElimination] Add support for non-local DSE. We extend the search for redundant stores to predecessor blocks that unconditionally lead to the block BB with the current store instruction. That also includes single-block loops that unconditionally lead to BB, and if-then-else blocks where then- and else-blocks unconditionally lead to BB. http://reviews.llvm.org/D13363 Patch by Ivan Baev <ibaev@codeaurora.org>! llvm-svn: 255247	2015-12-10 13:51:43 +00:00
Akira Hatanaka	a3c0e8e1ba	Revert r255137. This commit broke apple's internal bot. llvm-svn: 255227	2015-12-10 08:00:52 +00:00
Rong Xu	7dd9b1ea75	[PGO] Rename the profdata filename to avoid the conflict b/w tests. Two tests diag_mismatch.ll and diag_no_funcprofdata.ll generates the same profdata filename which can conflict in current test runs. This patch renames them to have different names. llvm-svn: 255158	2015-12-09 21:27:59 +00:00
Justin Bogner	b7389d6714	IR: Make ConstantDataArray::getFP actually return a ConstantDataArray The ConstantDataArray::getFP(LLVMContext &, ArrayRef<uint16_t>) overload has had a typo in it since it was written, where it will create a Vector instead of an Array. This obviously doesn't work at all, but it turns out that until r254991 there weren't actually any callers of this overload. Fix the typo and add some test coverage. llvm-svn: 255157	2015-12-09 21:21:07 +00:00
Reid Kleckner	54ade23504	[Float2Int] Don't operate on vector instructions This fixes a crash bug. It's also not clear if we'd want to do this transform for vectors. llvm-svn: 255155	2015-12-09 21:08:18 +00:00
Sanjoy Das	9abfb0b429	Use WeakVH to keep track of calls with operand bundles in CloneCodeInfo `CloneAndPruneIntoFromInst` can DCE instructions after cloning them into the new function, and so an AssertingVH is too strong. This change switches CloneCodeInfo to use a std::vector<WeakVH>. llvm-svn: 255148	2015-12-09 20:33:52 +00:00
Sanjay Patel	b67e6b6044	[InstCombine] fold bitcasts around an extractelement (2nd try) This is a redo of r255124 (reverted at r255126) with an added check for a scalar destination type and an added test for the failure seen in Clang's test/CodeGen/vector.c. The extra test shows a different missing optimization. Original commit message: Example: bitcast (extractelement (bitcast <2 x float> %X to <2 x i32>), 1) to float ---> extractelement <2 x float> %X, i32 1 This is part of fixing PR25543: https://llvm.org/bugs/show_bug.cgi?id=25543 The next step will be to generalize this fold: trunc ( lshr ( bitcast X) ) -> extractelement (X) Ie, I'm hoping to replace the existing transform of: bitcast ( trunc ( lshr ( bitcast X))) added by: http://reviews.llvm.org/rL112232 with 2 less specific transforms to catch the case in the bug report. Differential Revision: http://reviews.llvm.org/D14879 llvm-svn: 255137	2015-12-09 18:57:16 +00:00
Rong Xu	f430ae40cf	[PGO] Resubmit "MST based PGO instrumentation infrastructure" (r254021) This new patch fixes a few bugs that exposed in last submit. It also improves the test cases. --Original Commit Message-- This patch implements a minimum spanning tree (MST) based instrumentation for PGO. The use of MST guarantees minimum number of CFG edges getting instrumented. An addition optimization is to instrument the less executed edges to further reduce the instrumentation overhead. The patch contains both the instrumentation and the use of the profile to set the branch weights. Differential Revision: http://reviews.llvm.org/D12781 llvm-svn: 255132	2015-12-09 18:08:16 +00:00
Mehdi Amini	4e2b7c454c	Revert "[InstCombine] fold bitcasts around an extractelement" This reverts commit r255124. Broke http://lab.llvm.org:8011/builders/llvm-clang-lld-x86_64-scei-ps4-ubuntu-fast/builds/4193/steps/test/logs/stdio From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 255126	2015-12-09 16:31:39 +00:00
Sanjay Patel	07410ed234	[InstCombine] fold bitcasts around an extractelement Example: bitcast (extractelement (bitcast <2 x float> %X to <2 x i32>), 1) to float ---> extractelement <2 x float> %X, i32 1 This is part of fixing PR25543: https://llvm.org/bugs/show_bug.cgi?id=25543 The next step will be to generalize this fold: trunc ( lshr ( bitcast X) ) -> extractelement (X) Ie, I'm hoping to replace the existing transform of: bitcast ( trunc ( lshr ( bitcast X))) added by: http://reviews.llvm.org/rL112232 with 2 less specific transforms to catch the case in the bug report. Differential Revision: http://reviews.llvm.org/D14879 llvm-svn: 255124	2015-12-09 16:17:20 +00:00
Mehdi Amini	b000bbdec2	Change hasUniqueInitializer() to call isStrongDefinitionForLinker() instead of !isWeakForLinker() Summary: Available_externally global variable with initializer were considered "hasInitializer()", while obviously it can't match the description: Whether the global variable has an initializer, and any changes made to the initializer will turn up in the final executable. since modifying the initializer of an externally available variable does not make sense. Reviewers: pcc, rafael Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D15351 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 255123	2015-12-09 16:17:07 +00:00
Sanjoy Das	b8dced5dfa	Don't drop attributes when inlining through "deopt" operand bundles Test case attached (test case also checks that we don't drop the calling convention, but that functionality was correct before this patch). llvm-svn: 255088	2015-12-09 01:01:28 +00:00
Sanjoy Das	48945cdc15	[OperandBundles] Have PruneEH work correct with operand bundles. For an invoke with operand bundles, the [op_begin(), op_end()-3] range can contain things other than invoke arguments. This change teaches PruneEH to use arg_begin() and arg_end() explicitly. llvm-svn: 255073	2015-12-08 23:16:52 +00:00
Reid Kleckner	8de1fe23ed	[CGP] Reimplement r255055 a different way llvm-svn: 255070	2015-12-08 23:00:03 +00:00
Reid Kleckner	e18f92bfe9	Revert "[CGP] Check that we have an insert point before moving llvm.dbg.value around" This reverts commit r255055. Breakage has been reported. llvm-svn: 255063	2015-12-08 22:33:23 +00:00
Sanjoy Das	8a954a0553	[OperandBundles] Fix a transform in simplifycfg Reviewers: pcc, majnemer, reames Subscribers: reames, llvm-commits Differential Revision: http://reviews.llvm.org/D15345 llvm-svn: 255062	2015-12-08 22:26:08 +00:00
Reid Kleckner	7c005324d5	[CGP] Check that we have an insert point before moving llvm.dbg.value around llvm-svn: 255055	2015-12-08 21:50:52 +00:00
Philip Reames	8fc2cbf933	[EarlyCSE] Value forwarding for unordered atomics This patch teaches the fully redundant load part of EarlyCSE how to forward from atomic and volatile loads and stores, and how to eliminate unordered atomics (only). This patch does not include dead store elimination support for unordered atomics, that will follow in the near future. The basic idea is that we allow all loads and stores to be tracked by the AvailableLoad table. We store a bit in the table which tracks whether load/store was atomic, and then only replace atomic loads with ones which were also atomic. No attempt is made to refine our handling of ordered loads or stores. Those are still treated as full fences. We could pretty easily extend the release fence handling to release stores, but that should be a separate patch. Differential Revision: http://reviews.llvm.org/D15337 llvm-svn: 255054	2015-12-08 21:45:41 +00:00
Mehdi Amini	bddfbeaf59	Revert "Add Available Externally linkage type to isWeakForLinker()" This reverts r255043, as per post-review concern were raised on the correctness. From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 255045	2015-12-08 19:13:31 +00:00
Mehdi Amini	69e3ae8d4b	Cleanup test: remove useless alignment From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 255044	2015-12-08 19:02:55 +00:00
Mehdi Amini	37c25fa1d1	Add Available Externally linkage type to isWeakForLinker() Per LangRef: "Globals with available_externally linkage are allowed to be discarded at will, and are otherwise the same as linkonce_odr", since linkonce_odr is in this list it makes sense to have available_externally there as well. Reviewers: rafael Differential Revision: http://reviews.llvm.org/D15323 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 255043	2015-12-08 19:01:29 +00:00
Sanjoy Das	683bf070ef	[IndVars] Have getInsertPointForUses preserve LCSSA Summary: Also add a stricter post-condition for IndVarSimplify. Fixes PR25578. Test case by Michael Zolotukhin. Reviewers: hfinkel, atrick, mzolotukhin Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D15059 llvm-svn: 254977	2015-12-08 00:13:21 +00:00
Sanjoy Das	b771eb6d69	[SCEVExpander] Have hoistIVInc preserve LCSSA Summary: (Note: the problematic invocation of hoistIVInc that caused PR24804 came from IndVarSimplify, not from SCEVExpander itself) Fixes PR24804. Test case by David Majnemer. Reviewers: hfinkel, majnemer, atrick, mzolotukhin Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D15058 llvm-svn: 254976	2015-12-08 00:13:17 +00:00
Sanjoy Das	9fe86d90ab	[InstCombine] Call getCmpPredicateForMinMax only with a valid SPF Summary: There are `SelectPatternFlavor`s that don't represent min or max idioms, and we should not be passing those to `getCmpPredicateForMinMax`. Fixes PR25745. Reviewers: majnemer Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D15249 llvm-svn: 254869	2015-12-05 23:44:22 +00:00
Weiming Zhao	8213072a45	[SimplifyLibCalls] Optimization for pow(x, n) where n is some constant Summary: In order to avoid calling pow function we generate repeated fmul when n is a positive or negative whole number. For each exponent we pre-compute Addition Chains in order to minimize the no. of fmuls. Refer: http://wwwhomes.uni-bielefeld.de/achim/addition_chain.html We pre-compute addition chains for exponents upto 32 (which results in a max of 7 fmuls). For eg: 4 = 2+2 5 = 2+3 6 = 3+3 and so on Hence, pow(x, 4.0) ==> y = fmul x, x x = fmul y, y ret x For negative exponents, we simply compute the reciprocal of the final result. Note: This transformation is only enabled under fast-math. Patch by Mandeep Singh Grang <mgrang@codeaurora.org> Reviewers: weimingz, majnemer, escha, davide, scanon, joerg Subscribers: probinson, escha, llvm-commits Differential Revision: http://reviews.llvm.org/D13994 llvm-svn: 254776	2015-12-04 22:00:47 +00:00
David Majnemer	f6665f65b7	[Analysis] Become aware of MSVC's new/delete functions The compiler can take advantage of the allocation/deallocation function's properties. We knew how to do this for Itanium but had no support for MSVC-style functions. llvm-svn: 254656	2015-12-03 22:45:19 +00:00
Andrew Kaylor	92b3b16ba3	Move branch folding test to a better location. llvm-svn: 254640	2015-12-03 19:41:25 +00:00
Andrew Kaylor	412eabdeb2	Fix buildbot failures llvm-svn: 254636	2015-12-03 19:30:38 +00:00
Andrew Kaylor	9efb2332e2	[WinEH] Avoid infinite loop in BranchFolding for multiple single block funclets Differential Revision: http://reviews.llvm.org/D14996 llvm-svn: 254629	2015-12-03 18:55:28 +00:00
Rafael Espindola	8c04472edf	Delete what is now duplicated code. Having to import an alias as declaration is not thinlto specific. The test difference are because when we already have a decl and we are not importing it, we just leave the decl alone. llvm-svn: 254556	2015-12-02 22:22:24 +00:00
David Majnemer	942003acc6	Do (A == C1 \|\| A == C2) -> (A & ~(C1 ^ C2)) == C1 rather than (A == C1 \|\| A == C2) -> (A \| (C1 ^ C2)) == C2 when C1 ^ C2 is a power of 2. Differential Revision: http://reviews.llvm.org/D14223 Patch by Amaury SECHET! llvm-svn: 254518	2015-12-02 16:15:07 +00:00
Evgeniy Stepanov	42f3b12274	[safestack] Protect byval function arguments. Detect unsafe byval function arguments and move them to the unsafe stack. llvm-svn: 254353	2015-12-01 00:40:05 +00:00
Evgeniy Stepanov	a4ac3f4bdf	[safestack] Fix handling of array allocas. The current code does not take alloca array size into account and, as a result, considers any access past the first array element to be unsafe. llvm-svn: 254350	2015-12-01 00:06:13 +00:00
Sanjay Patel	8b1fb3daba	[InstCombine] add tests to show potential vector IR shuffle transforms llvm-svn: 254342	2015-11-30 22:39:36 +00:00
Davide Italiano	9c26161b2e	[SimplifyLibCalls] Remove useless bits of this tests. llvm-svn: 254318	2015-11-30 19:38:35 +00:00
Davide Italiano	1aeed6a955	[SimplifyLibCalls] Transform log(exp2(y)) to y*log(2) under fast-math. llvm-svn: 254317	2015-11-30 19:36:35 +00:00
Davide Italiano	0b14f29285	[SimplifyLibCalls] Don't crash if the function doesn't have a name. llvm-svn: 254265	2015-11-29 21:58:56 +00:00
Davide Italiano	b8b7133c94	[SimplifyLibCalls] Tranform log(pow(x, y)) -> ylog(x). This one is enabled only under -ffast-math. There are cases where the difference between the value computed and the correct value is huge even for ffast-math, e.g. as Steven pointed out: x = -1, y = -4 log(pow(-1), 4) = 0 4log(-1) = NaN I checked what GCC does and apparently they do the same optimization (which result in the dramatic difference). Future work might try to make this (slightly) less worse. Differential Revision: http://reviews.llvm.org/D14400 llvm-svn: 254263	2015-11-29 20:58:04 +00:00
Diego Novillo	84f06cc835	SamplePGO - Add initial support for inliner annotations. This adds two thresholds to the sample profiler to affect inlining decisions: the concept of global hotness and coldness. Functions that have accumulated more than a certain fraction of samples at runtime, are annotated with the InlineHint attribute. Conversely, functions that accumulate less than a certain fraction of samples, are annotated with the Cold attribute. This is very similar to the hints emitted by Clang when using instrumentation profiles. Notice that this is a very blunt instrument. A function may have globally collected a significant fraction of samples, but that does not necessarily mean that every callsite for that function is hot. Ideally, we would annotate each callsite with the samples collected at that callsite. This way, the inliner can incorporate all these weights into its cost model. Once the inliner offers this functionality, we can change the hints emitted here to a more precise per-callsite annotation. For now, this is providing some measure of speedups with our internal benchmarks. I've observed speedups of up to 23% (though the geo mean is about 3%). I expect these numbers to improve as the inliner gets better annotations. llvm-svn: 254212	2015-11-27 23:14:51 +00:00
Charlie Turner	54336a5a4e	[LoopVectorize] Use MapVector rather than DenseMap for MinBWs. The order in which instructions are truncated in truncateToMinimalBitwidths effects code generation. Switch to a map with a determinisic order, since the iteration order over a DenseMap is not defined. This code is not hot, so the difference in container performance isn't interesting. Many thanks to David Blaikie for making me aware of MapVector! Fixes PR25490. Differential Revision: http://reviews.llvm.org/D14981 llvm-svn: 254179	2015-11-26 20:39:51 +00:00
Rafael Espindola	8934577171	Disallow aliases to available_externally. They are as much trouble as aliases to declarations. They are requiring the code generator to define a symbol with the same value as another symbol, but the second symbol is undefined. If representing this is important for some optimization, we could add support for available_externally aliases. They would be required to point to a declaration (or available_externally definition). llvm-svn: 254170	2015-11-26 19:22:59 +00:00
Benjamin Kramer	fb419e71f4	[SimplifyLibCalls] Don't depend on a called function having a name, it might be an indirect call. Fixes the crasher in PR25651 and related crashers using the same pattern. llvm-svn: 254145	2015-11-26 09:51:17 +00:00
Evgeniy Stepanov	9842d61ca4	[safestack] Fix alignment of dynamic allocas. Fixes PR25588. llvm-svn: 254109	2015-11-25 22:52:30 +00:00
Sanjoy Das	7629346193	[InstCombine] Don't drop operand bundles Reviewers: majnemer Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D14857 llvm-svn: 254046	2015-11-25 00:42:19 +00:00
Rong Xu	55fa418a90	Revert r254021 llvm-svn: 254042	2015-11-24 23:57:51 +00:00
Rong Xu	25c106b347	[PGO] Revert revision r254021,r254028,r254035 Revert the above revision due to multiple issues. llvm-svn: 254040	2015-11-24 23:49:08 +00:00
Teresa Johnson	3930361969	[ThinLTO] Add option to limit importing based on instruction count Add a simple initial heuristic to control importing based on the number of instructions recorded in the function's summary. Add option to control the limit, and test using option. llvm-svn: 254036	2015-11-24 22:55:46 +00:00
Rong Xu	88cb57aba9	[PGO] Relax test cases in PGO instrumentation Fix buildbot failure for clang-x86_64-linux-selfhost-modules. http://lab.llvm.org:8011/builders/clang-x86_64-linux-selfhost-modules/builds/8866 The failing test cases are newly added from r254021. It seems the IR has a different order in this platform. In this patch, I temporarily relax the test case to make the build green. I'll have a complete fix (more robust way to test) soon. llvm-svn: 254035	2015-11-24 22:50:34 +00:00
Diego Novillo	0b6985a3c6	SamplePGO - Add test for hot/cold inlined functions. When the original binary is executed and sampled, the resulting profile contains information on the original inline stack. We currently follow the original inline plan if we notice that the inlined callsite has more than 0 samples to it. A better way is to determine whether the callsite is actually worth inlining. If the callsite accumulates a small fraction of the samples spent in the parent function, then we don't want to bother inlining it (as it means that the callsite is actually cold). This patch introduces a threshold expressed in percentage of samples in relation to the parent function. If the callsite uses less than N% of the total samples used by its parent, the original inline decision is not re-applied. I've set the threshold to the very arbitrary value of 5%. I'm yet to do any actual experiments to see what's a good value. I wanted to separate the basic mechanism from the tuning. llvm-svn: 254034	2015-11-24 22:38:37 +00:00

... 2 3 4 5 6 ...

6259 Commits