llvm-project

Commit Graph

Author	SHA1	Message	Date
Duncan P. N. Exon Smith	b84840c04e	Utils: Thread distinct-ness through the cloneMD*() functions, NFC The new logic isn't actually reachable yet, so no functionality change. llvm-svn: 225918	2015-01-14 01:24:38 +00:00
Duncan P. N. Exon Smith	7c69c1ebda	Utils: Extract cloneMDNode(), NFC llvm-svn: 225917	2015-01-14 01:22:47 +00:00
Duncan P. N. Exon Smith	b6515d6a71	Utils: Move cloneMD*() up, NFC llvm-svn: 225915	2015-01-14 01:21:24 +00:00
Duncan P. N. Exon Smith	47d82981d6	Utils: Add mapping for uniqued MDLocations Still doesn't handle distinct ones. Part of PR21433. llvm-svn: 225914	2015-01-14 01:20:27 +00:00
Duncan P. N. Exon Smith	4766e01250	Utils: Extract cloneMDTuple(), NFC llvm-svn: 225912	2015-01-14 01:12:14 +00:00
Duncan P. N. Exon Smith	fb9d128ab1	Utils: Extract shouldRemapUniquedNode(), NFC llvm-svn: 225911	2015-01-14 01:08:47 +00:00
Duncan P. N. Exon Smith	637e765907	Utils: Simplify code, NFC llvm-svn: 225906	2015-01-14 01:07:03 +00:00
Duncan P. N. Exon Smith	b557989a40	Utils: Extract mapUniquedNode(), NFC llvm-svn: 225905	2015-01-14 01:06:21 +00:00
Duncan P. N. Exon Smith	8725ca8c60	Utils: MDNode => UniquableMDNode, NFC Although this makes the `cast<>` assert more often, the `assert(Node->isResolved())` on the following line would assert in all those cases. So, no functionality change here. llvm-svn: 225903	2015-01-14 01:05:17 +00:00
Duncan P. N. Exon Smith	14cc94c1c6	Utils: Separate out mapDistinctNode(), NFC llvm-svn: 225902	2015-01-14 01:03:05 +00:00
Duncan P. N. Exon Smith	3956a85e6e	Utils: Use helper function directly, NFC llvm-svn: 225901	2015-01-14 01:02:17 +00:00
Duncan P. N. Exon Smith	077affdbb9	Utils: Extract helper function, NFC llvm-svn: 225897	2015-01-14 01:01:19 +00:00
Duncan P. N. Exon Smith	34651ee2f6	Utils: Use MDTuple::get() directly, NFC Working towards supporting `MDLocation` in `MapMetadata()`. llvm-svn: 225896	2015-01-14 00:59:57 +00:00
Ahmed Bougacha	71d7b18e3d	[SimplifyLibCalls] Don't try to simplify indirect calls. It turns out, all callsites of the simplifier are guarded by a check for CallInst::getCalledFunction (i.e., to make sure the callee is direct). This check wasn't done when trying to further optimize a simplified fortified libcall, introduced by a refactoring in r225640. Fix that, add a testcase, and document the requirement. llvm-svn: 225895	2015-01-14 00:55:05 +00:00
Julien Lerouge	0473cb5ab7	Fix non-determinism issue in SLP The issue was introduced in r214638: + for (auto &BSIter : BlocksSchedules) { + scheduleBlock(BSIter.second.get()); + } Because BlocksSchedules is a DenseMap with BasicBlock* keys, blocks are scheduled in non-deterministic order, resulting in unpredictable IR. Patch by Daniel Reynaud! llvm-svn: 225821	2015-01-13 19:45:52 +00:00
Erik Eckstein	a168ef753f	Revert "SLPVectorizer: Cache results from memory alias checking." The alias cache has a problem of incorrect collisions in case a new instruction is allocated at the same address as a previously deleted instruction. llvm-svn: 225790	2015-01-13 14:36:46 +00:00
Erik Eckstein	4a445c047f	SLPVectorizer: Cache results from memory alias checking. This speeds up the dependency calculations for blocks with many load/store/call instructions. Beside the improved runtime, there is no functional change. llvm-svn: 225786	2015-01-13 11:37:51 +00:00
Ramkumar Ramachandra	181233b2b7	fix {typo, build failure} in r225760 llvm-svn: 225762	2015-01-13 04:17:47 +00:00
Ramkumar Ramachandra	40c3e03e27	Standardize {pred,succ,use,user}_empty() The functions {pred,succ,use,user}_{begin,end} exist, but many users have to check _begin() with _end() by hand to determine if the BasicBlock or User is empty. Fix this with a standard *_empty(), demonstrating a few usecases. llvm-svn: 225760	2015-01-13 03:46:47 +00:00
Sanjay Patel	db8e6f472e	fix typo; NFC llvm-svn: 225753	2015-01-13 01:51:52 +00:00
Sanjay Patel	06d5589a84	80-cols; NFC llvm-svn: 225700	2015-01-12 21:21:28 +00:00
Duncan P. N. Exon Smith	118632dbf6	IR: Split GenericMDNode into MDTuple and UniquableMDNode Split `GenericMDNode` into two classes (with more descriptive names). - `UniquableMDNode` will be a common subclass for `MDNode`s that are sometimes uniqued like constants, and sometimes 'distinct'. This class gets the (short-lived) RAUW support and related API. - `MDTuple` is the basic tuple that has always been returned by `MDNode::get()`. This is as opposed to more specific nodes to be added soon, which have additional fields, custom assembly syntax, and extra semantics. This class gets the hash-related logic, since other sublcasses of `UniquableMDNode` may need to hash based on other fields. To keep this diff from getting too big, I've added casts to `MDTuple` that won't really scale as new subclasses of `UniquableMDNode` are added, but I'll clean those up incrementally. (No functionality change intended.) llvm-svn: 225682	2015-01-12 20:09:34 +00:00
Sanjay Patel	5f1d9eaad3	GVN: propagate equalities for floating point compares Allow optimizations based on FP comparison values in the same way as integers. This resolves PR17713: http://llvm.org/bugs/show_bug.cgi?id=17713 Differential Revision: http://reviews.llvm.org/D6911 llvm-svn: 225660	2015-01-12 19:29:48 +00:00
Timur Iskhodzhanov	00ede84084	[ASan] Move the shadow on Windows 32-bit from 0x20000000 to 0x40000000 llvm-svn: 225641	2015-01-12 17:38:58 +00:00
Ahmed Bougacha	e03bef7543	[SimplifyLibCalls] Factor out fortified libcall handling. This lets us remove CGP duplicate. Differential Revision: http://reviews.llvm.org/D6541 llvm-svn: 225640	2015-01-12 17:22:43 +00:00
Ahmed Bougacha	6722f5e5b3	[SimplifyLibCalls] Factor out str/mem libcall optimizations. Put them in a separate function, so we can reuse them to further simplify fortified libcalls as well. Differential Revision: http://reviews.llvm.org/D6540 llvm-svn: 225639	2015-01-12 17:20:06 +00:00
Ahmed Bougacha	b7d8afb6c5	[SimplifyLibCalls] Factor out signature checks for fortifiable libcalls. The checks are the same for fortified counterparts to the libcalls, so we might as well do them in a single place. Differential Revision: http://reviews.llvm.org/D6539 llvm-svn: 225638	2015-01-12 17:18:19 +00:00
Hal Finkel	38dd590861	[LoopUnroll] Fix the partial unrolling threshold for small loop sizes When we compute the size of a loop, we include the branch on the backedge and the comparison feeding the conditional branch. Under normal circumstances, these don't get replicated with the rest of the loop body when we unroll. This led to the somewhat surprising behavior that really small loops would not get unrolled enough -- they could be unrolled more and the resulting loop would be below the threshold, because we were assuming they'd take (LoopSize * UnrollingFactor) instructions after unrolling, instead of (((LoopSize-2) * UnrollingFactor)+2) instructions. This fixes that computation. llvm-svn: 225565	2015-01-10 00:30:55 +00:00
Michael Zolotukhin	d9ade185b9	Update comment. llvm-svn: 225553	2015-01-09 22:15:06 +00:00
Hans Wennborg	dcc6e5bc03	SimplifyCFG: check uses of constant-foldable instrs in switch destinations (PR20210) The previous code assumed that such instructions could not have any uses outside CaseDest, with the motivation that the instruction could not dominate CommonDest because CommonDest has phi nodes in it. That simply isn't true; e.g., CommonDest could have an edge back to itself. llvm-svn: 225552	2015-01-09 22:13:31 +00:00
Michael Zolotukhin	1c38bc12de	Remove duplicating code. NFC. The removed condition is checked in the previous loop. llvm-svn: 225542	2015-01-09 20:36:19 +00:00
Tim Northover	eb16112e97	Re-reapply r221924: "[GVN] Perform Scalar PRE on gep indices that feed loads before doing Load PRE" It's not really expected to stick around, last time it provoked a weird LTO build failure that I can't reproduce now, and the bot logs are long gone. I'll re-revert it if the failures recur. Original description: Perform Scalar PRE on gep indices that feed loads before doing Load PRE. llvm-svn: 225536	2015-01-09 19:19:56 +00:00
Suyog Sarda	85d0473650	Assumption that "VectorizedValue" will always be an Instruction is not correct. It can be a constant or a vector argument. ex : define i32 @hadd(<4 x i32> %a) #0 { entry: %vecext = extractelement <4 x i32> %a, i32 0 %vecext1 = extractelement <4 x i32> %a, i32 1 %add = add i32 %vecext, %vecext1 %vecext2 = extractelement <4 x i32> %a, i32 2 %add3 = add i32 %add, %vecext2 %vecext4 = extractelement <4 x i32> %a, i32 3 %add5 = add i32 %add3, %vecext4 ret i32 %add5 } llvm-svn: 225517	2015-01-09 10:23:48 +00:00
Philip Reames	567feb98f0	[Refactor] Have getNonLocalPointerDependency take the query instruction Previously, MemoryDependenceAnalysis::getNonLocalPointerDependency was taking a list of properties about the instruction being queried. Since I'm about to need one more property to be passed down through the infrastructure - I need to know a query instruction is non-volatile in an inner helper - fix the interface once and for all. I also added some assertions and behaviour clarifications around volatile and ordered field accesses. At the moment, this is mostly to document expected behaviour. The only non-standard instructions which can currently reach this are atomic, but unordered, loads and stores. Neither ordered or volatile accesses can reach here. The call in GVN is protected by an isSimple check when it first considers the load. The calls in MemDepPrinter are protected by isUnordered checks. Both utilities also check isVolatile for loads and stores. llvm-svn: 225481	2015-01-09 00:04:22 +00:00
Duncan P. N. Exon Smith	953e1a48f0	Utils: Keep distinct MDNodes distinct in MapMetadata() Create new copies of distinct `MDNode`s instead of following the uniquing `MDNode` logic. Just like self-references (or other cycles), `MapMetadata()` creates a new node. In practice most calls use `RF_NoModuleLevelChanges`, in which case nothing is duplicated anyway. Part of PR22111. llvm-svn: 225476	2015-01-08 22:42:30 +00:00
Matt Arsenault	b935d9df4c	Fix fcmp + fabs instcombines when using the intrinsic This was only handling the libcall. This is another example of why only the intrinsic should ever be used when it exists. llvm-svn: 225465	2015-01-08 20:09:34 +00:00
Adrian Prantl	2561bb8831	Revert "Reapply: Teach SROA how to update debug info for fragmented variables." This reverts commit r225379 while investigating an assertion failure reported by Alexey. llvm-svn: 225424	2015-01-08 02:02:00 +00:00
Adrian Prantl	72b8ee708f	Reapply: Teach SROA how to update debug info for fragmented variables. The two buildbot failures were addressed in LLVM r225378 and CFE r225359. This rapplies commit 225272 without modifications. llvm-svn: 225379	2015-01-07 20:52:22 +00:00
David Majnemer	5310c1e954	Analysis: Reformulate WillNotOverflowUnsignedAdd for reusability WillNotOverflowUnsignedAdd's smarts will live in ValueTracking as computeOverflowForUnsignedAdd. It now returns a tri-state result: never overflows, always overflows and sometimes overflows. llvm-svn: 225329	2015-01-07 00:39:50 +00:00
David Majnemer	3b83b3fa0b	InstCombine: Just a small tidy-up llvm-svn: 225328	2015-01-07 00:39:42 +00:00
Adrian Prantl	52f943b536	Revert "Reapply: Teach SROA how to update debug info for fragmented variables." because of a tsan buildbot failure. This reverts commit 225272. Fix should be coming soon. llvm-svn: 225288	2015-01-06 19:47:27 +00:00
Sanjoy Das	7c0ce26614	This patch teaches IndVarSimplify to add nuw and nsw to certain kinds of operations that provably don't overflow. For example, we can prove %civ.inc below does not sign-overflow. With this change, IndVarSimplify changes %civ.inc to an add nsw. define i32 @foo(i32* %array, i32* %length_ptr, i32 %init) { entry: %length = load i32* %length_ptr, !range !0 %len.sub.1 = sub i32 %length, 1 %upper = icmp slt i32 %init, %len.sub.1 br i1 %upper, label %loop, label %exit loop: %civ = phi i32 [ %init, %entry ], [ %civ.inc, %latch ] %civ.inc = add i32 %civ, 1 %cmp = icmp slt i32 %civ.inc, %length br i1 %cmp, label %latch, label %break latch: store i32 0, i32* %array %check = icmp slt i32 %civ.inc, %len.sub.1 br i1 %check, label %loop, label %break break: ret i32 %civ.inc exit: ret i32 42 } Differential Revision: http://reviews.llvm.org/D6748 llvm-svn: 225282	2015-01-06 19:02:56 +00:00
Adrian Prantl	8335a5724a	Reapply: Teach SROA how to update debug info for fragmented variables. This also rolls in the changes discussed in http://reviews.llvm.org/D6766. Defers migrating the debug info for new allocas until after all partitions are created. Thanks to Chandler for reviewing! llvm-svn: 225272	2015-01-06 17:14:10 +00:00
Matt Arsenault	55e7312cd8	Convert fcmp with 0.0 from casted integers to icmp This is already handled in general when it is known the conversion can't lose bits with smaller integer types casted into wider floating point types. This pattern happens somewhat often in GPU programs that cast workitem intrinsics to float, which are often compared with 0. Specifically handle the special case of compares with zero which should also be known to not lose information. I had a more general version of this which allows equality compares if the casted float is exactly representable in the integer, but I'm not 100% confident that is always correct. Also fold cases that aren't integers to true / false. llvm-svn: 225265	2015-01-06 15:50:59 +00:00
David Majnemer	9b6b822814	InstCombine: Bitcast call arguments from/to pointer/integer type Try harder to get rid of bitcast'd calls by ptrtoint/inttoptr'ing arguments and return values when DataLayout says it is safe to do so. llvm-svn: 225254	2015-01-06 08:41:31 +00:00
Saleem Abdulrasool	150a1dc5c2	SymbolRewriter: use iplist::splice The swap implementation for iplist is currently unsupported. Simply splice the old list into place, which achieves the same purpose. This is needed in order to thread the -frewrite-map-file frontend option correctly. NFC. llvm-svn: 225186	2015-01-05 17:56:32 +00:00
Saleem Abdulrasool	d37ce30888	SymbolRewriter: 80-column Wrap a couple of lines. NFC. llvm-svn: 225185	2015-01-05 17:56:29 +00:00
Craig Topper	d3c02f177a	Replace several 'assert(false' with 'llvm_unreachable' or fold a condition into the assert. llvm-svn: 225160	2015-01-05 10:15:49 +00:00
Jiangning Liu	40c1b35292	Fixed a bug in memory dependence checking module of loop vectorization. The following loop should not be vectorized with current algorithm. {code} // loop body ... = a[i] (1) ... = a[i+1] (2) ....... a[i+1] = .... (3) a[i] = ... (4) {code} The algorithm tries to collect memory access candidates from AliasSetTracker, and then check memory dependences one another. The memory accesses are unique in AliasSetTracker, and a single memory access in AliasSetTracker may map to multiple entries in AccessAnalysis, which could cover both 'read' and 'write'. Originally the algorithm only checked 'write' entry in Accesses if only 'write' exists. This is incorrect and the consequence is it ignored all read access, and finally some RAW and WAR dependence are missed. For the case given above, if we ignore two reads, the dependence between (1) and (3) would not be able to be captured, and finally this loop will be incorrectly vectorized. The fix simply inserts a new loop to find all entries in Accesses. Since it will skip most of all other memory accesses by checking the Value pointer at the very beginning of the loop, it should not increase compile-time visibly. llvm-svn: 225159	2015-01-05 10:08:58 +00:00
Chandler Carruth	73b0164fe5	[SROA] Apply a somewhat heavy and unpleasant hammer to fix PR22093, an assert out of the new pre-splitting in SROA. This fix makes the code do what was originally intended -- when we have a store of a load both dealing in the same alloca, we force them to both be pre-split with identical offsets. This is really quite hard to do because we can keep discovering problems as we go along. We have to track every load over the current alloca which for any resaon becomes invalid for pre-splitting, and go back to remove all stores of those loads. I've included a couple of test cases derived from PR22093 that cover the different ways this can happen. While that PR only really triggered the first of these two, its the same fundamental issue. The other challenge here is documented in a FIXME now. We end up being quite a bit more aggressive for pre-splitting when loads and stores don't refer to the same alloca. This aggressiveness comes at the cost of introducing potentially redundant loads. It isn't clear that this is the right balance. It might be considerably better to require that we only do pre-splitting when we can presplit every load and store involved in the entire operation. That would give more consistent if conservative results. Unfortunately, it requires a non-trivial change to the actual pre-splitting operation in order to correctly handle cases where we end up pre-splitting stores out-of-order. And it isn't 100% clear that this is the right direction, although I'm starting to suspect that it is. llvm-svn: 225149	2015-01-05 04:17:53 +00:00
Chandler Carruth	66b3130cda	[PM] Split the AssumptionTracker immutable pass into two separate APIs: a cache of assumptions for a single function, and an immutable pass that manages those caches. The motivation for this change is two fold. Immutable analyses are really hacks around the current pass manager design and don't exist in the new design. This is usually OK, but it requires that the core logic of an immutable pass be reasonably partitioned off from the pass logic. This change does precisely that. As a consequence it also paves the way for the many utility functions that deal in the assumptions to live in both pass manager worlds by creating an separate non-pass object with its own independent API that they all rely on. Now, the only bits of the system that deal with the actual pass mechanics are those that actually need to deal with the pass mechanics. Once this separation is made, several simplifications become pretty obvious in the assumption cache itself. Rather than using a set and callback value handles, it can just be a vector of weak value handles. The callers can easily skip the handles that are null, and eventually we can wrap all of this up behind a filter iterator. For now, this adds boiler plate to the various passes, but this kind of boiler plate will end up making it possible to port these passes to the new pass manager, and so it will end up factored away pretty reasonably. llvm-svn: 225131	2015-01-04 12:03:27 +00:00
David Majnemer	087dc8b831	InstCombine: match can find ConstantExprs, don't assume we have a Value We assumed the output of a match was a Value, this would cause us to assert because we would fail a cast<>. Instead, use a helper in the Operator family to hide the distinction between Value and Constant. This fixes PR22087. llvm-svn: 225127	2015-01-04 07:36:02 +00:00
Kostya Serebryany	d421db05bb	[asan] simplify the tracing code, make it use the same guard variables as coverage llvm-svn: 225103	2015-01-03 00:54:43 +00:00
David Majnemer	c8a576b5c0	InstCombine: Detect when llvm.umul.with.overflow always overflows We know overflow always occurs if both ~LHSKnownZero * ~RHSKnownZero and LHSKnownOne * RHSKnownOne overflow. llvm-svn: 225077	2015-01-02 07:29:47 +00:00
David Majnemer	491331aca8	Analysis: Reformulate WillNotOverflowUnsignedMul for reusability WillNotOverflowUnsignedMul's smarts will live in ValueTracking as computeOverflowForUnsignedMul. It now returns a tri-state result: never overflows, always overflows and sometimes overflows. llvm-svn: 225076	2015-01-02 07:29:43 +00:00
Chandler Carruth	24ac830d7c	[SROA] Teach SROA to be more aggressive in splitting now that we have a pre-splitting pass over loads and stores. Historically, splitting could cause enough problems that I hamstrung the entire process with a requirement that splittable integer loads and stores must cover the entire alloca. All smaller loads and stores were unsplittable to prevent chaos from ensuing. With the new pre-splitting logic that does load/store pair splitting I introduced in r225061, we can now very nicely handle arbitrarily splittable loads and stores. In order to fully benefit from these smarts, we need to mark all of the integer loads and stores as splittable. However, we don't actually want to rewrite partitions with all integer loads and stores marked as splittable. This will fail to extract scalar integers from aggregates, which is kind of the point of SROA. =] In order to resolve this, what we really want to do is only do pre-splitting on the alloca slices with integer loads and stores fully splittable. This allows us to uncover all non-integer uses of the alloca that would benefit from a split in an integer load or store (and where introducing the split is safe because it is just memory transfer from a load to a store). Once done, we make all the non-whole-alloca integer loads and stores unsplittable just as they have historically been, repartition and rewrite. The result is that when there are integer loads and stores anywhere within an alloca (such as from a memcpy of a sub-object of a larger object), we can split them up if there are non-integer components to the aggregate hiding beneath. I've added the challenging test cases to demonstrate how this is able to promote to scalars even a case where we have even partially overlapping loads and stores. This restores the single-store behavior for small arrays of i8s which is really nice. I've restored both the little endian testing and big endian testing for these exactly as they were prior to r225061. It also forced me to be more aggressive in an alignment test to actually defeat SROA. =] Without the added volatiles there, we actually split up the weird i16 loads and produce nice double allocas with better alignment. This also uncovered a number of bugs where we failed to handle splittable load and store slices which didn't have a begininng offset of zero. Those fixes are included, and without them the existing test cases explode in glorious fireworks. =] I've kept support for leaving whole-alloca integer loads and stores as splittable even for the purpose of rewriting, but I think that's likely no longer needed. With the new pre-splitting, we might be able to remove all the splitting support for loads and stores from the rewriter. Not doing that in this patch to try to isolate any performance regressions that causes in an easy to find and revert chunk. llvm-svn: 225074	2015-01-02 03:55:54 +00:00
Chandler Carruth	5986b541d4	[SROA] Make the computation of adjusted pointers not leak GEP instructions. I noticed this when working on dialing up how aggressively we can pre-split loads and stores. My test case wasn't passing because dead GEPs into the allocas persisted when they were built by this routine. This isn't terribly harmful, we still rewrote and promoted the alloca and I can't conceive of how to cause this to happen in a case where we will keep the exact same alloca but rewrite and promote the uses of it. If that ever happened, we'd get an assert out of mem2reg. So I don't have a direct test case yet, but the subsequent commit's test case wouldn't pass without this. There are other problems fixed by this patch that I spotted purely by inspection such as the fact that getAdjustedPtr could have actually deleted dead base pointers. I don't know how to get a base pointer to go into getAdjustedPtr today, so I think this bug could never have manifested (and I certainly can't write a test case for it) but, it wasn't the intent of the code. The code really just wanted to GC the new instructions built. That can be done more directly by comparing with the base pointer which is the only non-new instruction that this code can return. llvm-svn: 225073	2015-01-02 02:47:38 +00:00
Chandler Carruth	29c22fae46	[SROA] Fix the loop exit placement to be prior to indexing the splits array. This prevents it from walking out of bounds on the splits array. Bug found with the existing tests by ASan and by the MSVC debug build. llvm-svn: 225069	2015-01-02 00:10:22 +00:00
Chandler Carruth	c39eaa5041	[SROA] Fix two total think-os in r225061 that should have been caught on a +asserts bootstrap, but my bootstrap had asserts off. Oops. Anyways, in some places it is reasonable to cast (as a sanity check) the pointer operand to a load or store to an instruction within SROA -- namely when the pointer operand is expected to be derived from an alloca, and thus always an instruction. However, the pre-splitting code also deals with loads and stores to non-alloca pointers and there we need to just use the Value*. Nothing about the code relied on the instruction cast, it was only there essentially as an invariant assertion. Remove the two that don't actually hold. This should fix the proximate issue in PR22080, but I'm also doing an asserts bootstrap myself to see if there are other issues lurking. I'll craft a reduced test case in a moment, but I wanted to get the tree healthy as quickly as possible. llvm-svn: 225068	2015-01-01 23:26:16 +00:00
Chandler Carruth	6044c0bc78	[SROA] Switch to using a more direct debug logging technique in one part of my new load and store splitting, and fix a bug where it logged a totally irrelevant slice rather than the actual slice in question. The logging here previously worked because we used to place new slices onto the back of the core sequence, but that caused other problems. I updated the actual code to store new slices in their own vector but didn't update the logging. There isn't a good way to reuse the logging any more, and frankly it wasn't needed. We can directly log this bit more easily. llvm-svn: 225063	2015-01-01 12:56:47 +00:00
Chandler Carruth	994cde8869	[SROA] Fix formatting with clang-format which I managed to fail to do prior to committing r225061. Sorry for that. llvm-svn: 225062	2015-01-01 12:01:03 +00:00
Chandler Carruth	0715cba02d	[SROA] Teach SROA how to much more intelligently handle split loads and stores. When there are accesses to an entire alloca with an integer load or store as well as accesses to small pieces of the alloca, SROA splits up the large integer accesses. In order to do that, it uses bit math to merge the small accesses into large integers. While this is effective, it produces insane IR that can cause significant problems in the rest of the optimizer: - It can cause load and store mismatches with GVN on the non-alloca side where we end up loading an i64 (or some such) rather than loading specific elements that are stored. - We can't always get rid of the integer bit math, which is why we can't always fix the loads and stores to work well with GVN. - This is especially bad when we have operations that mix poorly with integer bit math such as floating point operations. - It will block things like the vectorizer which might be able to handle the scalar stores that underly the aggregate. At the same time, we can't just directly split up these loads and stores in all cases. If there is actual integer arithmetic involved on the values, then using integer bit math is actually the perfect lowering because we can often combine it heavily with the surrounding math. The solution this patch provides is to find places where SROA is partitioning aggregates into small elements, and look for splittable loads and stores that it can split all the way to some other adjacent load and store. These are uniformly the cases where failing to split the loads and stores hurts the optimizer that I have seen, and I've looked extensively at the code produced both from more and less aggressive approaches to this problem. However, it is quite tricky to actually do this in SROA. We may have loads and stores to the same alloca, or other complex patterns that are hard to handle. This complexity leads to the somewhat subtle algorithm implemented here. We have to do this entire process as a separate pass over the partitioning of the alloca, and split up all of the loads prior to splitting the stores so that we can handle safely the cases of overlapping, including partially overlapping, loads and stores to the same alloca. We also have to reconstitute the post-split slice configuration so we can avoid iterating again over all the alloca uses (the slow part of SROA). But we also have to ensure that when we split up loads and stores to other allocas, we do re-iterate over them in SROA to adapt to the more refined partitioning now required. With this, I actually think we can fix a long-standing TODO in SROA where I avoided splitting as many loads and stores as probably should be splittable. This limitation historically mitigated the fallout of all the bad things mentioned above. Now that we have more intelligent handling, I plan to remove the FIXME and more aggressively mark integer loads and stores as splittable. I'll do that in a follow-up patch to help with bisecting any fallout. The net result of this change should be more fine-grained and accurate scalars being formed out of aggregates. At the very least, Clang now generates perfect code for this high-level test case using std::complex<float>: #include <complex> void g1(std::complex<float> &x, float a, float b) { x += std::complex<float>(a, b); } void g2(std::complex<float> &x, float a, float b) { x -= std::complex<float>(a, b); } void foo(const std::complex<float> &x, float a, float b, std::complex<float> &x1, std::complex<float> &x2) { std::complex<float> l1 = x; g1(l1, a, b); std::complex<float> l2 = x; g2(l2, a, b); x1 = l1; x2 = l2; } This code isn't just hypothetical either. It was reduced out of the hot inner loops of essentially every part of the Eigen math library when using std::complex<float>. Those loops would consistently and pervasively hop between the floating point unit and the integer unit due to bit math extraction and insertion of floating point values that were "stored" in a 64-bit integer register around the loop backedge. So far, this change has passed a bootstrap and I have done some other testing and so far, no issues. That doesn't mean there won't be though, so I'll be prepared to help with any fallout. If you performance swings in particular, please let me know. I'm very curious what all the impact of this change will be. Stay tuned for the follow-up to also split more integer loads and stores. llvm-svn: 225061	2015-01-01 11:54:38 +00:00
Sanjay Patel	e68f71574f	InstCombine: fsub nsz 0, X ==> fsub nsz -0.0, X Some day the backend may handle instruction-level fast math flags and make this transform unnecessary, but it's still better practice to use the canonical representation of fneg when possible (use a -0.0). This is a partial fix for PR20870 ( http://llvm.org/bugs/show_bug.cgi?id=20870 ). See also http://reviews.llvm.org/D6723. Differential Revision: http://reviews.llvm.org/D6731 llvm-svn: 225050	2014-12-31 22:14:05 +00:00
David Majnemer	f89dc3edc9	InstCombine: try to transform A-B < 0 into A < B We are allowed to move the 'B' to the right hand side if we an prove there is no signed overflow and if the comparison itself is signed. llvm-svn: 225034	2014-12-31 04:21:41 +00:00
Kostya Serebryany	aa185bfc4b	[asan] change _sanitizer_cov_module_init to accept int* instead of int** llvm-svn: 224999	2014-12-30 19:29:28 +00:00
Elena Demikhovsky	84d1997b95	Some code improvements in Masked Load/Store. No functional changes. llvm-svn: 224986	2014-12-30 14:28:14 +00:00
Philip Reames	9db26ffc9a	Carry facts about nullness and undef across GC relocation This change implements four basic optimizations: If a relocated value isn't used, it doesn't need to be relocated. If the value being relocated is null, relocation doesn't change that. (Technically, this might be collector specific. I don't know of one which it doesn't work for though.) If the value being relocated is undef, the relocation is meaningless. If the value being relocated was known nonnull, the relocated pointer also isn't null. (Since it points to the same source language object.) I outlined other planned work in comments. Differential Revision: http://reviews.llvm.org/D6600 llvm-svn: 224968	2014-12-29 23:27:30 +00:00
Philip Reames	b35f46ce06	Refine the notion of MayThrow in LICM to include a header specific version In LICM, we have a check for an instruction which is guaranteed to execute and thus can't introduce any new faults if moved to the preheader. To handle a function which might unconditionally throw when first called, we check for any potentially throwing call in the loop and give up. This is unfortunate when the potentially throwing condition is down a rare path. It prevents essentially all LICM of potentially faulting instructions where the faulting condition is checked outside the loop. It also greatly diminishes the utility of loop unswitching since control dependent instructions - which are now likely in the loops header block - will not be lifted by subsequent LICM runs. define void @nothrow_header(i64 %x, i64 %y, i1 %cond) { ; CHECK-LABEL: nothrow_header ; CHECK-LABEL: entry ; CHECK: %div = udiv i64 %x, %y ; CHECK-LABEL: loop ; CHECK: call void @use(i64 %div) entry: br label %loop loop: ; preds = %entry, %for.inc %div = udiv i64 %x, %y br i1 %cond, label %loop-if, label %exit loop-if: call void @use(i64 %div) br label %loop exit: ret void } The current patch really only helps with non-memory instructions (i.e. divs, etc..) since the maythrow call down the rare path will be considered to alias an otherwise hoistable load. The one exception is that it does kick in for loads which are known to be invariant without regard to other possible stores, i.e. those marked with either !invarant.load metadata of tbaa 'is constant memory' metadata. Differential Revision: http://reviews.llvm.org/D6725 llvm-svn: 224965	2014-12-29 23:00:57 +00:00
Philip Reames	5ad26c353c	Loading from null is valid outside of addrspace 0 This patches fixes a miscompile where we were assuming that loading from null is undefined and thus we could assume it doesn't happen. This transform is perfectly legal in address space 0, but is not neccessarily legal in other address spaces. We really should introduce a hook to control this property on a per target per address space basis. We may be loosing valuable optimizations in some address spaces by being too conservative. Original patch by Thomas P Raoux (submitted to llvm-commits), tests and formatting fixes by me. llvm-svn: 224961	2014-12-29 22:46:21 +00:00
David Majnemer	b1296ec0fd	InstCombine: Infer nuw for multiplies A multiply cannot unsigned wrap if there are bitwidth, or more, leading zero bits between the two operands. llvm-svn: 224849	2014-12-26 09:50:35 +00:00
David Majnemer	54c2ca2539	InstCombe: Infer nsw for multiplies We already utilize this logic for reducing overflow intrinsics, it makes sense to reuse it for normal multiplies as well. llvm-svn: 224847	2014-12-26 09:10:14 +00:00
Elena Demikhovsky	fb81b93e17	Masked Load/Store - Changed the order of parameters in intrinsics. No functional changes. The documentation is coming. llvm-svn: 224829	2014-12-25 07:49:20 +00:00
Chandler Carruth	ffb7ce56a6	[SROA] Update the documentation and names for accessing the slices within a partition of an alloca in SROA. This reflects the fact that the organization of the slices isn't really ideal for analysis, but is the naive way in which the slices are available while we're processing them in the core partitioning algorithm. It is possible we could improve matters, and I've left a FIXME with one of my ideas for how to do this, but it is a lot of work, the benefit is somewhat minor, and it isn't clear that it would be strictly better. =/ Not really satisfying, but I'm out of really good ideas. This also improves one place where the debug logging failed to mark some split partitions. Now we log in one place, slightly later, and with accurate information about whether the slice is split by the partition being rewritten. llvm-svn: 224800	2014-12-24 01:48:09 +00:00
Chandler Carruth	5031bbe86a	[SROA] Refactor the integer and vector promotion testing logic to operate in terms of the new Partition class, and generally have a more clear set of arguments. No functionality changed. The most notable improvements here are consistently using the terminology of 'partition' for a collection of slices that will be rewritten together and 'slice' for a region of an alloca that is used by a particular instruction. This also makes it more clear that the split things are actually slices as well, just ones that will be split by the proposed partition. This doesn't yet address the confusing aspects of the partition's interface where slices that will be split by the partition and start prior to the partition are accesssed via Partition::splitSlices() while the core range of slices exposed by a Partition includes both unsplit slices and slices which will be split by the end, but started within the offset range of the partition. This is particularly hard to address because the algorithm which computes partitions quite literally doesn't know which slices these will end up being until too late. I'm looking at whether I can fix that or not, but I'm not optimistic. I'll update the comments and/or names to further explain this either way. I've also added one FIXME in this patch relating to this confusion so that I don't forget about it. llvm-svn: 224798	2014-12-24 01:05:14 +00:00
Kostya Serebryany	9fdeb37bd3	[asan] change the coverage collection scheme so that we can easily emit coverage for the entire process as a single bit set, and if coverage_bitset=1 actually emit that bitset llvm-svn: 224789	2014-12-23 22:32:17 +00:00
Michael Liao	5313da3263	[SimplifyCFG] Revise common code sinking - Fix the case where more than 1 common instructions derived from the same operand cannot be sunk. When a pair of value has more than 1 derived values in both branches, only 1 derived value could be sunk. - Replace BB1 -> (BB2, PN) map with joint value map, i.e. map of (BB1, BB2) -> PN, which is more accurate to track common ops. llvm-svn: 224757	2014-12-23 08:26:55 +00:00
Michael Kuperstein	0bf33ffde4	Remove a bad cast in CloneModule() A cast that was introduced in r209007 was accidentally left in after the changes made to GlobalAlias rules in r210062. This crashes if the aliasee is a now-leggal ConstantExpr. llvm-svn: 224756	2014-12-23 08:23:45 +00:00
Chandler Carruth	c7d1e24b34	Revert r224739: Debug info: Teach SROA how to update debug info for fragmented variables. This caused codegen to start crashing when we built somewhat large programs with debug info and optimizations. 'check-msan' hit in, and I suspect a bootstrap would as well. I mailed a test case to the review thread. llvm-svn: 224750	2014-12-23 02:58:14 +00:00
David Blaikie	ea37c1173e	Remove dynamic allocation/indirection from GCOVBlocks owned by GCOVFunction Since these are all created in the DenseMap before they are referenced, there's no problem with pointer validity by the time it's required. This removes another use of DeleteContainerSeconds/manual memory management which I'm cleaning up from time to time. llvm-svn: 224744	2014-12-22 23:12:42 +00:00
Chandler Carruth	e2f66ceed9	[SROA] Lift the logic for traversing the alloca slices one partition at a time into a partition iterator and a Partition class. There is a lot of knock-on simplification that this enables, largely stemming from having a Partition object to refer to in lots of helpers. I've only done a minimal amount of that because enoguh stuff is changing as-is in this commit. This shouldn't change any observable behavior. I've worked hard to preserve the exact traversal semantics which were originally present even though some of them make no sense. I'll be changing some of this in subsequent commits now that the logic is carefully factored into a reusable place. The primary motivation for this change is to break the rewriting into phases in order to support more intelligent rewriting. For example, I'm planning to change how split loads and stores are rewritten to remove the significant overuse of integer bit packing in the resulting code and allow more effective secondary splitting of aggregates. For any of this to work, they have to share the exact traversal logic. llvm-svn: 224742	2014-12-22 22:46:00 +00:00
Bruno Cardoso Lopes	bad65c3b70	[LCSSA] Handle PHI insertion in disjoint loops Take two disjoint Loops L1 and L2. LoopSimplify fails to simplify some loops (e.g. when indirect branches are involved). In such situations, it can happen that an exit for L1 is the header of L2. Thus, when we create PHIs in one of such exits we are also inserting PHIs in L2 header. This could break LCSSA form for L2 because these inserted PHIs can also have uses in L2 exits, which are never handled in the current implementation. Provide a fix for this corner case and test that we don't assert/crash on that. Differential Revision: http://reviews.llvm.org/D6624 rdar://problem/19166231 llvm-svn: 224740	2014-12-22 22:35:46 +00:00
Adrian Prantl	a47ace5901	Debug info: Teach SROA how to update debug info for fragmented variables. This allows us to generate debug info for extremely advanced code such as typedef struct { long int a; int b;} S; int foo(S s) { return s.b; } which at -O1 on x86_64 is codegen'd into define i32 @foo(i64 %s.coerce0, i32 %s.coerce1) #0 { ret i32 %s.coerce1, !dbg !24 } with this patch we emit the following debug info for this TAG_formal_parameter [3] AT_location( 0x00000000 0x0000000000000000 - 0x0000000000000006: rdi, piece 0x00000008, rsi, piece 0x00000004 0x0000000000000006 - 0x0000000000000008: rdi, piece 0x00000008, rax, piece 0x00000004 ) AT_name( "s" ) AT_decl_file( "/Volumes/Data/llvm/_build.ninja.release/test.c" ) Thanks to chandlerc, dblaikie, and echristo for their feedback on all previous iterations of this patch! llvm-svn: 224739	2014-12-22 22:26:00 +00:00
David Majnemer	b0362e4ee6	InstCombine: Squash an icmp+select into bitwise arithmetic (X & INT_MIN) == 0 ? X ^ INT_MIN : X into X \| INT_MIN (X & INT_MIN) != 0 ? X ^ INT_MIN : X into X & INT_MAX This fixes PR21993. llvm-svn: 224676	2014-12-20 04:45:35 +00:00
Chandler Carruth	113dc64c67	[SROA] Run clang-format over the entire SROA pass as I wrote it before much of the glory of clang-format, and now any time I touch it I risk introducing formatting changes as part of a functional commit. Also, clang-format is way better at formatting my code than I am. Most of this is a huge improvement although I reverted a couple of places where I hit a clang-format bug with lambdas that has been filed but not (fully) fixed. llvm-svn: 224666	2014-12-20 02:39:18 +00:00
Tilmann Scheller	be98f3c4ec	[BBVectorize] Remove two more redundant assignments. Found by the Clang static analyzer. llvm-svn: 224590	2014-12-19 17:21:38 +00:00
Tilmann Scheller	945ce0ae00	[BBVectorize] Remove redundant assignment. Found by the Clang static analyzer. llvm-svn: 224589	2014-12-19 17:13:12 +00:00
Bruno Cardoso Lopes	f6cf8ad4e5	Reapply: [InstCombine] Fix visitSwitchInst to use right operand types for sub cstexpr The visitSwitchInst generates SUB constant expressions to recompute the switch condition. When truncating the condition to a smaller type, SUB expressions should use the previous type (before trunc) for both operands. Also, fix code to also return the modified switch when only the truncation is performed. This fixes an assertion crash. Differential Revision: http://reviews.llvm.org/D6644 rdar://problem/19191835 llvm-svn: 224588	2014-12-19 17:12:35 +00:00
Tilmann Scheller	b811030b47	[LoopVectorize] Remove redundant assignment. Found by the Clang static analyzer. llvm-svn: 224587	2014-12-19 17:02:31 +00:00
Sanjay Patel	ea3c802887	use -0.0 when creating an fneg instruction Backends recognize (-0.0 - X) as the canonical form for fneg and produce better code. Eg, ppc64 with 0.0: lis r2, ha16(LCPI0_0) lfs f0, lo16(LCPI0_0)(r2) fsubs f1, f0, f1 blr vs. -0.0: fneg f1, f1 blr Differential Revision: http://reviews.llvm.org/D6723 llvm-svn: 224583	2014-12-19 16:44:08 +00:00
Bruno Cardoso Lopes	3be15b2fa6	Revert "[InstCombine] Fix visitSwitchInst to use right operand types for sub cstexpr" Reverts commit r224574 to appease buildbots: The visitSwitchInst generates SUB constant expressions to recompute the switch condition. When truncating the condition to a smaller type, SUB expressions should use the previous type (before trunc) for both operands. This fixes an assertion crash. llvm-svn: 224576	2014-12-19 14:36:24 +00:00
Bruno Cardoso Lopes	c9005f2f2b	[InstCombine] Fix visitSwitchInst to use right operand types for sub cstexpr The visitSwitchInst generates SUB constant expressions to recompute the switch condition. When truncating the condition to a smaller type, SUB expressions should use the previous type (before trunc) for both operands. This fixes an assertion crash. Differential Revision: http://reviews.llvm.org/D6644 rdar://problem/19191835 llvm-svn: 224574	2014-12-19 14:23:15 +00:00
Duncan P. N. Exon Smith	46d7af5729	Rename MapValue(Metadata*) to MapMetadata() Instead of reusing the name `MapValue()` when mapping `Metadata`, use `MapMetadata()`. The old name doesn't make much sense after the `Metadata`/`Value` split. llvm-svn: 224566	2014-12-19 06:06:18 +00:00
Sanjay Patel	c242dbb3b6	fix formatting; NFC llvm-svn: 224542	2014-12-18 21:11:09 +00:00
Viktor Kutuzov	b4ffb5d5e9	[Msan] Generalize instrumentation code to support FreeBSD mapping Differential Revision: http://reviews.llvm.org/D6666 llvm-svn: 224514	2014-12-18 12:12:59 +00:00
Chandler Carruth	68ea415d04	[SROA] Cleanup - remove the use of std::mem_fun_ref nonsense and use a lambda now that we have them. llvm-svn: 224500	2014-12-18 05:19:47 +00:00
Kostya Serebryany	fea4fb404e	[sanitizer] allow -fsanitize-coverage=N w/ -fsanitize=leak, llvm part llvm-svn: 224463	2014-12-17 21:50:04 +00:00
Suyog Sarda	43fae93da8	Revert 224119 "This patch recognizes (+ (+ v0, v1) (+ v2, v3)), reorders them for bundling into vector of loads, and vectorizes it." This was re-ordering floating point data types resulting in mismatch in output. llvm-svn: 224424	2014-12-17 10:34:27 +00:00
Erik Eckstein	a451b9b0b5	Strength reduce intrinsics with overflow into regular arithmetic operations if possible. Some intrinsics, like s/uadd.with.overflow and umul.with.overflow, are already strength reduced. This change adds other arithmetic intrinsics: s/usub.with.overflow, smul.with.overflow. It completes the work on PR20194. llvm-svn: 224417	2014-12-17 07:29:19 +00:00
Kostya Serebryany	7376294086	[sanitizer] prevent function call merging for sanitizer-coverage callbacks llvm-svn: 224372	2014-12-16 21:24:15 +00:00
Elena Demikhovsky	f5b72afff4	Masked Load and Store Intrinsics in loop vectorizer. The loop vectorizer optimizes loops containing conditional memory accesses by generating masked load and store intrinsics. This decision is target dependent. http://reviews.llvm.org/D6527 llvm-svn: 224334	2014-12-16 11:50:42 +00:00
Elena Demikhovsky	a5599bfd72	Sink store based on alias analysis - by Ella Bolshinsky The alias analysis is used define whether the given instruction is a barrier for store sinking. For 2 identical stores, following instructions are checked in the both basic blocks, to determine whether they are sinking barriers. http://reviews.llvm.org/D6420 llvm-svn: 224247	2014-12-15 14:09:53 +00:00
Elena Demikhovsky	3fcafa2cdb	Loop Vectorizer minor changes in the code - some comments, function names, identation. Reviewed here: http://reviews.llvm.org/D6527 llvm-svn: 224218	2014-12-14 09:43:50 +00:00
Steven Wu	f179d12e50	More code format fix from r224133, NFC llvm-svn: 224140	2014-12-12 18:48:37 +00:00
Steven Wu	1f7402a14e	Restructure code from r224097. NFC llvm-svn: 224133	2014-12-12 17:21:54 +00:00
Chad Rosier	78943bcc18	[Reassociate] Use dbgs() instead of errs(). llvm-svn: 224125	2014-12-12 14:44:12 +00:00
Suyog Sarda	384095e65c	This patch recognizes (+ (+ v0, v1) (+ v2, v3)), reorders them for bundling into vector of loads, and vectorizes it. Test case : float hadd(float* a) { return (a[0] + a[1]) + (a[2] + a[3]); } AArch64 assembly before patch : ldp s0, s1, [x0] ldp s2, s3, [x0, #8] fadd s0, s0, s1 fadd s1, s2, s3 fadd s0, s0, s1 ret AArch64 assembly after patch : ldp d0, d1, [x0] fadd v0.2s, v0.2s, v1.2s faddp s0, v0.2s ret Reviewed Link : http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20141208/248531.html llvm-svn: 224119	2014-12-12 12:53:44 +00:00
Steven Wu	881916dea5	Fix another infinite loop in InstCombine Summary: InstCombine infinite-loops for the testcase added It is because InstCombine is generating instructions that can be optimized by itself. Fix by not optimizing frem if the optimized type is the same as original type. rdar://problem/19150820 Reviewers: majnemer Differential Revision: http://reviews.llvm.org/D6634 llvm-svn: 224097	2014-12-12 04:34:07 +00:00
Alexey Samsonov	4b7f413e3e	[ASan] Change fake stack and local variables handling. This commit changes the way we get fake stack from ASan runtime (to find use-after-return errors) and the way we represent local variables: - __asan_stack_malloc function now returns pointer to newly allocated fake stack frame, or NULL if frame cannot be allocated. It doesn't take pointer to real stack as an input argument, it is calculated inside the runtime. - __asan_stack_free function doesn't take pointer to real stack as an input argument. Now this function is never called if fake stack frame wasn't allocated. - __asan_init version is bumped to reflect changes in the ABI. - new flag "-asan-stack-dynamic-alloca" allows to store all the function local variables in a dynamic alloca, instead of the static one. It reduces the stack space usage in use-after-return mode (dynamic alloca will not be called if the local variables are stored in a fake stack), and improves the debug info quality for local variables (they will not be described relatively to %rbp/%rsp, which are assumed to be clobbered by function calls). This flag is turned off by default for now, but I plan to turn it on after more testing. llvm-svn: 224062	2014-12-11 21:53:03 +00:00
Andrea Di Biagio	72b05aa59c	[InstCombine][X86] Improved folding of calls to Intrinsic::x86_sse4a_insertqi. This patch teaches the instruction combiner how to fold a call to 'insertqi' if the 'length field' (3rd operand) is set to zero, and if the sum between field 'length' and 'bit index' (4th operand) is bigger than 64. From the AMD64 Architecture Programmer's Manual: 1. If the sum of the bit index + length field is greater than 64, then the results are undefined; 2. A value of zero in the field length is defined as a length of 64. This patch improves the existing combining logic for intrinsic 'insertqi' adding extra checks to address both point 1. and point 2. Differential Revision: http://reviews.llvm.org/D6583 llvm-svn: 224054	2014-12-11 20:44:59 +00:00
Michael Kuperstein	fffb6996c9	The inliner needs to fix up debug information for llvm.dbg.declare, not only for llvm.dbg.value. Patch by Amjad Aboud Differential Revision: http://reviews.llvm.org/D6525 llvm-svn: 224015	2014-12-11 12:41:10 +00:00
Erik Eckstein	096ff7dcd6	Refactor creation of overflow result tuples in InstCombineCalls. Extract the creation of overflow result tuples in a separate function. NFC. llvm-svn: 224006	2014-12-11 08:02:30 +00:00
Kaelyn Takata	22324f378a	Rename static functiom "map" to be more descriptive and to avoid potential confusion with the std::map type. llvm-svn: 223853	2014-12-09 23:32:46 +00:00
Michael Zolotukhin	4def395646	Remove redundant variable. Tested by adding assert(LoopVectorPreHeader == VecPreheader) on LLVM test suite and SPECs. llvm-svn: 223847	2014-12-09 22:45:07 +00:00
Chandler Carruth	a7f247ea56	Revert r223764 which taught instcombine about integer-based elment extraction patterns. This is causing Clang to miscompile itself for 32-bit x86 somehow, and likely also on ARM and PPC. I really don't know how, but reverting now that I've confirmed this is actually the culprit. I have a reproduction as well and so should be able to restore this shortly. This reverts commit r223764. Original commit log follows: Teach instcombine to canonicalize "element extraction" from a load of an integer and "element insertion" into a store of an integer into actual element extraction, element insertion, and vector loads and stores. Previously various parts of LLVM (including instcombine itself) would introduce integer loads and stores into the code as a way of opaquely loading and storing "bits". In some cases (such as a memcpy of std::complex<float> object) we will eventually end up using those bits in non-integer types. In order for SROA to effectively promote the allocas involved, it splits these "store a bag of bits" integer loads and stores up into the constituent parts. However, for non-alloca loads and tsores which remain, it uses integer math to recombine the values into a large integer to load or store. All of this would be "fine", except that it forces LLVM to go through integer math to combine and split up values. While this makes perfect sense for integers (and in fact is critical for bitfields to end up lowering efficiently) it is terrible for non-integer types, especially floating point types. We have a much more canonical way of representing the act of concatenating the bits of two SSA values in LLVM: a vector and insertelement. This patch teaching InstCombine to use this representation. With this patch applied, LLVM will no longer introduce integer math into the critical path of every loop over std::complex<float> operations such as those that make up the hot path of ... oh, most HPC code, Eigen, and any other heavy linear algebra library. For the record, I looked extensively at fixing this in other parts of the compiler, but it just doesn't work: - We really do want to canonicalize memcpy and other bit-motion to integer loads and stores. SSA values are tremendously more powerful than "copy" intrinsics. Not doing this regresses massive amounts of LLVM's scalar optimizer. - We really do need to split up integer loads and stores of this form in SROA or every memcpy of a trivially copyable struct will prevent SSA formation of the members of that struct. It essentially turns off SROA. - The closest alternative is to actually split the loads and stores when partitioning with SROA, but this has all of the downsides historically discussed of splitting up loads and stores -- the wide-store information is fundamentally lost. We would also see performance regressions for bitfield-heavy code and other places where the integers aren't really intended to be split without seemingly arbitrary logic to treat integers totally differently. - We can effectively fix this in instcombine, so it isn't that hard of a choice to make IMO. llvm-svn: 223813	2014-12-09 19:21:16 +00:00
Frederic Riss	35f0a9aeba	Remove unneeded curly braces. llvm-svn: 223809	2014-12-09 18:57:39 +00:00
Frederic Riss	ff58fd207e	Reorder the code to avoid inserting at the beginning of a vector. As per dblaikie suggestion, thanks\! llvm-svn: 223808	2014-12-09 18:57:34 +00:00
Duncan P. N. Exon Smith	5bf8fef580	IR: Split Metadata from Value Split `Metadata` away from the `Value` class hierarchy, as part of PR21532. Assembly and bitcode changes are in the wings, but this is the bulk of the change for the IR C++ API. I have a follow-up patch prepared for `clang`. If this breaks other sub-projects, I apologize in advance :(. Help me compile it on Darwin I'll try to fix it. FWIW, the errors should be easy to fix, so it may be simpler to just fix it yourself. This breaks the build for all metadata-related code that's out-of-tree. Rest assured the transition is mechanical and the compiler should catch almost all of the problems. Here's a quick guide for updating your code: - `Metadata` is the root of a class hierarchy with three main classes: `MDNode`, `MDString`, and `ValueAsMetadata`. It is distinct from the `Value` class hierarchy. It is typeless -- i.e., instances do not have a `Type`. - `MDNode`'s operands are all `Metadata ` (instead of `Value `). - `TrackingVH<MDNode>` and `WeakVH` referring to metadata can be replaced with `TrackingMDNodeRef` and `TrackingMDRef`, respectively. If you're referring solely to resolved `MDNode`s -- post graph construction -- just use `MDNode`. - `MDNode` (and the rest of `Metadata`) have only limited support for `replaceAllUsesWith()`. As long as an `MDNode` is pointing at a forward declaration -- the result of `MDNode::getTemporary()` -- it maintains a side map of its uses and can RAUW itself. Once the forward declarations are fully resolved RAUW support is dropped on the ground. This means that uniquing collisions on changing operands cause nodes to become "distinct". (This already happened fairly commonly, whenever an operand went to null.) If you're constructing complex (non self-reference) `MDNode` cycles, you need to call `MDNode::resolveCycles()` on each node (or on a top-level node that somehow references all of the nodes). Also, don't do that. Metadata cycles (and the RAUW machinery needed to construct them) are expensive. - An `MDNode` can only refer to a `Constant` through a bridge called `ConstantAsMetadata` (one of the subclasses of `ValueAsMetadata`). As a side effect, accessing an operand of an `MDNode` that is known to be, e.g., `ConstantInt`, takes three steps: first, cast from `Metadata` to `ConstantAsMetadata`; second, extract the `Constant`; third, cast down to `ConstantInt`. The eventual goal is to introduce `MDInt`/`MDFloat`/etc. and have metadata schema owners transition away from using `Constant`s when the type isn't important (and they don't care about referring to `GlobalValue`s). In the meantime, I've added transitional API to the `mdconst` namespace that matches semantics with the old code, in order to avoid adding the error-prone three-step equivalent to every call site. If your old code was: MDNode N = foo(); bar(isa <ConstantInt>(N->getOperand(0))); baz(cast <ConstantInt>(N->getOperand(1))); bak(cast_or_null <ConstantInt>(N->getOperand(2))); bat(dyn_cast <ConstantInt>(N->getOperand(3))); bay(dyn_cast_or_null<ConstantInt>(N->getOperand(4))); you can trivially match its semantics with: MDNode N = foo(); bar(mdconst::hasa <ConstantInt>(N->getOperand(0))); baz(mdconst::extract <ConstantInt>(N->getOperand(1))); bak(mdconst::extract_or_null <ConstantInt>(N->getOperand(2))); bat(mdconst::dyn_extract <ConstantInt>(N->getOperand(3))); bay(mdconst::dyn_extract_or_null<ConstantInt>(N->getOperand(4))); and when you transition your metadata schema to `MDInt`: MDNode N = foo(); bar(isa <MDInt>(N->getOperand(0))); baz(cast <MDInt>(N->getOperand(1))); bak(cast_or_null <MDInt>(N->getOperand(2))); bat(dyn_cast <MDInt>(N->getOperand(3))); bay(dyn_cast_or_null<MDInt>(N->getOperand(4))); - A `CallInst` -- specifically, intrinsic instructions -- can refer to metadata through a bridge called `MetadataAsValue`. This is a subclass of `Value` where `getType()->isMetadataTy()`. `MetadataAsValue` is the only class that can legally refer to a `LocalAsMetadata`, which is a bridged form of non-`Constant` values like `Argument` and `Instruction`. It can also refer to any other `Metadata` subclass. (I'll break all your testcases in a follow-up commit, when I propagate this change to assembly.) llvm-svn: 223802	2014-12-09 18:38:53 +00:00
Frederic Riss	7c78db5065	Correctly handle complex locations expressions in replaceDbgDeclareForAlloca() replaceDbgDeclareForAlloca() replaces an alloca by a value storing the address of what was the alloca. If there is a dbg.declare corresponding to that alloca, we need to lower it to a dbg.value describing the additional dereference operation to be performed to get to the underlying variable. This is done by adding a DW_OP_deref to the complex location part of the location description. This deref was added to the end of the operation list, which is wrong. The expression applies to what is described by the dbg.{declare,value}, and as we are changing this, we need to apply the DW_OP_deref as the first operation in the list. Part of the fix for rdar://19162268. llvm-svn: 223799	2014-12-09 17:55:48 +00:00
Juergen Ributzka	194350a936	Revert "Move function to obtain branch weights into the BranchInst class. NFC." This reverts commit r223784 and copies the 'ExtractBranchMetadata' to CodeGenPrepare. llvm-svn: 223795	2014-12-09 17:32:12 +00:00
Juergen Ributzka	e2aa3aa38a	Move function to obtain branch weights into the BranchInst class. NFC. Make this function available to other parts of LLVM. llvm-svn: 223784	2014-12-09 16:36:06 +00:00
Chandler Carruth	7415205113	Teach instcombine to canonicalize "element extraction" from a load of an integer and "element insertion" into a store of an integer into actual element extraction, element insertion, and vector loads and stores. Previously various parts of LLVM (including instcombine itself) would introduce integer loads and stores into the code as a way of opaquely loading and storing "bits". In some cases (such as a memcpy of std::complex<float> object) we will eventually end up using those bits in non-integer types. In order for SROA to effectively promote the allocas involved, it splits these "store a bag of bits" integer loads and stores up into the constituent parts. However, for non-alloca loads and tsores which remain, it uses integer math to recombine the values into a large integer to load or store. All of this would be "fine", except that it forces LLVM to go through integer math to combine and split up values. While this makes perfect sense for integers (and in fact is critical for bitfields to end up lowering efficiently) it is terrible for non-integer types, especially floating point types. We have a much more canonical way of representing the act of concatenating the bits of two SSA values in LLVM: a vector and insertelement. This patch teaching InstCombine to use this representation. With this patch applied, LLVM will no longer introduce integer math into the critical path of every loop over std::complex<float> operations such as those that make up the hot path of ... oh, most HPC code, Eigen, and any other heavy linear algebra library. For the record, I looked extensively at fixing this in other parts of the compiler, but it just doesn't work: - We really do want to canonicalize memcpy and other bit-motion to integer loads and stores. SSA values are tremendously more powerful than "copy" intrinsics. Not doing this regresses massive amounts of LLVM's scalar optimizer. - We really do need to split up integer loads and stores of this form in SROA or every memcpy of a trivially copyable struct will prevent SSA formation of the members of that struct. It essentially turns off SROA. - The closest alternative is to actually split the loads and stores when partitioning with SROA, but this has all of the downsides historically discussed of splitting up loads and stores -- the wide-store information is fundamentally lost. We would also see performance regressions for bitfield-heavy code and other places where the integers aren't really intended to be split without seemingly arbitrary logic to treat integers totally differently. - We can effectively fix this in instcombine, so it isn't that hard of a choice to make IMO. Differential Revision: http://reviews.llvm.org/D6548 llvm-svn: 223764	2014-12-09 08:55:32 +00:00
Justin Bogner	61ba2e3996	InstrProf: An intrinsic and lowering for instrumentation based profiling Introduce the ``llvm.instrprof_increment`` intrinsic and the ``-instrprof`` pass. These provide the infrastructure for writing counters for profiling, as in clang's ``-fprofile-instr-generate``. The implementation of the instrprof pass is ported directly out of the CodeGenPGO classes in clang, and with the followup in clang that rips that code out to use these new intrinsics this ends up being NFC. Doing the instrumentation this way opens some doors in terms of improving the counter performance. For example, this will make it simple to experiment with alternate lowering strategies, and allows us to try handling profiling specially in some optimizations if we want to. Finally, this drastically simplifies the frontend and puts all of the lowering logic in one place. llvm-svn: 223672	2014-12-08 18:02:35 +00:00
NAKAMURA Takumi	6980404cfe	LLVMInstrumentation requires MC since r223532. llvm-svn: 223573	2014-12-06 02:22:11 +00:00
Duncan P. N. Exon Smith	b236211c4c	Utils: Style cleanups, NFC llvm-svn: 223556	2014-12-06 00:48:17 +00:00
Duncan P. N. Exon Smith	b13f7d2e36	Utils: Avoid RAUW on metadata in CloneFunction() llvm-svn: 223555	2014-12-06 00:48:13 +00:00
Kuba Brecka	1001bb533b	Recommit of r223513 and r223514. Reviewed at http://reviews.llvm.org/D6488 llvm-svn: 223532	2014-12-05 22:19:18 +00:00
Kuba Brecka	086e34bef8	Reverting r223513 and r223514. llvm-svn: 223520	2014-12-05 21:32:46 +00:00
Peter Collingbourne	0826e60748	[DFSAN][MIPS][LLVM] Defining ShadowPtrMask variable for MIPS64 Patch by Kumar Sukhani! corresponding compiler-rt patch: http://reviews.llvm.org/D6437 clang patch: http://reviews.llvm.org/D6147 Differential Revision: http://reviews.llvm.org/D6459 llvm-svn: 223516	2014-12-05 21:22:32 +00:00
Kuba Brecka	1e21378a37	AddressSanitizer - Don't instrument globals from cstring_literals sections. (llvm part) Reviewed at http://reviews.llvm.org/D6488 llvm-svn: 223513	2014-12-05 21:04:43 +00:00
Evgeniy Stepanov	d85ddee01d	[msan] Avoid extra origin address realignment. Do not realign origin address if the corresponding application address is at least 4-byte-aligned. Saves 2.5% code size in track-origins mode. llvm-svn: 223464	2014-12-05 14:34:03 +00:00
Simon Pilgrim	be24ab367b	[InstCombine] Minor optimization for bswap with binary ops Added instcombine optimizations for BSWAP with AND/OR/XOR ops: OP( BSWAP(x), BSWAP(y) ) -> BSWAP( OP(x, y) ) OP( BSWAP(x), CONSTANT ) -> BSWAP( OP(x, BSWAP(CONSTANT) ) ) Since its just a one liner, I've also added BSWAP to the DAGCombiner equivalent as well: fold (OP (bswap x), (bswap y)) -> (bswap (OP x, y)) Refactored bswap-fold tests to use FileCheck instead of just checking that the bswaps had gone. Differential Revision: http://reviews.llvm.org/D6407 llvm-svn: 223349	2014-12-04 09:44:01 +00:00
Kostya Serebryany	543f3db572	[msan] allow -fsanitize-coverage=N together with -fsanitize=memory, llvm part llvm-svn: 223312	2014-12-03 23:28:26 +00:00
Matthias Braun	395a82f6cc	correct spelling, NFC llvm-svn: 223274	2014-12-03 22:10:39 +00:00
Matthias Braun	d34e4d2354	[SimplifyLibCalls] Improve double->float shrinking to consider constants This allows cases like float x; fmin(1.0, x); to be optimized to fminf(1.0f, x); rdar://19049359 Differential Revision: http://reviews.llvm.org/D6496 llvm-svn: 223270	2014-12-03 21:46:33 +00:00
Matthias Braun	892c923c46	[SimplifyLibCalls] Enable double to float shrinking for copysign rdar://19049359 Differential Revision: http://reviews.llvm.org/D6495 llvm-svn: 223269	2014-12-03 21:46:29 +00:00
Evgeniy Stepanov	2e5a1f1c9c	msan] Add compile-time checks for missing origins. This change makes MemorySanitizer instrumentation a bit more strict about instructions that have no origin id assigned to them. This would have caught the bug that was fixed in r222918. This is re-commit of r222997, reverted in r223211, with 3 more missing origins added. llvm-svn: 223236	2014-12-03 14:15:53 +00:00
Erik Eckstein	d181752be0	InstCombine: simplify signed range checks Try to convert two compares of a signed range check into a single unsigned compare. Examples: (icmp sge x, 0) & (icmp slt x, n) --> icmp ult x, n (icmp slt x, 0) \| (icmp sgt x, n) --> icmp ugt x, n llvm-svn: 223224	2014-12-03 10:39:15 +00:00
Nick Lewycky	a4acb44995	Revert r222997. The newly added compile-time checks are finding missing origins, testcase is being reduced and a PR will be posted shortly. llvm-svn: 223211	2014-12-03 05:47:00 +00:00
Duncan P. N. Exon Smith	a48bd07e5e	LoopVectorize: Remove unnecessary RAUW Remove an unnecessary `MDNode::replaceAllUsesWith()`. In the preceding line, `TheLoop->setLoopID()` visits all backedges and sets the new loop ID. This sufficiently updates the loop metadata. Metadata RAUW is going away as part of PR21532. llvm-svn: 223210	2014-12-03 05:41:20 +00:00
Tom Stellard	1f0dded057	StructurizeCFG: Use LoopInfo analysis for better loop detection We were assuming that each back-edge in a region represented a unique loop, which is not always the case. We need to use LoopInfo to correctly determine which back-edges are loops. llvm-svn: 223199	2014-12-03 04:28:32 +00:00
Nick Lewycky	2e8a6219fc	Emit the entry block first and the exit block second, then all the blocks in between afterwards. This is what gcc always does, and some out of tree tools depend on that. llvm-svn: 223193	2014-12-03 02:45:01 +00:00
Peter Collingbourne	51d2de7b9e	Prologue support Patch by Ben Gamari! This redefines the `prefix` attribute introduced previously and introduces a `prologue` attribute. There are a two primary usecases that these attributes aim to serve, 1. Function prologue sigils 2. Function hot-patching: Enable the user to insert `nop` operations at the beginning of the function which can later be safely replaced with a call to some instrumentation facility 3. Runtime metadata: Allow a compiler to insert data for use by the runtime during execution. GHC is one example of a compiler that needs this functionality for its tables-next-to-code functionality. Previously `prefix` served cases (1) and (2) quite well by allowing the user to introduce arbitrary data at the entrypoint but before the function body. Case (3), however, was poorly handled by this approach as it required that prefix data was valid executable code. Here we redefine the notion of prefix data to instead be data which occurs immediately before the function entrypoint (i.e. the symbol address). Since prefix data now occurs before the function entrypoint, there is no need for the data to be valid code. The previous notion of prefix data now goes under the name "prologue data" to emphasize its duality with the function epilogue. The intention here is to handle cases (1) and (2) with prologue data and case (3) with prefix data. References ---------- This idea arose out of discussions[1] with Reid Kleckner in response to a proposal to introduce the notion of symbol offsets to enable handling of case (3). [1] http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-May/073235.html Test Plan: testsuite Differential Revision: http://reviews.llvm.org/D6454 llvm-svn: 223189	2014-12-03 02:08:38 +00:00
Michael Zolotukhin	ea8327b80f	PR21302. Vectorize only bottom-tested loops. rdar://problem/18886083 llvm-svn: 223171	2014-12-02 22:59:06 +00:00
Philip Reames	1a1bdb22bf	[Statepoints 3/4] Statepoint infrastructure for garbage collection: SelectionDAGBuilder This is the third patch in a small series. It contains the CodeGen support for lowering the gc.statepoint intrinsic sequences (223078) to the STATEPOINT pseudo machine instruction (223085). The change also includes the set of helper routines and classes for working with gc.statepoints, gc.relocates, and gc.results since the lowering code uses them. With this change, gc.statepoints should be functionally complete. The documentation will follow in the fourth change, and there will likely be some cleanup changes, but interested parties can start experimenting now. I'm not particularly happy with the amount of code or complexity involved with the lowering step, but at least it's fairly well isolated. The statepoint lowering code is split into it's own files and anyone not working on the statepoint support itself should be able to ignore it. During the lowering process, we currently spill aggressively to stack. This is not entirely ideal (and we have plans to do better), but it's functional, relatively straight forward, and matches closely the implementations of the patchpoint intrinsics. Most of the complexity comes from trying to keep relocated copies of values in the same stack slots across statepoints. Doing so avoids the insertion of pointless load and store instructions to reshuffle the stack. The current implementation isn't as effective as I'd like, but it is functional and 'good enough' for many common use cases. In the long term, I'd like to figure out how to integrate the statepoint lowering with the register allocator. In principal, we shouldn't need to eagerly spill at all. The register allocator should do any spilling required and the statepoint should simply record that fact. Depending on how challenging that turns out to be, we may invest in a smarter global stack slot assignment mechanism as a stop gap measure. Reviewed by: atrick, ributzka llvm-svn: 223137	2014-12-02 18:50:36 +00:00
Bruno Cardoso Lopes	15520db9ad	[SwitchLowering] Handle destinations on multiple phi instructions Follow up from r222926. Also handle multiple destinations from merged cases on multiple and subsequent phi instructions. rdar://problem/19106978 llvm-svn: 223135	2014-12-02 18:31:53 +00:00
Bruno Cardoso Lopes	d035fbb96f	[LICM] Avoind store sinking if no preheader is available Load instructions are inserted into loop preheaders when sinking stores and later removed if not used by the SSA updater. Avoid sinking if the loop has no preheader and avoid crashes. This fixes one more side effect of not handling indirectbr instructions properly on LoopSimplify. llvm-svn: 223119	2014-12-02 14:22:34 +00:00
Hans Wennborg	5bef5b522b	Revert r223049, r223050 and r223051 while investigating test failures. I didn't foresee affecting the Clang test suite :/ llvm-svn: 223054	2014-12-01 17:36:43 +00:00
Hans Wennborg	269ebb612e	SimplifyCFG: Omit range checks for switch lookup tables when default is unreachable They would get optimized away later, but we might as well not emit them. llvm-svn: 223051	2014-12-01 17:08:38 +00:00
Hans Wennborg	5a1e5c05d8	SimplifyCFG: don't remove unreachable default switch destinations An unreachable default destination can be exploited by other optimizations, and SDag lowering is now prepared to handle them efficiently. For example, branches to the unreachable destination will be optimized away, such as in the case of range checks for switch lookup tables. On 64-bit Linux, this reduces the size of a clang bootstrap by 80 kB (and Chromium by 30 kB). llvm-svn: 223050	2014-12-01 17:08:35 +00:00
Evgeniy Stepanov	a056ac8a98	[msan] Add compile-time checks for missing origins. This change makes MemorySanitizer instrumentation a bit more strict about instructions that have no origin id assigned to them. This would have caught the bug that was fixed in r222918. No functional change. llvm-svn: 222997	2014-12-01 09:53:51 +00:00

1 2 3 4 5 ...

12398 Commits