llvm-project

Commit Graph

Author	SHA1	Message	Date
Kevin Qin	a998735def	Run LICM pass after loop unrolling pass. Runtime unrollng will introduce a runtime check in loop prologue. If the unrolled loop is a inner loop, then the proglogue will be inside the outer loop. LICM pass can help to promote the runtime check out if the checked value is loop invariant. llvm-svn: 231630	2015-03-09 06:14:07 +00:00
Mehdi Amini	eb242a5041	InstCombine: fix fold "fcmp x, undef" to account for NaN Summary: See the two test cases. ; Can fold fcmp with undef on one side by choosing NaN for the undef ; Can fold fcmp with undef on both side ; fcmp u_pred undef, undef -> true ; fcmp o_pred undef, undef -> false ; because whatever you choose for the first undef ; you can choose NaN for the other undef Reviewers: hfinkel, chandlerc, majnemer Reviewed By: majnemer Subscribers: majnemer, llvm-commits Differential Revision: http://reviews.llvm.org/D7617 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 231626	2015-03-09 03:20:25 +00:00
Owen Anderson	7e621e9d5e	Teach DataLayout to infer a plausible alignment for things even when nothing is specified by the user. llvm-svn: 231613	2015-03-08 21:53:59 +00:00
Olivier Sallenave	049d803ce0	Do not restrict interleaved unrolling to small loops, depending on the target. llvm-svn: 231528	2015-03-06 23:12:04 +00:00
Karthik Bhat	88db86dd29	Add a new pass "Loop Interchange" This pass interchanges loops to provide a more cache-friendly memory access. For e.g. given a loop like - for(int i=0;i<N;i++) for(int j=0;j<N;j++) A[j][i] = A[j][i]+B[j][i]; is interchanged to - for(int j=0;j<N;j++) for(int i=0;i<N;i++) A[j][i] = A[j][i]+B[j][i]; This pass is currently disabled by default. To give a brief introduction it consists of 3 stages- LoopInterchangeLegality : Checks the legality of loop interchange based on Dependency matrix. LoopInterchangeProfitability: A very basic heuristic has been added to check for profitibility. This will evolve over time. LoopInterchangeTransform : Which does the actual transform. LNT Performance tests shows improvement in Polybench/linear-algebra/kernels/mvt and Polybench/linear-algebra/kernels/gemver becnmarks. TODO: 1) Add support for reductions and lcssa phi. 2) Improve profitability model. 3) Improve loop selection algorithm to select best loop for interchange. Currently the innermost loop is selected for interchange. 4) Improve compile time regression found in llvm lnt due to this pass. 5) Fix issues in Dependency Analysis module. A special thanks to Hal for reviewing this code. Review: http://reviews.llvm.org/D7499 llvm-svn: 231458	2015-03-06 10:11:25 +00:00
Michael Gottesman	f6bcb81000	[objc-arc] Remove annotations code. It will always be in the history if it is needed again. Now it is just dead code. llvm-svn: 231435	2015-03-06 00:34:29 +00:00
Nadav Rotem	c99a38796c	Teach ComputeNumSignBits about signed reminder. This optimization a continuation of r231140 that reasoned about signed div. llvm-svn: 231433	2015-03-06 00:23:58 +00:00
Philip Reames	e21ce4540c	[RewriteStatepointsForGC] Yet more test cases for relocation At this point, we should have decent coverage of the involved code. I've got a few more test cases to cleanup and submit, but what's here is already reasonable. I've got a collection of liveness tests which will be posted for review along with a decent liveness algorithm in the next few days. Once those are in, the code in this file should be well tested and I can start renaming things without risk of serious breakage. llvm-svn: 231414	2015-03-05 22:28:06 +00:00
Philip Reames	03ea8642b1	[RewriteStatepointsForGC] Add additional tests around relocation These are focused around the actual relocation rewriting itself, not the rest of the infrastructure. llvm-svn: 231399	2015-03-05 19:52:13 +00:00
Michael Kuperstein	bcb26d6880	[InstCombine] Fix an assertion when fmul has a ConstantExpr operand isNormalFp and isFiniteNonZeroFp should not assume vector operands can not be constant expressions. Patch by Pawel Jurek <pawel.jurek@intel.com> Differential Revision: http://reviews.llvm.org/D8053 llvm-svn: 231359	2015-03-05 08:38:57 +00:00
Mehdi Amini	46a43556db	Make DataLayout Non-Optional in the Module Summary: DataLayout keeps the string used for its creation. As a side effect it is no longer needed in the Module. This is "almost" NFC, the string is no longer canonicalized, you can't rely on two "equals" DataLayout having the same string returned by getStringRepresentation(). Get rid of DataLayoutPass: the DataLayout is in the Module The DataLayout is "per-module", let's enforce this by not duplicating it more than necessary. One more step toward non-optionality of the DataLayout in the module. Make DataLayout Non-Optional in the Module Module->getDataLayout() will never returns nullptr anymore. Reviewers: echristo Subscribers: resistor, llvm-commits, jholewinski Differential Revision: http://reviews.llvm.org/D7992 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 231270	2015-03-04 18:43:29 +00:00
Philip Reames	6da37857d1	[RewriteStatepointsForGC] Fix a relocation bug w.r.t values defined by invoke instructions RewriteStatepointsForGC pass emits an alloca for each GC pointer which will be relocated. It then inserts stores after def and all relocations, and inserts loads before each use as well. In the end, mem2reg is used to update IR with relocations in SSA form. However, there is a problem with inserting stores for values defined by invoke instructions. The code didn't expect a def was a terminator instruction, and inserting instructions after these terminators resulted in malformed IR. This patch fixes this problem by handling invoke instructions as a special case. If the def is an invoke instruction, the store will be inserted at the beginning of the normal destination block. Since return value from invoke instruction does not dominate the unwind destination block, no action is needed there. Patch by: Chen Li Differential Revision: http://reviews.llvm.org/D7923 llvm-svn: 231183	2015-03-04 00:13:52 +00:00
David Majnemer	1bacc0abc9	InstCombine: Ensure select condition types are identical before merging Selection conditions may be vectors or scalars. Make sure InstCombine doesn't indiscriminately assume that a select which is value dependent on another select have identical select condition types. This fixes PR22773. llvm-svn: 231156	2015-03-03 22:40:36 +00:00
Nadav Rotem	029c5c7fdb	Teach ComputeNumSignBits about signed divisions. http://reviews.llvm.org/D8028 rdar://20023136 llvm-svn: 231140	2015-03-03 21:39:02 +00:00
Duncan P. N. Exon Smith	e274180f0e	DebugInfo: Move new hierarchy into place Move the specialized metadata nodes for the new debug info hierarchy into place, finishing off PR22464. I've done bootstraps (and all that) and I'm confident this commit is NFC as far as DWARF output is concerned. Let me know if I'm wrong :). The code changes are fairly mechanical: - Bumped the "Debug Info Version". - `DIBuilder` now creates the appropriate subclass of `MDNode`. - Subclasses of DIDescriptor now expect to hold their "MD" counterparts (e.g., `DIBasicType` expects `MDBasicType`). - Deleted a ton of dead code in `AsmWriter.cpp` and `DebugInfo.cpp` for printing comments. - Big update to LangRef to describe the nodes in the new hierarchy. Feel free to make it better. Testcase changes are enormous. There's an accompanying clang commit on its way. If you have out-of-tree debug info testcases, I just broke your build. - `upgrade-specialized-nodes.sh` is attached to PR22564. I used it to update all the IR testcases. - Unfortunately I failed to find way to script the updates to CHECK lines, so I updated all of these by hand. This was fairly painful, since the old CHECKs are difficult to reason about. That's one of the benefits of the new hierarchy. This work isn't quite finished, BTW. The `DIDescriptor` subclasses are almost empty wrappers, but not quite: they still have loose casting checks (see the `RETURN_FROM_RAW()` macro). Once they're completely gutted, I'll rename the "MD" classes to "DI" and kill the wrappers. I also expect to make a few schema changes now that it's easier to reason about everything. llvm-svn: 231082	2015-03-03 17:24:31 +00:00
Peter Collingbourne	da2dbf21a9	LowerBitSets: Use byte arrays instead of bit sets to represent in-memory bit sets. By loading from indexed offsets into a byte array and applying a mask, a program can test bits from the bit set with a relatively short instruction sequence. For example, suppose we have 15 bit sets to lay out: A (16 bits), B (15 bits), C (14 bits), D (13 bits), E (12 bits), F (11 bits), G (10 bits), H (9 bits), I (7 bits), J (6 bits), K (5 bits), L (4 bits), M (3 bits), N (2 bits), O (1 bit) These bits can be laid out in a 16-byte array like this: Byte Offset 0123456789ABCDEF Bit 7 HHHHHHHHHIIIIIII 6 GGGGGGGGGGJJJJJJ 5 FFFFFFFFFFFKKKKK 4 EEEEEEEEEEEELLLL 3 DDDDDDDDDDDDDMMM 2 CCCCCCCCCCCCCCNN 1 BBBBBBBBBBBBBBBO 0 AAAAAAAAAAAAAAAA For example, to test bit X of A, we evaluate ((bits[X] & 1) != 0), or to test bit X of I, we evaluate ((bits[9 + X] & 0x80) != 0). This can be done in 1-2 machine instructions on x86, or 4-6 instructions on ARM. This uses the LPT multiprocessor scheduling algorithm to lay out the bits efficiently. Saves ~450KB of instructions in a recent build of Chromium. Differential Revision: http://reviews.llvm.org/D7954 llvm-svn: 231043	2015-03-03 00:49:28 +00:00
Benjamin Kramer	838752d3f6	LoopIdiom: Give globals for memset_pattern16 private linkage. There's really no reason to have them have entries in the symbol table anymore. Old versions of ld64 had some bugs in this area but those have been fixed long ago. llvm-svn: 231041	2015-03-03 00:17:09 +00:00
Sanjoy Das	2d38031271	Revert some changes that were made to fix PR20680. This re-lands change r230921. r230921 was reverted because it broke a clang test; a checkin fixing the clang test will be commited shortly. Summary: As far as I can tell, the real bug causing the issue was fixed in r230533. SCEVExpander should mark an increment operation as nuw or nsw only if it can prove that the operation does not overflow. There shouldn't be any situation where we have to do something different because of no-wrap flags generated by SCEVExpander. Revert "IndVarSimplify: Allow LFTR to fire more often" This reverts commit 1ade0f0faa98877b688e0b9da58e876052c1e04e (SVN: 222213). Revert "IndVarSimplify: Don't let LFTR compare against a poison value" This reverts commit c0f2b8b528d8a37b0a1522aae90af649d6357eb5 (SVN: 217102). Reviewers: majnemer, atrick, spatel Differential Revision: http://reviews.llvm.org/D7979 llvm-svn: 231018	2015-03-02 21:41:07 +00:00
NAKAMURA Takumi	0cd23c842e	Revert r230921, "Revert some changes that were made to fix PR20680.", for now. It caused a failure on clang/test/Misc/backend-optimization-failure.cpp . llvm-svn: 230929	2015-03-02 01:14:03 +00:00
Sanjoy Das	876bd51486	Revert some changes that were made to fix PR20680. Summary: As far as I can tell, the real bug causing the issue was fixed in r230533. SCEVExpander should mark an increment operation as nuw or nsw only if it can prove that the operation does not overflow. There shouldn't be any situation where we have to do something different because of no-wrap flags generated by SCEVExpander. Revert "IndVarSimplify: Allow LFTR to fire more often" This reverts commit 1ade0f0faa98877b688e0b9da58e876052c1e04e (SVN: 222213). Revert "IndVarSimplify: Don't let LFTR compare against a poison value" This reverts commit c0f2b8b528d8a37b0a1522aae90af649d6357eb5 (SVN: 217102). Reviewers: majnemer, atrick, spatel Differential Revision: http://reviews.llvm.org/D7979 llvm-svn: 230921	2015-03-01 23:36:26 +00:00
Duncan P. N. Exon Smith	d0c2a99f0e	DebugInfo: Convert DW_OP_piece => DW_OP_bit_piece r228631 stopped using `DW_OP_piece` inside `DIExpression`s in the IR, but it apparently missed updating these testcases. Caught by verifier checks for `MDExpression` while working on moving the new hierarchy into place. llvm-svn: 230882	2015-02-28 23:57:16 +00:00
Duncan P. N. Exon Smith	a951165e5a	Fix line endings on Transforms/Inline/inline_dbg_declare.ll llvm-svn: 230870	2015-02-28 21:38:32 +00:00
Benjamin Kramer	cb570f1bc9	TRE: Just erase dead BBs and tweak the iteration loop not to increment the deleted BB iterator. Leaving empty blocks around just opens up a can of bugs like PR22704. Deleting them early also slightly simplifies code. Thanks to Sanjay for the IR test case. llvm-svn: 230856	2015-02-28 16:47:27 +00:00
Philip Reames	2e5bcbe8d5	[RewriteStatepointsForGC] Fix another order of iteration bug It turns out the naming of inserted phis and selects is sensative to the order in which two sets are iterated. We need to nail this down to avoid non-deterministic output and possible test failures. The modified test is the one I first noticed something odd in. The change is making it more strict to report the error. With the test change, but without the code change, the test fails roughly 1 in 5. With the code change, I've run ~30 runs without error. Long term, the right fix here is to adjust the naming scheme. I'm checking in this hack to avoid any possible non-determinism in the tests over the weekend. HJust because I only noticed one case doesn't mean it's actually the only case. I hope to get to the right change Monday. std->llvm data structure changes bugfix change #3 llvm-svn: 230835	2015-02-28 01:52:09 +00:00
Philip Reames	a5aeaf4b4f	[RewriteStatepointsForGC] Add tests for the base pointer identification algorithm These tests cover the 'base object' identification and rewritting portion of RewriteStatepointsForGC. These aren't completely exhaustive, but they've proven to be reasonable effective over time at finding regressions. In the process of porting these tests over, I found my first "cleanup per llvm code style standards" bug. We were relying on the order of iteration when testing the base pointers found for a derived pointer. When we switched from std::set to DenseSet, this stopped being a safe assumption. I'm suspecting I'm going to find more of those. In particular, I'm now really wondering about the main iteration loop for this algorithm. I need to go take a closer look at the assumptions there. I'm not really happy with the fact these are testing what is essentially debug output (i.e. enabled via command line flags). Suggestions for how to structure this better are very welcome. llvm-svn: 230818	2015-02-28 00:20:48 +00:00
David Blaikie	a79ac14fa6	[opaque pointer type] Add textual IR support for explicit type parameter to load instruction Essentially the same as the GEP change in r230786. A similar migration script can be used to update test cases, though a few more test case improvements/changes were required this time around: (r229269-r229278) import fileinput import sys import re pat = re.compile(r"((?:=\|:\|^)\sload (?:atomic )?(?:volatile )?(.?))(\| addrspace$\d+$ )\($\| (?:%\|@\|null\|undef\|blockaddress\|getelementptr\|addrspacecast\|bitcast\|inttoptr\|\[\[[a-zA-Z]\|\{\{).$)") for line in sys.stdin: sys.stdout.write(re.sub(pat, r"\1, \2\3*\4", line)) Reviewers: rafael, dexonsmith, grosser Differential Revision: http://reviews.llvm.org/D7649 llvm-svn: 230794	2015-02-27 21:17:42 +00:00
David Blaikie	79e6c74981	[opaque pointer type] Add textual IR support for explicit type parameter to getelementptr instruction One of several parallel first steps to remove the target type of pointers, replacing them with a single opaque pointer type. This adds an explicit type parameter to the gep instruction so that when the first parameter becomes an opaque pointer type, the type to gep through is still available to the instructions. * This doesn't modify gep operators, only instructions (operators will be handled separately) * Textual IR changes only. Bitcode (including upgrade) and changing the in-memory representation will be in separate changes. * geps of vectors are transformed as: getelementptr <4 x float> %x, ... ->getelementptr float, <4 x float> %x, ... Then, once the opaque pointer type is introduced, this will ultimately look like: getelementptr float, <4 x ptr> %x with the unambiguous interpretation that it is a vector of pointers to float. * address spaces remain on the pointer, not the type: getelementptr float addrspace(1)* %x ->getelementptr float, float addrspace(1)* %x Then, eventually: getelementptr float, ptr addrspace(1) %x Importantly, the massive amount of test case churn has been automated by same crappy python code. I had to manually update a few test cases that wouldn't fit the script's model (r228970,r229196,r229197,r229198). The python script just massages stdin and writes the result to stdout, I then wrapped that in a shell script to handle replacing files, then using the usual find+xargs to migrate all the files. update.py: import fileinput import sys import re ibrep = re.compile(r"(^.?[^%\w]getelementptr inbounds )(((?:<\d x )?)(.?)(\| addrspace$\d$) \(\|>)(?:$\| (?:%\|@\|null\|undef\|blockaddress\|getelementptr\|addrspacecast\|bitcast\|inttoptr\|\[\[[a-zA-Z]\|\{\{).$))") normrep = re.compile( r"(^.?[^%\w]getelementptr )(((?:<\d* x )?)(.?)(\| addrspace$\d$) \(\|>)(?:$\| (?:%\|@\|null\|undef\|blockaddress\|getelementptr\|addrspacecast\|bitcast\|inttoptr\|\[\[[a-zA-Z]\|\{\{).$))") def conv(match, line): if not match: return line line = match.groups()[0] if len(match.groups()[5]) == 0: line += match.groups()[2] line += match.groups()[3] line += ", " line += match.groups()[1] line += "\n" return line for line in sys.stdin: if line.find("getelementptr ") == line.find("getelementptr inbounds"): if line.find("getelementptr inbounds") != line.find("getelementptr inbounds ("): line = conv(re.match(ibrep, line), line) elif line.find("getelementptr ") != line.find("getelementptr ("): line = conv(re.match(normrep, line), line) sys.stdout.write(line) apply.sh: for name in "$@" do python3 `dirname "$0"`/update.py < "$name" > "$name.tmp" && mv "$name.tmp" "$name" rm -f "$name.tmp" done The actual commands: From llvm/src: find test/ -name .ll \| xargs ./apply.sh From llvm/src/tools/clang: find test/ -name .mm -o -name .m -o -name .cpp -o -name .c \| xargs -I '{}' ../../apply.sh "{}" From llvm/src/tools/polly: find test/ -name *.ll \| xargs ./apply.sh After that, check-all (with llvm, clang, clang-tools-extra, lld, compiler-rt, and polly all checked out). The extra 'rm' in the apply.sh script is due to a few files in clang's test suite using interesting unicode stuff that my python script was throwing exceptions on. None of those files needed to be migrated, so it seemed sufficient to ignore those cases. Reviewers: rafael, dexonsmith, grosser Differential Revision: http://reviews.llvm.org/D7636 llvm-svn: 230786	2015-02-27 19:29:02 +00:00
Sanjoy Das	54ad996ca2	IRCE: add a test case for r230619. llvm-svn: 230680	2015-02-26 20:14:32 +00:00
Hal Finkel	221f467185	[InstCombine/PowerPC] Convert aligned QPX load/store intrinsics into loads/stores InstCombine has long had logic to convert aligned Altivec load/store intrinsics into regular loads and stores. This mirrors that functionality for QPX vector load/store intrinsics. llvm-svn: 230660	2015-02-26 18:56:03 +00:00
Hal Finkel	18ee7c14fd	[InstCombine] Add a test for altivec load/store intrinsic simplification InstCombine has logic to convert aligned Altivec load/store intrinsics into regular loads and stores. Unfortunately, there seems to be no regression test covering this behavior. Adding one... llvm-svn: 230632	2015-02-26 14:22:41 +00:00
Sanjoy Das	e75ed92630	IRCE: generalize to handle loops with decreasing induction variables. IRCE can now split the iteration space for loops like: for (i = n; i >= 0; i--) a[i + k] = 42; // bounds check on access llvm-svn: 230618	2015-02-26 08:19:31 +00:00
Ramkumar Ramachandra	3408f3e296	PlaceSafepoints: use IRBuilder helpers Use the IRBuilder helpers for gc.statepoint and gc.result, instead of coding the construction by hand. Note that the gc.statepoint IRBuilder handles only CallInst, not InvokeInst; retain that part of hand-coding. Differential Revision: http://reviews.llvm.org/D7518 llvm-svn: 230591	2015-02-26 00:35:56 +00:00
Sanjay Patel	cc29f4f2cb	only propagate equality comparisons of FP values that we are certain are non-zero This is a follow-on to r227491 which tightens the check for propagating FP values. If a non-constant value happens to be a zero, we would hit the same bug as before. Bug noted and patch suggested by Eli Friedman. llvm-svn: 230564	2015-02-25 22:46:08 +00:00
JF Bastien	d52c990a90	InstCombine: extract instead of shuffle when performing vector/array type punning Summary: SROA generates code that isn't quite as easy to optimize and contains unusual-sized shuffles, but that code is generally correct. As discussed in D7487 the right place to clean things up is InstCombine, which will pick up the type-punning pattern and transform it into a more obvious bitcast+extractelement, while leaving the other patterns SROA encounters as-is. Test Plan: make check Reviewers: jvoung, chandlerc Subscribers: llvm-commits llvm-svn: 230560	2015-02-25 22:30:51 +00:00
Peter Collingbourne	eba7f73ff9	LowerBitSets: Align referenced globals. This change aligns globals to the next highest power of 2 bytes, up to a maximum of 128. This makes it more likely that we will be able to compress bit sets with a greater alignment. In many more cases, we can now take advantage of a new optimization also introduced in this patch that removes bit set checks if the bit set is all ones. The 128 byte maximum was found to provide the best tradeoff between instruction overhead and data overhead in a recent build of Chromium. It allows us to remove ~2.4MB of instructions at the cost of ~250KB of data. Differential Revision: http://reviews.llvm.org/D7873 llvm-svn: 230540	2015-02-25 20:42:41 +00:00
Sanjoy Das	dcc84db264	Bugfix: SCEVExpander incorrectly marks increment operations as no-wrap (The change was landed in r230280 and caused the regression PR22674. This version contains a fix and a test-case for PR22674). When emitting the increment operation, SCEVExpander marks the operation as nuw or nsw based on the flags on the preincrement SCEV. This is incorrect because, for instance, it is possible that {-6,+,1} is <nuw> while {-6,+,1}+1 = {-5,+,1} is not. This change teaches SCEV to mark the increment as nuw/nsw only if it can explicitly prove that the increment operation won't overflow. Apart from the attached test case, another (more realistic) manifestation of the bug can be seen in Transforms/IndVarSimplify/pr20680.ll. Differential Revision: http://reviews.llvm.org/D7778 llvm-svn: 230533	2015-02-25 20:02:59 +00:00
Sanjay Patel	40eaa8df99	Fix really obscure bug in CannotBeNegativeZero() (PR22688) With a diabolically crafted test case, we could recurse through this code and return true instead of false. The larger engineering crime is the use of magic numbers. Added FIXME comments for those. llvm-svn: 230515	2015-02-25 18:00:15 +00:00
Charles Davis	33d1dc0008	[IC] Turn non-null MD on pointer loads to range MD on integer loads. Summary: This change fixes the FIXME that you recently added when you committed (a modified version of) my patch. When `InstCombine` combines a load and store of an pointer to those of an equivalently-sized integer, it currently drops any `!nonnull` metadata that might be present. This change replaces `!nonnull` metadata with `!range !{ 1, -1 }` metadata instead. Reviewers: chandlerc Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D7621 llvm-svn: 230462	2015-02-25 05:10:25 +00:00
Peter Collingbourne	1baeaa395a	LowerBitSets: Introduce global layout builder. The builder is based on a layout algorithm that tries to keep members of small bit sets together. The new layout compresses Chromium's bit sets to around 15% of their original size. Differential Revision: http://reviews.llvm.org/D7796 llvm-svn: 230394	2015-02-24 23:17:02 +00:00
Hans Wennborg	953d6fb84e	Revert r230280: "Bugfix: SCEVExpander incorrectly marks increment operations as no-wrap" This caused PR22674, failing this assert: Instructions.h:2281: llvm::Value* llvm::PHINode::getOperand(unsigned int) const: Assertion `i_nocapture < OperandTraits<PHINode>::operands(this) && "getOperand() out of range!"' failed. llvm-svn: 230341	2015-02-24 16:19:29 +00:00
Sanjoy Das	82ea3d45b5	New instcombine rule: max(~a,~b) -> ~min(a, b) This case is interesting because ScalarEvolutionExpander lowers min(a, b) as ~max(~a,~b). I think the profitability heuristics can be made more clever/aggressive, but this is a start. Differential Revision: http://reviews.llvm.org/D7821 llvm-svn: 230285	2015-02-24 00:08:41 +00:00
Sanjoy Das	18c243b933	Bugfix: SCEVExpander incorrectly marks increment operations as no-wrap When emitting the increment operation, SCEVExpander marks the operation as nuw or nsw based on the flags on the preincrement SCEV. This is incorrect because, for instance, it is possible that {-6,+,1} is <nuw> while {-6,+,1}+1 = {-5,+,1} is not. This change teaches SCEV to mark the increment as nuw/nsw only if it can explicitly prove that the increment operation won't overflow. Apart from the attached test case, another (more realistic) manifestation of the bug can be seen in Transforms/IndVarSimplify/pr20680.ll. NOTE: this change was landed with an incorrect commit message in rL230275 and was reverted for that reason in rL230279. This commit message is the correct one. Differential Revision: http://reviews.llvm.org/D7778 llvm-svn: 230280	2015-02-23 23:22:58 +00:00
Sanjoy Das	c9cf0151cf	Revert 230275. 230275 got committed with an incorrect commit message due to a mixup on my side. Will re-land in a few moments with the correct commit message. llvm-svn: 230279	2015-02-23 23:13:22 +00:00
Sanjoy Das	913dfd8f7f	Fix bug 22641 The bug was a result of getPreStartForExtend interpreting nsw/nuw flags on an add recurrence more strongly than is legal. {S,+,X}<nsw> implies S+X is nsw only if the backedge of the loop is taken at least once. Differential Revision: http://reviews.llvm.org/D7808 llvm-svn: 230275	2015-02-23 22:55:13 +00:00
Chad Rosier	543900539f	Prevent hoisting fmul from THEN/ELSE to IF if there is fmsub/fmadd opportunity. This patch adds the isProfitableToHoist API. For AArch64, we want to prevent a fmul from being hoisted in cases where it is more profitable to form a fmsub/fmadd. Phabricator Review: http://reviews.llvm.org/D7299 Patch by Lawrence Hu <lawrence@codeaurora.org> llvm-svn: 230241	2015-02-23 19:15:16 +00:00
Mehdi Amini	cd3ca6f7dd	InstSimplify: simplify 0 / X if nnan and nsz From: Fiona Glaser <fglaser@apple.com> llvm-svn: 230238	2015-02-23 18:30:25 +00:00
Sanjoy Das	7fc60da2f5	IRCE: use SCEVs instead of llvm::Value's for intermediate calculations. Semantically non-functional change. This gets rid of some of the SCEV -> Value -> SCEV round tripping and the Construct(SMin\|SMax)Of and MaybeSimplify helper routines. llvm-svn: 230150	2015-02-21 22:07:32 +00:00
Philip Reames	0b1b387441	[PlaceSafepoints] Adjust enablement logic to default to off and be GC configurable per GC Previously, this pass ran over every function in the Module if added to the pass order. With this change, it runs only over those with a GC attribute where the GC explicitly opts in. A GC can also choose which of entry safepoint polls, backedge safepoint polls, and call safepoints it wants. I hope to get these exposed as checks on the GCStrategy at some point, but for now, the checks are manual string comparisons. llvm-svn: 230097	2015-02-21 00:09:09 +00:00
Benjamin Kramer	911d5b3ace	LoopRotate: When reconstructing loop simplify form don't split edges from indirectbrs. Yet another chapter in the endless story. While this looks like we leave the loop in a non-canonical state this replicates the logic in LoopSimplify so it doesn't diverge from the canonical form in any way. PR21968 llvm-svn: 230058	2015-02-20 20:49:25 +00:00
Peter Collingbourne	e6909c8e8b	Introduce bitset metadata format and bitset lowering pass. This patch introduces a new mechanism that allows IR modules to co-operatively build pointer sets corresponding to addresses within a given set of globals. One particular use case for this is to allow a C++ program to efficiently verify (at each call site) that a vtable pointer is in the set of valid vtable pointers for the class or its derived classes. One way of doing this is for a toolchain component to build, for each class, a bit set that maps to the memory region allocated for the vtables, such that each 1 bit in the bit set maps to a valid vtable for that class, and lay out the vtables next to each other, to minimize the total size of the bit sets. The patch introduces a metadata format for representing pointer sets, an '@llvm.bitset.test' intrinsic and an LTO lowering pass that lays out the globals and builds the bitsets, and documents the new feature. Differential Revision: http://reviews.llvm.org/D7288 llvm-svn: 230054	2015-02-20 20:30:47 +00:00
Philip Reames	2ef029c7ae	Bugfix for 229954 Before calling Function::getGC to test for enablement, we need to make sure there's actually a GC at all via Function::hasGC. Otherwise, we'd crash on functions without a GC. Thankfully, this only mattered if you manually scheduled the pass, but still, oops. :( llvm-svn: 230040	2015-02-20 18:56:14 +00:00
Hal Finkel	847e05f569	[InstCombine] Remove unnecessary variable indexing into single-element arrays This change addresses a deficiency pointed out in PR22629. To copy from the bug report: [from the bug report] Consider this code: int f(int x) { int a[] = {12}; return a[x]; } GCC knows to optimize this to movl $12, %eax ret The code generated by recent Clang at -O3 is: movslq %edi, %rax movl .L_ZZ1fiE1a(,%rax,4), %eax retq .L_ZZ1fiE1a: .long 12 # 0xc [end from the bug report] This definitely seems worth fixing. I've also seen this kind of code before (as the base case of generic vector wrapper templates with one element). The general idea is to look at the GEP feeding a load or a store, which has some variable as its first non-zero index, and determine if that index must be zero (or else an out-of-bounds access would occur). We can do this for allocas and globals with constant initializers where we know the maximum size of the underlying object. When we find such a GEP, we create a new one for the memory access with that first variable index replaced with a constant zero. Even if we can't eliminate the memory access (and sometimes we can't), it is still useful because it removes unnecessary indexing calculations. llvm-svn: 229959	2015-02-20 03:05:53 +00:00
Philip Reames	6faacf4772	Adjust enablement of RewriteStatepointsForGC When back merging the changes in 229945 I noticed that I forgot to mark the test cases with the appropriate GC. We want the rewriting to be off by default (even when manually added to the pass order), not on-by default. To keep the current test working, mark them as using the statepoint-example GC and whitelist that GC. Longer term, we need a better selection mechanism here for both actual usage and testing. As I migrate more tests to the in tree version of this pass, I will probably need to update the enable/disable logic as well. llvm-svn: 229954	2015-02-20 02:34:49 +00:00
Philip Reames	d16a9b1fdc	Add a pass for constructing gc.statepoint sequences w/explicit relocations This patch consists of a single pass whose only purpose is to visit previous inserted gc.statepoints which do not have gc.relocates inserted yet, and insert them. This can be used either immediately after IR generation to perform 'early safepoint insertion' or late in the pass order to perform 'late insertion'. This patch is setting the stage for work to continue in tree. In particular, there are known naming and style violations in the current patch. I'll try to get those resolved over the next week or so. As I touch each area to make style changes, I need to make sure we have adequate testing in place. As part of the cleanup, I will be cleaning up a collection of test cases we have out of tree and submitting them upstream. The tests included in this change are very basic and mostly to provide examples of usage. The pass has several main subproblems it needs to address: - First, it has identify any live pointers. In the current code, the use of address spaces to distinguish pointers to GC managed objects is hard coded, but this will become parametrizable in the near future. Note that the current change doesn't actually contain a useful liveness analysis. It was seperated into a followup change as the code wasn't ready to be shared. Instead, the current implementation just considers any dominating def of appropriate pointer type to be live. - Second, it has to identify base pointers for each live pointer. This is a fairly straight forward data flow algorithm. - Third, the information in the previous steps is used to actually introduce rewrites. Rather than trying to do this by hand, we simply re-purpose the code behind Mem2Reg to do this for us. llvm-svn: 229945	2015-02-20 01:06:44 +00:00
Michael Gottesman	0fc2accb58	[objc-arc-contract] We can not move retains over instructions which can not conservatively be proven to not decrement the retain's RCIdentity. I also cleaned up the code to make it more understandable for mere mortals. <rdar://problem/19853758> llvm-svn: 229937	2015-02-20 00:02:49 +00:00
Ahmed Bougacha	db141ac37d	[ARM] Re-re-apply VLD1/VST1 base-update combine. This re-applies r223862, r224198, r224203, and r224754, which were reverted in r228129 because they exposed Clang misalignment problems when self-hosting. The combine caused the crashes because we turned ISD::LOAD/STORE nodes to ARMISD::VLD1/VST1_UPD nodes. When selecting addressing modes, we were very lax for the former, and only emitted the alignment operand (as in "[r1:128]") when it was larger than the standard alignment of the memory type. However, for ARMISD nodes, we just used the MMO alignment, no matter what. In our case, we turned ISD nodes to ARMISD nodes, and this caused the alignment operands to start being emitted. And that's how we exposed alignment problems that were ignored before (but I believe would have been caught with SCTRL.A==1?). To fix this, we can just mirror the hack done for ISD nodes: only take into account the MMO alignment when the access is overaligned. Original commit message: We used to only combine intrinsics, and turn them into VLD1_UPD/VST1_UPD when the base pointer is incremented after the load/store. We can do the same thing for generic load/stores. Note that we can only combine the first load/store+adds pair in a sequence (as might be generated for a v16f32 load for instance), because other combines turn the base pointer addition chain (each computing the address of the next load, from the address of the last load) into independent additions (common base pointer + this load's offset). rdar://19717869, rdar://14062261. llvm-svn: 229932	2015-02-19 23:52:41 +00:00
Rafael Espindola	8c97e19124	Avoid conversion to float when creating ConstantDataArray/ConstantDataVector. Patch by Raoux, Thomas F! llvm-svn: 229864	2015-02-19 16:08:20 +00:00
Igor Laevsky	55d60a4a2f	Add few simple tests to check statepoint placement for invoke instructions. Differential Revision: http://reviews.llvm.org/D7535 llvm-svn: 229842	2015-02-19 11:39:04 +00:00
Chandler Carruth	b89464a9b6	[x86,sdag] Two interrelated changes to the x86 and sdag code. First, don't combine bit masking into vector shuffles (even ones the target can handle) once operation legalization has taken place. Custom legalization of vector shuffles may exist for these patterns (making the predicate return true) but that custom legalization may in some cases produce the exact bit math this matches. We only really want to handle this prior to operation legalization. However, the x86 backend, in a fit of awesome, relied on this. What it would do is mark VSELECTs as expand, which would turn them into arithmetic, which this would then match back into vector shuffles, which we would then lower properly. Amazing. Instead, the second change is to teach the x86 backend to directly form vector shuffles from VSELECT nodes with constant conditions, and to mark all of the vector types we support lowering blends as shuffles as custom VSELECT lowering. We still mark the forms which actually support variable blends as legal so that the custom lowering is bypassed, and the legal lowering can even be used by the vector shuffle legalization (yes, i know, this is confusing. but that's how the patterns are written). This makes the VSELECT lowering much more sensible, and in fact should fix a bunch of bugs with it. However, as you'll see in the test cases, right now what it does is point out the hilarious deficiency of the new vector shuffle lowering when it comes to blends. Fortunately, my very next patch fixes that. I can't submit it yet, because that patch, somewhat obviously, forms the exact and/or pattern that the DAG combine is matching here! Without this patch, teaching the vector shuffle lowering to produce the right code infloops in the DAG combiner. With this patch alone, we produce terrible code but at least lower through the right paths. With both patches, all the regressions here should be fixed, and a bunch of the improvements (like using 2 shufps with no memory loads instead of 2 andps with memory loads and an orps) will stay. Win! There is one other change worth noting here. We had hilariously wrong vectorization cost estimates for vselect because we fell through to the code path that assumed all "expand" vector operations are scalarized. However, the "expand" lowering of VSELECT is vector bit math, most definitely not scalarized. So now we go back to the correct if horribly naive cost of "1" for "not scalarized". If anyone wants to add actual modeling of shuffle costs, that would be cool, but this seems an improvement on its own. Note the removal of 16 and 32 "costs" for doing a blend. Even in SSE2 we can blend in fewer than 16 instructions. ;] Of course, we don't right now because of OMG bad code, but I'm going to fix that. Next patch. I promise. llvm-svn: 229835	2015-02-19 10:36:19 +00:00
Sanjoy Das	11b279a832	Partial fix for bug 22589 Don't spend the entire iteration space in the scalar loop prologue if computing the trip count overflows. This change also gets rid of the backedge check in the prologue loop and the extra check for overflowing trip-count. Differential Revision: http://reviews.llvm.org/D7715 llvm-svn: 229731	2015-02-18 19:32:25 +00:00
Elena Demikhovsky	6ddd998589	Minor fix after 229495. Removed metadata and function attributes from the test. llvm-svn: 229647	2015-02-18 08:09:28 +00:00
Adam Nemet	acd22e1677	[LoopAccesses] Modify test to also check symbolic strides with memchecks See the comment in the code. This is part of the patchset that converts LoopAccessAnalysis into an actual analysis pass. llvm-svn: 229627	2015-02-18 03:43:32 +00:00
Akira Hatanaka	1defd5afbd	[InstCombine] Do not insert a GEP instruction before a landingpad instruction. InstCombiner::visitGetElementPtrInst was using getFirstNonPHI to compute the insertion point, which caused the verifier to complain when a GEP was inserted before a landingpad instruction. This commit fixes it to use getFirstInsertionPt instead. rdar://problem/19394964 llvm-svn: 229619	2015-02-18 03:30:11 +00:00
Hal Finkel	4393559621	[BDCE] Don't forget uses of root instructions seen before the instruction itself When visiting the initial list of "root" instructions (those which must always be alive), for those that are integer-valued (such as invokes returning an integer), we mark their bits as (initially) all dead (we might, obviously, find uses of those bits later, but all bits are assumed dead until proven otherwise). Don't do so, however, if we're already seen a use of those bits by another root instruction (such as a store). Fixes a miscompile of the sanitizer unit tests on x86_64. Also, add a debug line for visiting the root instructions, and remove a debug line which tried to print instructions being removed (printing dead instructions is dangerous, and can sometimes crash). llvm-svn: 229618	2015-02-18 03:12:28 +00:00
Elena Demikhovsky	ef035bb974	Fixed a bug in store sinking. The problem was in store-sink barrier check. Store sink barrier should be checked for ModRef (read-write) mode. http://llvm.org/bugs/show_bug.cgi?id=22613 llvm-svn: 229495	2015-02-17 13:10:05 +00:00
Hal Finkel	2bb61ba2fe	[BDCE] Add a bit-tracking DCE pass BDCE is a bit-tracking dead code elimination pass. It is based on ADCE (the "aggressive DCE" pass), with the added capability to track dead bits of integer valued instructions and remove those instructions when all of the bits are dead. Currently, it does not actually do this all-bits-dead removal, but rather replaces the instruction's uses with a constant zero, and lets instcombine (and the later run of ADCE) do the rest. Because we essentially get a run of ADCE "for free" while tracking the dead bits, we also do what ADCE does and removes actually-dead instructions as well (this includes instructions newly trivially dead because all bits were dead, but not all such instructions can be removed). The motivation for this is a case like: int __attribute__((const)) foo(int i); int bar(int x) { x \|= (4 & foo(5)); x \|= (8 & foo(3)); x \|= (16 & foo(2)); x \|= (32 & foo(1)); x \|= (64 & foo(0)); x \|= (128& foo(4)); return x >> 4; } As it turns out, if you order the bit-field insertions so that all of the dead ones come last, then instcombine will remove them. However, if you pick some other order (such as the one above), the fact that some of the calls to foo() are useless is not locally obvious, and we don't remove them (without this pass). I did a quick compile-time overhead check using sqlite from the test suite (Release+Asserts). BDCE took ~0.4% of the compilation time (making it about twice as expensive as ADCE). I've not looked at why yet, but we eliminate instructions due to having all-dead bits in: External/SPEC/CFP2006/447.dealII/447.dealII External/SPEC/CINT2006/400.perlbench/400.perlbench External/SPEC/CINT2006/403.gcc/403.gcc MultiSource/Applications/ClamAV/clamscan MultiSource/Benchmarks/7zip/7zip-benchmark llvm-svn: 229462	2015-02-17 01:36:59 +00:00
Mehdi Amini	b9a0fa4822	InstCombine: fold more cases of (fp_to_u/sint (u/sint_to_fp val)) Fixes radar 15486701. From: Fiona Glaser <fglaser@apple.com> llvm-svn: 229437	2015-02-16 21:47:54 +00:00
Mehdi Amini	7aab8752ba	Tests: reformat sitofp.ll and use FileCheck From: Fiona Glaser <fglaser@apple.com> llvm-svn: 229436	2015-02-16 21:47:50 +00:00
James Molloy	e32d806b5f	[LoopReroll] Relax some assumptions a little. We won't find a root with index zero in any loop that we are able to reroll. However, we may find one in a non-rerollable loop, so bail gracefully instead of failing hard. llvm-svn: 229406	2015-02-16 17:02:00 +00:00
James Molloy	4c7deb2259	[LoopReroll] Don't crash on dead code If a PHI has no users, don't crash; bail gracefully. This shouldn't happen often, but we can make no guarantees that previous passes didn't leave dead code around. llvm-svn: 229405	2015-02-16 17:01:52 +00:00
David Majnemer	9b529a76e9	IR: Properly return nullptr when getAggregateElement is out-of-bounds We didn't properly handle the out-of-bounds case for ConstantAggregateZero and UndefValue. This would manifest as a crash when the constant folder was asked to fold a load of a constant global whose struct type has no operands. This fixes PR22595. llvm-svn: 229352	2015-02-16 04:02:09 +00:00
David Blaikie	0c1a24026e	FileCheck-ize a test to make it easier to migrate to typeless pointers llvm-svn: 229278	2015-02-15 04:14:00 +00:00
David Blaikie	15a8444d16	Update a test to make it easier to migrate to untyped pointers llvm-svn: 229277	2015-02-15 04:13:58 +00:00
David Blaikie	21caa5c85d	Update a test to use FileCheck so it's easier to migrate to future typeless pointer changes llvm-svn: 229276	2015-02-15 04:13:57 +00:00
David Blaikie	eba8c88a90	Reformat test case to be easier to migrate to typeless pointers. llvm-svn: 229275	2015-02-15 04:13:53 +00:00
Ramkumar Ramachandra	8fcb498a9a	InstCombine: propagate deref via new addDereferenceableAttr The "dereferenceable" attribute cannot be added via .addAttribute(), since it also expects a size in bytes. AttrBuilder#addAttribute or AttributeSet#addAttribute is wrapped by classes Function, InvokeInst, and CallInst. Add corresponding wrappers to AttrBuilder#addDereferenceableAttr. Having done this, propagate the dereferenceable attribute via gc.relocate, adding a test to exercise it. Note that -datalayout is required during execution over and above -instcombine, because InstCombine only optionally requires DataLayoutPass. Differential Revision: http://reviews.llvm.org/D7510 llvm-svn: 229265	2015-02-14 19:37:54 +00:00
Philip Reames	9ae15209ad	[InstCombine] When canonicalizing gep indices, prefer zext when possible If we know that the sign bit of a value being sign extended is zero, we can use a zero extension instead. This is motivated by the fact that zero extensions are generally cheaper on x86 (and most other architectures?). We already apply a similar transform in DAGCombine, this just extends that to the IR level. This comes up when we eagerly canonicalize gep indices to the width of a machine register (i64 on x86_64). To do so, we insert sign extensions (sext) to promote smaller types. Differential Revision: http://reviews.llvm.org/D7255 llvm-svn: 229189	2015-02-14 00:05:36 +00:00
Andrea Di Biagio	30d471f6aa	[InstCombine] Fix regression introduced at r227197. This patch fixes a problem I accidentally introduced in an instruction combine on select instructions added at r227197. That revision taught the instruction combiner how to fold a cttz/ctlz followed by a icmp plus select into a single cttz/ctlz with flag 'is_zero_undef' cleared. However, the new rule added at r227197 would have produced wrong results in the case where a cttz/ctlz with flag 'is_zero_undef' cleared was follwed by a zero-extend or truncate. In that case, the folded instruction would have been inserted in a wrong location thus leaving the CFG in an inconsistent state. This patch fixes the problem and add two reproducible test cases to existing test 'InstCombine/select-cmp-cttz-ctlz.ll'. llvm-svn: 229124	2015-02-13 16:33:34 +00:00
Andrea Di Biagio	b14ae8692d	[CodeGenPrepare] Removed duplicate logic. SimplifyCFG already knows how to speculate calls to cttz/ctlz. SimplifyCFG now knows how to speculate calls to intrinsic cttz/ctlz that are 'cheap' for the target. Therefore, some of the logic in CodeGenPrepare that was originally added at revision 224899 can now be removed. This patch is basically a no functional change. It removes the duplicated logic in CodeGenPrepare and converts all the existing target specific tests for cttz/ctlz into SimplifyCFG tests. Differential Revision: http://reviews.llvm.org/D7608 llvm-svn: 229105	2015-02-13 14:15:48 +00:00
James Molloy	5eb75aced4	[SimplifyCFG] Add test for r229099 Add extra test that was accidentally not staged. llvm-svn: 229101	2015-02-13 11:08:40 +00:00
Chandler Carruth	1fbc316534	[unroll] Concede defeat and disable the unroll analyzer for now. The issues with the new unroll analyzer are more fundamental than code cleanup, algorithm, or data structure changes. I've sent an email to the original commit thread with details and a proposal for how to redesign things. I'm disabling this for now so that we don't spend time debugging issues with it in its current state. llvm-svn: 229064	2015-02-13 05:31:46 +00:00
Michael Liao	d266b928ae	[InstCombine] Fix a bug when combining `icmp` from `ptrtoint` - First, there's a crash when we try to combine that pointers into `icmp` directly by creating a `bitcast`, which is invalid if that two pointers are from different address spaces. - It's not always appropriate to cast one pointer to another if they are from different address spaces as that is not no-op cast. Instead, we only combine `icmp` from `ptrtoint` if that two pointers are of the same address space. llvm-svn: 229063	2015-02-13 04:51:26 +00:00
Chandler Carruth	87fdafc7b2	[IC] Fix a bug with the instcombine canonicalizing of loads and propagating of metadata. We were propagating !nonnull metadata even when the newly formed load is no longer of a pointer type. This is clearly broken and results in LLVM failing the verifier and aborting. This patch just restricts the propagation of !nonnull metadata to when we actually have a pointer type. This bug report and the initial version of this patch was provided by Charles Davis! Many thanks for finding this! We still need to add logic to round-trip the metadata correctly if we combine from pointer types to integer types and then back by using range metadata for the integer type loads. But this is the minimal and safe version of the patch, which is important so we can backport it into 3.6. llvm-svn: 229029	2015-02-13 02:30:01 +00:00
Olivier Sallenave	83aec218e7	Check interleaving without relying on debug output. llvm-svn: 229027	2015-02-13 02:13:57 +00:00
Michael Zolotukhin	8ec536e3dd	Testcase for r228988. llvm-svn: 228995	2015-02-13 00:35:45 +00:00
NAKAMURA Takumi	34d46fa297	llvm/test/Transforms/LoopVectorize/PowerPC/small-loop-rdx.ll REQUIRES +Asserts due to -debug. llvm-svn: 228989	2015-02-13 00:21:34 +00:00
Olivier Sallenave	05e69157b6	Change max interleave factor to 12 for POWER7 and POWER8. llvm-svn: 228973	2015-02-12 22:57:58 +00:00
Bjorn Steinbrink	6f972a13f6	Fix a crash in the assumption cache when inlining indirect function calls Summary: Instances of the AssumptionCache are per function, so we can't re-use the same AssumptionCache instance when recursing in the CallAnalyzer to analyze a different function. Instead we have to pass the AssumptionCacheTracker to the CallAnalyzer so it can get the right AssumptionCache on demand. Reviewers: hfinkel Subscribers: llvm-commits, hans Differential Revision: http://reviews.llvm.org/D7533 llvm-svn: 228957	2015-02-12 21:04:22 +00:00
Benjamin Kramer	e8cb17f282	Update test case. llvm-svn: 228956	2015-02-12 20:40:19 +00:00
Benjamin Kramer	443c7967ea	InstCombine: Allow folding of xor into icmp by changing the predicate for vectors The loop vectorizer can create this pattern. llvm-svn: 228954	2015-02-12 20:26:46 +00:00
Michael Zolotukhin	56c918bcce	Add a testcase for r228432. llvm-svn: 228951	2015-02-12 19:57:24 +00:00
James Molloy	e805ad95dc	[LoopRerolling] Be more forgiving with instruction order. We can't solve the full subgraph isomorphism problem. But we can allow obvious cases, where for example two instructions of different types are out of order. Due to them having different types/opcodes, there is no ambiguity. llvm-svn: 228931	2015-02-12 15:54:14 +00:00
Andrea Di Biagio	b08862c4f0	[TTI] Teach the cost heuristic how to query TLI to check if a zext/trunc is 'free' for the target. Now that SimplifyCFG uses TTI for the cost heuristic, we can teach BasicTTIImpl how to query TLI in order to get a more accurate cost for truncates and zero-extends. Before this patch, the basic cost heuristic in TargetTransformInfoImplCRTPBase would have conservatively returned a 'default' TCC_Basic for all zero-extends, and TCC_Free for truncates on native types. This patch improves the heuristic so that we query TLI (if available) to get more accurate answers. If TLI is available, then methods 'isZExtFree' and 'isTruncateFree' can be used to check if a zext/trunc is free for the target. Added more test cases to SimplifyCFG/X86/speculate-cttz-ctlz.ll. With this change, SimplifyCFG is now able to speculate a 'cheap' cttz/ctlz immediately followed by a free zext/trunc. Differential Revision: http://reviews.llvm.org/D7585 llvm-svn: 228923	2015-02-12 14:17:24 +00:00
Chandler Carruth	63aaa98d94	[slp] Fix a nasty bug in the SLP vectorizer that Joerg pointed out. Apparently some code finally started to tickle this after my canonicalization changes to instcombine. The bug stems from trying to form a vector type out of scalars that aren't compatible at all. In this example, from x86_mmx values. The code in the vectorizer that checks for reasonable types whas checking for aggregates or vectors, but there are lots of other types that should just never reach the vectorizer. Debugging this was made more confusing by the lie in an assert in VectorType::get() -- it isn't that the types are primitive. The types must be integer, pointer, or floating point types. No other types are allowed. I've improved the assert and added a helper to the vectorizer to handle the element type validity checks. It now re-uses the VectorType static function and then further excludes weird target-specific types that we probably shouldn't be touching here (x86_fp80 and ppc_fp128). Neither of these are really reachable anyways (neither 80-bit nor 128-bit things will get vectorized) but it seems better to just eagerly exclude such nonesense. I've added a test case, but while it definitely covers two of the paths through this code there may be more paths that would benefit from test coverage. I'm not familiar enough with the SLP vectorizer to synthesize test cases for all of these, but was able to update the code itself by inspection. llvm-svn: 228899	2015-02-12 02:30:56 +00:00
Tim Northover	02438033e8	DeadArgElim: aggregate Return assessment properly. I mistakenly thought the liveness of each "RetVal(F, i)" depended only on F. It actually depends on the index too, which means we need to be careful about how the results are combined before return. In particular if a single Use returns Live, that counts for the entire object, at the granularity we're considering. llvm-svn: 228885	2015-02-11 23:13:11 +00:00
Mehdi Amini	9730116bd6	Reassociate: cannot negate a INT_MIN value Summary: When trying to canonicalize negative constants out of multiplication expressions, we need to check that the constant is not INT_MIN which cannot be negated. Reviewers: mcrosier Reviewed By: mcrosier Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D7286 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 228872	2015-02-11 19:54:44 +00:00
Andrea Di Biagio	2a0e435db1	[TTI] Improved cost heuristic for cttz/ctlz calls. This patch is a follow-up of r228826 (see code-review: D7506). Now that SimplifyCFG uses TargetTransformInfo for cost analysis, we have to fix the cost heuristic for intrinsic calls to cttz/ctlz. This patch defines method 'getIntrinsicCost' in BasicTTIImpl: now, BasicTTIImpl queries TLI to check if a call to cttz/ctlz is cheap for the target. Added test cases in Transforms/SimplifyCFG/X86 to verify that on x86, SimplifyCFG only speculates a call to cttz/ctlz if it is cheap. Differential Revision: http://reviews.llvm.org/D7554 llvm-svn: 228829	2015-02-11 14:22:18 +00:00
James Molloy	7c336576a5	[SimplifyCFG] Swap to using TargetTransformInfo for cost analysis. We're already using TTI in SimplifyCFG, so remove the hard-baked "cheapness" heuristic and use TTI directly. Generally NFC intended, but we're using a slightly different heuristic now so there is a slight test churn. Test changes: * combine-comparisons-by-cse.ll: Removed unneeded branch check. * 2014-08-04-muls-it.ll: Test now doesn't branch but emits muleq. * coalesce-subregs.ll: Superfluous block check. * 2008-01-02-hoist-fp-add.ll: fadd is safe to speculate. Change to udiv. * PhiBlockMerge.ll: Superfluous CFG checking code. Main checks still present. * select-gep.ll: A variable GEP is not expensive, just TCC_Basic, according to the TTI. llvm-svn: 228826	2015-02-11 12:15:41 +00:00
James Molloy	f147359376	[LoopReroll] Introduce the concept of DAGRootSets. A DAGRootSet models an induction variable being used in a rerollable loop. For example: x[i3+0] = y1 x[i3+1] = y2 x[i3+2] = y3 Base instruction -> i3 +---+----+ / \| \ ST[y1] +1 +2 <-- Roots \| \| ST[y2] ST[y3] There may be multiple DAGRootSets, for example: x[i2+0] = ... (1) x[i2+1] = ... (1) x[i2+4] = ... (2) x[i2+5] = ... (2) x[(i+1234)2+5678] = ... (3) x[(i+1234)2+5679] = ... (3) This concept is similar to the "Scale" member used previously, but allows multiple independent sets of roots based off the same induction variable. llvm-svn: 228821	2015-02-11 09:19:47 +00:00
Reid Kleckner	b3775df32e	Fix invalid LLVM IR in PruneEH tests llvm-svn: 228786	2015-02-11 02:06:47 +00:00
Reid Kleckner	96d011315a	Don't promote asynch EH invokes of nounwind functions to calls If the landingpad of the invoke is using a personality function that catches asynch exceptions, then it can catch a trap. Also add some landingpads to invalid LLVM IR test cases that lack them. Over-the-shoulder reviewed by David Majnemer. llvm-svn: 228782	2015-02-11 01:23:16 +00:00
David Majnemer	44a5f22fbc	EarlyCSE: Add check lines for test added in r228760 llvm-svn: 228761	2015-02-10 23:11:02 +00:00
David Majnemer	7679300d93	EarlyCSE: It isn't safe to CSE across synchronization boundaries This fixes PR22514. llvm-svn: 228760	2015-02-10 23:09:43 +00:00
Tim Northover	43c0d2db50	DeadArgElim: arguments affect all returned sub-values by default. Unless we meet an insertvalue on a path from some value to a return, that value will be live if any of the return's components are live, so all of those components must be added to the MaybeLiveUses. Previously we were deleting arguments if sub-value 0 turned out to be dead. llvm-svn: 228731	2015-02-10 19:49:18 +00:00
Michael Zolotukhin	03e3518c91	Add a test case for new unrolling heuristics. THe heuristics were added in r228265 and r228434. llvm-svn: 228713	2015-02-10 17:54:54 +00:00
Chandler Carruth	2496910325	Revert r228556: InstCombine: propagate nonNull through assume This commit isn't using the correct context, and is transfoming calls that are operands to loads rather than calls that are operands to an icmp feeding into an assume. I've replied on the original review thread with a very reduced test case and some thoughts on how to rework this. llvm-svn: 228677	2015-02-10 08:07:32 +00:00
Ramkumar Ramachandra	2e4b9e0a37	PlaceSafepoints: modernize gc.result.* -> gc.result Differential Revision: http://reviews.llvm.org/D7516 llvm-svn: 228625	2015-02-09 23:00:40 +00:00
Philip Reames	0edbf2e407	Introduce more tests for PlaceSafepoints These tests the two optimizations for backedge insertion currently implemented and the split backedge flag which is currently off by default. llvm-svn: 228617	2015-02-09 22:10:15 +00:00
Philip Reames	5fc82fd6b5	Minor test cleanup a) add gc attribute b) remove unused param llvm-svn: 228612	2015-02-09 21:50:31 +00:00
Philip Reames	b1ed02f728	Add basic tests for PlaceSafepoints This is just adding really simple tests which should have been part of the original submission. When doing so, I discovered that I'd mistakenly removed required pieces when preparing the patch for upstream submission. I fixed two such bugs in this submission. llvm-svn: 228610	2015-02-09 21:48:05 +00:00
Tim Northover	705d2af9e1	DeadArgElim: fix mismatch in accounting of array return types. Some parts of DeadArgElim were only considering the individual fields of StructTypes separately, but others (where insertvalue & extractvalue instructions occur) also looked into ArrayTypes. This one is an actual bug; the mismatch can lead to an argument being considered used by a return sub-value that isn't being tracked (and hence is dead by default). It then gets incorrectly eliminated. llvm-svn: 228559	2015-02-09 01:21:00 +00:00
Tim Northover	854c927de5	DeadArgElim: assess uses of entire return value aggregate. Previously, a non-extractvalue use of an aggregate return value meant the entire return was considered live (the algorithm gave up entirely). This was correct, but conservative. It's better to actually look at that Use, making the analysis results apply to all sub-values under consideration. E.g. %val = call { i32, i32 } @whatever() [...] ret { i32, i32 } %val The return is using the entire aggregate (sub-values 0 and 1). We can still simplify @whatever if we can prove that this return is itself unused. Also unifies the logic slightly between aggregate and non-aggregate cases.. llvm-svn: 228558	2015-02-09 01:20:53 +00:00
Ramkumar Ramachandra	a021ee62ca	InstCombine: propagate nonNull through assume Make assume (load (call\|invoke) != null) set nonNull return attribute for the call and invoke. Also include tests. Differential Revision: http://reviews.llvm.org/D7107 llvm-svn: 228556	2015-02-09 01:13:13 +00:00
Bjorn Steinbrink	5ec7522771	Correctly combine alias.scope metadata by a union instead of intersecting Summary: The alias.scope metadata represents sets of things an instruction might alias with. When generically combining the metadata from two instructions the result must be the union of the original sets, because the new instruction might alias with anything any of the original instructions aliased with. Reviewers: hfinkel Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D7490 llvm-svn: 228525	2015-02-08 17:07:14 +00:00
Benjamin Kramer	17d9015d27	ValueTracking: Make isBytewiseValue simpler and more powerful at the same time. Turns out there is a simpler way of checking that all bytes in a word are equal than binary decomposition. llvm-svn: 228503	2015-02-07 19:29:02 +00:00
Bjorn Steinbrink	71bf3b800a	Properly update AA metadata when performing call slot optimization Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D7482 llvm-svn: 228500	2015-02-07 17:54:36 +00:00
Matthias Braun	2e404597f4	InstCombine: Combine select sequences into a single select Normalize select(C0, select(C1, a, b), b) -> select((C0 & C1), a, b) select(C0, a, select(C1, a, b)) -> select((C0 \| C1), a, b) This normal form may enable further combines on the And/Or and shortens paths for the values. Many targets prefer the other but can go back easily in CodeGen. Differential Revision: http://reviews.llvm.org/D7399 llvm-svn: 228409	2015-02-06 17:49:36 +00:00
Michael Kuperstein	d2b6fdbc31	Teach isDereferenceablePointer() to look through bitcast constant expressions. This fixes a LICM regression due to the new load+store pair canonicalization. Differential Revision: http://reviews.llvm.org/D7411 llvm-svn: 228284	2015-02-05 09:15:37 +00:00
Cameron Esfahani	17177d1e84	Value soft float calls as more expensive in the inliner. Summary: When evaluating floating point instructions in the inliner, ask the TTI whether it is an expensive operation. By default, it's not an expensive operation. This keeps the default behavior the same as before. The ARM TTI has been updated to return back TCC_Expensive for targets which don't have hardware floating point. Reviewers: chandlerc, echristo Reviewed By: echristo Subscribers: t.p.northover, aemerson, llvm-commits Differential Revision: http://reviews.llvm.org/D6936 llvm-svn: 228263	2015-02-05 02:09:33 +00:00
Tom Stellard	071ec90b68	StructurizeCFG: Use a reverse post-order traversal We were previously doing a post-order traversal and operating on the list in reverse, however this would occasionaly cause backedges for loops to be visited before some of the other blocks in the loop. We know use a reverse post-order traversal, which avoids this issue. The reverse post-order traversal is not completely ideal, so we need to manually fixup the list to ensure that inner loop backedges are visited before outer loop backedges. llvm-svn: 228186	2015-02-04 20:49:44 +00:00
Renato Golin	2a5c0a51ce	Reverting VLD1/VST1 base-updating/post-incrementing combining This reverts patches 223862, 224198, 224203, and 224754, which were all related to the vector load/store combining and were reverted/reaplied a few times due to the same alignment problems we're seeing now. Further tests, mainly self-hosting Clang, will be needed to reapply this patch in the future. llvm-svn: 228129	2015-02-04 10:11:59 +00:00
Daniel Berlin	487aed0d77	Allow PRE to insert no-cost phi nodes llvm-svn: 228024	2015-02-03 20:37:08 +00:00
Jingyue Wu	d7966ff3b9	Add straight-line strength reduction to LLVM Summary: Straight-line strength reduction (SLSR) is implemented in GCC but not yet in LLVM. It has proven to effectively simplify statements derived from an unrolled loop, and can potentially benefit many other cases too. For example, LLVM unrolls #pragma unroll foo (int i = 0; i < 3; ++i) { sum += foo((b + i) * s); } into sum += foo(b * s); sum += foo((b + 1) * s); sum += foo((b + 2) * s); However, no optimizations yet reduce the internal redundancy of the three expressions: b * s (b + 1) * s (b + 2) * s With SLSR, LLVM can optimize these three expressions into: t1 = b * s t2 = t1 + s t3 = t2 + s This commit is only an initial step towards implementing a series of such optimizations. I will implement more (see TODO in the file commentary) in the near future. This optimization is enabled for the NVPTX backend for now. However, I am more than happy to push it to the standard optimization pipeline after more thorough performance tests. Test Plan: test/StraightLineStrengthReduce/slsr.ll Reviewers: eliben, HaoLiu, meheff, hfinkel, jholewinski, atrick Reviewed By: jholewinski, atrick Subscribers: karthikthecool, jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D7310 llvm-svn: 228016	2015-02-03 19:37:06 +00:00
Erik Eckstein	7330e358f6	Fix: SLPVectorizer crashes with assertion when vectorizing a cmp instruction. The commit r225977 uncovered this bug. The problem was that the vectorizer tried to read the second operand of an already deleted instruction. The bug didn't show up before r225977 because the freed memory still contained a non-null pointer. With r225977 deletion of instructions is delayed and the read operand pointer is always null. llvm-svn: 227800	2015-02-02 12:45:34 +00:00
Chandler Carruth	fdffd87d68	[PM] Port SimplifyCFG to the new pass manager. This should be sufficient to replace the initial (minor) function pass pipeline in Clang with the new pass manager. I'll probably add an (off by default) flag to do that just to ensure we can get extra testing. llvm-svn: 227726	2015-02-01 11:34:21 +00:00
Chandler Carruth	e8c686aa86	[PM] Port EarlyCSE to the new pass manager. I've added RUN lines both to the basic test for EarlyCSE and the target-specific test, as this serves as a nice test that the TTI layer in the new pass manager is in fact working well. llvm-svn: 227725	2015-02-01 10:51:23 +00:00
Adrian Prantl	3e2659eb92	Inliner: Use replaceDbgDeclareForAlloca() instead of splicing the instruction and generalize it to optionally dereference the variable. Follow-up to r227544. llvm-svn: 227604	2015-01-30 19:37:48 +00:00
Hao Liu	6bd67c08fa	Move the target specific test case arbitrary-induction-step.ll to test/Transforms/LoopVectorize/AArch64 folder. llvm-svn: 227561	2015-01-30 07:33:31 +00:00
Hao Liu	8de4f8b1b5	[LoopVectorize] Induction variables: support arbitrary constant step. Previously, only -1 and +1 step values are supported for induction variables. This patch extends LV to support arbitrary constant steps. Initial patch by Alexey Volkov. Some bug fixes are added in the following version. Differential Revision: http://reviews.llvm.org/D6051 and http://reviews.llvm.org/D7193 llvm-svn: 227557	2015-01-30 05:02:21 +00:00
Adrian Prantl	4d365250ec	Fix PR22386. The inliner moves static allocas to the entry basic block so we need to move the dbg.declare intrinsics that describe them, too. llvm-svn: 227544	2015-01-30 01:55:25 +00:00
Sanjay Patel	4f07a56958	[GVN] don't propagate equality comparisons of FP zero (PR22376) In http://reviews.llvm.org/D6911, we allowed GVN to propagate FP equalities to allow some simple value range optimizations. But that introduced a bug when comparing to -0.0 or 0.0: these compare equal even though they are not bitwise identical. This patch disallows propagating zero constants in equality comparisons. Fixes: http://llvm.org/bugs/show_bug.cgi?id=22376 Differential Revision: http://reviews.llvm.org/D7257 llvm-svn: 227491	2015-01-29 20:51:49 +00:00
Philip Reames	9198b33b48	Teach SplitBlockPredecessors how to handle landingpad blocks. Patch by: Igor Laevsky <igor@azulsystems.com> "Currently SplitBlockPredecessors generates incorrect code in case if basic block we are going to split has a landingpad. Also seems like it is fairly common case among it's users to conditionally call either SplitBlockPredecessors or SplitLandingPadPredecessors. Because of this I think it is reasonable to add this condition directly into SplitBlockPredecessors." Differential Revision: http://reviews.llvm.org/D7157 llvm-svn: 227390	2015-01-28 23:06:47 +00:00
Michael Kuperstein	951995821a	[X86] Reduce some 32-bit imuls into lea + shl Reduce integer multiplication by a constant of the form k*2^c, where k is in {3,5,9} into a lea + shl. Previously it was only done for imulq on 64-bit platforms, but it makes sense for imull and 32-bit as well. Differential Revision: http://reviews.llvm.org/D7196 llvm-svn: 227308	2015-01-28 14:08:22 +00:00
Elena Demikhovsky	45f0448081	Fold fcmp in cases where value is provably non-negative. By Arch Robison. This patch folds fcmp in some cases of interest in Julia. The patch adds a function CannotBeOrderedLessThanZero that returns true if a value is provably not less than zero. I.e. the function returns true if the value is provably -0, +0, positive, or a NaN. The patch extends InstructionSimplify.cpp to fold instances of fcmp where: - the predicate is olt or uge - the first operand is provably not less than zero - the second operand is zero The motivation for handling these cases optimizing away domain checks for sqrt in Julia for common idioms such as sqrt(xx+yy).. http://reviews.llvm.org/D6972 llvm-svn: 227298	2015-01-28 08:03:58 +00:00
Reid Kleckner	4af6415237	Move EH personality type classification to Analysis/LibCallSemantics.h Summary: Also add enum types for __C_specific_handler and _CxxFrameHandler3 for which we know a few things. Reviewers: majnemer Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D7214 llvm-svn: 227284	2015-01-28 01:17:38 +00:00
Ahmed Bougacha	1ac9356524	[SimplifyLibCalls] Don't confuse strcpy_chk for stpcpy_chk. This was introduced in a faulty refactoring (r225640, mea culpa): the tests weren't testing the return values, so, for both __strcpy_chk and __stpcpy_chk, we would return the end of the buffer (matching stpcpy) instead of the beginning (for strcpy). The root cause was the prefix "__" being ignored when comparing, which made us always pick LibFunc::stpcpy_chk. Pass the LibFunc::Func directly to avoid this kind of error. Also, make the testcases as explicit as possible to prevent this. The now-useful testcases expose another, entangled, stpcpy problem, with the further simplification. This was introduced in a refactoring (r225640) to match the original behavior. However, this leads to problems when successive simplifications generate several similar instructions, none of which are removed by the custom replaceAllUsesWith. For instance, InstCombine (the main user) doesn't erase the instruction in its custom RAUW. When trying to simplify say __stpcpy_chk: - first, an stpcpy is created (fortified simplifier), - second, a memcpy is created (normal simplifier), but the stpcpy call isn't removed. - third, InstCombine later revisits the instructions, and simplifies the first stpcpy to a memcpy. We now have two memcpys. llvm-svn: 227250	2015-01-27 21:52:16 +00:00
Sanjoy Das	dcf2651043	Teach IRCE to look at branch weights when recognizing range checks Splitting a loop to make range checks redundant is profitable only if the range check "never" fails. Make this fact a part of recognizing a range check -- a branch is a range check only if it is expected to pass (via branch_weights metadata). Differential Revision: http://reviews.llvm.org/D7192 llvm-svn: 227249	2015-01-27 21:38:12 +00:00
Andrea Di Biagio	086cbc37ad	[InstCombine] Teach how to fold a select into a cttz/ctlz with the 'is_zero_undef' flag. This patch teaches the Instruction Combiner how to fold a cttz/ctlz followed by a icmp plus select into a single cttz/ctlz with flag 'is_zero_undef' cleared. Added test InstCombine/select-cmp-cttz-ctlz.ll. llvm-svn: 227197	2015-01-27 15:58:14 +00:00
David Majnemer	4c82daea60	LoopRotate: Don't walk the uses of a Constant LoopRotate wanted to avoid live range interference by looking at the uses of a Value in the loop latch and seeing if any lied outside of the loop. We would wrongly perform this operation on Constants. This fixes PR22337. llvm-svn: 227171	2015-01-27 06:21:43 +00:00
Chad Rosier	f9327d6fe9	Commoning of target specific load/store intrinsics in Early CSE. Phabricator revision: http://reviews.llvm.org/D7121 Patch by Sanjin Sijaric <ssijaric@codeaurora.org>! llvm-svn: 227149	2015-01-26 22:51:15 +00:00
Philip Reames	219b15e85f	Add test cases for PRE w/volatile loads These tests check that the combination of 227110 (cross block query inst) and 227112 (volatile load semantics) work together properly to allow PRE in cases where a loop contains a volatile access. llvm-svn: 227146	2015-01-26 22:40:44 +00:00
Hans Wennborg	b64cb271dc	SimplifyCFG: Omit range checks for switch lookup tables when default is unreachable The range check would get optimized away later, but we might as well not emit them in the first place. http://reviews.llvm.org/D6471 llvm-svn: 227126	2015-01-26 19:52:34 +00:00
Hans Wennborg	6800008f04	SimplifyCFG: don't remove unreachable default switch destinations An unreachable default destination can be exploited by other optimizations and allows for more efficient lowering. Both the SDag switch lowering and LowerSwitch can exploit unreachable defaults. Also make TurnSwitchRangeICmp handle switches with unreachable default. This is kind of separate change, but it cannot be tested without the change above, and I don't want to land the change above without this since that would regress other tests. Differential Revision: http://reviews.llvm.org/D6471 llvm-svn: 227125	2015-01-26 19:52:32 +00:00
Philip Reames	a7ad6a589c	Refine memory dependence's notion of volatile semantics According to my reading of the LangRef, volatiles are only ordered with respect to other volatiles. It is entirely legal and profitable to forward unrelated loads over the volatile load. This patch implements this for GVN by refining the transition rules MemoryDependenceAnalysis uses when encountering a volatile. The added test cases show where the extra flexibility is profitable for local dependence optimizations. I have a related change (227110) which will extend this to non-local dependence (i.e. PRE), but that's essentially orthogonal to the semantic change in this patch. I have tested the two together and can confirm that PRE works over a volatile load with both changes. I will be submitting a PRE w/volatiles test case seperately in the near future. Differential Revision: http://reviews.llvm.org/D6901 llvm-svn: 227112	2015-01-26 18:54:27 +00:00
Philip Reames	32351455f6	Pass QueryInst down through non-local dependency calculation This change is mostly motivated by exposing information about the original query instruction to the actual scanning work in getPointerDependencyFrom when used by GVN PRE. In a follow up change, I will use this to be more precise with regards to the semantics of volatile instructions encountered in the scan of a basic block. Worth noting, is that this change (despite appearing quite simple) is not semantically preserving. By providing more information to the helper routine, we allow some optimizations to kick in that weren't previously able to (when called from this code path.) In particular, we see that treatment of !invariant.load becomes more precise. In theory, we might see a difference with an ordered/atomic instruction as well, but I'm having a hard time actually finding a test case which shows that. Test wise, I've included new tests for !invariant.load which illustrate this difference. I've also included some updated TBAA tests which highlight that this change isn't needed for that optimization to kick in - it's handled inside alias analysis itself. Eventually, it would be nice to factor the !invariant.load handling inside alias analysis as well. Differential Revision: http://reviews.llvm.org/D6895 llvm-svn: 227110	2015-01-26 18:39:52 +00:00
Erik Eckstein	98df6da740	SLPVectorizer: fix wrong scheduling of atomic load/stores. This fixes PR22306. llvm-svn: 227077	2015-01-26 09:07:04 +00:00
Chandler Carruth	43e590e51f	[PM] Port LowerExpectIntrinsic to the new pass manager. This just lifts the logic into a static helper function, sinks the legacy pass to be a trivial wrapper of that helper fuction, and adds a trivial wrapper for the new PM as well. Not much to see here. I switched a test case to run in both modes, but we have to strip the dead prototypes separately as that pass isn't in the new pass manager (yet). llvm-svn: 226999	2015-01-24 11:13:02 +00:00
Chandler Carruth	83ba269e4b	[PM] Port instcombine to the new pass manager! This is exciting as this is a much more involved port. This is a complex, existing transformation pass. All of the core logic is shared between both old and new pass managers. Only the access to the analyses is separate because the actual techniques are separate. This also uses a bunch of different and interesting analyses and is the first time where we need to use an analysis across an IR layer. This also paves the way to expose instcombine utility functions. I've got a static function that implements the core pass logic over a function which might be mildly interesting, but more interesting is likely exposing a routine which just uses instructions already in the worklist and combines until empty. I've switched one of my favorite instcombine tests to run with both as well to make sure this keeps working. llvm-svn: 226987	2015-01-24 04:19:17 +00:00
Hans Wennborg	ae9c971a2f	LowerSwitch: replace unreachable default with popular case destination SimplifyCFG currently does this transformation, but I'm planning to remove that to allow other passes, such as this one, to exploit the unreachable default. This patch takes care to keep track of what case values are unreachable even after the transformation, allowing for more efficient lowering. Differential Revision: http://reviews.llvm.org/D6697 llvm-svn: 226934	2015-01-23 20:43:51 +00:00
Reid Kleckner	f12b33454f	Revert "Don't remove a landing pad if the invoke requires a table entry." This reverts commit r176827. Björn Steinbrink pointed out that this didn't actually fix the bug (PR15555) it was attempting to fix. With this reverted, we can now remove landingpad cleanups that immediately resume unwinding, converting the invoke to a call. llvm-svn: 226850	2015-01-22 19:29:46 +00:00
Sanjoy Das	d1fb13ce4c	Fix crashes in IRCE caused by mismatched types There are places where the inductive range check elimination pass depends on two llvm::Values or llvm::SCEVs to be of the same llvm::Type when they do not need to be. This patch relaxes those restrictions (by bailing out of the optimization if the types mismatch), and adds test cases to trigger those paths. These issues were found by bootstrapping clang with IRCE running in the -O3 pass ordering. Differential Revision: http://reviews.llvm.org/D7082 llvm-svn: 226793	2015-01-22 08:29:18 +00:00
Elena Demikhovsky	079b2d8c0c	Fixed a bug in masked load/store in reversed loop. Added a test. The bug was submitted to bugzilla: http://llvm.org/bugs/show_bug.cgi?id=22225 llvm-svn: 226791	2015-01-22 08:20:06 +00:00
Chandler Carruth	cd8522ef44	[canonicalize] Teach InstCombine to canonicalize loads which are only ever stored to always use a legal integer type if one is available. Regardless of whether this particular type is good or bad, it ensures we don't get weird differences in generated code (and resulting performance) from "equivalent" patterns that happen to end up using a slightly different type. After some discussion on llvmdev it seems everyone generally likes this canonicalization. However, there may be some parts of LLVM that handle it poorly and need to be fixed. I have at least verified that this doesn't impede GVN and instcombine's store-to-load forwarding powers in any obvious cases. Subtle cases are exactly what we need te flush out if they remain. Also note that this IR pattern should already be hitting LLVM from Clang at least because it is exactly the IR which would be produced if you used memcpy to copy a pointer or floating point between memory instead of a variable. llvm-svn: 226781	2015-01-22 05:08:12 +00:00
David Blaikie	df706288fb	DebugInfo: Use distinct inlinedAt MDLocations to avoid separate inlined calls being coalesced When two calls from the same MDLocation are inlined they currently get treated as one inlined function call (creating difficulty debugging, duplicate variables, etc). Clang worked around this by including column information on inline calls which doesn't address LTO inlining or calls to the same function from the same line and column (such as through a macro). It also didn't address ctor and member function calls. By making the inlinedAt locations distinct, every call site has an explicitly distinct location that cannot be coalesced with any other call. This can produce linearly (2x in the worst case where every call is inlined and the call instruction has a non-call instruction at the same location) more debug locations. Any increase beyond that are in cases where the Clang workaround was insufficient and the new scheme is creating necessary distinct nodes that were being erroneously coalesced previously. After this change to LLVM the incomplete workarounds in Clang. That should reduce the number of debug locations (in a build without column info, the default on Darwin, not the default on Linux) by not creating pseudo-distinct locations for every call to an inline function. (oh, and I made the inlined-at chain rebuilding iterative instead of recursive because I was having trouble wrapping my head around it the way it was - open to discussion on the right design for that function (including going back to a recursive solution)) llvm-svn: 226736	2015-01-21 22:57:29 +00:00
David Majnemer	4c0a6e918a	InstCombine: Don't strip bitcasts off of callsites marked 'thunk' The return type of a thunk is meaningless, we just want the arguments and return value to be forwarded. llvm-svn: 226708	2015-01-21 22:32:04 +00:00
Alexander Potapenko	4ac461c35f	Use a smaller pragma unroll threshold to reduce test execution time. When opt is compiled with AddressSanitizer it takes more than 30 seconds to unroll the loop in unroll_1M(). llvm-svn: 226660	2015-01-21 13:52:02 +00:00
Karthik Bhat	0b0f4660fa	Fix Operandreorder logic in SLPVectorizer to generate longer vectorizable chain. This patch fixes 2 issues in reorderInputsAccordingToOpcode 1) AllSameOpcodeLeft and AllSameOpcodeRight was being calculated incorrectly resulting in code not being vectorized in few cases. 2) Adds logic to reorder operands if we get longer chain of consecutive loads enabling vectorization. Handled the same for cases were we have AltOpcode. Thanks Michael for inputs and review. Review: http://reviews.llvm.org/D6677 llvm-svn: 226547	2015-01-20 06:11:00 +00:00
Mehdi Amini	590a2700fc	Fix Reassociate handling of constant in presence of undef float http://reviews.llvm.org/D6993 llvm-svn: 226245	2015-01-16 03:00:58 +00:00
Sanjoy Das	a1837a342d	Add a new pass "inductive range check elimination" IRCE eliminates range checks of the form 0 <= A * I + B < Length by splitting a loop's iteration space into three segments in a way that the check is completely redundant in the middle segment. As an example, IRCE will convert len = < known positive > for (i = 0; i < n; i++) { if (0 <= i && i < len) { do_something(); } else { throw_out_of_bounds(); } } to len = < known positive > limit = smin(n, len) // no first segment for (i = 0; i < limit; i++) { if (0 <= i && i < len) { // this check is fully redundant do_something(); } else { throw_out_of_bounds(); } } for (i = limit; i < n; i++) { if (0 <= i && i < len) { do_something(); } else { throw_out_of_bounds(); } } IRCE can deal with multiple range checks in the same loop (it takes the intersection of the ranges that will make each of them redundant individually). Currently IRCE does not do any profitability analysis. That is a TODO. Please note that the status of this pass is experimental, and it is not part of any default pass pipeline. Having said that, I will love to get feedback and general input from people interested in trying this out. This pass was originally r226201. It was reverted because it used C++ features not supported by MSVC 2012. Differential Revision: http://reviews.llvm.org/D6693 llvm-svn: 226238	2015-01-16 01:03:22 +00:00
Sanjoy Das	7f62ac8e4d	Revert r226201 (Add a new pass "inductive range check elimination") The change used C++11 features not supported by MSVC 2012. I will fix the change to use things supported MSVC 2012 and recommit shortly. llvm-svn: 226216	2015-01-15 22:18:10 +00:00
Sanjoy Das	7059e2959d	Add a new pass "inductive range check elimination" IRCE eliminates range checks of the form 0 <= A * I + B < Length by splitting a loop's iteration space into three segments in a way that the check is completely redundant in the middle segment. As an example, IRCE will convert len = < known positive > for (i = 0; i < n; i++) { if (0 <= i && i < len) { do_something(); } else { throw_out_of_bounds(); } } to len = < known positive > limit = smin(n, len) // no first segment for (i = 0; i < limit; i++) { if (0 <= i && i < len) { // this check is fully redundant do_something(); } else { throw_out_of_bounds(); } } for (i = limit; i < n; i++) { if (0 <= i && i < len) { do_something(); } else { throw_out_of_bounds(); } } IRCE can deal with multiple range checks in the same loop (it takes the intersection of the ranges that will make each of them redundant individually). Currently IRCE does not do any profitability analysis. That is a TODO. Please note that the status of this pass is experimental, and it is not part of any default pass pipeline. Having said that, I will love to get feedback and general input from people interested in trying this out. Differential Revision: http://reviews.llvm.org/D6693 llvm-svn: 226201	2015-01-15 20:45:46 +00:00
Sanjoy Das	8c252bde36	Fix PR22222 The bug was introduced in r225282. r225282 assumed that sub X, Y is the same as add X, -Y. This is not correct if we are going to upgrade the sub to sub nuw. This change fixes the issue by making the optimization ignore sub instructions. Differential Revision: http://reviews.llvm.org/D6979 llvm-svn: 226075	2015-01-15 01:46:09 +00:00
Richard Smith	e78bb1249e	For PR21145: recognise a builtin call to a known deallocation function even if it's defined in the current module. Clang generates this situation for the C++14 sized deallocation functions, because it generates a weak definition in case one isn't provided by the C++ runtime library. llvm-svn: 226069	2015-01-15 01:00:33 +00:00
Ramkumar Ramachandra	dba7329ebb	[GC] CodeGenPrep transform: simplify offsetable relocate The transform is somewhat involved, but the basic idea is simple: find derived pointers that have been offset from the base pointer using gep and replace the relocate of the derived pointer with a gep to the relocated base pointer (with the same offset). llvm-svn: 226060	2015-01-14 23:27:07 +00:00
Duncan P. N. Exon Smith	9885469922	IR: Move MDLocation into place This commit moves `MDLocation`, finishing off PR21433. There's an accompanying clang commit for frontend testcases. I'll attach the testcase upgrade script I used to PR21433 to help out-of-tree frontends/backends. This changes the schema for `DebugLoc` and `DILocation` from: !{i32 3, i32 7, !7, !8} to: !MDLocation(line: 3, column: 7, scope: !7, inlinedAt: !8) Note that empty fields (line/column: 0 and inlinedAt: null) don't get printed by the assembly writer. llvm-svn: 226048	2015-01-14 22:27:36 +00:00
David Majnemer	a0afb55ff9	InstCombine: Don't take A-B<0 into A<B if A-B has other uses This fixes PR22226. llvm-svn: 226023	2015-01-14 19:26:56 +00:00
Ahmed Bougacha	71d7b18e3d	[SimplifyLibCalls] Don't try to simplify indirect calls. It turns out, all callsites of the simplifier are guarded by a check for CallInst::getCalledFunction (i.e., to make sure the callee is direct). This check wasn't done when trying to further optimize a simplified fortified libcall, introduced by a refactoring in r225640. Fix that, add a testcase, and document the requirement. llvm-svn: 225895	2015-01-14 00:55:05 +00:00
Sanjay Patel	5f1d9eaad3	GVN: propagate equalities for floating point compares Allow optimizations based on FP comparison values in the same way as integers. This resolves PR17713: http://llvm.org/bugs/show_bug.cgi?id=17713 Differential Revision: http://reviews.llvm.org/D6911 llvm-svn: 225660	2015-01-12 19:29:48 +00:00
Hal Finkel	611b127ad8	[PowerPC] Readjust the loop unrolling threshold Now that the way that the partial unrolling threshold for small loops is used to compute the unrolling factor as been corrected, a slightly smaller threshold is preferable. This is expected; other targets may need to re-tune as well. llvm-svn: 225566	2015-01-10 00:31:10 +00:00
Hal Finkel	38dd590861	[LoopUnroll] Fix the partial unrolling threshold for small loop sizes When we compute the size of a loop, we include the branch on the backedge and the comparison feeding the conditional branch. Under normal circumstances, these don't get replicated with the rest of the loop body when we unroll. This led to the somewhat surprising behavior that really small loops would not get unrolled enough -- they could be unrolled more and the resulting loop would be below the threshold, because we were assuming they'd take (LoopSize * UnrollingFactor) instructions after unrolling, instead of (((LoopSize-2) * UnrollingFactor)+2) instructions. This fixes that computation. llvm-svn: 225565	2015-01-10 00:30:55 +00:00
Hans Wennborg	dcc6e5bc03	SimplifyCFG: check uses of constant-foldable instrs in switch destinations (PR20210) The previous code assumed that such instructions could not have any uses outside CaseDest, with the motivation that the instruction could not dominate CommonDest because CommonDest has phi nodes in it. That simply isn't true; e.g., CommonDest could have an edge back to itself. llvm-svn: 225552	2015-01-09 22:13:31 +00:00
Tim Northover	eb16112e97	Re-reapply r221924: "[GVN] Perform Scalar PRE on gep indices that feed loads before doing Load PRE" It's not really expected to stick around, last time it provoked a weird LTO build failure that I can't reproduce now, and the bot logs are long gone. I'll re-revert it if the failures recur. Original description: Perform Scalar PRE on gep indices that feed loads before doing Load PRE. llvm-svn: 225536	2015-01-09 19:19:56 +00:00
Hal Finkel	b359b735d6	[PowerPC] Enable late partial unrolling on the POWER7 The P7 benefits from not have really-small loops so that we either have multiple dispatch groups in the loop and/or the ability to form more-full dispatch groups during scheduling. Setting the partial unrolling threshold to 44 seems good, empirically, for the P7. Compared to using no late partial unrolling, this yields the following test-suite speedups: SingleSource/Benchmarks/Adobe-C++/simple_types_constant_folding -66.3253% +/- 24.1975% SingleSource/Benchmarks/Misc-C++/oopack_v1p8 -44.0169% +/- 29.4881% SingleSource/Benchmarks/Misc/pi -27.8351% +/- 12.2712% SingleSource/Benchmarks/Stanford/Bubblesort -30.9898% +/- 22.4647% I've speculatively added a similar setting for the P8. Also, I've noticed that the unroller does not quite calculate the unrolling factor correctly for really tiny loops because it neglects to account for the fact that not every loop body replicant contains an ending branch and counter increment. I'll fix that later. llvm-svn: 225522	2015-01-09 15:51:16 +00:00
Duncan P. N. Exon Smith	090a19bd3c	IR: Add 'distinct' MDNodes to bitcode and assembly Propagate whether `MDNode`s are 'distinct' through the other types of IR (assembly and bitcode). This adds the `distinct` keyword to assembly. Currently, no one actually calls `MDNode::getDistinct()`, so these nodes only get created for: - self-references, which are never uniqued, and - nodes whose operands are replaced that hit a uniquing collision. The concept of distinct nodes is still not quite first-class, since distinct-ness doesn't yet survive across `MapMetadata()`. Part of PR22111. llvm-svn: 225474	2015-01-08 22:38:29 +00:00
Matt Arsenault	b935d9df4c	Fix fcmp + fabs instcombines when using the intrinsic This was only handling the libcall. This is another example of why only the intrinsic should ever be used when it exists. llvm-svn: 225465	2015-01-08 20:09:34 +00:00
Matt Arsenault	2458393104	Fix using wrong intrinsic in test This is a leftover from renaming the intrinsic. It's surprising the unknown llvm. intrinsic wasn't rejected. llvm-svn: 225304	2015-01-06 23:00:33 +00:00
Rafael Espindola	83a362cde8	Change the .ll syntax for comdats and add a syntactic sugar. In order to make comdats always explicit in the IR, we decided to make the syntax a bit more compact for the case of a GlobalObject in a comdat with the same name. Just dropping the $name causes problems for @foo = globabl i32 0, comdat $bar = comdat ... and declare void @foo() comdat $bar = comdat ... So the syntax is changed to @g1 = globabl i32 0, comdat($c1) @g2 = globabl i32 0, comdat and declare void @foo() comdat($c1) declare void @foo() comdat llvm-svn: 225302	2015-01-06 22:55:16 +00:00
Sanjoy Das	7c0ce26614	This patch teaches IndVarSimplify to add nuw and nsw to certain kinds of operations that provably don't overflow. For example, we can prove %civ.inc below does not sign-overflow. With this change, IndVarSimplify changes %civ.inc to an add nsw. define i32 @foo(i32* %array, i32* %length_ptr, i32 %init) { entry: %length = load i32* %length_ptr, !range !0 %len.sub.1 = sub i32 %length, 1 %upper = icmp slt i32 %init, %len.sub.1 br i1 %upper, label %loop, label %exit loop: %civ = phi i32 [ %init, %entry ], [ %civ.inc, %latch ] %civ.inc = add i32 %civ, 1 %cmp = icmp slt i32 %civ.inc, %length br i1 %cmp, label %latch, label %break latch: store i32 0, i32* %array %check = icmp slt i32 %civ.inc, %len.sub.1 br i1 %check, label %loop, label %break break: ret i32 %civ.inc exit: ret i32 42 } Differential Revision: http://reviews.llvm.org/D6748 llvm-svn: 225282	2015-01-06 19:02:56 +00:00
Matt Arsenault	55e7312cd8	Convert fcmp with 0.0 from casted integers to icmp This is already handled in general when it is known the conversion can't lose bits with smaller integer types casted into wider floating point types. This pattern happens somewhat often in GPU programs that cast workitem intrinsics to float, which are often compared with 0. Specifically handle the special case of compares with zero which should also be known to not lose information. I had a more general version of this which allows equality compares if the casted float is exactly representable in the integer, but I'm not 100% confident that is always correct. Also fold cases that aren't integers to true / false. llvm-svn: 225265	2015-01-06 15:50:59 +00:00
David Majnemer	9b6b822814	InstCombine: Bitcast call arguments from/to pointer/integer type Try harder to get rid of bitcast'd calls by ptrtoint/inttoptr'ing arguments and return values when DataLayout says it is safe to do so. llvm-svn: 225254	2015-01-06 08:41:31 +00:00
Michael Kuperstein	6ae456b0d7	Fix broken test from r225159. llvm-svn: 225164	2015-01-05 12:34:01 +00:00
Jiangning Liu	40c1b35292	Fixed a bug in memory dependence checking module of loop vectorization. The following loop should not be vectorized with current algorithm. {code} // loop body ... = a[i] (1) ... = a[i+1] (2) ....... a[i+1] = .... (3) a[i] = ... (4) {code} The algorithm tries to collect memory access candidates from AliasSetTracker, and then check memory dependences one another. The memory accesses are unique in AliasSetTracker, and a single memory access in AliasSetTracker may map to multiple entries in AccessAnalysis, which could cover both 'read' and 'write'. Originally the algorithm only checked 'write' entry in Accesses if only 'write' exists. This is incorrect and the consequence is it ignored all read access, and finally some RAW and WAR dependence are missed. For the case given above, if we ignore two reads, the dependence between (1) and (3) would not be able to be captured, and finally this loop will be incorrectly vectorized. The fix simply inserts a new loop to find all entries in Accesses. Since it will skip most of all other memory accesses by checking the Value pointer at the very beginning of the loop, it should not increase compile-time visibly. llvm-svn: 225159	2015-01-05 10:08:58 +00:00
Chandler Carruth	73b0164fe5	[SROA] Apply a somewhat heavy and unpleasant hammer to fix PR22093, an assert out of the new pre-splitting in SROA. This fix makes the code do what was originally intended -- when we have a store of a load both dealing in the same alloca, we force them to both be pre-split with identical offsets. This is really quite hard to do because we can keep discovering problems as we go along. We have to track every load over the current alloca which for any resaon becomes invalid for pre-splitting, and go back to remove all stores of those loads. I've included a couple of test cases derived from PR22093 that cover the different ways this can happen. While that PR only really triggered the first of these two, its the same fundamental issue. The other challenge here is documented in a FIXME now. We end up being quite a bit more aggressive for pre-splitting when loads and stores don't refer to the same alloca. This aggressiveness comes at the cost of introducing potentially redundant loads. It isn't clear that this is the right balance. It might be considerably better to require that we only do pre-splitting when we can presplit every load and store involved in the entire operation. That would give more consistent if conservative results. Unfortunately, it requires a non-trivial change to the actual pre-splitting operation in order to correctly handle cases where we end up pre-splitting stores out-of-order. And it isn't 100% clear that this is the right direction, although I'm starting to suspect that it is. llvm-svn: 225149	2015-01-05 04:17:53 +00:00
David Majnemer	087dc8b831	InstCombine: match can find ConstantExprs, don't assume we have a Value We assumed the output of a match was a Value, this would cause us to assert because we would fail a cast<>. Instead, use a helper in the Operator family to hide the distinction between Value and Constant. This fixes PR22087. llvm-svn: 225127	2015-01-04 07:36:02 +00:00
David Majnemer	6ee8d17bc6	ValueTracking: ComputeNumSignBits should tolerate misshapen phi nodes PHI nodes can have zero operands in the middle of a transform. It is expected that utilities in Analysis don't freak out when this happens. Note that it is considered invalid to allow these misshapen phi nodes to make it to another pass. This fixes PR22086. llvm-svn: 225126	2015-01-04 07:06:53 +00:00
David Majnemer	c8a576b5c0	InstCombine: Detect when llvm.umul.with.overflow always overflows We know overflow always occurs if both ~LHSKnownZero * ~RHSKnownZero and LHSKnownOne * RHSKnownOne overflow. llvm-svn: 225077	2015-01-02 07:29:47 +00:00
Chandler Carruth	24ac830d7c	[SROA] Teach SROA to be more aggressive in splitting now that we have a pre-splitting pass over loads and stores. Historically, splitting could cause enough problems that I hamstrung the entire process with a requirement that splittable integer loads and stores must cover the entire alloca. All smaller loads and stores were unsplittable to prevent chaos from ensuing. With the new pre-splitting logic that does load/store pair splitting I introduced in r225061, we can now very nicely handle arbitrarily splittable loads and stores. In order to fully benefit from these smarts, we need to mark all of the integer loads and stores as splittable. However, we don't actually want to rewrite partitions with all integer loads and stores marked as splittable. This will fail to extract scalar integers from aggregates, which is kind of the point of SROA. =] In order to resolve this, what we really want to do is only do pre-splitting on the alloca slices with integer loads and stores fully splittable. This allows us to uncover all non-integer uses of the alloca that would benefit from a split in an integer load or store (and where introducing the split is safe because it is just memory transfer from a load to a store). Once done, we make all the non-whole-alloca integer loads and stores unsplittable just as they have historically been, repartition and rewrite. The result is that when there are integer loads and stores anywhere within an alloca (such as from a memcpy of a sub-object of a larger object), we can split them up if there are non-integer components to the aggregate hiding beneath. I've added the challenging test cases to demonstrate how this is able to promote to scalars even a case where we have even partially overlapping loads and stores. This restores the single-store behavior for small arrays of i8s which is really nice. I've restored both the little endian testing and big endian testing for these exactly as they were prior to r225061. It also forced me to be more aggressive in an alignment test to actually defeat SROA. =] Without the added volatiles there, we actually split up the weird i16 loads and produce nice double allocas with better alignment. This also uncovered a number of bugs where we failed to handle splittable load and store slices which didn't have a begininng offset of zero. Those fixes are included, and without them the existing test cases explode in glorious fireworks. =] I've kept support for leaving whole-alloca integer loads and stores as splittable even for the purpose of rewriting, but I think that's likely no longer needed. With the new pre-splitting, we might be able to remove all the splitting support for loads and stores from the rewriter. Not doing that in this patch to try to isolate any performance regressions that causes in an easy to find and revert chunk. llvm-svn: 225074	2015-01-02 03:55:54 +00:00
Chandler Carruth	e65ae89327	[SROA] Add a test case for r225068 / PR22080. llvm-svn: 225070	2015-01-02 00:34:29 +00:00
Chandler Carruth	0715cba02d	[SROA] Teach SROA how to much more intelligently handle split loads and stores. When there are accesses to an entire alloca with an integer load or store as well as accesses to small pieces of the alloca, SROA splits up the large integer accesses. In order to do that, it uses bit math to merge the small accesses into large integers. While this is effective, it produces insane IR that can cause significant problems in the rest of the optimizer: - It can cause load and store mismatches with GVN on the non-alloca side where we end up loading an i64 (or some such) rather than loading specific elements that are stored. - We can't always get rid of the integer bit math, which is why we can't always fix the loads and stores to work well with GVN. - This is especially bad when we have operations that mix poorly with integer bit math such as floating point operations. - It will block things like the vectorizer which might be able to handle the scalar stores that underly the aggregate. At the same time, we can't just directly split up these loads and stores in all cases. If there is actual integer arithmetic involved on the values, then using integer bit math is actually the perfect lowering because we can often combine it heavily with the surrounding math. The solution this patch provides is to find places where SROA is partitioning aggregates into small elements, and look for splittable loads and stores that it can split all the way to some other adjacent load and store. These are uniformly the cases where failing to split the loads and stores hurts the optimizer that I have seen, and I've looked extensively at the code produced both from more and less aggressive approaches to this problem. However, it is quite tricky to actually do this in SROA. We may have loads and stores to the same alloca, or other complex patterns that are hard to handle. This complexity leads to the somewhat subtle algorithm implemented here. We have to do this entire process as a separate pass over the partitioning of the alloca, and split up all of the loads prior to splitting the stores so that we can handle safely the cases of overlapping, including partially overlapping, loads and stores to the same alloca. We also have to reconstitute the post-split slice configuration so we can avoid iterating again over all the alloca uses (the slow part of SROA). But we also have to ensure that when we split up loads and stores to other allocas, we do re-iterate over them in SROA to adapt to the more refined partitioning now required. With this, I actually think we can fix a long-standing TODO in SROA where I avoided splitting as many loads and stores as probably should be splittable. This limitation historically mitigated the fallout of all the bad things mentioned above. Now that we have more intelligent handling, I plan to remove the FIXME and more aggressively mark integer loads and stores as splittable. I'll do that in a follow-up patch to help with bisecting any fallout. The net result of this change should be more fine-grained and accurate scalars being formed out of aggregates. At the very least, Clang now generates perfect code for this high-level test case using std::complex<float>: #include <complex> void g1(std::complex<float> &x, float a, float b) { x += std::complex<float>(a, b); } void g2(std::complex<float> &x, float a, float b) { x -= std::complex<float>(a, b); } void foo(const std::complex<float> &x, float a, float b, std::complex<float> &x1, std::complex<float> &x2) { std::complex<float> l1 = x; g1(l1, a, b); std::complex<float> l2 = x; g2(l2, a, b); x1 = l1; x2 = l2; } This code isn't just hypothetical either. It was reduced out of the hot inner loops of essentially every part of the Eigen math library when using std::complex<float>. Those loops would consistently and pervasively hop between the floating point unit and the integer unit due to bit math extraction and insertion of floating point values that were "stored" in a 64-bit integer register around the loop backedge. So far, this change has passed a bootstrap and I have done some other testing and so far, no issues. That doesn't mean there won't be though, so I'll be prepared to help with any fallout. If you performance swings in particular, please let me know. I'm very curious what all the impact of this change will be. Stay tuned for the follow-up to also split more integer loads and stores. llvm-svn: 225061	2015-01-01 11:54:38 +00:00
Sanjay Patel	e68f71574f	InstCombine: fsub nsz 0, X ==> fsub nsz -0.0, X Some day the backend may handle instruction-level fast math flags and make this transform unnecessary, but it's still better practice to use the canonical representation of fneg when possible (use a -0.0). This is a partial fix for PR20870 ( http://llvm.org/bugs/show_bug.cgi?id=20870 ). See also http://reviews.llvm.org/D6723. Differential Revision: http://reviews.llvm.org/D6731 llvm-svn: 225050	2014-12-31 22:14:05 +00:00
David Majnemer	f89dc3edc9	InstCombine: try to transform A-B < 0 into A < B We are allowed to move the 'B' to the right hand side if we an prove there is no signed overflow and if the comparison itself is signed. llvm-svn: 225034	2014-12-31 04:21:41 +00:00
Philip Reames	9db26ffc9a	Carry facts about nullness and undef across GC relocation This change implements four basic optimizations: If a relocated value isn't used, it doesn't need to be relocated. If the value being relocated is null, relocation doesn't change that. (Technically, this might be collector specific. I don't know of one which it doesn't work for though.) If the value being relocated is undef, the relocation is meaningless. If the value being relocated was known nonnull, the relocated pointer also isn't null. (Since it points to the same source language object.) I outlined other planned work in comments. Differential Revision: http://reviews.llvm.org/D6600 llvm-svn: 224968	2014-12-29 23:27:30 +00:00
Philip Reames	b35f46ce06	Refine the notion of MayThrow in LICM to include a header specific version In LICM, we have a check for an instruction which is guaranteed to execute and thus can't introduce any new faults if moved to the preheader. To handle a function which might unconditionally throw when first called, we check for any potentially throwing call in the loop and give up. This is unfortunate when the potentially throwing condition is down a rare path. It prevents essentially all LICM of potentially faulting instructions where the faulting condition is checked outside the loop. It also greatly diminishes the utility of loop unswitching since control dependent instructions - which are now likely in the loops header block - will not be lifted by subsequent LICM runs. define void @nothrow_header(i64 %x, i64 %y, i1 %cond) { ; CHECK-LABEL: nothrow_header ; CHECK-LABEL: entry ; CHECK: %div = udiv i64 %x, %y ; CHECK-LABEL: loop ; CHECK: call void @use(i64 %div) entry: br label %loop loop: ; preds = %entry, %for.inc %div = udiv i64 %x, %y br i1 %cond, label %loop-if, label %exit loop-if: call void @use(i64 %div) br label %loop exit: ret void } The current patch really only helps with non-memory instructions (i.e. divs, etc..) since the maythrow call down the rare path will be considered to alias an otherwise hoistable load. The one exception is that it does kick in for loads which are known to be invariant without regard to other possible stores, i.e. those marked with either !invarant.load metadata of tbaa 'is constant memory' metadata. Differential Revision: http://reviews.llvm.org/D6725 llvm-svn: 224965	2014-12-29 23:00:57 +00:00
Philip Reames	5ad26c353c	Loading from null is valid outside of addrspace 0 This patches fixes a miscompile where we were assuming that loading from null is undefined and thus we could assume it doesn't happen. This transform is perfectly legal in address space 0, but is not neccessarily legal in other address spaces. We really should introduce a hook to control this property on a per target per address space basis. We may be loosing valuable optimizations in some address spaces by being too conservative. Original patch by Thomas P Raoux (submitted to llvm-commits), tests and formatting fixes by me. llvm-svn: 224961	2014-12-29 22:46:21 +00:00
David Majnemer	b1296ec0fd	InstCombine: Infer nuw for multiplies A multiply cannot unsigned wrap if there are bitwidth, or more, leading zero bits between the two operands. llvm-svn: 224849	2014-12-26 09:50:35 +00:00
David Majnemer	54c2ca2539	InstCombe: Infer nsw for multiplies We already utilize this logic for reducing overflow intrinsics, it makes sense to reuse it for normal multiplies as well. llvm-svn: 224847	2014-12-26 09:10:14 +00:00
Michael Kuperstein	be8032c875	[ValueTracking] Move GlobalAlias handling to be after the max depth check in computeKnownBits() GlobalAlias handling used to be after GlobalValue handling, which meant it was, in practice, dead code. r220165 moved GlobalAlias handling to be before GlobalValue handling, but also moved it to be before the max depth check, causing an assert due to a recursion depth limit violation. This moves GlobalAlias handling forward to where it's safe, and changes the GlobalValue handling to only look at GlobalObjects. Differential Revision: http://reviews.llvm.org/D6758 llvm-svn: 224765	2014-12-23 11:33:41 +00:00
Michael Liao	5313da3263	[SimplifyCFG] Revise common code sinking - Fix the case where more than 1 common instructions derived from the same operand cannot be sunk. When a pair of value has more than 1 derived values in both branches, only 1 derived value could be sunk. - Replace BB1 -> (BB2, PN) map with joint value map, i.e. map of (BB1, BB2) -> PN, which is more accurate to track common ops. llvm-svn: 224757	2014-12-23 08:26:55 +00:00
Bruno Cardoso Lopes	bad65c3b70	[LCSSA] Handle PHI insertion in disjoint loops Take two disjoint Loops L1 and L2. LoopSimplify fails to simplify some loops (e.g. when indirect branches are involved). In such situations, it can happen that an exit for L1 is the header of L2. Thus, when we create PHIs in one of such exits we are also inserting PHIs in L2 header. This could break LCSSA form for L2 because these inserted PHIs can also have uses in L2 exits, which are never handled in the current implementation. Provide a fix for this corner case and test that we don't assert/crash on that. Differential Revision: http://reviews.llvm.org/D6624 rdar://problem/19166231 llvm-svn: 224740	2014-12-22 22:35:46 +00:00
David Majnemer	6eed0e0d20	This should have been part of r224676. llvm-svn: 224677	2014-12-20 04:48:34 +00:00
David Majnemer	b0362e4ee6	InstCombine: Squash an icmp+select into bitwise arithmetic (X & INT_MIN) == 0 ? X ^ INT_MIN : X into X \| INT_MIN (X & INT_MIN) != 0 ? X ^ INT_MIN : X into X & INT_MAX This fixes PR21993. llvm-svn: 224676	2014-12-20 04:45:35 +00:00
David Majnemer	0b6a0b0257	InstSimplify: Optimize away pointless comparisons (X & INT_MIN) ? X & INT_MAX : X into X & INT_MAX (X & INT_MIN) ? X : X & INT_MAX into X (X & INT_MIN) ? X \| INT_MIN : X into X (X & INT_MIN) ? X : X \| INT_MIN into X \| INT_MIN llvm-svn: 224669	2014-12-20 03:04:38 +00:00
Bruno Cardoso Lopes	f6cf8ad4e5	Reapply: [InstCombine] Fix visitSwitchInst to use right operand types for sub cstexpr The visitSwitchInst generates SUB constant expressions to recompute the switch condition. When truncating the condition to a smaller type, SUB expressions should use the previous type (before trunc) for both operands. Also, fix code to also return the modified switch when only the truncation is performed. This fixes an assertion crash. Differential Revision: http://reviews.llvm.org/D6644 rdar://problem/19191835 llvm-svn: 224588	2014-12-19 17:12:35 +00:00
Sanjay Patel	ea3c802887	use -0.0 when creating an fneg instruction Backends recognize (-0.0 - X) as the canonical form for fneg and produce better code. Eg, ppc64 with 0.0: lis r2, ha16(LCPI0_0) lfs f0, lo16(LCPI0_0)(r2) fsubs f1, f0, f1 blr vs. -0.0: fneg f1, f1 blr Differential Revision: http://reviews.llvm.org/D6723 llvm-svn: 224583	2014-12-19 16:44:08 +00:00
Bruno Cardoso Lopes	3be15b2fa6	Revert "[InstCombine] Fix visitSwitchInst to use right operand types for sub cstexpr" Reverts commit r224574 to appease buildbots: The visitSwitchInst generates SUB constant expressions to recompute the switch condition. When truncating the condition to a smaller type, SUB expressions should use the previous type (before trunc) for both operands. This fixes an assertion crash. llvm-svn: 224576	2014-12-19 14:36:24 +00:00
Bruno Cardoso Lopes	c9005f2f2b	[InstCombine] Fix visitSwitchInst to use right operand types for sub cstexpr The visitSwitchInst generates SUB constant expressions to recompute the switch condition. When truncating the condition to a smaller type, SUB expressions should use the previous type (before trunc) for both operands. This fixes an assertion crash. Differential Revision: http://reviews.llvm.org/D6644 rdar://problem/19191835 llvm-svn: 224574	2014-12-19 14:23:15 +00:00
David Majnemer	824e011ad7	ConstantFold: Shifting undef by zero results in undef llvm-svn: 224553	2014-12-18 23:54:43 +00:00
Suyog Sarda	43fae93da8	Revert 224119 "This patch recognizes (+ (+ v0, v1) (+ v2, v3)), reorders them for bundling into vector of loads, and vectorizes it." This was re-ordering floating point data types resulting in mismatch in output. llvm-svn: 224424	2014-12-17 10:34:27 +00:00
Elena Demikhovsky	028e966a54	Added 5 more tests related to sink store revision 224247 - by Ella Bolshinsky http://reviews.llvm.org/D6420 llvm-svn: 224418	2014-12-17 08:12:59 +00:00
Erik Eckstein	a451b9b0b5	Strength reduce intrinsics with overflow into regular arithmetic operations if possible. Some intrinsics, like s/uadd.with.overflow and umul.with.overflow, are already strength reduced. This change adds other arithmetic intrinsics: s/usub.with.overflow, smul.with.overflow. It completes the work on PR20194. llvm-svn: 224417	2014-12-17 07:29:19 +00:00
David Majnemer	65c52ae8ca	InstSimplify: shl nsw/nuw undef, %V -> undef We can always choose an value for undef which might cause %V to shift out an important bit except for one case, when %V is zero. However, shl behaves like an identity function when the right hand side is zero. llvm-svn: 224405	2014-12-17 01:54:33 +00:00
Elena Demikhovsky	f5b72afff4	Masked Load and Store Intrinsics in loop vectorizer. The loop vectorizer optimizes loops containing conditional memory accesses by generating masked load and store intrinsics. This decision is target dependent. http://reviews.llvm.org/D6527 llvm-svn: 224334	2014-12-16 11:50:42 +00:00
Sanjoy Das	4555b6d870	Teach ScalarEvolution to exploit min and max expressions when proving isKnownPredicate. The motivation for this change is to optimize away checks in loops like this: limit = min(t, len) for (i = 0 to limit) if (i >= len \|\| i < 0) throw_array_of_of_bounds(); a[i] = ... Differential Revision: http://reviews.llvm.org/D6635 llvm-svn: 224285	2014-12-15 22:50:15 +00:00
Duncan P. N. Exon Smith	be7ea19b58	IR: Make metadata typeless in assembly Now that `Metadata` is typeless, reflect that in the assembly. These are the matching assembly changes for the metadata/value split in r223802. - Only use the `metadata` type when referencing metadata from a call intrinsic -- i.e., only when it's used as a `Value`. - Stop pretending that `ValueAsMetadata` is wrapped in an `MDNode` when referencing it from call intrinsics. So, assembly like this: define @foo(i32 %v) { call void @llvm.foo(metadata !{i32 %v}, metadata !0) call void @llvm.foo(metadata !{i32 7}, metadata !0) call void @llvm.foo(metadata !1, metadata !0) call void @llvm.foo(metadata !3, metadata !0) call void @llvm.foo(metadata !{metadata !3}, metadata !0) ret void, !bar !2 } !0 = metadata !{metadata !2} !1 = metadata !{i32* @global} !2 = metadata !{metadata !3} !3 = metadata !{} turns into this: define @foo(i32 %v) { call void @llvm.foo(metadata i32 %v, metadata !0) call void @llvm.foo(metadata i32 7, metadata !0) call void @llvm.foo(metadata i32* @global, metadata !0) call void @llvm.foo(metadata !3, metadata !0) call void @llvm.foo(metadata !{!3}, metadata !0) ret void, !bar !2 } !0 = !{!2} !1 = !{i32* @global} !2 = !{!3} !3 = !{} I wrote an upgrade script that handled almost all of the tests in llvm and many of the tests in cfe (even handling many `CHECK` lines). I've attached it (or will attach it in a moment if you're speedy) to PR21532 to help everyone update their out-of-tree testcases. This is part of PR21532. llvm-svn: 224257	2014-12-15 19:07:53 +00:00
Elena Demikhovsky	bf74736290	Added a test related to 224247 revision llvm-svn: 224248	2014-12-15 14:14:10 +00:00
Suyog Sarda	2b27fc78a2	Typo Correction in Test Case. NFC. llvm-svn: 224244	2014-12-15 12:19:46 +00:00
Ahmed Bougacha	0cb861634b	Reapply "[ARM] Combine base-updating/post-incrementing vector load/stores." r223862 tried to also combine base-updating load/stores. r224198 reverted it, as "it created a regression on the test-suite on test MultiSource/Benchmarks/Ptrdist/anagram by scrambling the order in which the words are shown." Reapply, with a fix to ignore non-normal load/stores. Truncstores are handled elsewhere (you can actually write a pattern for those, whereas for postinc loads you can't, since they return two values), but it should be possible to also combine extloads base updates, by checking that the memory (rather than result) type is of the same size as the addend. Original commit message: We used to only combine intrinsics, and turn them into VLD1_UPD/VST1_UPD when the base pointer is incremented after the load/store. We can do the same thing for generic load/stores. Note that we can only combine the first load/store+adds pair in a sequence (as might be generated for a v16f32 load for instance), because other combines turn the base pointer addition chain (each computing the address of the next load, from the address of the last load) into independent additions (common base pointer + this load's offset). Differential Revision: http://reviews.llvm.org/D6585 llvm-svn: 224203	2014-12-13 23:22:12 +00:00
Renato Golin	df8f9b6dc9	Revert "[ARM] Combine base-updating/post-incrementing vector load/stores." This reverts commit r223862, as it created a regression on the test-suite on test MultiSource/Benchmarks/Ptrdist/anagram by scrambling the order in which the words are shown. We'll investigate the issue and re-apply when safe. llvm-svn: 224198	2014-12-13 20:23:18 +00:00
David Majnemer	9b6097586c	ValueTracking: Don't recurse too deeply in computeKnownBitsFromAssume Respect the MaxDepth recursion limit, doing otherwise will trigger an assert in computeKnownBits. This fixes PR21891. llvm-svn: 224168	2014-12-12 23:59:29 +00:00
Suyog Sarda	384095e65c	This patch recognizes (+ (+ v0, v1) (+ v2, v3)), reorders them for bundling into vector of loads, and vectorizes it. Test case : float hadd(float* a) { return (a[0] + a[1]) + (a[2] + a[3]); } AArch64 assembly before patch : ldp s0, s1, [x0] ldp s2, s3, [x0, #8] fadd s0, s0, s1 fadd s1, s2, s3 fadd s0, s0, s1 ret AArch64 assembly after patch : ldp d0, d1, [x0] fadd v0.2s, v0.2s, v1.2s faddp s0, v0.2s ret Reviewed Link : http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20141208/248531.html llvm-svn: 224119	2014-12-12 12:53:44 +00:00
Steven Wu	881916dea5	Fix another infinite loop in InstCombine Summary: InstCombine infinite-loops for the testcase added It is because InstCombine is generating instructions that can be optimized by itself. Fix by not optimizing frem if the optimized type is the same as original type. rdar://problem/19150820 Reviewers: majnemer Differential Revision: http://reviews.llvm.org/D6634 llvm-svn: 224097	2014-12-12 04:34:07 +00:00
Andrea Di Biagio	72b05aa59c	[InstCombine][X86] Improved folding of calls to Intrinsic::x86_sse4a_insertqi. This patch teaches the instruction combiner how to fold a call to 'insertqi' if the 'length field' (3rd operand) is set to zero, and if the sum between field 'length' and 'bit index' (4th operand) is bigger than 64. From the AMD64 Architecture Programmer's Manual: 1. If the sum of the bit index + length field is greater than 64, then the results are undefined; 2. A value of zero in the field length is defined as a length of 64. This patch improves the existing combining logic for intrinsic 'insertqi' adding extra checks to address both point 1. and point 2. Differential Revision: http://reviews.llvm.org/D6583 llvm-svn: 224054	2014-12-11 20:44:59 +00:00
David Majnemer	f532fcb889	InstSimplify: Remove usesless %a parameter from tests No functional change intended. llvm-svn: 224016	2014-12-11 12:56:17 +00:00
Michael Kuperstein	fffb6996c9	The inliner needs to fix up debug information for llvm.dbg.declare, not only for llvm.dbg.value. Patch by Amjad Aboud Differential Revision: http://reviews.llvm.org/D6525 llvm-svn: 224015	2014-12-11 12:41:10 +00:00
David Majnemer	5a7717e498	ConstantFold, InstSimplify: undef >>a x can be either -1 or 0, choose 0 Zero is usually a nicer constant to have than -1. llvm-svn: 223969	2014-12-10 21:58:15 +00:00
David Majnemer	89cf6d79eb	ConstantFold: an undef shift amount results in undef X shifted by undef results in undef because the undef value can represent values greater than the width of the operands. llvm-svn: 223968	2014-12-10 21:38:05 +00:00
David Majnemer	7b86b77248	ConstantFold: div undef, 0 should fold to undef, not zero Dividing by zero yields an undefined value. llvm-svn: 223924	2014-12-10 09:14:55 +00:00
David Majnemer	ae707582c0	InstSimplify: [al]shr exact undef, %X -> undef Exact shifts always keep the non-zero bits of their input. This means it keeps it's undef bits. llvm-svn: 223923	2014-12-10 09:14:52 +00:00
David Majnemer	71dc8fb867	InstSimplify: div %X, 0 -> undef We already optimized rem %X, 0 to undef, we should do the same for div. llvm-svn: 223919	2014-12-10 07:52:18 +00:00
Ahmed Bougacha	7efbac74ec	[ARM] Combine base-updating/post-incrementing vector load/stores. We used to only combine intrinsics, and turn them into VLD1_UPD/VST1_UPD when the base pointer is incremented after the load/store. We can do the same thing for generic load/stores. Note that we can only combine the first load/store+adds pair in a sequence (as might be generated for a v16f32 load for instance), because other combines turn the base pointer addition chain (each computing the address of the next load, from the address of the last load) into independent additions (common base pointer + this load's offset). Differential Revision: http://reviews.llvm.org/D6585 llvm-svn: 223862	2014-12-10 00:07:37 +00:00
Chandler Carruth	a7f247ea56	Revert r223764 which taught instcombine about integer-based elment extraction patterns. This is causing Clang to miscompile itself for 32-bit x86 somehow, and likely also on ARM and PPC. I really don't know how, but reverting now that I've confirmed this is actually the culprit. I have a reproduction as well and so should be able to restore this shortly. This reverts commit r223764. Original commit log follows: Teach instcombine to canonicalize "element extraction" from a load of an integer and "element insertion" into a store of an integer into actual element extraction, element insertion, and vector loads and stores. Previously various parts of LLVM (including instcombine itself) would introduce integer loads and stores into the code as a way of opaquely loading and storing "bits". In some cases (such as a memcpy of std::complex<float> object) we will eventually end up using those bits in non-integer types. In order for SROA to effectively promote the allocas involved, it splits these "store a bag of bits" integer loads and stores up into the constituent parts. However, for non-alloca loads and tsores which remain, it uses integer math to recombine the values into a large integer to load or store. All of this would be "fine", except that it forces LLVM to go through integer math to combine and split up values. While this makes perfect sense for integers (and in fact is critical for bitfields to end up lowering efficiently) it is terrible for non-integer types, especially floating point types. We have a much more canonical way of representing the act of concatenating the bits of two SSA values in LLVM: a vector and insertelement. This patch teaching InstCombine to use this representation. With this patch applied, LLVM will no longer introduce integer math into the critical path of every loop over std::complex<float> operations such as those that make up the hot path of ... oh, most HPC code, Eigen, and any other heavy linear algebra library. For the record, I looked extensively at fixing this in other parts of the compiler, but it just doesn't work: - We really do want to canonicalize memcpy and other bit-motion to integer loads and stores. SSA values are tremendously more powerful than "copy" intrinsics. Not doing this regresses massive amounts of LLVM's scalar optimizer. - We really do need to split up integer loads and stores of this form in SROA or every memcpy of a trivially copyable struct will prevent SSA formation of the members of that struct. It essentially turns off SROA. - The closest alternative is to actually split the loads and stores when partitioning with SROA, but this has all of the downsides historically discussed of splitting up loads and stores -- the wide-store information is fundamentally lost. We would also see performance regressions for bitfield-heavy code and other places where the integers aren't really intended to be split without seemingly arbitrary logic to treat integers totally differently. - We can effectively fix this in instcombine, so it isn't that hard of a choice to make IMO. llvm-svn: 223813	2014-12-09 19:21:16 +00:00
Duncan P. N. Exon Smith	5bf8fef580	IR: Split Metadata from Value Split `Metadata` away from the `Value` class hierarchy, as part of PR21532. Assembly and bitcode changes are in the wings, but this is the bulk of the change for the IR C++ API. I have a follow-up patch prepared for `clang`. If this breaks other sub-projects, I apologize in advance :(. Help me compile it on Darwin I'll try to fix it. FWIW, the errors should be easy to fix, so it may be simpler to just fix it yourself. This breaks the build for all metadata-related code that's out-of-tree. Rest assured the transition is mechanical and the compiler should catch almost all of the problems. Here's a quick guide for updating your code: - `Metadata` is the root of a class hierarchy with three main classes: `MDNode`, `MDString`, and `ValueAsMetadata`. It is distinct from the `Value` class hierarchy. It is typeless -- i.e., instances do not have a `Type`. - `MDNode`'s operands are all `Metadata ` (instead of `Value `). - `TrackingVH<MDNode>` and `WeakVH` referring to metadata can be replaced with `TrackingMDNodeRef` and `TrackingMDRef`, respectively. If you're referring solely to resolved `MDNode`s -- post graph construction -- just use `MDNode`. - `MDNode` (and the rest of `Metadata`) have only limited support for `replaceAllUsesWith()`. As long as an `MDNode` is pointing at a forward declaration -- the result of `MDNode::getTemporary()` -- it maintains a side map of its uses and can RAUW itself. Once the forward declarations are fully resolved RAUW support is dropped on the ground. This means that uniquing collisions on changing operands cause nodes to become "distinct". (This already happened fairly commonly, whenever an operand went to null.) If you're constructing complex (non self-reference) `MDNode` cycles, you need to call `MDNode::resolveCycles()` on each node (or on a top-level node that somehow references all of the nodes). Also, don't do that. Metadata cycles (and the RAUW machinery needed to construct them) are expensive. - An `MDNode` can only refer to a `Constant` through a bridge called `ConstantAsMetadata` (one of the subclasses of `ValueAsMetadata`). As a side effect, accessing an operand of an `MDNode` that is known to be, e.g., `ConstantInt`, takes three steps: first, cast from `Metadata` to `ConstantAsMetadata`; second, extract the `Constant`; third, cast down to `ConstantInt`. The eventual goal is to introduce `MDInt`/`MDFloat`/etc. and have metadata schema owners transition away from using `Constant`s when the type isn't important (and they don't care about referring to `GlobalValue`s). In the meantime, I've added transitional API to the `mdconst` namespace that matches semantics with the old code, in order to avoid adding the error-prone three-step equivalent to every call site. If your old code was: MDNode N = foo(); bar(isa <ConstantInt>(N->getOperand(0))); baz(cast <ConstantInt>(N->getOperand(1))); bak(cast_or_null <ConstantInt>(N->getOperand(2))); bat(dyn_cast <ConstantInt>(N->getOperand(3))); bay(dyn_cast_or_null<ConstantInt>(N->getOperand(4))); you can trivially match its semantics with: MDNode N = foo(); bar(mdconst::hasa <ConstantInt>(N->getOperand(0))); baz(mdconst::extract <ConstantInt>(N->getOperand(1))); bak(mdconst::extract_or_null <ConstantInt>(N->getOperand(2))); bat(mdconst::dyn_extract <ConstantInt>(N->getOperand(3))); bay(mdconst::dyn_extract_or_null<ConstantInt>(N->getOperand(4))); and when you transition your metadata schema to `MDInt`: MDNode N = foo(); bar(isa <MDInt>(N->getOperand(0))); baz(cast <MDInt>(N->getOperand(1))); bak(cast_or_null <MDInt>(N->getOperand(2))); bat(dyn_cast <MDInt>(N->getOperand(3))); bay(dyn_cast_or_null<MDInt>(N->getOperand(4))); - A `CallInst` -- specifically, intrinsic instructions -- can refer to metadata through a bridge called `MetadataAsValue`. This is a subclass of `Value` where `getType()->isMetadataTy()`. `MetadataAsValue` is the only class that can legally refer to a `LocalAsMetadata`, which is a bridged form of non-`Constant` values like `Argument` and `Instruction`. It can also refer to any other `Metadata` subclass. (I'll break all your testcases in a follow-up commit, when I propagate this change to assembly.) llvm-svn: 223802	2014-12-09 18:38:53 +00:00
Sonam Kumari	72ccc3c428	Removal Of Duplicate Test Cases and Addition Of Missing Check Statements llvm-svn: 223768	2014-12-09 10:46:38 +00:00
Ankur Garg	51eeba70da	[test/Transforms/InstCombine/shift.ll] Removed duplicate test cases. NFC. Removed some duplicate test cases from the file /test/Transforms/InstCombine/shift.ll. test54 and test57 were duplicates of each other. test55 and test58 were duplicates of each other. (Removed test57 and test58) llvm-svn: 223767	2014-12-09 10:35:19 +00:00
Chandler Carruth	7415205113	Teach instcombine to canonicalize "element extraction" from a load of an integer and "element insertion" into a store of an integer into actual element extraction, element insertion, and vector loads and stores. Previously various parts of LLVM (including instcombine itself) would introduce integer loads and stores into the code as a way of opaquely loading and storing "bits". In some cases (such as a memcpy of std::complex<float> object) we will eventually end up using those bits in non-integer types. In order for SROA to effectively promote the allocas involved, it splits these "store a bag of bits" integer loads and stores up into the constituent parts. However, for non-alloca loads and tsores which remain, it uses integer math to recombine the values into a large integer to load or store. All of this would be "fine", except that it forces LLVM to go through integer math to combine and split up values. While this makes perfect sense for integers (and in fact is critical for bitfields to end up lowering efficiently) it is terrible for non-integer types, especially floating point types. We have a much more canonical way of representing the act of concatenating the bits of two SSA values in LLVM: a vector and insertelement. This patch teaching InstCombine to use this representation. With this patch applied, LLVM will no longer introduce integer math into the critical path of every loop over std::complex<float> operations such as those that make up the hot path of ... oh, most HPC code, Eigen, and any other heavy linear algebra library. For the record, I looked extensively at fixing this in other parts of the compiler, but it just doesn't work: - We really do want to canonicalize memcpy and other bit-motion to integer loads and stores. SSA values are tremendously more powerful than "copy" intrinsics. Not doing this regresses massive amounts of LLVM's scalar optimizer. - We really do need to split up integer loads and stores of this form in SROA or every memcpy of a trivially copyable struct will prevent SSA formation of the members of that struct. It essentially turns off SROA. - The closest alternative is to actually split the loads and stores when partitioning with SROA, but this has all of the downsides historically discussed of splitting up loads and stores -- the wide-store information is fundamentally lost. We would also see performance regressions for bitfield-heavy code and other places where the integers aren't really intended to be split without seemingly arbitrary logic to treat integers totally differently. - We can effectively fix this in instcombine, so it isn't that hard of a choice to make IMO. Differential Revision: http://reviews.llvm.org/D6548 llvm-svn: 223764	2014-12-09 08:55:32 +00:00
David Majnemer	d5b3aa49ac	InstSimplify: Try to bring back the rest of r223583 This reverts r223624 with a small tweak, hopefully this will make stage3 equivalent. llvm-svn: 223679	2014-12-08 18:30:43 +00:00
Sonam Kumari	90d266c0a9	Removal Of Duplicate Test Case from shift.ll file llvm-svn: 223648	2014-12-08 09:40:43 +00:00
NAKAMURA Takumi	2b6e662672	Revert a part of r223583, for now. It seems causing different emission between stage2(gcc-clang) and stage3 clang. Investigating. llvm-svn: 223624	2014-12-08 02:07:22 +00:00
David Majnemer	1af36e5baf	InstSimplify: Optimize away useless unsigned comparisons Code like X < Y && Y == 0 should always be folded away to false. llvm-svn: 223583	2014-12-06 10:51:40 +00:00
Duncan P. N. Exon Smith	da41af9e94	IR: Disallow complicated function-local metadata Disallow complex types of function-local metadata. The only valid function-local metadata is an `MDNode` whose sole argument is a non-metadata function-local value. Part of PR21532. llvm-svn: 223564	2014-12-06 01:26:49 +00:00
Hans Wennborg	67ead59108	Add some tests for SimplifyCFG's TurnSwitchRangeIntoICmp(). NFC. llvm-svn: 223396	2014-12-04 22:19:28 +00:00
Hans Wennborg	1096539335	Add some tests for SimplifyCFG's ConstantFoldTerminator(). NFC. llvm-svn: 223395	2014-12-04 22:19:25 +00:00
Philip Reames	5b3ce71b62	Add a test case for argument type coercion in an invoke of a vararg function This would have caught the bug I fixed in 223370. llvm-svn: 223378	2014-12-04 19:13:45 +00:00
Hal Finkel	aa19bafc9c	Revert "r223364 - Revert r223347 which has caused crashes on bootstrap bots." Reapply r223347, with a fix to not crash on uninserted instructions (or more precisely, instructions in uninserted blocks). bugpoint was able to reduce the test case somewhat, but it is still somewhat large (and relies on setting things up to be simplified during inlining), so I've not included it here. Nevertheless, it is clear what is going on and why. Original commit message: Restrict somewhat the memory-allocation pointer cmp opt from r223093 Based on review comments from Richard Smith, restrict this optimization from applying to globals that might resolve lazily to other dynamically-loaded modules, and also from dynamic allocas (which might be transformed into malloc calls). In short, take extra care that the compared-to pointer is really simultaneously live with the memory allocation. llvm-svn: 223371	2014-12-04 17:45:19 +00:00
Alexander Potapenko	76770e4930	Revert r223347 which has caused crashes on bootstrap bots. llvm-svn: 223364	2014-12-04 14:22:27 +00:00
Simon Pilgrim	be24ab367b	[InstCombine] Minor optimization for bswap with binary ops Added instcombine optimizations for BSWAP with AND/OR/XOR ops: OP( BSWAP(x), BSWAP(y) ) -> BSWAP( OP(x, y) ) OP( BSWAP(x), CONSTANT ) -> BSWAP( OP(x, BSWAP(CONSTANT) ) ) Since its just a one liner, I've also added BSWAP to the DAGCombiner equivalent as well: fold (OP (bswap x), (bswap y)) -> (bswap (OP x, y)) Refactored bswap-fold tests to use FileCheck instead of just checking that the bswaps had gone. Differential Revision: http://reviews.llvm.org/D6407 llvm-svn: 223349	2014-12-04 09:44:01 +00:00
Hal Finkel	8b24b32c44	Restrict somewhat the memory-allocation pointer cmp opt from r223093 Based on review comments from Richard Smith, restrict this optimization from applying to globals that might resolve lazily to other dynamically-loaded modules, and also from dynamic allocas (which might be transformed into malloc calls). In short, take extra care that the compared-to pointer is really simultaneously live with the memory allocation. llvm-svn: 223347	2014-12-04 09:22:28 +00:00
Matthias Braun	d34e4d2354	[SimplifyLibCalls] Improve double->float shrinking to consider constants This allows cases like float x; fmin(1.0, x); to be optimized to fminf(1.0f, x); rdar://19049359 Differential Revision: http://reviews.llvm.org/D6496 llvm-svn: 223270	2014-12-03 21:46:33 +00:00
Matthias Braun	892c923c46	[SimplifyLibCalls] Enable double to float shrinking for copysign rdar://19049359 Differential Revision: http://reviews.llvm.org/D6495 llvm-svn: 223269	2014-12-03 21:46:29 +00:00
Nick Lewycky	8ede1ef09b	Fix test to use the right metadata node (reapply r223239 plus a fix) and also to use the correct path to the GCNO file. llvm-svn: 223244	2014-12-03 17:32:44 +00:00
Alexander Potapenko	ff5326f0d1	Revert r223239, which broke some bots. llvm-svn: 223240	2014-12-03 16:03:08 +00:00
Alexander Potapenko	1b42ad9b43	Fix the metadata number used by llvm.gcov to match the number of the inserted metadata node. llvm-svn: 223239	2014-12-03 15:15:58 +00:00
Erik Eckstein	d181752be0	InstCombine: simplify signed range checks Try to convert two compares of a signed range check into a single unsigned compare. Examples: (icmp sge x, 0) & (icmp slt x, n) --> icmp ult x, n (icmp slt x, 0) \| (icmp sgt x, n) --> icmp ugt x, n llvm-svn: 223224	2014-12-03 10:39:15 +00:00
Tom Stellard	1f0dded057	StructurizeCFG: Use LoopInfo analysis for better loop detection We were assuming that each back-edge in a region represented a unique loop, which is not always the case. We need to use LoopInfo to correctly determine which back-edges are loops. llvm-svn: 223199	2014-12-03 04:28:32 +00:00
Nick Lewycky	2e8a6219fc	Emit the entry block first and the exit block second, then all the blocks in between afterwards. This is what gcc always does, and some out of tree tools depend on that. llvm-svn: 223193	2014-12-03 02:45:01 +00:00
Michael Zolotukhin	ea8327b80f	PR21302. Vectorize only bottom-tested loops. rdar://problem/18886083 llvm-svn: 223171	2014-12-02 22:59:06 +00:00
Michael Zolotukhin	540580ca06	Apply loop-rotate to several vectorizer tests. Such loops shouldn't be vectorized due to the loops form. After applying loop-rotate (+simplifycfg) the tests again start to check what they are intended to check. llvm-svn: 223170	2014-12-02 22:59:02 +00:00
Bruno Cardoso Lopes	15520db9ad	[SwitchLowering] Handle destinations on multiple phi instructions Follow up from r222926. Also handle multiple destinations from merged cases on multiple and subsequent phi instructions. rdar://problem/19106978 llvm-svn: 223135	2014-12-02 18:31:53 +00:00
Bruno Cardoso Lopes	d035fbb96f	[LICM] Avoind store sinking if no preheader is available Load instructions are inserted into loop preheaders when sinking stores and later removed if not used by the SSA updater. Avoid sinking if the loop has no preheader and avoid crashes. This fixes one more side effect of not handling indirectbr instructions properly on LoopSimplify. llvm-svn: 223119	2014-12-02 14:22:34 +00:00
Sonam Kumari	f2eacabd66	[signext.ll] Removal Of Duplicate Test Cases Removed the duplicate test case existing in signext.ll file. llvm-svn: 223109	2014-12-02 05:29:47 +00:00
Hal Finkel	afcd8dbbcf	Simplify pointer comparisons involving memory allocation functions System memory allocation functions, which are identified at the IR level by the noalias attribute on the return value, must return a pointer into a memory region disjoint from any other memory accessible to the caller. We can use this property to simplify pointer comparisons between allocated memory and local stack addresses and the addresses of global variables. Neither the stack nor global variables can overlap with the region used by the memory allocator. Fixes PR21556. llvm-svn: 223093	2014-12-01 23:38:06 +00:00
Reid Kleckner	35fc363ce8	Parse 'ghccc' in .ll files as the GHC convention (cc 10) Previously we just used "cc 10" in the .ll files, but that isn't very human readable. llvm-svn: 223076	2014-12-01 21:04:44 +00:00
Hans Wennborg	5bef5b522b	Revert r223049, r223050 and r223051 while investigating test failures. I didn't foresee affecting the Clang test suite :/ llvm-svn: 223054	2014-12-01 17:36:43 +00:00
Hans Wennborg	269ebb612e	SimplifyCFG: Omit range checks for switch lookup tables when default is unreachable They would get optimized away later, but we might as well not emit them. llvm-svn: 223051	2014-12-01 17:08:38 +00:00
Hans Wennborg	5a1e5c05d8	SimplifyCFG: don't remove unreachable default switch destinations An unreachable default destination can be exploited by other optimizations, and SDag lowering is now prepared to handle them efficiently. For example, branches to the unreachable destination will be optimized away, such as in the case of range checks for switch lookup tables. On 64-bit Linux, this reduces the size of a clang bootstrap by 80 kB (and Chromium by 30 kB). llvm-svn: 223050	2014-12-01 17:08:35 +00:00
Sonam Kumari	237cfa9916	Removed extra whitespace. (Testing commit access). NFC. llvm-svn: 222994	2014-12-01 09:27:46 +00:00
Rafael Espindola	a4b2ee4548	Relax an assert a bit to avoid a crash on unreachable code. Patch by Duncan Exon Smith with a small tweak by me. llvm-svn: 222984	2014-12-01 02:55:24 +00:00
Duncan P. N. Exon Smith	910f05d181	DebugIR: Delete -debug-ir llvm-svn: 222945	2014-11-29 03:15:47 +00:00
Duncan P. N. Exon Smith	9bc81fbe92	Revert "Masked Vector Load and Store Intrinsics." This reverts commit r222632 (and follow-up r222636), which caused a host of LNT failures on an internal bot. I'll respond to the commit on the list with a reproduction of one of the failures. Conflicts: lib/Target/X86/X86TargetTransformInfo.cpp llvm-svn: 222936	2014-11-28 21:29:14 +00:00
David Majnemer	3d6f80b619	InstCombine: FoldOrOfICmps harder We may be in a situation where the icmps might not be near each other in a tree of or instructions. Try to dig out related compare instructions and see if they combine. N.B. This won't fire on deep trees of compares because rewritting the tree might end up creating a net increase of IR. We may have to resort to something more sophisticated if this is a real problem. llvm-svn: 222928	2014-11-28 19:58:29 +00:00
Bruno Cardoso Lopes	46d5bf2982	[LICM] Store sink and indirectbr instructions Loop simplify skips exit-block insertion when exits contain indirectbr instructions. This leads to an assertion in LICM when trying to sink stores out of non-dedicated loop exits containing indirectbr instructions. This patch fix this issue by re-checking for dedicated exits in LICM prior to store sink attempts. Differential Revision: http://reviews.llvm.org/D6414 rdar://problem/18943047 llvm-svn: 222927	2014-11-28 19:47:46 +00:00
Bruno Cardoso Lopes	bc7ba2c766	[SwitchLowering] Handle multiple destinations on condensed case stmts Switch cases statements with sequential values that branch to the same destination BB may often be handled together in a single new source BB. In this scenario we need to remove remaining incoming values from PHI instructions in the destination BB, as to match the number of source branches. Differential Revision: http://reviews.llvm.org/D6415 rdar://problem/19040894 llvm-svn: 222926	2014-11-28 19:47:33 +00:00
Erik Eckstein	0d86c7623f	reinstate r222872: Peephole optimization in switch table lookup: reuse the guarding table comparison if possible. Fixed missing dominance check. Original commit message: This optimization tries to reuse the generated compare instruction, if there is a comparison against the default value after the switch. Example: if (idx < tablesize) r = table[idx]; // table does not contain default_value else r = default_value; if (r != default_value) ... Is optimized to: cond = idx < tablesize; if (cond) r = table[idx]; else r = default_value; if (cond) ... Jump threading will then eliminate the second if(cond). llvm-svn: 222891	2014-11-27 15:13:14 +00:00
Suyog Sarda	f8516e1662	Use FileCheck instead of grep. Change by Ankur Garg. Differential Revision: http://reviews.llvm.org/D6430 llvm-svn: 222879	2014-11-27 11:22:49 +00:00
Erik Eckstein	2190cd9ffa	Revert "Peephole optimization in switch table lookup: reuse the guarding table comparison if possible." It is breaking the clang bootstrag. llvm-svn: 222877	2014-11-27 10:59:08 +00:00
Suyog Sarda	c3024c75e0	Use FileCheck instead of grep. Change by Sonam. Differential Revision: http://reviews.llvm.org/D6432 llvm-svn: 222876	2014-11-27 10:57:24 +00:00
Erik Eckstein	e73e308ab9	Peephole optimization in switch table lookup: reuse the guarding table comparison if possible. This optimization tries to reuse the generated compare instruction, if there is a comparison against the default value after the switch. Example: if (idx < tablesize) r = table[idx]; // table does not contain default_value else r = default_value; if (r != default_value) ... Is optimized to: cond = idx < tablesize; if (cond) r = table[idx]; else r = default_value; if (cond) ... \endcode Jump threading will then eliminate the second if(cond). llvm-svn: 222872	2014-11-27 08:33:51 +00:00
David Majnemer	40157d5c4d	InstCombine: Restore optimizations lost in r210006 This restores our ability to optimize: (X & C) == 0 ? X ^ C : X into X \| C (X & C) != 0 ? X ^ C : X into X & ~C llvm-svn: 222871	2014-11-27 07:25:21 +00:00
David Majnemer	c6a5e1dd4f	InstSimplify: Restore optimizations lost in r210006 This restores our ability to optimize: (X & C) ? X & ~C : X into X & ~C (X & C) ? X : X & ~C into X (X & C) ? X \| C : X into X (X & C) ? X : X \| C into X \| C llvm-svn: 222868	2014-11-27 06:32:46 +00:00
David Majnemer	5468e86469	Revert "Added inst combine transforms for single bit tests from Chris's note" This reverts commit r210006, it miscompiled libapr which is used in who knows how many projects. A test has been added to ensure that we don't regress again. I'll work on a rewrite of what the optimization was trying to do later. llvm-svn: 222856	2014-11-26 23:00:38 +00:00
Hans Wennborg	bda193edff	Remove useless rdar:// comment from switch_to_lookup_table.ll test. llvm-svn: 222772	2014-11-25 18:45:23 +00:00
Hans Wennborg	45172aceb3	LazyValueInfo: Actually re-visit partially solved block-values in solveBlockValue() If solveBlockValue() needs results from predecessors that are not already computed, it returns false with the intention of resuming when the dependencies have been resolved. However, the computation would never be resumed since an 'overdefined' result had been placed in the cache, preventing any further computation. The point of placing the 'overdefined' result in the cache seems to have been to break cycles, but we can check for that when inserting work items in the BlockValue stack instead. This makes the "stop and resume" mechanism of solveBlockValue() work as intended, unlocking more analysis. Using this patch shaves 120 KB off a 64-bit Chromium build on Linux. I benchmarked compiling bzip2.c at -O2 but couldn't measure any difference in compile time. Tests by Jiangning Liu from r215343 / PR21238, Pete Cooper, and me. Differential Revision: http://reviews.llvm.org/D6397 llvm-svn: 222768	2014-11-25 17:23:05 +00:00
Chandler Carruth	816d26fe5e	[InstCombine] Change LLVM To canonicalize toward the value type being stored rather than the pointer type. This change is analogous to r220138 which changed the canonicalization for loads. The rationale is the same: memory does not have a type, operations (and thus the values they produce) have a type. We should match that type as closely as possible rather than reading some form of semantics into the pointer type. With this change, loads and stores should no longer be made with nonsensical types for the values that tehy load and store. This is particularly important when trying to match specific loaded and stored types in the process of doing other instcombines, which is what led me down this twisty maze of miscanonicalization. I've put quite some effort into looking through IR to find places where LLVM's optimizer was being unreasonably conservative in the face of mismatched load and store types, however it is possible (let's say, likely!) I have missed some. If you see regressions here, or from r220138, the likely cause is some part of LLVM failing to cope with load and store types differing. Test cases appreciated, it is important that we root all of these out of LLVM. llvm-svn: 222748	2014-11-25 10:09:51 +00:00
Suyog Sarda	99c9c1f2b0	Change the test case file to use FileCheck instead of grep. NFC. Change by Ankur Garg. Differential Revision: http://reviews.llvm.org/D6382 llvm-svn: 222740	2014-11-25 08:44:56 +00:00
Chandler Carruth	1a3c2c414c	Revert r220349 to re-instate r220277 with a fix for PR21330 -- quite clearly only exactly equal width ptrtoint and inttoptr casts are no-op casts, it says so right there in the langref. Make the code agree. Original log from r220277: Teach the load analysis to allow finding available values which require inttoptr or ptrtoint cast provided there is datalayout available. Eventually, the datalayout can just be required but in practice it will always be there today. To go with the ability to expose available values requiring a ptrtoint or inttoptr cast, helpers are added to perform one of these three casts. These smarts are necessary to finish canonicalizing loads and stores to the operational type requirements without regressing fundamental combines. I've added some test cases. These should actually improve as the load combining and store combining improves, but they may fundamentally be highlighting some missing combines for select in addition to exercising the specific added logic to load analysis. llvm-svn: 222739	2014-11-25 08:20:27 +00:00
David Majnemer	bd9ce4ea51	InstSimplify: Handle some simple tautological comparisons This handles cases where we are comparing a masked value against itself. The analysis could be further improved by making it recursive but such expense is not currently justified. llvm-svn: 222716	2014-11-25 02:55:48 +00:00
Matt Arsenault	238ff1ad1e	Bug 21610: Canonicalize min/max fcmp selects to use ordered comparisons llvm-svn: 222705	2014-11-24 23:15:18 +00:00
Matt Arsenault	ea515d33c9	Convert test to FileCheck and use CHECK-LABEL llvm-svn: 222704	2014-11-24 23:03:17 +00:00
David Majnemer	8e6f6a98b5	InstCombine: Don't create an unused instruction We would create an instruction but not inserting it. Not inserting the unused instruction would lead us to verification failure. This fixes PR21653. llvm-svn: 222659	2014-11-24 16:41:13 +00:00
David Majnemer	b2a6e7458d	InstCombine: Don't assume DataLayout is always available We tried to get the result of DataLayout::getLargestLegalIntTypeSize but we didn't have a DataLayout. This resulted in opt crashing. This fixes PR21651. llvm-svn: 222645	2014-11-24 07:26:20 +00:00
Elena Demikhovsky	9e5089a938	Masked Vector Load and Store Intrinsics. Introduced new target-independent intrinsics in order to support masked vector loads and stores. The loop vectorizer optimizes loops containing conditional memory accesses by generating these intrinsics for existing targets AVX2 and AVX-512. The vectorizer asks the target about availability of masked vector loads and stores. Added SDNodes for masked operations and lowering patterns for X86 code generator. Examples: <16 x i32> @llvm.masked.load.v16i32(i8* %addr, <16 x i32> %passthru, i32 4 /* align /, <16 x i1> %mask) declare void @llvm.masked.store.v8f64(i8 %addr, <8 x double> %value, i32 4, <8 x i1> %mask) Scalarizer for other targets (not AVX2/AVX-512) will be done in a separate patch. http://reviews.llvm.org/D6191 llvm-svn: 222632	2014-11-23 08:07:43 +00:00
David Majnemer	fb3805576b	InstCombine: Propagate exact for (sdiv X, Pow2) -> (udiv X, Pow2) llvm-svn: 222625	2014-11-22 20:00:41 +00:00
David Majnemer	ec6e481bc5	InstCombine: Propagate exact for (sdiv X, Y) -> (udiv X, Y) llvm-svn: 222624	2014-11-22 20:00:38 +00:00
David Majnemer	fa4699e65f	InstCombine: Propagate exact for (sdiv -X, C) -> (sdiv X, -C) llvm-svn: 222623	2014-11-22 20:00:34 +00:00
David Majnemer	a3aeb15613	InstCombine: Propagate exact in (udiv (lshr X,C1),C2) -> (udiv x,C1<<C2) llvm-svn: 222620	2014-11-22 18:16:54 +00:00
David Majnemer	546f81064c	InstCombine: Propagate NSW/NUW for X*(1<<Y) -> X<<Y llvm-svn: 222613	2014-11-22 08:57:02 +00:00
David Majnemer	8279a7506d	InstCombine: Propagate NSW for -X * -Y -> X * Y llvm-svn: 222612	2014-11-22 07:25:19 +00:00
David Majnemer	4efa9ff8ca	InstSimplify: Simplify (sub 0, X) -> X if it's NUW This is a generalization of the X - (0 - Y) -> X transform. llvm-svn: 222611	2014-11-22 07:15:16 +00:00
David Majnemer	80c8f627db	InstCombine: Preserve nsw when folding X*(2^C) -> X << C llvm-svn: 222606	2014-11-22 04:52:55 +00:00
David Majnemer	fd4a6d2b7a	InstCombine: Preserve nsw/nuw for ((X << C2)C1) -> (X (C1 << C2)) llvm-svn: 222605	2014-11-22 04:52:52 +00:00

... 4 5 6 7 8 ...

5403 Commits