llvm-project

Commit Graph

Author	SHA1	Message	Date
George Rimar	eaf5172ca6	[ThinLTO] - Stop internalizing and drop non-prevailing symbols. Implementation marks non-prevailing symbols as not live in the summary. Then them are dropped in backends. Fixes https://bugs.llvm.org/show_bug.cgi?id=35938 Differential revision: https://reviews.llvm.org/D42107 llvm-svn: 323633	2018-01-29 08:03:30 +00:00
Davide Italiano	8b797a0fd2	[CVP] Don't Replace incoming values from unreachable blocks with undef. This pretty much reverts r322006, except that we keep the test, because we work around the issue exposed in a different way (a recursion limit in value tracking). There's still probably some sequence that exposes this problem, and the proper way to fix that for somebody who has time is outlined in the code review. llvm-svn: 323630	2018-01-29 05:59:55 +00:00
Hiroshi Inoue	c8e9245816	[NFC] fix trivial typos in comments and documents "to to" -> "to" llvm-svn: 323628	2018-01-29 05:17:03 +00:00
Alexey Bataev	f86be12182	Revert "[SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle." This reverts commit r323530 to fix possible problems in users code. llvm-svn: 323581	2018-01-27 02:42:21 +00:00
Alexey Bataev	dce1614d75	Revert "[SLP] Removed the warning about unused variable, NFC." This reverts commit r323533 to fix possible problems in users code. llvm-svn: 323580	2018-01-27 02:42:17 +00:00
Vedant Kumar	cff94627cf	[InstrProfiling] Don't exit early when an unused intrinsic is found This fixes a think-o in r323574. llvm-svn: 323576	2018-01-27 00:01:04 +00:00
Vedant Kumar	1ee511c19c	[InstrProfiling] Improve compile time when there is no work When there are no uses of profiling intrinsics in a module, and there's no coverage data to lower, InstrProfiling has no work to do. llvm-svn: 323574	2018-01-26 23:54:24 +00:00
Vedant Kumar	e48597a50e	[InstCombine] Preserve debug values for eliminable casts A cast from A to B is eliminable if its result is casted to C, and if the pair of casts could just be expressed as a single cast. E.g here, %c1 is eliminable: %c1 = zext i16 %A to i32 %c2 = sext i32 %c1 to i64 InstCombine optimizes away eliminable casts. This patch teaches it to insert a dbg.value intrinsic pointing to the final result, so that local variables pointing to the eliminable result are preserved. Differential Revision: https://reviews.llvm.org/D42566 llvm-svn: 323570	2018-01-26 22:02:52 +00:00
Alexey Bataev	041ef2dd15	[SLP] Removed the warning about unused variable, NFC. llvm-svn: 323533	2018-01-26 15:34:44 +00:00
Alexey Bataev	167003df28	[SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle. Summary: If the same value is going to be vectorized several times in the same tree entry, this entry is considered to be a gather entry and cost of this gather is counter as cost of InsertElementInstrs for each gathered value. But we can consider these elements as ShuffleInstr with SK_PermuteSingle shuffle kind. Reviewers: spatel, RKSimon, mkuper, hfinkel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38697 llvm-svn: 323530	2018-01-26 14:31:09 +00:00
Daniil Fukalov	6e1dc68117	[AMDGPU] fix LDS f32 intrinsics - using qualified pointer addrspace in intrinsics class to avoid .f32 mangling - changed too common atomic mangling to ds - added missing intrinsics to AMDGPUTTIImpl::getTgtMemIntrinsic Reviewed by: b-sumner Differential Revision: https://reviews.llvm.org/D42383 llvm-svn: 323516	2018-01-26 11:09:38 +00:00
Florian Hahn	212afb9fd9	[CallSiteSplitting] Fix infinite loop when recording conditions. Fix infinite loop when recording conditions by correctly marking basic blocks as visited. Fixes https://bugs.llvm.org/show_bug.cgi?id=36105 llvm-svn: 323515	2018-01-26 10:36:50 +00:00
Hiroshi Inoue	0909ca132f	[NFC] fix trivial typos in comments and documents "in in" -> "in", "on on" -> "on" etc. llvm-svn: 323508	2018-01-26 08:15:29 +00:00
Vedant Kumar	6394df9fc4	[Debug] LCSSA: Insert dbg.value at the first available insertion point Inserting a dbg.value instruction at the start of a basic block with a landingpad instruction triggers a verifier failure. We should be OK if we insert the instruction a bit later. Speculative fix for the bot failure described here: https://reviews.llvm.org/D42551 llvm-svn: 323482	2018-01-25 23:48:29 +00:00
Easwaran Raman	8410c37465	[SyntheticCounts] Rewrite the code using only graph traits. Summary: The intent of this is to allow the code to be used with ThinLTO. In Thinlink phase, a traditional Callgraph can not be computed even though all the necessary information (nodes and edges of a call graph) is available. This is due to the fact that CallGraph class is closely tied to the IR. This patch first extends GraphTraits to add a CallGraphTraits graph. This is then used to implement a version of counts propagation on a generic callgraph. Reviewers: davidxl Subscribers: mehdi_amini, tejohnson, llvm-commits Differential Revision: https://reviews.llvm.org/D42311 llvm-svn: 323475	2018-01-25 22:02:29 +00:00
Vedant Kumar	60f54084bf	[Debug] Add dbg.value intrinsics for PHIs created during LCSSA. This patch is an enhancement to propagate dbg.value information when Phis are created on behalf of LCSSA. I noticed a case where a value carried across a loop was reported as <optimized out>. Specifically this case: int bar(int x, int y) { return x + y; } int foo(int size) { int val = 0; for (int i = 0; i < size; ++i) { val = bar(val, i); // Both val and i are correct } return val; // <optimized out> } In the above case, after all of the interesting computation completes our value is reported as "optimized out." This change will add a dbg.value to correct this. This patch also moves the dbg.value insertion routine from LoopRotation.cpp into Local.cpp, so that we can share it in both places (LoopRotation and LCSSA). Patch by Matt Davis! Differential Revision: https://reviews.llvm.org/D42551 llvm-svn: 323472	2018-01-25 21:37:07 +00:00
Vedant Kumar	6bfc869cf7	[Debug] Add a utility to propagate dbg.value to new PHIs, NFC This simply moves an existing utility to Utils for reuse. Split out of: https://reviews.llvm.org/D42551 Patch by Matt Davis! llvm-svn: 323471	2018-01-25 21:37:05 +00:00
Evgeniy Stepanov	31475a039a	[asan] Fix kernel callback naming in instrumentation module. Right now clang uses "_n" suffix for some user space callbacks and "N" for the matching kernel ones. There's no need for this and it actually breaks kernel build with inline instrumentation. Use the same callback names for user space and the kernel (and also make them consistent with the names GCC uses). Patch by Andrey Konovalov. Differential Revision: https://reviews.llvm.org/D42423 llvm-svn: 323470	2018-01-25 21:28:51 +00:00
Easwaran Raman	c73cec84c9	Re-land "[ThinLTO] Add call edges' relative block frequency to per-module summary." It was reverted after buildbot regressions. Original commit message: This allows relative block frequency of call edges to be passed to the thinlink stage where it will be used to compute synthetic entry counts of functions. llvm-svn: 323460	2018-01-25 19:27:17 +00:00
Alexey Bataev	102d4b59f9	Revert "[SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle." This reverts commit r323441 to fix buildbots. llvm-svn: 323447	2018-01-25 17:28:12 +00:00
Alexey Bataev	c8cfa14b6d	[SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle. Summary: If the same value is going to be vectorized several times in the same tree entry, this entry is considered to be a gather entry and cost of this gather is counter as cost of InsertElementInstrs for each gathered value. But we can consider these elements as ShuffleInstr with SK_PermuteSingle shuffle kind. Reviewers: spatel, RKSimon, mkuper, hfinkel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38697 llvm-svn: 323441	2018-01-25 16:45:18 +00:00
Sanjay Patel	1d68112c4b	[InstCombine] narrow masked zexted binops (PR35792) This is guarded by shouldChangeType(), so the tests show that we don't do the fold if the narrower type is not legal. Note that there is a proposal (D42424) that would change the results for the specific cases shown in these tests. That difference is also discussed in PR35792: https://bugs.llvm.org/show_bug.cgi?id=35792 Alive proofs for the cases handled here as well as the bitwise logic binops that we should already do better on: https://rise4fun.com/Alive/c97 https://rise4fun.com/Alive/Lc5E https://rise4fun.com/Alive/kdf llvm-svn: 323437	2018-01-25 16:34:36 +00:00
Alexey Bataev	a0b2c78efc	Revert "[SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle." This reverts commit r323430 to fix buildbots. llvm-svn: 323432	2018-01-25 15:20:29 +00:00
Alexey Bataev	ad51fe3644	[SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle. Summary: If the same value is going to be vectorized several times in the same tree entry, this entry is considered to be a gather entry and cost of this gather is counter as cost of InsertElementInstrs for each gathered value. But we can consider these elements as ShuffleInstr with SK_PermuteSingle shuffle kind. Reviewers: spatel, RKSimon, mkuper, hfinkel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38697 llvm-svn: 323430	2018-01-25 15:01:36 +00:00
Amjad Aboud	f1f57a3137	Another try to commit 323321 (aggressive instruction combine). llvm-svn: 323416	2018-01-25 12:06:32 +00:00
Mikael Holmen	886edf8f8a	[GlobalOpt] Emit fragments using field offsets from struct layout Summary: When creating the debug fragments for a SRA'd struct, use the fields' offsets, taken from the struct layout, as the offsets for the resulting fragments. This fixes an issue where GlobalOpt would emit fragments with incorrect offsets for padded fields. This should solve PR36016. Patch by David Stenberg. Reviewers: aprantl Reviewed By: aprantl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D42489 llvm-svn: 323411	2018-01-25 10:09:26 +00:00
Alexey Bataev	0affccc8d7	Revert "[SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle." This reverts commit r323348 because of the broken buildbots. llvm-svn: 323359	2018-01-24 18:36:51 +00:00
Nicolai Haehnle	4afb64e4c6	Revert r321751, "StructurizeCFG: Fix broken backedge detection" It causes regressions in various OpenGL test suites. Keep the test cases introduced by r321751 as XFAIL, and add a test case for the regression. Change-Id: I90b4cc354f68cebe5fcef1f2422dc8fe1c6d3514 Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=36015 llvm-svn: 323355	2018-01-24 18:02:05 +00:00
Alexey Bataev	4bd8e5332f	[SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle. Summary: If the same value is going to be vectorized several times in the same tree entry, this entry is considered to be a gather entry and cost of this gather is counter as cost of InsertElementInstrs for each gathered value. But we can consider these elements as ShuffleInstr with SK_PermuteSingle shuffle kind. Reviewers: spatel, RKSimon, mkuper, hfinkel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38697 llvm-svn: 323348	2018-01-24 17:50:53 +00:00
Amjad Aboud	d53504e379	Reverted 323321. llvm-svn: 323326	2018-01-24 14:48:49 +00:00
Amjad Aboud	e4453233d7	[InstCombine] Introducing Aggressive Instruction Combine pass (-aggressive-instcombine). Combine expression patterns to form expressions with fewer, simple instructions. This pass does not modify the CFG. For example, this pass reduce width of expressions post-dominated by TruncInst into smaller width when applicable. It differs from instcombine pass in that it contains pattern optimization that requires higher complexity than the O(1), thus, it should run fewer times than instcombine pass. Differential Revision: https://reviews.llvm.org/D38313 llvm-svn: 323321	2018-01-24 12:42:42 +00:00
Max Kazantsev	0f720e1296	[NFC] Remove overconfident assert from IRCE This patch removes assert that SCEV is able to prove that a value is non-negative. In fact, SCEV can sometimes be unable to do this because its cache does not update properly. This assert will be returned once this problem is resolved. llvm-svn: 323309	2018-01-24 07:51:41 +00:00
Volkan Keles	ebf34ea316	BlockExtractor: Remove unused variable. NFC. llvm-svn: 323271	2018-01-23 22:24:34 +00:00
Volkan Keles	dc40be75f8	[llvm-extract] Support extracting basic blocks Summary: Currently, there is no way to extract a basic block from a function easily. This patch extends llvm-extract to extract the specified basic block(s). Reviewers: loladiro, rafael, bogner Reviewed By: bogner Subscribers: hintonda, mgorny, qcolombet, llvm-commits Differential Revision: https://reviews.llvm.org/D41638 llvm-svn: 323266	2018-01-23 21:51:34 +00:00
Alexey Bataev	4f74a31c0e	Revert "[SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle." This reverts commit r323246 because of the broken buildbots. llvm-svn: 323252	2018-01-23 20:11:27 +00:00
Alexey Bataev	6719e2418c	[SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle. Summary: If the same value is going to be vectorized several times in the same tree entry, this entry is considered to be a gather entry and cost of this gather is counter as cost of InsertElementInstrs for each gathered value. But we can consider these elements as ShuffleInstr with SK_PermuteSingle shuffle kind. Reviewers: spatel, RKSimon, mkuper, hfinkel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38697 llvm-svn: 323246	2018-01-23 19:30:26 +00:00
Ashutosh Nema	007b425b77	This change add's optimization remark in LoopVersioning LICM pass. Summary: This patch is adding remark messages to the LoopVersioning LICM pass, which will be useful for optimization remark emitter (ORE) infrastructure. Patch by: Deepak Porwal Reviewers: anemet, ashutosh.nema, eastig Subscribers: eastig, vivekvpandya, fhahn, llvm-commits llvm-svn: 323183	2018-01-23 09:47:28 +00:00
Dmitry Vyukov	68aab34f2d	asan: allow inline instrumentation for the kernel Currently ASan instrumentation pass forces callback instrumentation when applied to the kernel. This patch changes the current behavior to allow using inline instrumentation in this case. Authored by andreyknvl. Reviewed in: https://reviews.llvm.org/D42384 llvm-svn: 323140	2018-01-22 19:07:11 +00:00
Eugene Leviant	28d8a49f42	[ThinLTO] Re-commit of dot dumper after test fix llvm-svn: 323116	2018-01-22 13:35:40 +00:00
Serguei Katkov	f38041dc3e	Revert [SCEV] Fix isLoopEntryGuardedByCond usage It causes buildbot failures. New added assert is fired. It seems not all usages of isLoopEntryGuardedByCond are fixed. llvm-svn: 323079	2018-01-22 07:47:02 +00:00
Serguei Katkov	50714a1cbc	[SCEV] Fix isLoopEntryGuardedByCond usage ScalarEvolution::isKnownPredicate invokes isLoopEntryGuardedByCond without check that SCEV is available at entry point of the loop. It is incorrect and fixed by patch. Reviewers: sanjoy, mkazantsev, anna, dorit Reviewed By: mkazantsev Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D42165 llvm-svn: 323077	2018-01-22 07:31:41 +00:00
Sanjay Patel	9530f18864	[InstCombine] (X << Y) / X -> 1 << Y ...when the shift is known to not overflow with the matching signed-ness of the division. This closes an optimization gap caused by canonicalizing mul by power-of-2 to shl as shown in PR35709: https://bugs.llvm.org/show_bug.cgi?id=35709 Patch by Anton Bikineev! Differential Revision: https://reviews.llvm.org/D42032 llvm-svn: 323068	2018-01-21 16:14:51 +00:00
Eugene Leviant	72b9bdb71a	Temporarily revert r323062 to investigate buildbot failures llvm-svn: 323065	2018-01-21 10:22:19 +00:00
Eugene Leviant	453c976a63	[ThinLTO] Implement summary visualizer Differential revision: https://reviews.llvm.org/D41297 llvm-svn: 323062	2018-01-21 07:27:32 +00:00
Philip Reames	f57714c3c7	[DSE] Factor out common code [NFC] We already had the pointer being stored to in the MemLoc, reuse that code. In merging cases, it turned out the interface of the getLocForWrite had become inconsitent with other related utilities. Fix that by making sure the input passes hasAnalyzableWrite as well. llvm-svn: 323056	2018-01-21 02:10:54 +00:00
Philip Reames	424e7a1174	[DSE] Minor rename for clarity sake [NFC] llvm-svn: 323055	2018-01-21 01:44:33 +00:00
Akira Hatanaka	73ceb50d85	[ObjCARC] Do not turn a call to @objc_autoreleaseReturnValue into a call to @objc_autorelease if its operand is a PHI and the PHI has an equivalent value that is used by a return instruction. For example, ARC optimizer shouldn't replace the call in the following example, as doing so breaks the AutoreleaseRV/RetainRV optimization: %v1 = bitcast i32* %v0 to i8* br label %bb3 bb2: %v3 = bitcast i32* %v2 to i8* br label %bb3 bb3: %p = phi i8* [ %v1, %bb1 ], [ %v3, %bb2 ] %retval = phi i32* [ %v0, %bb1 ], [ %v2, %bb2 ] ; equivalent to %p %v4 = tail call i8* @objc_autoreleaseReturnValue(i8* %p) ret i32* %retval Also, make sure ObjCARCContract replaces @objc_autoreleaseReturnValue's operand uses with its value so that the call gets tail-called. rdar://problem/15894705 llvm-svn: 323009	2018-01-19 23:51:13 +00:00
Daniel Neilson	1e68724d24	Remove alignment argument from memcpy/memmove/memset in favour of alignment attributes (Step 1) Summary: This is a resurrection of work first proposed and discussed in Aug 2015: http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.html and initially landed (but then backed out) in Nov 2015: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html The @llvm.memcpy/memmove/memset intrinsics currently have an explicit argument which is required to be a constant integer. It represents the alignment of the dest (and source), and so must be the minimum of the actual alignment of the two. This change is the first in a series that allows source and dest to each have their own alignments by using the alignment attribute on their arguments. In this change we: 1) Remove the alignment argument. 2) Add alignment attributes to the source & dest arguments. We, temporarily, require that the alignments for source & dest be equal. For example, code which used to read: call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 100, i32 4, i1 false) will now read call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 4 %dest, i8* align 4 %src, i32 100, i1 false) Downstream users may have to update their lit tests that check for @llvm.memcpy/memmove/memset call/declaration patterns. The following extended sed script may help with updating the majority of your tests, but it does not catch all possible patterns so some manual checking and updating will be required. s~declare void @llvm\.mem(set\|cpy\|move)\.p([^(])$(.), i32, i1$~declare void @llvm.mem\1.p\2(\3, i1)~g s~call void @llvm\.memset\.p([^(])i8$i8([^])\ (.), i8 (.), i8 (.), i32 [01], i1 ([^)])$~call void @llvm.memset.p\1i8(i8\2* \3, i8 \4, i8 \5, i1 \6)~g s~call void @llvm\.memset\.p([^(])i16$i8([^])\ (.), i8 (.), i16 (.), i32 [01], i1 ([^)])$~call void @llvm.memset.p\1i16(i8\2* \3, i8 \4, i16 \5, i1 \6)~g s~call void @llvm\.memset\.p([^(])i32$i8([^])\ (.), i8 (.), i32 (.), i32 [01], i1 ([^)])$~call void @llvm.memset.p\1i32(i8\2* \3, i8 \4, i32 \5, i1 \6)~g s~call void @llvm\.memset\.p([^(])i64$i8([^])\ (.), i8 (.), i64 (.), i32 [01], i1 ([^)])$~call void @llvm.memset.p\1i64(i8\2* \3, i8 \4, i64 \5, i1 \6)~g s~call void @llvm\.memset\.p([^(])i128$i8([^])\ (.), i8 (.), i128 (.), i32 [01], i1 ([^)])$~call void @llvm.memset.p\1i128(i8\2* \3, i8 \4, i128 \5, i1 \6)~g s~call void @llvm\.memset\.p([^(])i8$i8([^])\ (.), i8 (.), i8 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.memset.p\1i8(i8\2 align \6 \3, i8 \4, i8 \5, i1 \7)~g s~call void @llvm\.memset\.p([^(])i16$i8([^])\ (.), i8 (.), i16 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.memset.p\1i16(i8\2 align \6 \3, i8 \4, i16 \5, i1 \7)~g s~call void @llvm\.memset\.p([^(])i32$i8([^])\ (.), i8 (.), i32 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.memset.p\1i32(i8\2 align \6 \3, i8 \4, i32 \5, i1 \7)~g s~call void @llvm\.memset\.p([^(])i64$i8([^])\ (.), i8 (.), i64 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.memset.p\1i64(i8\2 align \6 \3, i8 \4, i64 \5, i1 \7)~g s~call void @llvm\.memset\.p([^(])i128$i8([^])\ (.), i8 (.), i128 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.memset.p\1i128(i8\2 align \6 \3, i8 \4, i128 \5, i1 \7)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i8$i8([^])\ (.), i8([^])\ (.), i8 (.), i32 [01], i1 ([^)])$~call void @llvm.mem\1.p\2i8(i8\3 \4, i8\5* \6, i8 \7, i1 \8)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i16$i8([^])\ (.), i8([^])\ (.), i16 (.), i32 [01], i1 ([^)])$~call void @llvm.mem\1.p\2i16(i8\3 \4, i8\5* \6, i16 \7, i1 \8)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i32$i8([^])\ (.), i8([^])\ (.), i32 (.), i32 [01], i1 ([^)])$~call void @llvm.mem\1.p\2i32(i8\3 \4, i8\5* \6, i32 \7, i1 \8)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i64$i8([^])\ (.), i8([^])\ (.), i64 (.), i32 [01], i1 ([^)])$~call void @llvm.mem\1.p\2i64(i8\3 \4, i8\5* \6, i64 \7, i1 \8)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i128$i8([^])\ (.), i8([^])\ (.), i128 (.), i32 [01], i1 ([^)])$~call void @llvm.mem\1.p\2i128(i8\3 \4, i8\5* \6, i128 \7, i1 \8)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i8$i8([^])\ (.), i8([^])\ (.), i8 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.mem\1.p\2i8(i8\3* align \8 \4, i8\5* align \8 \6, i8 \7, i1 \9)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i16$i8([^])\ (.), i8([^])\ (.), i16 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.mem\1.p\2i16(i8\3* align \8 \4, i8\5* align \8 \6, i16 \7, i1 \9)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i32$i8([^])\ (.), i8([^])\ (.), i32 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.mem\1.p\2i32(i8\3* align \8 \4, i8\5* align \8 \6, i32 \7, i1 \9)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i64$i8([^])\ (.), i8([^])\ (.), i64 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.mem\1.p\2i64(i8\3* align \8 \4, i8\5* align \8 \6, i64 \7, i1 \9)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i128$i8([^])\ (.), i8([^])\ (.), i128 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.mem\1.p\2i128(i8\3* align \8 \4, i8\5* align \8 \6, i128 \7, i1 \9)~g The remaining changes in the series will: Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing source and dest alignments. Step 3) Update Clang to use the new IRBuilder API. Step 4) Update Polly to use the new IRBuilder API. Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API, and those that use use MemIntrinsicInst::[get\|set]Alignment() to use getDestAlignment() and getSourceAlignment() instead. Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the MemIntrinsicInst::[get\|set]Alignment() methods. Reviewers: pete, hfinkel, lhames, reames, bollu Reviewed By: reames Subscribers: niosHD, reames, jholewinski, qcolombet, jfb, sanjoy, arsenm, dschuff, dylanmckay, mehdi_amini, sdardis, nemanjai, david2050, nhaehnle, javed.absar, sbc100, jgravelle-google, eraman, aheejin, kbarton, JDevlieghere, asb, rbar, johnrusso, simoncook, jordy.potman.lists, apazos, sabuasal, llvm-commits Differential Revision: https://reviews.llvm.org/D41675 llvm-svn: 322965	2018-01-19 17:13:12 +00:00
Alexey Bataev	fa80c47c6a	[SLP] Fix vectorization for tree with trunc to minimum required bit width. Summary: If the vectorized tree has truncate to minimum required bit width and the vector type of the cast operation after the truncation is the same as the vector type of the cast operands, count cost of the vector cast operation as 0, because this cast will be later removed. Also, if the vectorization tree root operations are integer cast operations, do not consider them as candidates for truncation. It will just create extra number of the same vector/scalar operations, which will be removed by instcombiner. Reviewers: RKSimon, spatel, mkuper, hfinkel, mssimpso Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D41948 llvm-svn: 322946	2018-01-19 14:40:13 +00:00
Hiroshi Inoue	d24ddcd6c4	[NFC] fix trivial typos in comments "the the" -> "the" llvm-svn: 322934	2018-01-19 10:55:29 +00:00
John Brawn	2867bd72c0	[InstCombine] Make foldSelectOpOp able to handle two-operand getelementptr Three (or more) operand getelementptrs could plausibly also be handled, but handling only two-operand fits in easily with the existing BinaryOperator handling. Differential Revision: https://reviews.llvm.org/D39958 llvm-svn: 322930	2018-01-19 10:05:15 +00:00
Benjamin Kramer	bfc1d976ca	[HWAsan] Fix uninitialized variable. Found by msan. llvm-svn: 322847	2018-01-18 14:19:04 +00:00
Evgeniy Stepanov	5bd669dc8f	[hwasan] LLVM-level flags for linux kernel-compatible hwasan instrumentation. Summary: -hwasan-mapping-offset defines the non-zero shadow base address. -hwasan-kernel disables calls to __hwasan_init in module constructors. Unlike ASan, -hwasan-kernel does not force callback instrumentation. This is controlled separately with -hwasan-instrument-with-calls. Reviewers: kcc Subscribers: srhines, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D42141 llvm-svn: 322785	2018-01-17 23:24:38 +00:00
Easwaran Raman	e5b8de2f1f	Add a ProfileCount class to represent entry counts. Summary: The class wraps a uint64_t and an enum to represent the type of profile count (real and synthetic) with some helper methods. Reviewers: davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D41883 llvm-svn: 322771	2018-01-17 22:24:23 +00:00
Javed Absar	1e28194a40	[SCEV] Fix typo. NFC. Fix confusing typo in comment. llvm-svn: 322765	2018-01-17 21:58:35 +00:00
Zaara Syeda	c9dc7b451b	Revert [PowerPC] This reverts commit rL322721 Failing build bots. Revert the commit now. llvm-svn: 322748	2018-01-17 20:00:15 +00:00
Zaara Syeda	8e951fd2f6	[PowerPC] Add handling for ColdCC calling convention and a pass to mark candidates with coldcc attribute. This patch adds support for the coldcc calling convention for Power. This changes the set of non-volatile registers. It includes a pass to stress test the implementation by marking all static directly called functions with the coldcc attribute through the option -enable-coldcc-stress-test. It also includes an option, -ppc-enable-coldcc, to add the coldcc attribute to functions which are cold at all call sites based on BlockFrequencyInfo when the containing function does not call any non cold functions. Differential Revision: https://reviews.llvm.org/D38413 llvm-svn: 322721	2018-01-17 18:22:55 +00:00
Sanjay Patel	aa766efd09	[InstCombine] fix demanded-bits propagation for zext/trunc I was comparing the demanded-bits implementations between InstCombine and TargetLowering as part of investigating questions in D42088 and noticed that this was wrong in IR. We were losing all of the prior known bits when we got back to the 'zext'. llvm-svn: 322662	2018-01-17 14:39:28 +00:00
Daniil Fukalov	d5fca554e2	[AMDGPU] add LDS f32 intrinsics added llvm.amdgcn.atomic.{add\|min\|max}.f32 intrinsics to allow generate ds_{add\|min\|max}[_rtn]_f32 instructions needed for OpenCL float atomics in LDS Reviewed by: arsenm Differential Revision: https://reviews.llvm.org/D37985 llvm-svn: 322656	2018-01-17 14:05:05 +00:00
Ivan A. Kosarev	4d0ff0c74d	[Transforms] Support making mutable versions of new-format TBAA access tags Differential Revision: https://reviews.llvm.org/D41565 llvm-svn: 322650	2018-01-17 13:29:54 +00:00
Javed Absar	0b05f327d6	[SCEV] fix typo llvm-svn: 322629	2018-01-17 11:03:06 +00:00
Evgeniy Stepanov	c07e0bd533	[hwasan] Rename sized load/store callbacks to be consistent with ASan. Summary: __hwasan_load is now __hwasan_loadN. Reviewers: kcc Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D42138 llvm-svn: 322601	2018-01-16 23:15:08 +00:00
Florian Hahn	c6c89bffdc	[CallSiteSplitting] Pass list of (BB, Conditions) pairs to splitCallSite. This removes some duplication from splitCallSite and makes it easier to add additional code dealing with each predecessor. It also allows us to split for more than 2 predecessors, although that is not enabled for now. Reviewers: junbuml, mcrosier, davidxl, davide Reviewed By: junbuml Differential Revision: https://reviews.llvm.org/D41858 llvm-svn: 322599	2018-01-16 22:13:15 +00:00
Alexey Bataev	6977dbcc7b	[SLP] Fix for PR32164: Improve vectorization of reverse order of extract operations. Summary: Sometimes vectorization of insertelement instructions with extractelement operands may produce an extra shuffle operation, if these operands are in the reverse order. Patch tries to improve this situation by the reordering of the operands to remove this extra shuffle operation. Reviewers: mkuper, hfinkel, RKSimon, spatel Subscribers: mzolotukhin, llvm-commits Differential Revision: https://reviews.llvm.org/D33954 llvm-svn: 322579	2018-01-16 18:17:01 +00:00
Hiroshi Inoue	99a8faa615	[SROA] fix assetion failure This patch fixes the assertion failure in SROA reported in PR35657. PR35657 reports the assertion failure due to r319522 (splitting for non-whole-alloca slices), but this problem can happen even without r319522. The problem exists in a check for reusing an existing alloca when rewriting partitions. As the original comment said, we can reuse the existing alloca if the new alloca has the same type and offset with the existing one. But the code checks only type of the alloca and then check the offset using an assert. In a corner case with out-of-bounds access (e.g. @PR35657 function added in unit test), it is possible that the two allocas have the same type but different offsets. This patch makes the check of the offset in the if condition, and re-enables the splitting for non-whole-alloca slices. Differential Revision: https://reviews.llvm.org/D41981 llvm-svn: 322533	2018-01-16 06:23:05 +00:00
Andrei Elovikov	7457aa0bce	[LV] Don't call recordVectorLoopValueForInductionCast for newly-created IV from a trunc. Summary: This method is supposed to be called for IVs that have casts in their use-def chains that are completely ignored after vectorization under PSE. However, for truncates of such IVs the same InductionDescriptor is used during creation/widening of both original IV based on PHINode and new IV based on TruncInst. This leads to unintended second call to recordVectorLoopValueForInductionCast with a VectorLoopVal set to the newly created IV for a trunc and causes an assert due to attempt to store new information for already existing entry in the map. This is wrong and should not be done. Fixes PR35773. Reviewers: dorit, Ayal, mssimpso Reviewed By: dorit Subscribers: RKSimon, dim, dcaballe, hsaito, llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D41913 llvm-svn: 322473	2018-01-15 10:56:07 +00:00
Max Kazantsev	d0fe502385	[NFC] Fix comment to adjust to reality llvm-svn: 322468	2018-01-15 05:44:43 +00:00
Evgeniy Stepanov	080e0d40b9	[hwasan] An LLVM flag to disable stack tag randomization. Summary: Necessary to achieve consistent test results. Reviewers: kcc, alekseyshl Subscribers: kubamracek, llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D42023 llvm-svn: 322429	2018-01-13 01:32:15 +00:00
Daniel Neilson	2409d24201	[NFC] Change MemIntrinsicInst::setAlignment() to take an unsigned instead of a Constant Summary: In preparation for https://reviews.llvm.org/D41675 this NFC changes this prototype of MemIntrinsicInst::setAlignment() to accept an unsigned instead of a Constant. llvm-svn: 322403	2018-01-12 21:33:37 +00:00
Brian M. Rzycki	9b7ae23256	[JumpThreading] Preservation of DT and LVI across the pass Summary: See D37528 for a previous (non-deferred) version of this patch and its description. Preserves dominance in a deferred manner using a new class DeferredDominance. This reduces the performance impact of updating the DominatorTree at every edge insertion and deletion. A user may call DDT->flush() within JumpThreading for an up-to-date DT. This patch currently has one flush() at the end of runImpl() to ensure DT is preserved across the pass. LVI is also preserved to help subsequent passes such as CorrelatedValuePropagation. LVI is simpler to maintain and is done immediately (not deferred). The code to perform the preversation was minimally altered and simply marked as preserved for the PassManager to be informed. This extends the analysis available to JumpThreading for future enhancements such as threading across loop headers. Reviewers: dberlin, kuhar, sebpop Reviewed By: kuhar, sebpop Subscribers: mgorny, dmgreen, kuba, rnk, rsmith, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D40146 llvm-svn: 322401	2018-01-12 21:06:48 +00:00
Max Kazantsev	ef0576000c	[IRCE][NFC] Make range check's End a non-null SCEV Currently, IRC contains `Begin` and `Step` as SCEVs and `End` as value. Aside from that, `End` can also be `nullptr` which can be later conditionally converted into a non-null SCEV. To make this logic more transparent, this patch makes `End` a SCEV and calculates it early, so that it is never a null. Differential Revision: https://reviews.llvm.org/D39590 llvm-svn: 322364	2018-01-12 10:00:26 +00:00
Serguei Katkov	a757d65cec	[LoopDeletion] Handle users in unreachable block This is a fix for PR35884. When we want to delete dead loop we must clean uses in unreachable blocks otherwise we'll get an assert during deletion of instructions from the loop. Reviewers: anna, davide Reviewed By: anna Subscribers: llvm-commits, lebedev.ri Differential Revision: https://reviews.llvm.org/D41943 llvm-svn: 322357	2018-01-12 07:24:43 +00:00
Evgeniy Stepanov	99fa3e774d	[hwasan] Stack instrumentation. Summary: Very basic stack instrumentation using tagged pointers. Tag for N'th alloca in a function is built as XOR of: * base tag for the function, which is just some bits of SP (poor man's random) * small constant which is a function of N. Allocas are aligned to 16 bytes. On every ReturnInst allocas are re-tagged to catch use-after-return. This implementation has a bunch of issues that will be taken care of later: 1. lifetime intrinsics referring to tagged pointers are not recognized in SDAG. This effectively disables stack coloring. 2. Generated code is quite inefficient. There is one extra instruction at each memory access that adds the base tag to the untagged alloca address. It would be better to keep tagged SP in a callee-saved register and address allocas as an offset of that XOR retag, but that needs better coordination between hwasan instrumentation pass and prologue/epilogue insertion. 3. Lifetime instrinsics are ignored and use-after-scope is not implemented. This would be harder to do than in ASan, because we need to use a differently tagged pointer depending on which lifetime.start / lifetime.end the current instruction is dominated / post-dominated. Reviewers: kcc, alekseyshl Subscribers: srhines, kubamracek, javed.absar, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D41602 llvm-svn: 322324	2018-01-11 22:53:30 +00:00
Rafael Espindola	e4b0231c63	Make internal/private GVs implicitly dso_local. While updating clang tests for having clang set dso_local I noticed that: - There are a lot of tests to update. - Many of the updates are redundant. They are redundant because a GV is "obviously dso_local". This patch starts formalizing that a bit by requiring that internal and private GVs be dso_local too. Since they all are, we don't have to print dso_local to the textual representation, making it a bit more compact and easier to read. llvm-svn: 322317	2018-01-11 22:15:05 +00:00
Fiona Glaser	efe6a84e5b	[Sink] Really really fix predicate in legality check LoadInst isn't enough; we need to include intrinsics that perform loads too. All side-effecting intrinsics and such are already covered by the isSafe check, so we just need to care about things that read from memory. D41960, originally from D33179. llvm-svn: 322311	2018-01-11 21:28:57 +00:00
Benjamin Kramer	738e6e7cb0	[InstCombine] Apply the fix from r322284 for sin / cos -> tan too llvm-svn: 322285	2018-01-11 15:33:21 +00:00
Benjamin Kramer	44993ede60	[InstCombine] For cos/sin -> tan copy attributes from cos instead of the parent function Ideally we should merge the attributes from the functions somehow, but this is obviously an improvement over taking random attributes from the caller which will trip up the verifier if they're nonsensical for an unary intrinsic call. llvm-svn: 322284	2018-01-11 15:19:02 +00:00
Dmitry Venikov	e5fbf591a7	[InstCombine] Missed optimization in math expression: sin(x) / cos(x) => tan(x) Summary: This patch enables folding sin(x) / cos(x) -> tan(x), cos(x) / sin(x) -> 1 / tan(x) under -ffast-math flag Reviewers: hfinkel, spatel Reviewed By: spatel Subscribers: andrew.w.kaylor, efriedma, scanon, llvm-commits Differential Revision: https://reviews.llvm.org/D41286 llvm-svn: 322255	2018-01-11 06:33:00 +00:00
Marcello Maggioni	ddccd50313	[NFC] Commit to mention that r322248 is actually made by AndrewScheidecker llvm-svn: 322249	2018-01-11 02:06:28 +00:00
Marcello Maggioni	7083423f22	[SimplifyCFG] Add cut-off for InitializeUniqueCases. The function can take a significant amount of time on some complicated test cases, but for the currently only use of the function we can stop the initialization much earlier when we find out we are going to discard the result anyway in the caller of the function. Adding configurable cut-off points so that we avoid wasting time. NFCI. llvm-svn: 322248	2018-01-11 02:01:16 +00:00
Justin Lebar	9d3afd3c06	Add explanatory comment to LoadStoreVectorizer. Reviewers: arsenm Subscribers: rengolin, sanjoy, wdng, hiraditya, asbirlea Differential Revision: https://reviews.llvm.org/D41890 llvm-svn: 322157	2018-01-10 03:02:12 +00:00
Vlad Tsyrklevich	cdec22ef9a	LowerTypeTests: Add limited support for aliases Summary: LowerTypeTests moves some function definitions from individual object files to the merged module, leaving a stub to be called in the merged module's jump table. If an alias was pointing to such a function definition LowerTypeTests would fail because the alias would be left without a definition to point to. This change 1) emits information about aliases to the ThinLTO summary, 2) replaces aliases pointing to function definitions that are moved to the merged module with function declarations, and 3) re-emits those aliases in the merged module pointing to the correct function definitions. The patch does not correctly fix all possible mis-uses of aliases in LowerTypeTests. For example, it does not handle aliases with a different type from the pointed to function. The addition of alias data increases the size of Chrome build artifacts by less than 1%. Reviewers: pcc Reviewed By: pcc Subscribers: mehdi_amini, eraman, mgrang, llvm-commits, eugenis, kcc Differential Revision: https://reviews.llvm.org/D41741 llvm-svn: 322139	2018-01-10 00:00:51 +00:00
Michael Zolotukhin	1f562176e9	[LoopRotate] Detect loops with indirect branches better (we're giving up on them). llvm-svn: 322137	2018-01-09 23:54:35 +00:00
Chris Bieneman	abdea268c1	[IPSCCP] Remove calls without side effects Summary: When performing constant propagation for call instructions we have historically replaced all uses of the return from a call, but not removed the call itself. This is required for correctness if the calls have side effects, however the compiler should be able to safely remove calls that don't have side effects. This allows the compiler to completely fold away calls to functions that have no side effects if the inputs are constant and the output can be determined at compile time. Reviewers: davide, sanjoy, bruno, dberlin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38856 llvm-svn: 322125	2018-01-09 21:58:46 +00:00
Daniel Berlin	56cca7437c	NewGVN: Fix PR/33367, which was causing us to delete non-copy intrinsics accidentally in some rare cases llvm-svn: 322115	2018-01-09 20:12:42 +00:00
Easwaran Raman	bdf20261d8	Add a pass to generate synthetic function entry counts. Summary: This pass synthesizes function entry counts by traversing the callgraph and using the relative block frequencies of the callsites. The intended use of these counts is in inlining to determine hot/cold callsites in the absence of profile information. The pass is split into two files with the code that propagates the counts in a callgraph in a Utils file. I plan to add support for propagation in the thinlto link phase and the propagation code will be shared and hence this split. I did not add support to the old PM since hot callsite determination in inlining is not possible in old PM (although we could use hot callee heuristic with synthetic counts in the old PM it is not worth the effort tuning it) Reviewers: davidxl, silvas Subscribers: mgorny, mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D41604 llvm-svn: 322110	2018-01-09 19:39:35 +00:00
Sanjay Patel	6fb1357c35	[InstCombine] weaken assertions for icmp folds (PR35846) Because of potential UB (known bits conflicts with an llvm.assume), we have to check rather than assert here because InstSimplify doesn't kill the compare: https://bugs.llvm.org/show_bug.cgi?id=35846 llvm-svn: 322104	2018-01-09 18:56:03 +00:00
Petar Jovanovic	1d26c7e4ff	[EarlyCSE] Salvage debug info during DCE EarlyCSE did not try to salvage debug info during erasing of instructions. This change fixes it. Patch by Djordje Todorovic. Differential Revision: https://reviews.llvm.org/D41496 llvm-svn: 322083	2018-01-09 15:08:37 +00:00
Simon Pilgrim	5d909be91b	[InstCombine] Check for out of range ashr values using APInt before calling getZExtValue Reduced from oss-fuzz #5032 test case llvm-svn: 322078	2018-01-09 14:23:46 +00:00
Justin Bogner	6f6846fc9d	AlwaysInliner: Alow setting InsertLifetime in the new-style pass llvm-svn: 322033	2018-01-08 22:07:42 +00:00
Justin Bogner	92fe563b57	ArgPromotion: Allow setting MaxElements in the new-style pass llvm-svn: 322025	2018-01-08 21:13:35 +00:00
Davide Italiano	9a60d2c157	[CVP] Replace incoming values from unreachable blocks with undef. This is an attempt of fixing PR35807. Due to the non-standard definition of dominance in LLVM, where uses in unreachable blocks are dominated by anything, you can have, in an unreachable block: %patatino = OP1 %patatino, CONSTANT When `SimplifyInstruction` receives a PHI where an incoming value is of the aforementioned form, in some cases, loops indefinitely. What I propose here instead is keeping track of the incoming values from unreachable blocks, and replacing them with undef. It fixes this case, and it seems to be good regardless (even if we can't prove that the value is constant, as it's coming from an unreachable block, we can ignore it). Differential Revision: https://reviews.llvm.org/D41812 llvm-svn: 322006	2018-01-08 16:34:06 +00:00
Sanjay Patel	31b4b76f99	[InstCombine] fold min/max tree with common operand (PR35717) There is precedence for factorization transforms in instcombine for FP ops with fast-math. We also have similar logic in foldSPFofSPF(). It would take more work to add this to reassociate because that's specialized for binops, and min/max are not binops (or even single instructions). Also, I don't have evidence that larger min/max trees than this exist in real code, but if we find that's true, we might want to reorganize where/how we do this optimization. In the motivating example from https://bugs.llvm.org/show_bug.cgi?id=35717 , we have: int test(int xc, int xm, int xy) { int xk; if (xc < xm) xk = xc < xy ? xc : xy; else xk = xm < xy ? xm : xy; return xk; } This patch solves that problem because we recognize more min/max patterns after rL321672 https://rise4fun.com/Alive/Qjne https://rise4fun.com/Alive/3yg Differential Revision: https://reviews.llvm.org/D41603 llvm-svn: 321998	2018-01-08 15:05:34 +00:00
Alexey Bataev	5b9a77d4ea	[SLP] Fix PR35777: Incorrect handling of aggregate values. Summary: Fixes the bug with incorrect handling of InsertValue\|InsertElement instrucions in SLP vectorizer. Currently, we may use incorrect ExtractElement instructions as the operands of the original InsertValue\|InsertElement instructions. Reviewers: mkuper, hfinkel, RKSimon, spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D41767 llvm-svn: 321994	2018-01-08 14:43:06 +00:00
Alexey Bataev	118a0a2c38	[SLP] Fix PR35628: Count external uses on extra reduction arguments. Summary: If the vectorized value is marked as extra reduction argument, its users are not considered as external users. Patch fixes this. Reviewers: mkuper, hfinkel, RKSimon, spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D41786 llvm-svn: 321993	2018-01-08 14:33:11 +00:00
Davide Italiano	e15bffe9ea	Revert "[SCCP] Manually fold branches on undef." I thought this was responsible for PR35723, but I was wrong, the issue lies elsewhere. Revert while I debug. llvm-svn: 321975	2018-01-07 22:09:44 +00:00
Davide Italiano	4c39758a38	[SLPVectorizer] Reintroduce std::stable_sort(properlyDominates()). The approach was never discussed, I wasn't able to reproduce this non-determinism, and the original author went AWOL. After a discussion on the ML, Philip suggested to revert this. llvm-svn: 321974	2018-01-07 22:06:24 +00:00
Hal Finkel	0f1314c5ee	[LV][VPlan] NFC patch to move LoopVectorizationPlanner class out of LoopVectorize.cpp Another small step forward to move VPlan stuff outside of LoopVectorize.cpp. VPlanBuilder.h is renamed to LoopVectorizationPlanner.h LoopVectorizationPlanner class is moved from LoopVectorize.cpp to LoopVectorizationPlanner.h LoopVectorizationCostModel::VectorizationFactor class is moved to LoopVectorizationPlanner.h (used by the planner class) --- this needs further streamlining work in later patches and thus all I did was take it out of the CostModel class and moved to the header file. The callback function had to stay inside LoopVectorize.cpp since it calls an InnerLoopVectorizer member function declared in it. Next Steps: Make InnerLoopVectorizer, LoopVectorizationCostModel, and other classes more modular and more aligned with VPlan direction, in small increments. Previous step was: r320900 (https://reviews.llvm.org/D41045) Patch by Hideki Saito, thanks! Differential Revision: https://reviews.llvm.org/D41420 llvm-svn: 321962	2018-01-07 16:02:58 +00:00
Florian Hahn	55be37e7d4	[CodeExtractor] Use subset of function attributes for extracted function. In addition to target-dependent attributes, we can also preserve a white-listed subset of target independent function attributes. The white-list excludes problematic attributes, most prominently: * attributes related to memory accesses, as alloca instructions could be moved in/out of the extracted block * control-flow dependent attributes, like no_return or thunk, as the relerelevant instructions might or might not get extracted. Thanks @efriedma and @aemerson for providing a set of attributes that cannot be propagated. Reviewers: efriedma, davidxl, davide, silvas Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D41334 llvm-svn: 321961	2018-01-07 11:22:25 +00:00
Florian Hahn	a82eef2363	[InlineFunction] Preserve calling convention when forwarding VarArgs. Reviewers: efriedma, rnk, davide Reviewed By: rnk, davide Differential Revision: https://reviews.llvm.org/D41556 llvm-svn: 321943	2018-01-06 20:56:27 +00:00
Florian Hahn	de10e6e064	[InlineFunction] Preserve attributes when forwarding VarArgs. Reviewers: rnk, efriedma Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D41555 llvm-svn: 321942	2018-01-06 20:46:00 +00:00
Florian Hahn	80788d8088	[InlineFunction] Inline vararg functions that do not access varargs. If the varargs are not accessed by a function, we can inline the function. Reviewers: dblaikie, chandlerc, davide, efriedma, rnk, hfinkel Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D41335 llvm-svn: 321940	2018-01-06 19:45:40 +00:00
Sanjay Patel	26a6fcde83	[InstCombine] relax use constraint for min/max (~a, ~b) --> ~min/max(a, b) In the minimal case, this won't remove instructions, but it still improves uses of existing values. In the motivating example from PR35834, it does remove instructions, and sets that case up to be optimized by something like D41603: https://reviews.llvm.org/D41603 llvm-svn: 321936	2018-01-06 17:34:22 +00:00
Vedant Kumar	b2ec02ba0b	[Utils] Simplify salvageDebugInfo, NFCI Having a single call to findDbgUsers() allows salvageDebugInfo() to return earlier. Differential Revision: https://reviews.llvm.org/D41787 llvm-svn: 321915	2018-01-05 23:27:02 +00:00
Sanjay Patel	5b6aacf2c1	[InstCombine] add folds for min(~a, b) --> ~max(a, b) Besides the bug of omitting the inverse transform of max(~a, ~b) --> ~min(a, b), the use checking and operand creation were off. We were potentially creating repeated identical instructions of existing values. This led to infinite looping after I added the extra folds. By using the simpler m_Not matcher and not creating new 'not' ops for a and b, we avoid that problem. It's possible that not using IsFreeToInvert() here is more limiting than the simpler matcher, but there are no tests for anything more exotic. It's also possible that we should relax the use checking further to handle a case like PR35834: https://bugs.llvm.org/show_bug.cgi?id=35834 ...but we can make that a follow-up if it is needed. llvm-svn: 321882	2018-01-05 19:01:17 +00:00
Peter Collingbourne	9110cb456d	WholeProgramDevirt: Simplify ORE getter mechanism for old PM. NFCI. llvm-svn: 321841	2018-01-05 00:27:51 +00:00
Reid Kleckner	cd78ddc119	Revert "[JumpThreading] Preservation of DT and LVI across the pass" This reverts r321825, it causes crashes in Chromium. Reproducer forthcoming. llvm-svn: 321832	2018-01-04 23:23:46 +00:00
Brian M. Rzycki	cdad6c0b60	[JumpThreading] Preservation of DT and LVI across the pass Summary: See D37528 for a previous (non-deferred) version of this patch and its description. Preserves dominance in a deferred manner using a new class DeferredDominance. This reduces the performance impact of updating the DominatorTree at every edge insertion and deletion. A user may call DDT->flush() within JumpThreading for an up-to-date DT. This patch currently has one flush() at the end of runImpl() to ensure DT is preserved across the pass. LVI is also preserved to help subsequent passes such as CorrelatedValuePropagation. LVI is simpler to maintain and is done immediately (not deferred). The code to perfom the preversation was minimally altered and was simply marked as preserved for the PassManager to be informed. This extends the analysis available to JumpThreading for future enhancements. One example is loop boundary threading. Reviewers: dberlin, kuhar, sebpop Reviewed By: kuhar, sebpop Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D40146 llvm-svn: 321825	2018-01-04 21:57:32 +00:00
Anna Thomas	9fca583757	Add assertion on DT availability during LI update in UpdateAnalysisInformation This came up during discussions in llvm-commits for rL321653: Check for unreachable preds before updating LI in UpdateAnalysisInformation The assert provides hints to passes to require both DT and LI if we plan on updating LI through this function. Tests run: make check llvm-svn: 321805	2018-01-04 17:21:15 +00:00
Sanjay Patel	c63f9014d6	[InstCombine] safely create a constant of the right type (PR35794) llvm-svn: 321801	2018-01-04 14:31:56 +00:00
Aditya Kumar	1f90cae80f	[GVNHoist] Fix: PR35222 gvn-hoist incorrectly erases load in case of a loop Reviewers: dberlin sebpop eli.friedman Differential Revision: https://reviews.llvm.org/D41453 llvm-svn: 321789	2018-01-04 07:47:24 +00:00
Matt Arsenault	8070882b4e	StructurizeCFG: Fix broken backedge detection The work order was changed in r228186 from SCC order to RPO with an arbitrary sorting function. The sorting function attempted to move inner loop nodes earlier. This was was apparently relying on an assumption that every block in a given loop / the same loop depth would be seen before visiting another loop. In the broken testcase, a block outside of the loop was encountered before moving onto another block in the same loop. The testcase would then structurize such that one blocks unconditional successor could never be reached. Revert to plain RPO for the analysis phase. This fixes detecting edges as backedges that aren't really. The processing phase does use another visited set, and I'm unclear on whether the order there is as important. An arbitrary order doesn't work, and triggers some infinite loops. The reversed RPO list seems to work and is closer to the order that was used before, minus the arbitary custom sorting. A few of the changed tests now produce smaller code, and a few are slightly worse looking. llvm-svn: 321751	2018-01-03 18:45:37 +00:00
Simon Pilgrim	3bf2d64589	[InstCombine] Check for out of range shift values using APInt before calling getZExtValue Reduced from oss-fuzz #4871 test case llvm-svn: 321748	2018-01-03 18:28:20 +00:00
Anna Thomas	bdb9430917	[BasicBlockUtils] Check for unreachable preds before updating LI in UpdateAnalysisInformation Summary: We are incorrectly updating the LI when loop-simplify generates dedicated exit blocks for a loop. The issue is that there's an implicit assumption that the Preds passed into UpdateAnalysisInformation are reachable. However, this is not true and breaks LI by incorrectly updating the header of a loop. One such case is when we generate dedicated exits when the exit block is a landing pad (through SplitLandingPadPredecessors). There maybe other cases as well, since we do not guarantee that Preds passed in are reachable basic blocks. The added test case shows how loop-simplify breaks LI for the outer loop (and DT in turn) after we try to generate the LoopSimplifyForm. Reviewers: davide, chandlerc, sanjoy Reviewed By: davide Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D41519 llvm-svn: 321653	2018-01-02 16:25:50 +00:00
Dmitry Venikov	a58d8deb3a	[InstCombine] Missed optimization in math expression: squashing sqrt functions Summary: This patch enables folding under -ffast-math flag sqrt(a) * sqrt(b) -> sqrt(a*b) Reviewers: hfinkel, spatel, davide Reviewed By: spatel, davide Subscribers: davide, llvm-commits Differential Revision: https://reviews.llvm.org/D41322 llvm-svn: 321637	2018-01-02 05:58:11 +00:00
Davide Italiano	86b7949f62	[SimplifyCFG] Return to the pass manager the correct value. I wanted to commit this with r321603, but I failed to squash the two commits. llvm-svn: 321606	2017-12-31 16:54:03 +00:00
Davide Italiano	0512bf5af2	[Utils/Local] Use `auto` when the type is obvious. NFCI. llvm-svn: 321605	2017-12-31 16:51:50 +00:00
Davide Italiano	5dd1c587e7	[Utils] Remove commented debug message. NFCI. llvm-svn: 321604	2017-12-31 16:48:44 +00:00
Davide Italiano	9f074fe915	[SimplifyCFG] Stop hoisting musttail calls incorrectly. PR35774. llvm-svn: 321603	2017-12-31 16:47:16 +00:00
Benjamin Kramer	c7fc81e659	Use phi ranges to simplify code. No functionality change intended. llvm-svn: 321585	2017-12-30 15:27:33 +00:00
Matt Arsenault	8dcfa137f3	StructurizeCFG: Use phi iterator range llvm-svn: 321568	2017-12-29 19:25:57 +00:00
Benjamin Kramer	24cb28bb54	Remove superfluous copies in sample profiling. No functionliaty change intended. llvm-svn: 321530	2017-12-28 18:10:41 +00:00
Guozhi Wei	29697c13bc	Revert r321377, it causes regression to https://reviews.llvm.org/P8055 . llvm-svn: 321528	2017-12-28 17:02:34 +00:00
Benjamin Kramer	3a13ed60ba	Avoid int to string conversion in Twine or raw_ostream contexts. Some output changes from uppercase hex to lowercase hex, no other functionality change intended. llvm-svn: 321526	2017-12-28 16:58:54 +00:00
Max Kazantsev	a13e163a27	[RewriteStatepoints] Fix incorrect assertion `RewriteStatepointsForGC` iterates over function blocks and their predecessors in order of declaration. One of outcomes of this is that callsites are placed in arbitrary order which has nothing to do with travelsar order. On the other hand, function `recomputeLiveInValues` asserts that bases are added to `Info.PointerToBase` before their deried pointers are updated. But if call sites are processed in order different from RPOT, this is not necessarily true. We cannot guarantee that the base was placed there before every pointer derived from it. All we can guarantee is that this base was marked as known base by this point. This patch replaces the fact that we assert from checking that the base was added to the map with assert that the base was marked as known base. Differential Revision: https://reviews.llvm.org/D41593 llvm-svn: 321517	2017-12-28 12:03:12 +00:00
Simon Pilgrim	472689a159	[InstCombine] Check for isa<Instruction> before using cast<> Protects against casts from constexpr etc. Reduced from oss-fuzz #4788 test case llvm-svn: 321515	2017-12-28 09:35:35 +00:00
Reid Kleckner	6d31001cd6	Revert "[memcpyopt] Teach memcpyopt to optimize across basic blocks" This reverts r321138. It seems there are still underlying issues with memdep. PR35519 seems to still be present if debug info is enabled. We end up losing a memcpy. Somehow during store to memset merging, we insert the memset after the memcpy or fail to update the memdep analysis to account for the newly inserted memset of a pair. Reduced test case: #include <assert.h> #include <stdio.h> #include <string> #include <utility> #include <vector> void do_push_back( std::vector<std::pair<std::string, std::vector<std::string>>>* crls) { crls->push_back(std::make_pair(std::string(), std::vector<std::string>())); } int __attribute__((optnone)) main() { // Put some data in the vector and then remove it so we take the push_back // fast path. std::vector<std::pair<std::string, std::vector<std::string>>> crl_set; crl_set.push_back({"asdf", {}}); crl_set.pop_back(); printf("first word in vector storage: %p\n", (void)crl_set.data()); // Do the push_back which may fail to initialize the data. do_push_back(&crl_set); auto first = &crl_set.back().first; printf("first word in vector storage (should be zero): %p\n", (void*)crl_set.data()); assert(first->empty()); puts("ok"); } Compile with libc++, enable optimizations, and enable debug info: $ clang++ -stdlib=libc++ -g -O2 t.cpp -o t.exe -Wl,-rpath=llvm/build/lib This program will assert with this change. llvm-svn: 321510	2017-12-28 05:10:33 +00:00
Simon Pilgrim	e7d032f1d8	[InstCombine] Gracefully handle out of range extractelement indices InstSimplify is responsible for handling these, but we shouldn't just assert here. Reduced from oss-fuzz #4808 test case llvm-svn: 321489	2017-12-27 12:00:18 +00:00
Philip Reames	cd13a66381	[instcombine] add powi(x, 2) -> x * x llvm-svn: 321468	2017-12-27 01:30:12 +00:00
Philip Reames	5000ba69d7	Sink a couple of transforms from instcombine into instsimplify. llvm-svn: 321467	2017-12-27 01:14:30 +00:00
Philip Reames	7a6db4fc4f	[NFC] Extract out a helper function for SimplifyCall(CS, Q) This simplifies code, but the real motivation is that it lets me clean up some downstream code. llvm-svn: 321466	2017-12-27 00:16:12 +00:00
Zhaoshi Zheng	8af1e1cb78	[Unroll][DebugInfo] Propagate loop body's debug location to epilog preheader NewExit and epilog PreHeader should has the same debug loc as the original loop body, instead of original loop exit. llvm-svn: 321465	2017-12-26 23:31:21 +00:00
Sanjay Patel	14adbacd8a	[InstCombine] fix miscompile of frem with 0.0 operand (PR34870) We might want to select NAN here or do this transform with fast-math, but this should at least fix the miscompile. llvm-svn: 321461	2017-12-26 22:12:20 +00:00
Benjamin Kramer	802e6255b2	Make helpers static. No functionality change. llvm-svn: 321425	2017-12-24 12:46:22 +00:00
Florian Hahn	7e9328906b	[CallSiteSplitting] Remove isOrHeader restriction. By following the single predecessors of the predecessors of the call site, we do not need to restrict the control flow. Reviewed By: junbuml, davide Differential Revision: https://reviews.llvm.org/D40729 llvm-svn: 321413	2017-12-23 20:02:26 +00:00
Davide Italiano	55b663431e	[SCCP] Manually fold branches on undef. This code was originally removed and replace with an assertion because believed unnecessary. It turns out there was simply no test coverage for this case, and the constant folder doesn't yet know about patterns like `br undef %label1, %label2`. Presumably at some point the constant folder might learn about these patterns, but it's a broader change. A testcase will be added to make sure this doesn't regress again in the future. Fixes PR35723. llvm-svn: 321402	2017-12-23 15:06:30 +00:00
Guozhi Wei	33250340f4	[SimplifyCFG] Don't do if-conversion if there is a long dependence chain If after if-conversion, most of the instructions in this new BB construct a long and slow dependence chain, it may be slower than cmp/branch, even if the branch has a high miss rate, because the control dependence is transformed into data dependence, and control dependence can be speculated, and thus, the second part can execute in parallel with the first part on modern OOO processor. This patch checks for the long dependence chain, and give up if-conversion if find one. Differential Revision: https://reviews.llvm.org/D39352 llvm-svn: 321377	2017-12-22 18:54:04 +00:00
Easwaran Raman	a17f220590	Add hasProfileData() to check if a function has profile data. NFC. Summary: This replaces calls to getEntryCount().hasValue() with hasProfileData that does the same thing. This refactoring is useful to do before adding synthetic function entry counts but also a useful cleanup IMO even otherwise. I have used hasProfileData instead of hasRealProfileData as David had earlier suggested since I think profile implies "real" and I use the phrase "synthetic entry count" and not "synthetic profile count" but I am fine calling it hasRealProfileData if you prefer. Reviewers: davidxl, silvas Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D41461 llvm-svn: 321331	2017-12-22 01:33:52 +00:00
Michael Zolotukhin	ad371e0caa	[SimplifyCFG] Avoid quadratic on a predecessors number behavior in instruction sinking. If a block has N predecessors, then the current algorithm will try to sink common code to this block N times (whenever we visit a predecessor). Every attempt to sink the common code includes going through all predecessors, so the complexity of the algorithm becomes O(N^2). With this patch we try to sink common code only when we visit the block itself. With this, the complexity goes down to O(N). As a side effect, the moment the code is sunk is slightly different than before (the order of simplifications has been changed), that's why I had to adjust two tests (note that neither of the tests is supposed to test SimplifyCFG): * test/CodeGen/AArch64/arm64-jumptable.ll - changes in this test mimic the changes that previous implementation of SimplifyCFG would do. * test/CodeGen/ARM/avoid-cpsr-rmw.ll - in this test I disabled common code sinking by a command line flag. llvm-svn: 321236	2017-12-21 01:22:13 +00:00
Matthew Simpson	cb35c5d5c2	[ICP] Expose unconditional call promotion interface This patch modifies the indirect call promotion utilities by exposing and using an unconditional call promotion interface. The unconditional promotion interface (i.e., call promotion without creating an if-then-else) can be used if it's known that an indirect call has only one possible callee. The existing conditional promotion interface uses this unconditional interface to promote an indirect call after it has been versioned and placed within the "then" block. A consequence of unconditional promotion is that the fix-up operations for phi nodes in the normal destination of invoke instructions are changed. This is necessary because the existing implementation assumed that an invoke had been versioned, creating a "merge" block where a return value bitcast could be placed. In the new implementation, the edge between a promoted invoke's parent block and its normal destination is split if needed to add a bitcast for the return value. If the invoke is also versioned, the phi node merging the return value of the promoted and original invoke instructions is placed in the "merge" block. Differential Revision: https://reviews.llvm.org/D40751 llvm-svn: 321210	2017-12-20 19:26:37 +00:00
Evgeniy Stepanov	3fd1b1a764	[hwasan] Implement -fsanitize-recover=hwaddress. Summary: Very similar to AddressSanitizer, with the exception of the error type encoding. Reviewers: kcc, alekseyshl Subscribers: cfe-commits, kubamracek, llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D41417 llvm-svn: 321203	2017-12-20 19:05:44 +00:00
Florian Hahn	012c8f97b2	[InstCombine] Add debug location to new caller. Reviewers: rnk, aprantl, majnemer Reviewed By: aprantl Differential Revision: https://reviews.llvm.org/D414 llvm-svn: 321191	2017-12-20 17:16:59 +00:00
Mohammad Shahid	3a934d6ab9	Revert r320548:[SLP] Vectorize jumbled memory loads llvm-svn: 321181	2017-12-20 15:26:59 +00:00
Florian Hahn	467abe3e4f	[LV] Remove unnecessary DoExtraAnalysis guard (silent bug) canVectorize is only checking if the loop has a normalized pre-header if DoExtraAnalysis is true. This doesn't make sense to me because reporting analysis information shouldn't alter legality checks. This is probably the result of a last minute minor change before committing (?). Patch by Diego Caballero. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D40973 llvm-svn: 321172	2017-12-20 13:28:38 +00:00
Dan Gohman	aa3922819e	[memcpyopt] Teach memcpyopt to optimize across basic blocks This teaches memcpyopt to make a non-local memdep query when a local query indicates that the dependency is non-local. This notably allows it to eliminate many more llvm.memcpy calls in common Rust code, often by 20-30%. This is r319482 and r319483, along with fixes for PR35519: fix the optimization that merges stores into memsets to preserve cached memdep info, and fix memdep's non-local caching strategy to not assume that larger queries are always more conservative than smaller ones. Fixes PR28958 and PR35519. Differential Revision: https://reviews.llvm.org/D40802 llvm-svn: 321138	2017-12-20 01:36:25 +00:00
Adrian Prantl	0e6694d111	Silence a bunch of implicit fallthrough warnings llvm-svn: 321114	2017-12-19 22:05:25 +00:00
Haicheng Wu	5b106ef92e	[SeparateConstOffsetFromGEP] Fix a typo. NFC. do CSE for to => do CSE to llvm-svn: 321098	2017-12-19 18:49:21 +00:00
Max Kazantsev	fd95ee0c9a	[JumpThreading] Restrict PRE across instructions that don't pass control to successors PRE in JumpThreading should not be able to hoist copy of non-speculable loads across instructions that don't always transfer execution to their successors, otherwise they may introduce an unsafe load which otherwise would not be executed. The same problem for GVN was fixed as rL316975. Differential Revision: https://reviews.llvm.org/D40347 llvm-svn: 321063	2017-12-19 09:10:21 +00:00
Teresa Johnson	915897e21b	[PGO] Fix handling of cold entry count for instrumented PGO Summary: In r277849, getEntryCount was changed to return None when the entry count was 0, specifically for SamplePGO where it means no samples were recorded. However, for instrumentation PGO a 0 entry count should be returned directly, since it does mean that the function was completely cold. Otherwise we end up treating these functions conservatively in isFunctionEntryCold() and isColdBB(). Instead, for SamplePGO use -1 when there are no samples, and change getEntryCount to return None when the value is -1. Reviewers: danielcdh, davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D41307 llvm-svn: 321018	2017-12-18 20:02:43 +00:00
Dimitry Andric	e4f5d01033	Fix more inconsistent line endings. NFC. llvm-svn: 321016	2017-12-18 19:46:56 +00:00
Matt Arsenault	d89d0b6494	Removed unused DominanceFrontier llvm-svn: 321001	2017-12-18 18:01:13 +00:00
Xinliang David Li	19fb5b467b	[PGO] add MST min edge selection heuristic to ensure non-zero entry count Differential Revision: http://reviews.llvm.org/D41059 llvm-svn: 320998	2017-12-18 17:56:19 +00:00
Sean Fertile	5fb624a3b8	[Memcpy Loop Lowering] Remove the fixed int8 lowering. Switch over to the lowering that uses target supplied operand types. Differential Revision: https://reviews.llvm.org/D41201 llvm-svn: 320989	2017-12-18 15:31:14 +00:00
Eugene Leviant	c95b49603e	[ThinLTO] Remove unused code This is a re-commit of r320464, after patch for gold plugin was landed. llvm-svn: 320968	2017-12-18 10:53:45 +00:00
Hiroshi Inoue	c6faf15459	[SROA] Disable non-whole-alloca splits by default This patch introduce a switch to control splitting of non-whole-alloca slices with default off. The switch will be default on again after fixing an issue reported in PR35657. llvm-svn: 320958	2017-12-18 06:47:37 +00:00
Sean Fertile	68d7f9da76	[Memcpy Loop Lowering] Only calculate residual size/bytes copied when needed. If the loop operand type is int8 then there will be no residual loop for the unknown size expansion. Dont create the residual-size and bytes-copied values when they are not needed. llvm-svn: 320929	2017-12-16 22:41:39 +00:00
Sanjay Patel	5a0cdac174	[InstCombine] canonicalize shifty abs(): ashr+add+xor --> cmp+neg+sel We want to do this for 2 reasons: 1. Value tracking does not recognize the ashr variant, so it would fail to match for cases like D39766. 2. DAGCombiner does better at producing optimal codegen when we have the cmp+sel pattern. More detail about what happens in the backend: 1. DAGCombiner has a generic transform for all targets to convert the scalar cmp+sel variant of abs into the shift variant. That is the opposite of this IR canonicalization. 2. DAGCombiner has a generic transform for all targets to convert the vector cmp+sel variant of abs into either an ABS node or the shift variant. That is again the opposite of this IR canonicalization. 3. DAGCombiner has a generic transform for all targets to convert the exact shift variants produced by #1 or #2 into an ISD::ABS node. Note: It would be an efficiency improvement if we had #1 go directly to an ABS node when that's legal/custom. 4. The pattern matching above is incomplete, so it is possible to escape the intended/optimal codegen in a variety of ways. a. For #2, the vector path is missing the case for setlt with a '1' constant. b. For #3, we are missing a match for commuted versions of the shift variants. 5. Therefore, this IR canonicalization can only help get us to the optimal codegen. The version of cmp+sel produced by this patch will be recognized in the DAG and converted to an ABS node when possible or the shift sequence when not. 6. In the following examples with this patch applied, we may get conditional moves rather than the shift produced by the generic DAGCombiner transforms. The conditional move is created using a target-specific decision for any given target. Whether it is optimal or not for a particular subtarget may be up for debate. define i32 @abs_shifty(i32 %x) { %signbit = ashr i32 %x, 31 %add = add i32 %signbit, %x %abs = xor i32 %signbit, %add ret i32 %abs } define i32 @abs_cmpsubsel(i32 %x) { %cmp = icmp slt i32 %x, zeroinitializer %sub = sub i32 zeroinitializer, %x %abs = select i1 %cmp, i32 %sub, i32 %x ret i32 %abs } define <4 x i32> @abs_shifty_vec(<4 x i32> %x) { %signbit = ashr <4 x i32> %x, <i32 31, i32 31, i32 31, i32 31> %add = add <4 x i32> %signbit, %x %abs = xor <4 x i32> %signbit, %add ret <4 x i32> %abs } define <4 x i32> @abs_cmpsubsel_vec(<4 x i32> %x) { %cmp = icmp slt <4 x i32> %x, zeroinitializer %sub = sub <4 x i32> zeroinitializer, %x %abs = select <4 x i1> %cmp, <4 x i32> %sub, <4 x i32> %x ret <4 x i32> %abs } > $ ./opt -instcombine shiftyabs.ll -S \| ./llc -o - -mtriple=x86_64 -mattr=avx > abs_shifty: > movl %edi, %eax > negl %eax > cmovll %edi, %eax > retq > > abs_cmpsubsel: > movl %edi, %eax > negl %eax > cmovll %edi, %eax > retq > > abs_shifty_vec: > vpabsd %xmm0, %xmm0 > retq > > abs_cmpsubsel_vec: > vpabsd %xmm0, %xmm0 > retq > > $ ./opt -instcombine shiftyabs.ll -S \| ./llc -o - -mtriple=aarch64 > abs_shifty: > cmp w0, #0 // =0 > cneg w0, w0, mi > ret > > abs_cmpsubsel: > cmp w0, #0 // =0 > cneg w0, w0, mi > ret > > abs_shifty_vec: > abs v0.4s, v0.4s > ret > > abs_cmpsubsel_vec: > abs v0.4s, v0.4s > ret > > $ ./opt -instcombine shiftyabs.ll -S \| ./llc -o - -mtriple=powerpc64le > abs_shifty: > srawi 4, 3, 31 > add 3, 3, 4 > xor 3, 3, 4 > blr > > abs_cmpsubsel: > srawi 4, 3, 31 > add 3, 3, 4 > xor 3, 3, 4 > blr > > abs_shifty_vec: > vspltisw 3, -16 > vspltisw 4, 15 > vsubuwm 3, 4, 3 > vsraw 3, 2, 3 > vadduwm 2, 2, 3 > xxlxor 34, 34, 35 > blr > > abs_cmpsubsel_vec: > vspltisw 3, -16 > vspltisw 4, 15 > vsubuwm 3, 4, 3 > vsraw 3, 2, 3 > vadduwm 2, 2, 3 > xxlxor 34, 34, 35 > blr > Differential Revision: https://reviews.llvm.org/D40984 llvm-svn: 320921	2017-12-16 16:41:17 +00:00
Hal Finkel	5444f40965	[LV] Extend InstWidening with CM_Widen_Recursive Changes to the original scalar loop during LV code gen cause the return value of Legal->isConsecutivePtr() to be inconsistent with the return value during legal/cost phases (further analysis and information of the bug is in D39346). This patch is an alternative fix to PR34965 following the CM_Widen approach proposed by Ayal and Gil in D39346. It extends InstWidening enum with CM_Widen_Reverse to properly record the widening decision for consecutive reverse memory accesses and, consequently, get rid of the Legal->isConsetuviePtr() call in LV code gen. I think this is a simpler/cleaner solution to PR34965 than the one in D39346. Fixes PR34965. Patch by Diego Caballero, thanks! Differential Revision: https://reviews.llvm.org/D40742 llvm-svn: 320913	2017-12-16 02:55:24 +00:00
Hal Finkel	2ff24731bb	[SimplifyLibCalls] Inline calls to cabs when it's safe to do so When unsafe algerbra is allowed calls to cabs(r) can be replaced by: sqrt(creal(r)creal(r) + cimag(r)cimag(r)) Patch by Paul Walker, thanks! Differential Revision: https://reviews.llvm.org/D40069 llvm-svn: 320901	2017-12-16 01:26:25 +00:00
Hal Finkel	7333aa9f16	[LV] NFC patch for moving VPRecipe class definitions from LoopVectorize.cpp to VPlan.h This is a small step forward to move VPlan stuff to where it should belong (i.e., VPlan.): 1. VPRecipe classes in LoopVectorize.cpp are moved to VPlan.h. 2. Many of VPRecipe::print() and execute() definitions are still left in LoopVectorize.cpp since they refer to things declared in LoopVectorize.cpp. To be moved to VPlan.cpp at a later time. 3. InterleaveGroup class is moved from anonymous namespace to llvm namespace. Referencing it in anonymous namespace from VPlan.h ended up in warning. Patch by Hideki Saito, thanks! Differential Revision: https://reviews.llvm.org/D41045 llvm-svn: 320900	2017-12-16 01:12:50 +00:00
Teresa Johnson	69b2de8466	Fix NDEBUG build problem in r320895 Fix incorrect placement of #endif causing NDEBUG build failures. llvm-svn: 320897	2017-12-16 00:29:31 +00:00
Teresa Johnson	81bbf74265	[ThinLTO] Enable importing of aliases as copy of aliasee Summary: This implements a missing feature to allow importing of aliases, which was previously disabled because alias cannot be available_externally. We instead import an alias as a copy of its aliasee. Some additional work was required in the IndexBitcodeWriter for the distributed build case, to ensure that the aliasee has a value id in the distributed index file (i.e. even when it is not being imported directly). This is a performance win in codes that have many aliases, e.g. C++ applications that have many constructor and destructor aliases. Reviewers: pcc Subscribers: mehdi_amini, inglorion, eraman, llvm-commits Differential Revision: https://reviews.llvm.org/D40747 llvm-svn: 320895	2017-12-16 00:18:12 +00:00
Jun Bum Lim	44c58d35c1	Re-commit : [LICM] Allow sinking when foldable in loop This recommits r320823 reverted due to the test failure in sink-foldable.ll and an unused variable. Added "REQUIRES: aarch64-registered-target" in the test and removed unused variable. Original commit message: Continue trying to sink an instruction if its users in the loop is foldable. This will allow the instruction to be folded in the loop by decoupling it from the user outside of the loop. Reviewers: hfinkel, majnemer, davidxl, efriedma, danielcdh, bmakam, mcrosier Reviewed By: hfinkel Subscribers: javed.absar, bmakam, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D37076 llvm-svn: 320858	2017-12-15 20:33:24 +00:00
Sean Fertile	42b13343fd	[Memcpy Loop Lowering] Insert loop BB inbetween the split BB. The original memcpy expansion inserted the loop basic block inbetween the 2 new basic blocks created by splitting the original block the memcpy call was in. This commit makes the new memcpy expansion do the same to keep the layout of the IR matching between the old and new implementations. Differential Review: https://reviews.llvm.org/D41197 llvm-svn: 320848	2017-12-15 19:29:12 +00:00
Sanjay Patel	c722e26549	fix typo in comment and remove inaccurate comment; NFC llvm-svn: 320838	2017-12-15 18:25:13 +00:00
Jun Bum Lim	5efd4d8b5e	Revert "Re-commit : [LICM] Allow sinking when foldable in loop" This reverts commit r320833. llvm-svn: 320836	2017-12-15 18:12:49 +00:00
Jun Bum Lim	83ccad6684	Re-commit : [LICM] Allow sinking when foldable in loop This recommit r320823 after fixing a test failure. Original commit message: Continue trying to sink an instruction if its users in the loop is foldable. This will allow the instruction to be folded in the loop by decoupling it from the user outside of the loop. Reviewers: hfinkel, majnemer, davidxl, efriedma, danielcdh, bmakam, mcrosier Reviewed By: hfinkel Subscribers: javed.absar, bmakam, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D37076 llvm-svn: 320833	2017-12-15 17:58:59 +00:00
Jun Bum Lim	6136d87f5d	Revert "[LICM] Allow sinking when foldable in loop" This reverts commit r320823. llvm-svn: 320828	2017-12-15 16:35:09 +00:00
Jun Bum Lim	22855c26a5	[LICM] Allow sinking when foldable in loop Summary: Continue trying to sink an instruction if its users in the loop is foldable. This will allow the instruction to be folded in the loop by decoupling it from the user outside of the loop. Reviewers: hfinkel, majnemer, davidxl, efriedma, danielcdh, bmakam, mcrosier Reviewed By: hfinkel Subscribers: javed.absar, bmakam, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D37076 llvm-svn: 320823	2017-12-15 16:09:54 +00:00
Fedor Sergeev	4b86d79048	[PM] port Rewrite Statepoints For GC to the new pass manager. Summary: The port is nearly straightforward. The only complication is related to the analyses handling, since one of the analyses used in this module pass is domtree, which is a function analysis. That requires asking for the results of each function and disallows a single interface for run-on-module pass action. Decided to copy-paste the main body of this pass. Most of its code is requesting analyses anyway, so not that much of a copy-paste. The rest of the code movement is to transform all the implementation helper functions like stripNonValidData into non-member statics. Extended all the related LLVM tests with new-pass-manager use. No failures. Reviewers: sanjoy, anna, reames Reviewed By: anna Subscribers: skatkov, llvm-commits Differential Revision: https://reviews.llvm.org/D41162 llvm-svn: 320796	2017-12-15 09:32:11 +00:00
Sanjay Patel	0ab0c1a201	[SimplifyCFG] don't sink common insts too soon (PR34603) This should solve: https://bugs.llvm.org/show_bug.cgi?id=34603 ...by preventing SimplifyCFG from altering redundant instructions before early-cse has a chance to run. It changes the default (canonical-forming) behavior of SimplifyCFG, so we're only doing the sinking transform later in the optimization pipeline. Differential Revision: https://reviews.llvm.org/D38566 llvm-svn: 320749	2017-12-14 22:05:20 +00:00
Guozhi Wei	d22d1b953d	[SLPVectorizer] Don't ignore scalar extraction instructions of aggregate value In SLPVectorizer, the vector build instructions (insertvalue for aggregate type) is passed to BoUpSLP.buildTree, it is treated as UserIgnoreList, so later in cost estimation, the cost of these instructions are not counted. For aggregate value, later usage are more likely to be done in scalar registers, either used as individual scalars or used as a whole for function call or return value. Ignore scalar extraction instructions may cause too aggressive vectorization for aggregate values, and slow down performance. So for vectorization of aggregate value, the scalar extraction instructions are required in cost estimation. Differential Revision: https://reviews.llvm.org/D41139 llvm-svn: 320736	2017-12-14 19:35:43 +00:00
Fedor Sergeev	83bcc68afa	[PM][InstCombine] fixing omission of AliasAnalysis in new-pass-manager's version of InstCombine Summary: Passing AliasAnalysis results instead of nullptr appears to work just fine. A couple new-pass-manager tests updated to align with new order of analyses. Reviewers: chandlerc, spatel, craig.topper Reviewed By: chandlerc Subscribers: mehdi_amini, eraman, llvm-commits Differential Revision: https://reviews.llvm.org/D41203 llvm-svn: 320687	2017-12-14 10:36:31 +00:00
Dorit Nuzman	4750c785b3	[LV] Support efficient vectorization of an induction with redundant casts D30041 extended SCEVPredicateRewriter to improve handling of Phi nodes whose update chain involves casts; PSCEV can now build an AddRecurrence for some forms of such phi nodes, under the proper runtime overflow test. This means that we can identify such phi nodes as an induction, and the loop-vectorizer can now vectorize such inductions, however inefficiently. The vectorizer doesn't know that it can ignore the casts, and so it vectorizes them. This patch records the casts in the InductionDescriptor, so that they could be marked to be ignored for cost calculation (we use VecValuesToIgnore for that) and ignored for vectorization/widening/scalarization (i.e. treated as TriviallyDead). In addition to marking all these casts to be ignored, we also need to make sure that each cast is mapped to the right vector value in the vector loop body (be it a widened, vectorized, or scalarized induction). So whenever an induction phi is mapped to a vector value (during vectorization/widening/ scalarization), we also map the respective cast instruction (if exists) to that vector value. (If the phi-update sequence of an induction involves more than one cast, then the above mapping to vector value is relevant only for the last cast of the sequence as we allow only the "last cast" to be used outside the induction update chain itself). This is the last step in addressing PR30654. llvm-svn: 320672	2017-12-14 07:56:31 +00:00
Sanjay Patel	558a465473	[EarlyCSE] recognize swapped variants of abs/nabs as equivalent Extends https://reviews.llvm.org/rL320640 Differential Revision: https://reviews.llvm.org/D41136 llvm-svn: 320653	2017-12-13 22:57:35 +00:00
Brian M. Rzycki	580bc3c8fa	Reverting [JumpThreading] Preservation of DT and LVI across the pass Stage 2 bootstrap failed: http://lab.llvm.org:8011/builders/clang-x86_64-linux-selfhost-modules-2/builds/14434 llvm-svn: 320641	2017-12-13 22:01:17 +00:00
Sanjay Patel	3c7a35de7f	[EarlyCSE] recognize commuted and swapped variants of min/max as equivalent (PR35642) As shown in: https://bugs.llvm.org/show_bug.cgi?id=35642 ...we can have different forms of min/max, so we should recognize those here in EarlyCSE similar to how we already handle binops and compares that can commute. Differential Revision: https://reviews.llvm.org/D41136 llvm-svn: 320640	2017-12-13 21:58:15 +00:00
Michael Zolotukhin	6af4f232b5	Remove redundant includes from lib/Transforms. llvm-svn: 320628	2017-12-13 21:31:01 +00:00
Brian M. Rzycki	d989af98b3	[JumpThreading] Preservation of DT and LVI across the pass Summary: See D37528 for a previous (non-deferred) version of this patch and its description. Preserves dominance in a deferred manner using a new class DeferredDominance. This reduces the performance impact of updating the DominatorTree at every edge insertion and deletion. A user may call DDT->flush() within JumpThreading for an up-to-date DT. This patch currently has one flush() at the end of runImpl() to ensure DT is preserved across the pass. LVI is also preserved to help subsequent passes such as CorrelatedValuePropagation. LVI is simpler to maintain and is done immediately (not deferred). The code to perfom the preversation was minimally altered and was simply marked as preserved for the PassManager to be informed. This extends the analysis available to JumpThreading for future enhancements. One example is loop boundary threading. Reviewers: dberlin, kuhar, sebpop Reviewed By: kuhar, sebpop Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D40146 llvm-svn: 320612	2017-12-13 20:52:26 +00:00
Aditya Kumar	49c03b11df	[GVNHoist] Fix: PR35222 gvn-hoist incorrectly erases load w.r.t. the paper "A Practical Improvement to the Partial Redundancy Elimination in SSA Form" (https://sites.google.com/site/jongsoopark/home/ssapre.pdf) Proper dominance check was missing here, so having a loopinfo should not be required. Committing this diff as this fixes the bug, if there are further concerns, I'll be happy to work on them. Differential Revision: https://reviews.llvm.org/D39781 llvm-svn: 320607	2017-12-13 19:40:07 +00:00
Igor Laevsky	e0edb66475	Reintroduce r320049, r320014 and r319894. OpenGL issues should be fixed by now. llvm-svn: 320568	2017-12-13 11:21:18 +00:00
Mohammad Shahid	dbd30edb7f	[SLP] Vectorize jumbled memory loads. Summary: This patch tries to vectorize loads of consecutive memory accesses, accessed in non-consecutive or jumbled way. An earlier attempt was made with patch D26905 which was reverted back due to some basic issue with representing the 'use mask' of jumbled accesses. This patch fixes the mask representation by recording the 'use mask' in the usertree entry. Change-Id: I9fe7f5045f065d84c126fa307ef6ebe0787296df Reviewers: mkuper, loladiro, Ayal, zvi, danielcdh Reviewed By: Ayal Subscribers: mgrang, dcaballe, hans, mzolotukhin Differential Revision: https://reviews.llvm.org/D36130 llvm-svn: 320548	2017-12-13 03:08:29 +00:00
Florian Hahn	beda7d517d	[CallSiteSplitting] Refactor creating callsites. Summary: This change makes the call site creation more general if any of the arguments is predicated on a condition in the call site's predecessors. If we find a callsite, that potentially can be split, we collect the set of conditions for the call site's predecessors (currently only 2 predecessors are allowed). To do that, we traverse each predecessor's predecessors as long as it only has single predecessors and record the condition, if it is relevant to the call site. For each condition, we also check if the condition is taken or not. In case it is not taken, we record the inverse predicate. We use the recorded conditions to create the new call sites and split the basic block. This has 2 benefits: (1) it is slightly easier to see what is going on (IMO) and (2) we can easily extend it to handle more complex control flow. Reviewers: davidxl, junbuml Reviewed By: junbuml Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D40728 llvm-svn: 320547	2017-12-13 03:05:20 +00:00
Evgeniy Stepanov	ecb48e523e	[hwasan] Inline instrumentation & fixed shadow. Summary: This brings CPU overhead on bzip2 down from 5.5x to 2x. Reviewers: kcc, alekseyshl Subscribers: kubamracek, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D41137 llvm-svn: 320538	2017-12-13 01:16:34 +00:00
Alexey Bataev	83c15b1363	[InstCombine] Fix PR35618: Instcombine hangs on single minmax load bitcast. Summary: If we have pattern `store (load(bitcast(select (cmp(V1, V2), &V1, &V2)))), bitcast)`, but the load is used in other instructions, it leads to looping in InstCombiner. Patch adds additional check that all users of the load instructions are stores and then replaces all uses of load instruction by the new one with new type. Reviewers: RKSimon, spatel, majnemer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D41072 llvm-svn: 320525	2017-12-12 20:28:46 +00:00
Fiona Glaser	b8a330c42a	Reassociate: add global reassociation algorithm This algorithm (explained more in the source code) takes into account global redundancies by building a "pair map" to find common subexprs. The primary motivation of this is to handle situations like foo = (a * b) * c bar = (a * d) * c where we currently don't identify that "a * c" is redundant. Accordingly, it prioritizes the emission of a * c so that CSE can remove the redundant calculation later. Does not change the actual reassociation algorithm -- only the order in which the reassociated operand chain is reconstructed. Gives ~1.5% floating point math instruction count reduction on a large offline suite of graphics shaders. llvm-svn: 320515	2017-12-12 19:18:02 +00:00
Alexey Bataev	fa0a76dbcc	Revert "[InstCombine] Fix PR35618: Instcombine hangs on single minmax load bitcast." This reverts commit r320510 - again sanitizers bbots. llvm-svn: 320513	2017-12-12 19:12:34 +00:00
Hiroshi Yamauchi	f3bda1daa2	Split IndirectBr critical edges before PGO gen/use passes. Summary: The PGO gen/use passes currently fail with an assert failure if there's a critical edge whose source is an IndirectBr instruction and that edge needs to be instrumented. To avoid this in certain cases, split IndirectBr critical edges in the PGO gen/use passes. This works for blocks with single indirectbr predecessors, but not for those with multiple indirectbr predecessors (splitting an IndirectBr critical edge isn't always possible.) Reviewers: davidxl, xur Reviewed By: davidxl Subscribers: efriedma, llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D40699 llvm-svn: 320511	2017-12-12 19:07:43 +00:00
Alexey Bataev	195c97e220	[InstCombine] Fix PR35618: Instcombine hangs on single minmax load bitcast. Summary: If we have pattern `store (load(bitcast(select (cmp(V1, V2), &V1, &V2)))), bitcast)`, but the load is used in other instructions, it leads to looping in InstCombiner. Patch adds additional check that all users of the load instructions are stores and then replaces all uses of load instruction by the new one with new type. Reviewers: RKSimon, spatel, majnemer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D41072 llvm-svn: 320510	2017-12-12 18:47:00 +00:00
Alexey Bataev	6132a50d2a	Revert "[InstCombine] Fix PR35618: Instcombine hangs on single minmax load bitcast." This reverts commit r320499 again to resolve the problem with the sanitizers bbots. llvm-svn: 320501	2017-12-12 17:35:29 +00:00
Alexey Bataev	ca4c9a5246	[InstCombine] Fix PR35618: Instcombine hangs on single minmax load bitcast. Summary: If we have pattern `store (load(bitcast(select (cmp(V1, V2), &V1, &V2)))), bitcast)`, but the load is used in other instructions, it leads to looping in InstCombiner. Patch adds additional check that all users of the load instructions are stores and then replaces all uses of load instruction by the new one with new type. Reviewers: RKSimon, spatel, majnemer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D41072 llvm-svn: 320499	2017-12-12 17:19:15 +00:00
Alexey Bataev	d19dbe6791	Revert "[InstCombine] Fix PR35618: Instcombine hangs on single minmax load bitcast." This reverts commit r320496 to solve the problems with sanitizer buildbots. llvm-svn: 320498	2017-12-12 17:08:48 +00:00
Alexey Bataev	d0c3aeb200	[InstCombine] Fix PR35618: Instcombine hangs on single minmax load bitcast. Summary: If we have pattern `store (load(bitcast(select (cmp(V1, V2), &V1, &V2)))), bitcast)`, but the load is used in other instructions, it leads to looping in InstCombiner. Patch adds additional check that all users of the load instructions are stores and then replaces all uses of load instruction by the new one with new type. Reviewers: RKSimon, spatel, majnemer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D41072 llvm-svn: 320496	2017-12-12 16:58:48 +00:00
Alexey Bataev	c9f1d2e4a0	Revert "[InstCombine] Fix PR35618: Instcombine hangs on single minmax load bitcast." This reverts commit r320488 because of the failed asan buildbots.. llvm-svn: 320490	2017-12-12 16:05:52 +00:00
Alexey Bataev	fb68c48a82	[InstCombine] Fix PR35618: Instcombine hangs on single minmax load bitcast. Summary: If we have pattern `store (load(bitcast(select (cmp(V1, V2), &V1, &V2)))), bitcast)`, but the load is used in other instructions, it leads to looping in InstCombiner. Patch adds additional check that all users of the load instructions are stores and then replaces all uses of load instruction by the new one with new type. Reviewers: RKSimon, spatel, majnemer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D41072 llvm-svn: 320488	2017-12-12 15:54:49 +00:00
Alexey Bataev	ca2a8cea2f	Revert "[InstCombine] Fix PR35618: Instcombine hangs on single minmax load bitcast." This reverts commit r320483 because of the failed Windows buildbots. llvm-svn: 320485	2017-12-12 15:24:17 +00:00
Alexey Bataev	1daef8a667	[InstCombine] Fix PR35618: Instcombine hangs on single minmax load bitcast. If we have pattern `store (load(bitcast(select (cmp(V1, V2), &V1, &V2)))), bitcast)`, but the load is used in other instructions, it leads to looping in InstCombiner. Patch adds additional check that all users of the load instructions are stores and then replaces all uses of load instruction by the new one with new type. Reviewers: RKSimon, spatel, majnemer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D41072 llvm-svn: 320483	2017-12-12 15:03:17 +00:00
Anna Thomas	2dd9835f35	[InstComineLoadStoreAlloca] Optimize stores to GEP off null base Summary: Currently, in InstCombineLoadStoreAlloca, we have simplification rules for the following cases: 1. load off a null 2. load off a GEP with null base 3. store to a null This patch adds support for the fourth case which is store into a GEP with null base. Since this is UB as well (and directly analogous to the load off a GEP with null base), we can substitute the stored val with undef in instcombine, so that SimplifyCFG can optimize this code into unreachable code. Note: Right now, simplifyCFG hasn't been taught about optimizing this to unreachable and adding an llvm.trap (this is already done for the above 3 cases). Reviewers: majnemer, hfinkel, sanjoy, davide Reviewed by: sanjoy, davide Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D41026 llvm-svn: 320480	2017-12-12 14:12:33 +00:00
Eugene Leviant	d53f3da772	Revert r320464 as it breaks gold plugin tests llvm-svn: 320467	2017-12-12 10:12:46 +00:00
Igor Laevsky	d63560b817	Revert r320049, r320014 and r319894 They were causing failures of the piglit OpenGL tests with AMD GPUs using the Mesa radeonsi driver. llvm-svn: 320466	2017-12-12 10:03:39 +00:00
Eugene Leviant	3695183395	[ThinLTO] Remove unused code from thinLTOInternalizeModule Differential revision: https://reviews.llvm.org/D40970 llvm-svn: 320464	2017-12-12 09:12:32 +00:00
Dorit Nuzman	927b31600e	[LV] Ignore the cost of values that will not appear in the vectorized loop VecValuesToIgnore holds values that will not appear in the vectorized loop. We should therefore ignore their cost when VF > 1. Differential Revision: https://reviews.llvm.org/D40883 llvm-svn: 320463	2017-12-12 08:57:43 +00:00
Mikael Holmen	66cf383761	[CallSiteSplitting] Don't let debug intrinsics affect optimizations Summary: This solves PR35616. We don't want the compiler to generate different code when we compile with/without -g, so we now ignore debug intrinsics when determining if the optimization can trigger or not. Reviewers: junbuml Subscribers: davide, JDevlieghere, llvm-commits Differential Revision: https://reviews.llvm.org/D41068 llvm-svn: 320460	2017-12-12 07:29:57 +00:00
Matt Arsenault	3e268cc0dd	LSR: Check more intrinsic pointer operands llvm-svn: 320424	2017-12-11 21:38:43 +00:00
Hans Wennborg	27d1c00c01	Revert r320407 "[InstCombine] Fix PR35618: Instcombine hangs on single minmax load bitcast." The tests fail (opt asserts) on Windows. > Summary: > If we have pattern `store (load(bitcast(select (cmp(V1, V2), &V1, > &V2)))), bitcast)`, but the load is used in other instructions, it leads > to looping in InstCombiner. Patch adds additional check that all users > of the load instructions are stores and then replaces all uses of load > instruction by the new one with new type. > > Reviewers: RKSimon, spatel, majnemer > > Subscribers: llvm-commits > > Differential Revision: https://reviews.llvm.org/D41072 llvm-svn: 320421	2017-12-11 21:15:27 +00:00
Adrian Prantl	3c6c14d14b	ASAN: Provide reliable debug info for local variables at -O0. The function stack poisioner conditionally stores local variables either in an alloca or in malloc'ated memory, which has the unfortunate side-effect, that the actual address of the variable is only materialized when the variable is accessed, which means that those variables are mostly invisible to the debugger even when compiling without optimizations. This patch stores the address of the local stack base into an alloca, which can be referred to by the debug info and is available throughout the function. This adds one extra pointer-sized alloca to each stack frame (but mem2reg can optimize it away again when optimizations are enabled, yielding roughly the same debug info quality as before in optimized code). rdar://problem/30433661 Differential Revision: https://reviews.llvm.org/D41034 llvm-svn: 320415	2017-12-11 20:43:21 +00:00
Alexey Bataev	ec128ace8a	[InstCombine] Fix PR35618: Instcombine hangs on single minmax load bitcast. Summary: If we have pattern `store (load(bitcast(select (cmp(V1, V2), &V1, &V2)))), bitcast)`, but the load is used in other instructions, it leads to looping in InstCombiner. Patch adds additional check that all users of the load instructions are stores and then replaces all uses of load instruction by the new one with new type. Reviewers: RKSimon, spatel, majnemer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D41072 llvm-svn: 320407	2017-12-11 19:11:16 +00:00
Alexander Potapenko	3c934e4864	[MSan] Hotfix compilation For some reason the override directives got removed in r320373. I suspect this to be an unwanted effect of clang-format. llvm-svn: 320381	2017-12-11 15:48:56 +00:00
Alexander Potapenko	c07e6a0eff	[MSan] introduce getShadowOriginPtr(). NFC. This patch introduces getShadowOriginPtr(), a method that obtains both the shadow and origin pointers for an address as a Value pair. The existing callers of getShadowPtr() and getOriginPtr() are updated to use getShadowOriginPtr(). The rationale for this change is to simplify KMSAN instrumentation implementation. In KMSAN origins tracking is always enabled, and there's no direct mapping between the app memory and the shadow/origin pages. Both the shadow and the origin pointer for a given address are obtained by calling a single runtime hook from the instrumentation, therefore it's easier to work with those pointers together. Reviewed at https://reviews.llvm.org/D40835. llvm-svn: 320373	2017-12-11 15:05:22 +00:00
Sanjay Patel	b23e148114	[SimplifyLibCalls] propagate FMF when folding pow(x, -1.0) call Follow-up for a bug that's similar to: https://bugs.llvm.org/show_bug.cgi?id=35601 llvm-svn: 320312	2017-12-10 17:25:54 +00:00
Sanjay Patel	09ec34349a	[SimplifyLibCalls] propagate FMF when folding pow(x, 2.0) call (PR35601) This should fix the larger problem with sqrt shown in: https://bugs.llvm.org/show_bug.cgi?id=35601 llvm-svn: 320310	2017-12-10 16:52:26 +00:00
Xinliang David Li	fa3f1a15b2	[PGO] change arg type to uint64_t to match member field type llvm-svn: 320285	2017-12-10 07:39:53 +00:00
Simon Pilgrim	a42a54258e	[InstCombine] Fix SimplifyDemandedUseBits SHL handling (PR35515) Don't assume that the pattern matched SRL can be cast to an Instruction (might be ConstExpr etc.) llvm-svn: 320270	2017-12-09 23:42:56 +00:00
Florian Hahn	c5bebffe4f	[InlineFunction] Set debug loc for call to forward varargs. Reviewers: aprantl, dblaikie, rnk Reviewed By: rnk Subscribers: eraman, llvm-commits, JDevlieghere Differential Revision: https://reviews.llvm.org/D40432 llvm-svn: 320252	2017-12-09 14:25:33 +00:00
Kamil Rytarowski	3d3f91e832	Register NetBSD/x86_64 in MemorySanitizer.cpp Summary: Reuse the Linux new mapping as it is. Sponsored by <The NetBSD Foundation> Reviewers: joerg, eugenis, vitalybuka Reviewed By: vitalybuka Subscribers: llvm-commits, #sanitizers Tags: #sanitizers Differential Revision: https://reviews.llvm.org/D41022 llvm-svn: 320219	2017-12-09 00:32:09 +00:00
Evgeniy Stepanov	c667c1f47a	Hardware-assisted AddressSanitizer (llvm part). Summary: This is LLVM instrumentation for the new HWASan tool. It is basically a stripped down copy of ASan at this point, w/o stack or global support. Instrumenation adds a global constructor + runtime callbacks for every load and store. HWASan comes with its own IR attribute. A brief design document can be found in clang/docs/HardwareAssistedAddressSanitizerDesign.rst (submitted earlier). Reviewers: kcc, pcc, alekseyshl Subscribers: srhines, mehdi_amini, mgorny, javed.absar, eraman, llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D40932 llvm-svn: 320217	2017-12-09 00:21:41 +00:00
Adrian Prantl	d13170174c	Generalize llvm::replaceDbgDeclare and actually support the use-case that is mentioned in the documentation (inserting a deref before the plus_uconst). llvm-svn: 320203	2017-12-08 21:58:18 +00:00
Florian Hahn	e5089e2e94	[CodeExtractor] Add debug locations for new call and branch instrs. Summary: If a partially inlined function has debug info, we have to add debug locations to the call instruction calling the outlined function. We use the debug location of the first instruction in the outlined function, as the introduced call transfers control to this statement and there is no other equivalent line in the source code. We also use the same debug location for the branch instruction added to jump from artificial entry block for the outlined function, which just jumps to the first actual basic block of the outlined function. Reviewers: davide, aprantl, rriddle, dblaikie, danielcdh, wmi Reviewed By: aprantl, rriddle, danielcdh Subscribers: eraman, JDevlieghere, llvm-commits Differential Revision: https://reviews.llvm.org/D40413 llvm-svn: 320199	2017-12-08 21:49:03 +00:00
Xinliang David Li	d91057bf52	Revert r320104: infinite loop profiling bug fix Causes unexpected memory issue with New PM this time. The new PM invalidates BPI but not BFI, leaving the reference to BPI from BFI invalid. Abandon this patch. There is a more general solution which also handles runtime infinite loop (but not statically). llvm-svn: 320180	2017-12-08 19:38:07 +00:00
Brian M. Rzycki	0eae123d9e	[JumpThreading] Minor comment cleanup. NFC. (test commit) llvm-svn: 320179	2017-12-08 19:36:32 +00:00
Alexey Bataev	ec95c6cc0a	[InstCombine] PR35354: Convert store(bitcast, load bitcast (select (Cond, &V1, &V2)) --> store (, load (select(Cond, load &V1, load &V2))) Summary: If we have the code like this: ``` float a, b; a = std::max(a ,b); ``` it is converted into something like this: ``` %call = call dereferenceable(4) float* @_ZSt3maxIfERKT_S2_S2_(float* nonnull dereferenceable(4) %a.addr, float* nonnull dereferenceable(4) %b.addr) %1 = bitcast float* %call to i32* %2 = load i32, i32* %1, align 4 %3 = bitcast float* %a.addr to i32* store i32 %2, i32* %3, align 4 ``` After inlinning this code is converted to the next: ``` %1 = load float, float* %a.addr %2 = load float, float* %b.addr %cmp.i = fcmp fast olt float %1, %2 %__b.__a.i = select i1 %cmp.i, float* %a.addr, float* %b.addr %3 = bitcast float* %__b.__a.i to i32* %4 = load i32, i32* %3, align 4 %5 = bitcast float* %arrayidx to i32* store i32 %4, i32* %5, align 4 ``` This pattern is not recognized as minmax pattern. Patch solves this problem by converting sequence ``` store (bitcast, (load bitcast (select ((cmp V1, V2), &V1, &V2)))) ``` to a sequence ``` store (,load (select((cmp V1, V2), &V1, &V2))) ``` After this the code is recognized as minmax pattern. Reviewers: RKSimon, spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D40304 llvm-svn: 320157	2017-12-08 15:32:10 +00:00
Bill Seurer	957a076cce	[PowerPC][asan] Update asan to handle changed memory layouts in newer kernels In more recent Linux kernels with 47 bit VMAs the layout of virtual memory for powerpc64 changed causing the address sanitizer to not work properly. This patch adds support for 47 bit VMA kernels for powerpc64 and fixes up test cases. https://reviews.llvm.org/D40907 There is an associated patch for compiler-rt. Tested on several 4.x and 3.x kernel releases. llvm-svn: 320109	2017-12-07 22:53:33 +00:00
Alina Sbirlea	193429f0c8	[ModRefInfo] Make enum ModRefInfo an enum class [NFC]. Summary: Make enum ModRefInfo an enum class. Changes to ModRefInfo values should be done using inline wrappers. This should prevent future bit-wise opearations from being added, which can be more error-prone. Reviewers: sanjoy, dberlin, hfinkel, george.burgess.iv Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D40933 llvm-svn: 320107	2017-12-07 22:41:34 +00:00
Xinliang David Li	4b0027f671	[PGO] detect infinite loop and form MST properly Differential Revision: http://reviews.llvm.org/D40873 llvm-svn: 320104	2017-12-07 22:23:28 +00:00
Igor Laevsky	4a4f2e8c67	[InstCombine] Don't crash on out of bounds index in the insertelement Differential Revision: https://reviews.llvm.org/D40390 llvm-svn: 320049	2017-12-07 15:00:52 +00:00
Adam Nemet	a502ee73c4	[LV] Interleaved access vectorization: fix computing new alias info As a new access is generated spanning across multiple fields, we need to propagate alias info from all the fields to form the most generic alias info. rdar://35602528 Differential Revision: https://reviews.llvm.org/D40617 llvm-svn: 319979	2017-12-06 22:42:24 +00:00
Sanjay Patel	b6404a8ca6	[InstCombine] canonicalize constant-minus-boolean to select-of-constants This restores the half of: https://reviews.llvm.org/rL75531 that was reverted at: https://reviews.llvm.org/rL159230 For the x86 case mentioned there, we now produce: leal 1(%rdi), %eax subl %esi, %eax We have target hooks to invert this in DAGCombiner (and x86 is enabled) with: https://reviews.llvm.org/rL296977 https://reviews.llvm.org/rL311731 AArch64 and possibly other targets would probably benefit from enabling those hooks too. See PR30327: https://bugs.llvm.org/show_bug.cgi?id=30327#c2 Differential Revision: https://reviews.llvm.org/D40612 llvm-svn: 319964	2017-12-06 21:22:57 +00:00
Matthew Simpson	e363d2cebb	[PGO] Make indirect call promotion a utility This patch factors out the main code transformation utilities in the pgo-driven indirect call promotion pass and places them in Transforms/Utils. The change is intended to be a non-functional change, letting non-pgo-driven passes share a common implementation with the existing pgo-driven pass. The common utilities are used to conditionally promote indirect call sites to direct call sites. They perform the underlying transformation, and do not consider profile information. The pgo-specific details (e.g., the computation of branch weight metadata) have been left in the indirect call promotion pass. Differential Revision: https://reviews.llvm.org/D40658 llvm-svn: 319963	2017-12-06 21:22:54 +00:00
Alina Sbirlea	18fea013de	[ModRefInfo] Do not use ModRefInfo result in if conditions as this makes assumptions about the values in the enum. Replace with wrapper returning bool [NFC]. llvm-svn: 319949	2017-12-06 19:56:37 +00:00
Florian Hahn	115d99162c	[InlineFunction] Only replace call if there are VarArgs to forward. Summary: There is no need to replace the original call instruction if no VarArgs need to be forwarded. Reviewers: davide, rnk, majnemer, efriedma Reviewed By: efriedma Subscribers: eraman, llvm-commits Differential Revision: https://reviews.llvm.org/D40412 llvm-svn: 319947	2017-12-06 19:47:24 +00:00
Sanjay Patel	3e069f5724	[LoopUtils] simplify createTargetReduction(); NFCI llvm-svn: 319946	2017-12-06 19:37:00 +00:00
Sanjay Patel	1ea7b6f7a1	[LoopUtils] fix variable name to match FMF vocabulary; NFC llvm-svn: 319928	2017-12-06 19:11:23 +00:00
Hans Wennborg	146a9c3e51	Revert r319482 and r319483 "[memcpyopt] Teach memcpyopt to optimize across basic blocks" This caused PR35519. > [memcpyopt] Teach memcpyopt to optimize across basic blocks > > This teaches memcpyopt to make a non-local memdep query when a local query > indicates that the dependency is non-local. This notably allows it to > eliminate many more llvm.memcpy calls in common Rust code, often by 20-30%. > > Fixes PR28958. > > Differential Revision: https://reviews.llvm.org/D38374 > > [memcpyopt] Commit file missed in r319482. > > This change was meant to be included with r319482 but was accidentally > omitted. llvm-svn: 319873	2017-12-06 01:47:55 +00:00
Xinliang David Li	45c819063a	Revert r319794: [PGO] detect infinite loop and form MST properly: memory leak problem llvm-svn: 319841	2017-12-05 21:54:01 +00:00
Alina Sbirlea	63d2250a42	Modify ModRefInfo values using static inline method abstractions [NFC]. Summary: The aim is to make ModRefInfo checks and changes more intuitive and less error prone using inline methods that abstract the bit operations. Ideally ModRefInfo would become an enum class, but that change will require a wider set of changes into FunctionModRefBehavior. Reviewers: sanjoy, george.burgess.iv, dberlin, hfinkel Subscribers: nlopes, llvm-commits Differential Revision: https://reviews.llvm.org/D40749 llvm-svn: 319821	2017-12-05 20:12:23 +00:00
Joel Galenson	ea0bafda8a	[CVP] Remove some {s\|u}sub.with.overflow checks. This uses ConstantRange::makeGuaranteedNoWrapRegion's newly-added handling for subtraction to allow CVP to remove some subtraction overflow checks. Differential Revision: https://reviews.llvm.org/D40039 llvm-svn: 319807	2017-12-05 18:14:24 +00:00
Joel Galenson	d9500bc533	Test commit. I removed a space at the end of a comment. NFC. llvm-svn: 319803	2017-12-05 17:59:07 +00:00
Xinliang David Li	cc35bc9efc	[PGO] detect infinite loop and form MST properly Differential Revision: http://reviews.llvm.org/D40702 llvm-svn: 319794	2017-12-05 17:19:41 +00:00
Mikael Holmen	0a3e98062f	Bail out of a SimplifyCFG switch table opt at undef values. Summary: A true or false result is expected from a comparison, but it seems the possibility of undef was overlooked, which could lead to a failed assert. This is fixed by this patch by bailing out if we encounter undef. The bug is old and the assert has been there since the end of 2014, so it seems this is unusual enough to forego optimization. Patch by JesperAntonsson. Reviewers: spatel, eeckstein, hans Reviewed By: hans Subscribers: uabelho, llvm-commits Differential Revision: https://reviews.llvm.org/D40639 llvm-svn: 319768	2017-12-05 14:14:00 +00:00
Evgeniy Stepanov	4a8d151986	[msan] Add a fixme note for a minor deficiency. llvm-svn: 319708	2017-12-04 22:50:39 +00:00
Hiroshi Yamauchi	9364fa3434	Move splitIndirectCriticalEdges() to BasicBlockUtils.h. Summary: Move splitIndirectCriticalEdges() from CodeGenPrepare to BasicBlockUtils.h so that it can be called from other places. Reviewers: davidxl Reviewed By: davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D40750 llvm-svn: 319689	2017-12-04 20:36:01 +00:00
Sanjoy Das	aa92cae14e	[BypassSlowDivision] Improve our handling of divisions by constants (This reapplies r314253. r314253 was reverted on r314482 because of a correctness regression on P100, but that regression was identified to be something else.) Summary: Don't bail out on constant divisors for divisions that can be narrowed without introducing control flow . This gives us a 32 bit multiply instead of an emulated 64 bit multiply in the generated PTX assembly. Reviewers: jlebar Subscribers: jholewinski, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D38265 llvm-svn: 319677	2017-12-04 19:21:58 +00:00
Anna Thomas	7b360434ff	[Loop Predication] Teach LP about reverse loops Summary: Currently, we only support predication for forward loops with step of 1. This patch enables loop predication for reverse or countdownLoops, which satisfy the following conditions: 1. The step of the IV is -1. 2. The loop has a singe latch as B(X) = X <pred> latchLimit with pred as s> or u> 3. The IV of the guard is the decrement IV of the latch condition (Guard is: G(X) = X-1 u< guardLimit). This patch was downstream for a while and is the last series of patches that's from our LP implementation downstream. Reviewers: apilipenko, mkazantsev, sanjoy Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D40353 llvm-svn: 319659	2017-12-04 15:11:48 +00:00
Philip Reames	6260cf71d3	[IndVars] Fix a bug introduced in r317012 Turns out we can have comparisons which are indirect users of the induction variable that we can make invariant. In this case, there is no loop invariant value contributing and we'd fail an assert. The test case was found by a java fuzzer and reduced. It's a real cornercase. You have to have a static loop which we've already proven only executes once, but haven't broken the backedge on, and an inner phi whose result can be constant folded by SCEV using exit count reasoning but not proven by isKnownPredicate. To my knowledge, only the fuzzer has hit this case. llvm-svn: 319583	2017-12-01 20:57:19 +00:00
Hans Wennborg	e2470b95da	Revert r319531 "[SLPVectorizer] Failure to beneficially vectorize 'copyable' elements in integer binary ops." It causes builds to fail with "Instruction does not dominate all uses" (PR35497). > Patch tries to improve vectorization of the following code: > > void add1(int * __restrict dst, const int * __restrict src) { > dst++ = src++; > dst++ = src++ + 1; > dst++ = src++ + 2; > dst++ = src++ + 3; > } > Allows to vectorize even if the very first operation is not a binary add, but just a load. > > Fixed issues related to previous commit. > > Reviewers: spatel, mzolotukhin, mkuper, hfinkel, RKSimon, filcab, ABataev > > Reviewed By: ABataev, RKSimon > > Subscribers: llvm-commits, RKSimon > > Differential Revision: https://reviews.llvm.org/D28907 llvm-svn: 319550	2017-12-01 16:17:24 +00:00
Mikael Holmen	9c13c8b6ec	Revert r319537: Bail out of a SimplifyCFG switch table opt at undef values. Broke build bots so reverting. llvm-svn: 319539	2017-12-01 13:11:39 +00:00
Mikael Holmen	9f047795fb	Bail out of a SimplifyCFG switch table opt at undef values. Summary: A true or false result is expected from a comparison, but it seems the possibility of undef was overlooked, which could lead to a failed assert. This is fixed by this patch by bailing out if we encounter undef. The bug is old and the assert has been there since the end of 2014, so it seems this is unusual enough to forego optimization. Patch by: JesperAntonsson Reviewers: spatel, eeckstein, hans Reviewed By: hans Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D40639 llvm-svn: 319537	2017-12-01 12:30:49 +00:00
Dinar Temirbulatov	29e86584c6	[SLPVectorizer] Failure to beneficially vectorize 'copyable' elements in integer binary ops. Patch tries to improve vectorization of the following code: void add1(int * __restrict dst, const int * __restrict src) { dst++ = src++; dst++ = src++ + 1; dst++ = src++ + 2; dst++ = src++ + 3; } Allows to vectorize even if the very first operation is not a binary add, but just a load. Fixed issues related to previous commit. Reviewers: spatel, mzolotukhin, mkuper, hfinkel, RKSimon, filcab, ABataev Reviewed By: ABataev, RKSimon Subscribers: llvm-commits, RKSimon Differential Revision: https://reviews.llvm.org/D28907 llvm-svn: 319531	2017-12-01 11:10:47 +00:00
Hiroshi Inoue	48e4c7aae6	Recommit rL319407: [SROA] enable splitting for non-whole-alloca loads and stores Recommiting once reverted patch rL319407 after adding a check for bit vector size to avoid failures in some build bots. llvm-svn: 319522	2017-12-01 06:05:05 +00:00
Zachary Turner	8065f0b975	Mark all library options as hidden. These command line options are not intended for public use, and often don't even make sense in the context of a particular tool anyway. About 90% of them are already hidden, but when people add new options they forget to hide them, so if you were to make a brand new tool today, link against one of LLVM's libraries, and run tool -help you would get a bunch of junk that doesn't make sense for the tool you're writing. This patch hides these options. The real solution is to not have libraries defining command line options, but that's a much larger effort and not something I'm prepared to take on. Differential Revision: https://reviews.llvm.org/D40674 llvm-svn: 319505	2017-12-01 00:53:10 +00:00
Peter Collingbourne	1f03422610	ThinLTOBitcodeWriter: Try harder to discard unused references to the merged module. If the thin module has no references to an internal global in the merged module, we need to make sure to preserve that property if the global is a member of a comdat group, as otherwise promotion can end up adding global symbols to the comdat, which is not allowed. This situation can arise if the external global in the thin module has dead constant users, which would cause use_empty() to return false and would cause us to try to promote it. To prevent this from happening, discard the dead constant users before asking whether a global is empty. Differential Revision: https://reviews.llvm.org/D40593 llvm-svn: 319494	2017-11-30 23:05:52 +00:00
Dan Gohman	59e4c0b938	[memcpyopt] Teach memcpyopt to optimize across basic blocks This teaches memcpyopt to make a non-local memdep query when a local query indicates that the dependency is non-local. This notably allows it to eliminate many more llvm.memcpy calls in common Rust code, often by 20-30%. Fixes PR28958. Differential Revision: https://reviews.llvm.org/D38374 llvm-svn: 319482	2017-11-30 22:10:53 +00:00
Xinliang David Li	c23d2c6883	[PGO] Skip counter promotion for infinite loops Differential Revision: http://reviews.llvm.org/D40662 llvm-svn: 319462	2017-11-30 19:16:25 +00:00
Hiroshi Inoue	21e8ded4d2	Revert rL319407: [SROA] enable splitting for non-whole-alloca loads and stores This reverts commit rL319407 due to failures in some buildbot. llvm-svn: 319410	2017-11-30 08:29:51 +00:00
Hiroshi Inoue	422e80aee2	[SROA] enable splitting for non-whole-alloca loads and stores Currently, SROA splits loads and stores only when they are accessing the whole alloca. This patch relaxes this limitation to allow splitting a load/store if all other loads and stores to the alloca are disjoint to or fully included in the current load/store. If there is no other load or store that crosses the boundary of the current load/store, the current splitting implementation works as is. The whole-alloca loads and stores meet this new condition and so they are still splittable. Here is a simplified motivating example. struct record { long long a; int b; int c; }; int func(struct record r) { for (int i = 0; i < r.c; i++) r.b++; return r.b; } When updating r.b (or r.c as well), LLVM generates redundant instructions on some platforms (such as x86_64, ppc64); here, r.b and r.c are packed into one 64-bit GPR when the struct is passed as a method argument. With this patch, the above example is compiled into only few instructions without loop. Without the patch, unnecessary loop-carried dependency is introduced by SROA and the loop cannot be eliminated by the later optimizers. Differential Revision: https://reviews.llvm.org/D32998 llvm-svn: 319407	2017-11-30 07:44:46 +00:00
Graham Yiu	70293fa27a	- Removed unused lamba (IsReturnBlock) causing build bots to fail for r319398 - Added lit testcases that were supposed to be part of r319398 llvm-svn: 319399	2017-11-30 03:36:57 +00:00
Graham Yiu	8b1882c186	With PGO information, we can do more aggressive outlining of cold regions in the inline candidate function. This contrasts with the scheme of keeping only the 'early return' portion of the inline candidate and outlining the rest of the function as a single function call. Support for outlining multiple regions of each function is added, as well as some basic heuristics to determine which regions are good to outline. Outline candidates limited to regions that are single-entry & single-exit. We also avoid outlining regions that produce live-exit variables, which may inhibit some forms of code motion (like commoning). Fallback to the regular partial inlining scheme is retained when either i) no regions are identified for outlining in the function, or ii) the outlined function could not be inlined in any of its callers. Differential Revision: https://reviews.llvm.org/D38190 llvm-svn: 319398	2017-11-30 02:41:36 +00:00
Peter Collingbourne	9e3175bb6b	LowerTypeTests: Deduplicate code. NFC. llvm-svn: 319390	2017-11-30 00:27:08 +00:00
Peter Collingbourne	943aca3c27	LowerTypeTests: Remove unnecessary cast. NFC. llvm-svn: 319387	2017-11-30 00:02:55 +00:00
Adam Nemet	2e92289014	Demote this opt remark to DEBUG. From a random opt-stat output: Top 10 remarks: tailcallelim/tailcall 53% inline/AlwaysInline 13% gvn/LoadClobbered 13% inline/Inlined 8% inline/TooCostly 2% inline/NoDefinition 2% licm/LoadWithLoopInvariantAddressInvalidated 2% licm/Hoisted 1% asm-printer/InstructionCount 1% prologepilog/StackSize 1% llvm-svn: 319235	2017-11-28 22:11:00 +00:00
Adrian Prantl	77d90b0c39	SROA: Don't create variable fragments that are outside of the variable. An alloca may be larger than a variable that is described to be stored there. Don't create a dbg.value for fragments that are outside of the variable. This fixes PR35447. https://bugs.llvm.org/show_bug.cgi?id=35447 llvm-svn: 319230	2017-11-28 21:30:38 +00:00
Hans Wennborg	ca46db957d	EntryExitInstrumenter: set DebugLocs on the inserted call instructions (PR35412) Apparently the verifier requires that inlineable calls in a function with debug info have debug locations. llvm-svn: 319199	2017-11-28 18:44:26 +00:00
Jonas Paulsson	f0ff20f1f0	Use getStoreSize() in various places instead of 'BitSize >> 3'. This is needed for cases when the memory access is not as big as the width of the data type. For instance, storing i1 (1 bit) would be done in a byte (8 bits). Using 'BitSize >> 3' (or '/ 8') would e.g. give the memory access of an i1 a size of 0, which for instance makes alias analysis return NoAlias even when it shouldn't. There are no tests as this was done as a follow-up to the bugfix for the case where this was discovered (r318824). This handles more similar cases. Review: Björn Petterson https://reviews.llvm.org/D40339 llvm-svn: 319173	2017-11-28 14:44:32 +00:00
Chandler Carruth	c34f789e38	Add a new pass to speculate around PHI nodes with constant (integer) operands when profitable. The core idea is to (re-)introduce some redundancies where their cost is hidden by the cost of materializing immediates for constant operands of PHI nodes. When the cost of the redundancies is covered by this, avoiding materializing the immediate has numerous benefits: 1) Less register pressure 2) Potential for further folding / combining 3) Potential for more efficient instructions due to immediate operand As a motivating example, consider the remarkably different cost on x86 of a SHL instruction with an immediate operand versus a register operand. This pattern turns up surprisingly frequently, but is somewhat rarely obvious as a significant performance problem. The pass is entirely target independent, but it does rely on the target cost model in TTI to decide when to speculate things around the PHI node. I've included x86-focused tests, but any target that sets up its immediate cost model should benefit from this pass. There is probably more that can be done in this space, but the pass as-is is enough to get some important performance on our internal benchmarks, and should be generally performance neutral, but help with more extensive benchmarking is always welcome. One awkward part is that this pass has to be scheduled after everything that can eliminate these kinds of redundancies. This includes SimplifyCFG, GVN, etc. I'm open to suggestions about better places to put this. We could in theory make it part of the codegen pass pipeline, but there doesn't really seem to be a good reason for that -- it isn't "lowering" in any sense and only relies on pretty standard cost model based TTI queries, so it seems to fit well with the "optimization" pipeline model. Still, further thoughts on the pipeline position are welcome. I've also only implemented this in the new pass manager. If folks are very interested, I can try to add it to the old PM as well, but I didn't really see much point (my use case is already switched over to the new PM). I've tested this pretty heavily without issue. A wide range of benchmarks internally show no change outside the noise, and I don't see any significant changes in SPEC either. However, the size class computation in tcmalloc is substantially improved by this, which turns into a 2% to 4% win on the hottest path through tcmalloc for us, so there are definitely important cases where this is going to make a substantial difference. Differential revision: https://reviews.llvm.org/D37467 llvm-svn: 319164	2017-11-28 11:32:31 +00:00
Florian Hahn	25ea91a838	[TailRecursionElimination] Skip debug intrinsics. Summary: I think we do not need to analyze debug intrinsics here, as they should not impact codegen. This has 2 benefits: 1) slightly less work to do and 2) avoiding generating optimization remarks for converting calls to debug intrinsics to tail calls, which are not really helpful for users. Based on work by Sander de Smalen. Reviewers: davide, trentxintong, aprantl Reviewed By: aprantl Subscribers: llvm-commits, JDevlieghere Tags: #debug-info Differential Revision: https://reviews.llvm.org/D40440 llvm-svn: 319158	2017-11-28 09:32:25 +00:00
Max Kazantsev	115607226a	[GVN] Prevent ScalarPRE from hoisting across instructions that don't pass control flow to successors This is to address a problem similar to those in D37460 for Scalar PRE. We should not PRE across an instruction that may not pass execution to its successor unless it is safe to speculatively execute it. Differential Revision: https://reviews.llvm.org/D38619 llvm-svn: 319147	2017-11-28 07:07:55 +00:00
Rafael Espindola	c06f55e1e8	This reverts commit r319096 and r319097. Revert "[SROA] Propagate !range metadata when moving loads." Revert "[Mem2Reg] Clang-format unformatted parts of this file. NFCI." Davide says they broke a bot. llvm-svn: 319131	2017-11-28 01:25:38 +00:00
Adrian Prantl	d7f6f1636d	SROA: Avoid creating a fragment expression that covers the entire variable. Fixes PR35416. https://bugs.llvm.org/show_bug.cgi?id=35416 llvm-svn: 319126	2017-11-28 00:57:53 +00:00
Davide Italiano	824d71a9c5	[Mem2Reg] Clang-format unformatted parts of this file. NFCI. llvm-svn: 319097	2017-11-27 21:25:52 +00:00
Davide Italiano	b5d59e73ee	[SROA] Propagate !range metadata when moving loads. This tries to propagate !range metadata to a pre-existing load when a load is optimized out. This is done instead of adding an assume because converting loads to and from assumes creates a lot of IR. Patch by Ariel Ben-Yehuda. Differential Revision: https://reviews.llvm.org/D37216 llvm-svn: 319096	2017-11-27 21:25:13 +00:00
Sanjay Patel	0de1a4bc2d	[PartiallyInlineLibCalls][x86] add TTI hook to allow sqrt inlining to depend on arg rather than result This should fix PR31455: https://bugs.llvm.org/show_bug.cgi?id=31455 Differential Revision: https://reviews.llvm.org/D28314 llvm-svn: 319094	2017-11-27 21:15:43 +00:00
Arnold Schwaighofer	d9e710984d	Inliner: Don't mark notail calls with the 'tail' attribute enum TailCallKind { TCK_None = 0, TCK_Tail = 1, TCK_MustTail = 2, TCK_NoTail = 3 }; TCK_NoTail is greater than TCK_Tail so taking the min does not do the correct thing. rdar://35639547 llvm-svn: 319075	2017-11-27 19:03:40 +00:00
Sanjay Patel	863d494730	[InstCombine] use 'auto' with 'dyn_cast'; NFC llvm-svn: 319067	2017-11-27 18:19:32 +00:00
Benjamin Kramer	51ebcaaf25	Make helpers static. NFC. llvm-svn: 318953	2017-11-24 14:55:41 +00:00
Alexander Potapenko	9e5477f473	MSan: remove an unnecessary cast. NFC for userspace instrumenetation. llvm-svn: 318923	2017-11-23 15:06:51 +00:00
Alexander Potapenko	391804f54b	[MSan] Move the access address check before the shadow access for that address MSan used to insert the shadow check of the store pointer operand _after_ the shadow of the value operand has been written. This happens to work in the userspace, as the whole shadow range is always mapped. However in the kernel the shadow page may not exist, so the bug may cause a crash. This patch moves the address check in front of the shadow access. llvm-svn: 318901	2017-11-23 08:34:32 +00:00
Max Kazantsev	716e647d74	[IRCE][NFC] Add no wrap flags to no-wrapping SCEV calculation In a lambda where we expect to have result within bounds, add respective `nsw/nuw` flags to help SCEV just in case if it fails to figure them out on its own. Differential Revision: https://reviews.llvm.org/D40168 llvm-svn: 318898	2017-11-23 06:14:39 +00:00
Davide Italiano	b480b5c2ee	[SCCP] Pick the right lattice value for constants. After the dataflow algorithm proves that an argument is constant, it replaces it value with the integer constant and drops the lattice value associated to the DEF. e.g. in the example we have @f() that's called twice: call @f(undef, ...) call @f(2, ...) `undef` MEET 2 = 2 so we replace the argument and all its uses with the constant 2. Shortly after, tryToReplaceWithConstantRange() tries to get the lattice value for the argument we just replaced, causing an assertion. This function is a little peculiar as it runs when we're doing replacement and not as part of the solver but still queries the solver. The fix is that of checking whether we replaced the value already and get a temporary lattice value for the constant. Thanks to Zhendong Su for the report! Fixes PR35357. llvm-svn: 318817	2017-11-22 03:04:55 +00:00
Hans Wennborg	37cbf28e79	EntryExitInstrumenter: support __cyg_profile_func_enter_bare It works just like __cyg_profile_func_enter but takes no arguments. llvm-svn: 318783	2017-11-21 17:22:19 +00:00
Alina Sbirlea	ff8b8aea2e	Add MemorySSA as loop dependency, disabled by default [NFC]. Summary: First step in adding MemorySSA as dependency for loop pass manager. Adding the dependency under a flag. New pass manager: MSSA pointer in LoopStandardAnalysisResults can be null. Legacy and new pass manager: Use cl::opt EnableMSSALoopDependency. Disabled by default. Reviewers: sanjoy, davide, gberry Subscribers: mehdi_amini, Prazek, llvm-commits Differential Revision: https://reviews.llvm.org/D40274 llvm-svn: 318772	2017-11-21 15:45:46 +00:00
NAKAMURA Takumi	519ea284af	SLPVectorizer.cpp: Avoid std::stable_sort(properlyDominates()). properlyDominates() shouldn't be used as sort key. It causes different output between stdlibc++ and libc++. Instead, I introduced RPOT. In most cases, it works for CSE. llvm-svn: 318743	2017-11-21 09:41:01 +00:00
Davide Italiano	5df8080011	[SCCP] If we replace with a constant, we can't replace with a range. This microoptimization is NFC. llvm-svn: 318711	2017-11-21 00:21:52 +00:00
Vitaly Buka	8000f228b3	[msan] Don't sanitize "nosanitize" instructions Reviewers: eugenis Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D40205 llvm-svn: 318708	2017-11-20 23:37:56 +00:00
Hiroshi Yamauchi	c94d4d70d8	Add heuristics for irreducible loop metadata under PGO Summary: Add the following heuristics for irreducible loop metadata: - When an irreducible loop header is missing the loop header weight metadata, give it the minimum weight seen among other headers. - Annotate indirectbr targets with the loop header weight metadata (as they are likely to become irreducible loop headers after indirectbr tail duplication.) These greatly improve the accuracy of the block frequency info of the Python interpreter loop (eg. from ~3-16x off down to ~40-55% off) and the Python performance (eg. unpack_sequence from ~50% slower to ~8% faster than GCC) due to better register allocation under PGO. Reviewers: davidxl Reviewed By: davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D39980 llvm-svn: 318693	2017-11-20 21:03:38 +00:00
Teresa Johnson	3309002a86	[SROA] Correctly invalidate analyses when dead instructions deleted Summary: SROA can fail in rewriting alloca but still rewrite a phi resulting in dead instruction elimination. The Changed flag was not being set correctly, resulting in downstream passes using stale analyses. The included test case will assert during the second BDCE pass as a result. Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D39921 llvm-svn: 318677	2017-11-20 18:33:38 +00:00
Evgeniy Stepanov	8e7018d92f	[asan] Use dynamic shadow on 32-bit Android, try 2. Summary: This change reverts r318575 and changes FindDynamicShadowStart() to keep the memory range it found mapped PROT_NONE to make sure it is not reused. We also skip MemoryRangeIsAvailable() check, because it is (a) unnecessary, and (b) would fail anyway. Reviewers: pcc, vitalybuka, kcc Subscribers: srhines, kubamracek, mgorny, llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D40203 llvm-svn: 318666	2017-11-20 17:41:57 +00:00
Gil Rapaport	8b9d1f3c5b	[LV] Model masking in VPlan, introducing VPInstructions This patch adds a new abstraction layer to VPlan and leverages it to model the planned instructions that manipulate masks (AND, OR, NOT), introduced during predication. The new VPValue and VPUser classes model how data flows into, through and out of a VPlan, forming the vertices of a planned Def-Use graph. The new VPInstruction class is a generic single-instruction Recipe that models a planned instruction along with its opcode, operands and users. See VectorizationPlan.rst for more details. Differential Revision: https://reviews.llvm.org/D38676 llvm-svn: 318645	2017-11-20 12:01:47 +00:00
Max Kazantsev	268467869b	[IRCE] Smart range intersection In rL316552, we ban intersection of unsigned latch range with signed range check and vice versa, unless the entire range check iteration space is known positive. It was a correct functional fix that saved us from dealing with ambiguous values, but it also appeared to be a very restrictive limitation. In particular, in the following case: loop: %iv = phi i32 [ 0, %preheader ], [ %iv.next, %latch] %iv.offset = add i32 %iv, 10 %rc = icmp slt i32 %iv.offset, %len br i1 %rc, label %latch, label %deopt latch: %iv.next = add i32 %iv, 11 %cond = icmp i32 ult %iv.next, 100 br it %cond, label %loop, label %exit Here, the unsigned iteration range is `[0, 100)`, and the safe range for range check is `[-10, %len - 10)`. For unsigned iteration spaces, we use unsigned min/max functions for range intersection. Given this, we wanted to avoid dealing with `-10` because it is interpreted as a very big unsigned value. Semantically, range check's safe range goes through unsigned border, so in fact it is two disjoint ranges in IV's iteration space. Intersection of such ranges is not trivial, so we prohibited this case saying that we are not allowed to intersect such ranges. What semantics of this safe range actually means is that we can start from `-10` and go up increasing the `%iv` by one until we reach `%len - 10` (for simplicity let's assume that `%len - 10` is a reasonably big positive value). In particular, this safe iteration space includes `0, 1, 2, ..., %len - 11`. So if we were able to return safe iteration space `[0, %len - 10)`, we could safely intersect it with IV's iteration space. All values in this range are non-negative, so using signed/unsigned min/max for them is unambiguous. In this patch, we alter the algorithm of safe range calculation so that it returnes a subset of the original safe space which is represented by one continuous range that does not go through wrap. In order to reach this, we use modified SCEV substraction function. It can be imagined as a function that substracts by `1` (or `-1`) as long as the further substraction does not cause a wrap in IV iteration space. This allows us to perform IRCE in many situations when we deal with IV space and range check of different types (in terms of signed/unsigned). We apply this approach for both matching and not matching types of IV iteration space and the range check. One implication of this is that now IRCE became smarter in detection of empty safe ranges. For example, in this case: loop: %iv = phi i32 [ %begin, %preheader ], [ %iv.next, %latch] %iv.offset = sub i32 %iv, 10 %rc = icmp ult i32 %iv.offset, %len br i1 %rc, label %latch, label %deopt latch: %iv.next = add i32 %iv, 11 %cond = icmp i32 ult %iv.next, 100 br it %cond, label %loop, label %exit If `%len` was less than 10 but SCEV failed to trivially prove that `%begin - 10 >u %len- 10`, we could end up executing entire loop in safe preloop while the main loop was still generated, but never executed. Now, cutting the ranges so that if both `begin - 10` and `%len - 10` overflow, we have a trivially empty range of `[0, 0)`. This in some cases prevents us from meaningless optimization. Differential Revision: https://reviews.llvm.org/D39954 llvm-svn: 318639	2017-11-20 06:07:57 +00:00
Sanjay Patel	9771a96f6e	[LibCallSimplifier] allow splat vectors for pow(x, 0.5) -> sqrt() transforms llvm-svn: 318629	2017-11-19 16:42:27 +00:00
Sanjay Patel	fbd3e66b9a	[LibCallSimplifier] partly fix pow(x, 0.5) -> sqrt() transforms As the first test shows, we could transform an llvm intrinsic which never sets errno into a libcall which could set errno (even though it's marked readnone?), so that's not ideal. It's possible that we can also transform a libcall which could set errno to an intrinsic given the fast-math-flags constraint, but that's deferred to determine exactly which set of FMF are needed. Differential Revision: https://reviews.llvm.org/D40150 llvm-svn: 318628	2017-11-19 16:13:14 +00:00
Florian Hahn	2a266a343f	[CallSiteSplitting] Remove some indirection (NFC). Summary: With this patch I tried to reduce the complexity of the code sightly, by removing some indirection. Please let me know what you think. Reviewers: junbuml, mcrosier, davidxl Reviewed By: junbuml Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D40037 llvm-svn: 318593	2017-11-18 18:14:13 +00:00
Walter Lee	9abeecc07c	[asan] Add a full redzone after every stack variable We were not doing that for large shadow granularity. Also add more stack frame layout tests for large shadow granularity. Differential Revision: https://reviews.llvm.org/D39475 llvm-svn: 318581	2017-11-18 01:13:18 +00:00
Evgeniy Stepanov	9d564cdcb0	Revert "[asan] Use dynamic shadow on 32-bit Android" and 3 more. Revert the following commits: r318369 [asan] Fallback to non-ifunc dynamic shadow on android<22. r318235 [asan] Prevent rematerialization of &__asan_shadow. r317948 [sanitizer] Remove unnecessary attribute hidden. r317943 [asan] Use dynamic shadow on 32-bit Android. MemoryRangeIsAvailable() reads /proc/$PID/maps into an mmap-ed buffer that may overlap with the address range that we plan to use for the dynamic shadow mapping. This is causing random startup crashes. llvm-svn: 318575	2017-11-18 00:22:34 +00:00
Jun Bum Lim	0f90672ae9	[LICM] Fix PR35342 Summary: This change fix PR35342 by replacing only the current use with undef in unreachable blocks. Reviewers: efriedma, mcrosier, igor-laevsky Reviewed By: efriedma Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D40184 llvm-svn: 318551	2017-11-17 20:38:25 +00:00
Chandler Carruth	693eedb138	[PM/Unswitch] Teach SimpleLoopUnswitch to do non-trivial unswitching, making it no longer even remotely simple. The pass will now be more of a "full loop unswitching" pass rather than anything substantively simpler than any other approach. I plan to rename it accordingly once the dust settles. The key ideas of the new loop unswitcher are carried over for non-trivial unswitching: 1) Fully unswitch a branch or switch instruction from inside of a loop to outside of it. 2) Update the CFG and IR. This avoids needing to "remember" the unswitched branches as well as avoiding excessively cloning and reliance on complex parts of simplify-cfg to cleanup the cfg. 3) Update the analyses (where we can) rather than just blowing them away or relying on something else updating them. Sadly, #3 is somewhat compromised here as the dominator tree updates were too complex for me to want to reason about. I will need to make another attempt to do this now that we have a nice dynamic update API for dominators. However, we do adhere to #3 w.r.t. LoopInfo. This approach also adds an important principls specific to non-trivial unswitching: not all of the loop will be duplicated when unswitching. This fact allows us to compute the cost in terms of how much duplicate code is inserted rather than just on raw size. Unswitching conditions which essentialy partition loops will work regardless of the total loop size. Some remaining issues that I will be addressing in subsequent commits: - Handling unstructured control flow. - Unswitching 'switch' cases instead of just branches. - Moving to the dynamic update API for dominators. Some high-level, interesting limitationsV that folks might want to push on as follow-ups but that I don't have any immediate plans around: - We could be much more clever about not cloning things that will be deleted. In fact, we should be able to delete nothing and do a minimal number of clones. - There are many more interesting selection criteria for which branch to unswitch that we might want to look at. One that I'm interested in particularly are a set of conditions which all exit the loop and which can be merged into a single unswitched test of them. Differential revision: https://reviews.llvm.org/D34200 llvm-svn: 318549	2017-11-17 19:58:36 +00:00
Max Kazantsev	1ac6e8ae61	[IRCE] Remove folding of two range checks into RANGE_CHECK_BOTH The logic of replacing of a couple `RANGE_CHECK_LOWER + RANGE_CHECK_UPPER` into `RANGE_CHECK_BOTH` in fact duplicates the logic of range intersection which happens when we calculate safe iteration space. Effectively, the result of intersection of these ranges doesn't differ from the range of merged range check. We chose to remove duplicating logic in favor of code simplicity. Differential Revision: https://reviews.llvm.org/D39589 llvm-svn: 318508	2017-11-17 06:49:26 +00:00
David Blaikie	b3bde2ea50	Fix a bunch more layering of CodeGen headers that are in Target All these headers already depend on CodeGen headers so moving them into CodeGen fixes the layering (since CodeGen depends on Target, not the other way around). llvm-svn: 318490	2017-11-17 01:07:10 +00:00
Mandeep Singh Grang	e6bb66357c	[PredicateInfo] Add comment about why we require stable sort llvm-svn: 318487	2017-11-17 00:43:24 +00:00
Walter Lee	8f1545c629	[asan] Fix small X86_64 ShadowOffset for non-default shadow scale The requirement is that shadow memory must be aligned to page boundaries (4k in this case). Use a closed form equation that always satisfies this requirement. Differential Revision: https://reviews.llvm.org/D39471 llvm-svn: 318421	2017-11-16 17:03:00 +00:00
Sanjay Patel	b3fa94586f	[InstCombine] include 'sub' in the list of narrow-able binops // trunc (binop X, C) --> binop (trunc X, C') // trunc (binop (ext X), Y) --> binop X, (trunc Y) I'm grouping sub with the other binops because that makes the code simpler and the transforms are valid: https://rise4fun.com/Alive/UeF ...so even though we don't expect a sub with constant Op1 or any of the other opcodes with constant Op0 due to canonicalization rules, we might as well handle those situations if non-canonical code somehow reaches this point (it should just make instcombine more efficient in reaching its end goal). This should solve the problem that later manifests in the vectorizers in PR35295: https://bugs.llvm.org/show_bug.cgi?id=35295 llvm-svn: 318404	2017-11-16 14:40:51 +00:00
Walter Lee	2a2b69e9c7	[asan] Fix size/alignment issues with non-default shadow scale Fix a couple places where the minimum alignment/size should be a function of the shadow granularity: - alignment of AllGlobals - the minimum left redzone size on the stack Added a test to verify that the metadata_array is properly aligned for shadow scale of 5, to be enabled when we add build support for testing shadow scale of 5. Differential Revision: https://reviews.llvm.org/D39470 llvm-svn: 318395	2017-11-16 12:57:19 +00:00
Max Kazantsev	b1b8aff2e7	[IRCE] Fix SCEVExpander's usage in IRCE When expanding exit conditions for pre- and postloops, we may end up expanding a recurrency from the loop to in its loop's preheader. This produces incorrect IR. This patch ensures that IRCE uses SCEVExpander correctly and only expands code which is safe to expand in this particular location. Differentian Revision: https://reviews.llvm.org/D39234 llvm-svn: 318381	2017-11-16 06:06:27 +00:00
Evgeniy Stepanov	396ed67950	[asan] Fallback to non-ifunc dynamic shadow on android<22. Summary: Android < 22 does not support ifunc. Reviewers: pcc Subscribers: srhines, kubamracek, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D40116 llvm-svn: 318369	2017-11-16 02:52:19 +00:00
Craig Topper	062bcf30b1	[GVNHoist] Fix a signed/unsigned comparison warning that occurs in 32-bit builds with gcc. std::distance returns ptrdiff_t which is signed. 64-bit builds don't notice because type promotion widens the unsigned first. llvm-svn: 318354	2017-11-16 00:19:59 +00:00
Sanjay Patel	03d0cd6a81	[InstCombine] trunc (binop X, C) --> binop (trunc X, C') Note that one-use and shouldChangeType() are checked ahead of the switch. Without the narrowing folds, we can produce inferior vector code as shown in PR35299: https://bugs.llvm.org/show_bug.cgi?id=35299 llvm-svn: 318323	2017-11-15 19:12:01 +00:00
Reid Kleckner	72b819b8ee	[InstCombine] Salvage debug info during initial DCE InstCombine salvages debug info for every instruction it erases from its worklist, but it wasn't doing it during its initial DCE when populating its worklist. This fixes that. This should help improve availability of 'this' in optimized debug info when casts are necessary. llvm-svn: 318320	2017-11-15 18:51:12 +00:00
Adam Nemet	572a87c76f	[SLP] Added more missed optimization remarks Summary: Added more remarks to SLP pass, in particular "missed" optimization remarks. Also proposed several tests for new functionality. Patch by Vladimir Miloserdov! For reference you may look at: https://reviews.llvm.org/rL302811 Reviewers: anemet, fhahn Reviewed By: anemet Subscribers: javed.absar, lattner, petecoup, yakush, llvm-commits Differential Revision: https://reviews.llvm.org/D38367 llvm-svn: 318307	2017-11-15 17:04:53 +00:00
Sanjay Patel	d1becd082a	[Reassociate] simplify code; NFCI llvm-svn: 318298	2017-11-15 16:19:17 +00:00
Craig Topper	f7b86728fa	[InstCombine] Simplify binops that are only used by a select and are fed by a select with the same condition. Summary: This patch optimizes a binop sandwiched between 2 selects with the same condition. Since we know its only used by the select we can propagate the appropriate input value from the earlier select. As I'm writing this I realize I may need to avoid doing this for division in case the select was protecting a divide by zero? Reviewers: spatel, majnemer Reviewed By: majnemer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D39999 llvm-svn: 318267	2017-11-15 05:23:02 +00:00
Hans Wennborg	45cabacd2f	Revert r318193 "[SLPVectorizer] Failure to beneficially vectorize 'copyable' elements in integer binary ops." It crashes building sqlite; see reply on the llvm-commits thread. > [SLPVectorizer] Failure to beneficially vectorize 'copyable' elements in integer binary ops. > > Patch tries to improve vectorization of the following code: > > void add1(int * __restrict dst, const int * __restrict src) { > dst++ = src++; > dst++ = src++ + 1; > dst++ = src++ + 2; > dst++ = src++ + 3; > } > Allows to vectorize even if the very first operation is not a binary add, but just a load. > > Fixed issues related to previous commit. > > Reviewers: spatel, mzolotukhin, mkuper, hfinkel, RKSimon, filcab, ABataev > > Reviewed By: ABataev, RKSimon > > Subscribers: llvm-commits, RKSimon > > Differential Revision: https://reviews.llvm.org/D28907 llvm-svn: 318239	2017-11-15 00:38:13 +00:00
Craig Topper	bf6495fbcb	[LoopRotate] processLoop should return true even if it just simplified the loop latch without making any other changes Simplifying a loop latch changes the IR and we need to make sure the pass manager knows to invalidate analysis passes if that happened. PR35210 discovered a case where we failed to invalidate the post dominator tree after this simplification because we no changes other than simplifying the loop latch. Fixes PR35210. Differential Revision: https://reviews.llvm.org/D40035 llvm-svn: 318237	2017-11-15 00:22:42 +00:00
Evgeniy Stepanov	cff19ee233	[asan] Prevent rematerialization of &__asan_shadow. Summary: In the mode when ASan shadow base is computed as the address of an external global (__asan_shadow, currently on android/arm32 only), regalloc prefers to rematerialize this value to save register spills. Even in -Os. On arm32 it is rather expensive (2 loads + 1 constant pool entry). This changes adds an inline asm in the function prologue to suppress this behavior. It reduces AsanTest binary size by 7%. Reviewers: pcc, vitalybuka Subscribers: aemerson, kristof.beyls, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D40048 llvm-svn: 318235	2017-11-15 00:11:51 +00:00
Davide Italiano	1380cb8055	[EntryExitInstrumenter] Placate GCC, the semicolon is redundant. NFCI. llvm-svn: 318217	2017-11-14 23:13:38 +00:00
Sanjay Patel	64fd333304	[Reassociate] use dyn_cast instead of isa+cast; NFCI llvm-svn: 318212	2017-11-14 23:03:56 +00:00
Reid Kleckner	29a5c03cc2	Make salvageDebugInfo of casts work for dbg.declare and dbg.addr Summary: Instcombine (and probably other passes) sometimes want to change the type of an alloca. To do this, they generally create a new alloca with the desired type, create a bitcast to make the new pointer type match the old pointer type, replace all uses with the cast, and then simplify the casts. We already knew how to salvage dbg.value instructions when removing casts, but we can extend it to cover dbg.addr and dbg.declare. Fixes a debug info quality issue uncovered in Chromium in http://crbug.com/784609 Reviewers: aprantl, vsk Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D40042 llvm-svn: 318203	2017-11-14 21:49:06 +00:00
Hans Wennborg	e1ecd61b98	Rename CountingFunctionInserter and use for both mcount and cygprofile calls, before and after inlining Clang implements the -finstrument-functions flag inherited from GCC, which inserts calls to __cyg_profile_func_{enter,exit} on function entry and exit. This is useful for getting a trace of how the functions in a program are executed. Normally, the calls remain even if a function is inlined into another function, but it is useful to be able to turn this off for users who are interested in a lower-level trace, i.e. one that reflects what functions are called post-inlining. (We use this to generate link order files for Chromium.) LLVM already has a pass for inserting similar instrumentation calls to mcount(), which it does after inlining. This patch renames and extends that pass to handle calls both to mcount and the cygprofile functions, before and/or after inlining as controlled by function attributes. Differential Revision: https://reviews.llvm.org/D39287 llvm-svn: 318195	2017-11-14 21:09:45 +00:00
Dinar Temirbulatov	2bd1836520	[SLPVectorizer] Failure to beneficially vectorize 'copyable' elements in integer binary ops. Patch tries to improve vectorization of the following code: void add1(int * __restrict dst, const int * __restrict src) { dst++ = src++; dst++ = src++ + 1; dst++ = src++ + 2; dst++ = src++ + 3; } Allows to vectorize even if the very first operation is not a binary add, but just a load. Fixed issues related to previous commit. Reviewers: spatel, mzolotukhin, mkuper, hfinkel, RKSimon, filcab, ABataev Reviewed By: ABataev, RKSimon Subscribers: llvm-commits, RKSimon Differential Revision: https://reviews.llvm.org/D28907 llvm-svn: 318193	2017-11-14 20:55:08 +00:00
Mandeep Singh Grang	b8a11bbcf1	[PredicateInfo] Stable sort ValueDFS to remove non-deterministic ordering Summary: This fixes failure in Transforms/Util/PredicateInfo/testandor.ll uncovered by D39245. Reviewers: dberlin Reviewed By: dberlin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D39630 llvm-svn: 318165	2017-11-14 18:22:50 +00:00
Gil Rapaport	848581cadb	[LV] Introduce VPBlendRecipe, VPWidenMemoryInstructionRecipe This patch is part of D38676. The patch introduces two new Recipes to handle instructions whose vectorization involves masking. These Recipes take VPlan-level masks in D38676, but still rely on ILV's existing createEdgeMask(), createBlockInMask() in this patch. VPBlendRecipe handles intra-loop phi nodes, which are vectorized as a sequence of SELECTs. Its execute() code is refactored out of ILV::widenPHIInstruction(), which now handles only loop-header phi nodes. VPWidenMemoryInstructionRecipe handles load/store which are to be widened (but are not part of an Interleave Group). In this patch it simply calls ILV::vectorizeMemoryInstruction on execute(). Differential Revision: https://reviews.llvm.org/D39068 llvm-svn: 318149	2017-11-14 12:09:30 +00:00
Chandler Carruth	00a301d568	[PM] Port BoundsChecking to the new PM. Registers it and everything, updates all the references, etc. Next patch will add support to Clang's `-fexperimental-new-pass-manager` path to actually enable BoundsChecking correctly. Differential Revision: https://reviews.llvm.org/D39084 llvm-svn: 318128	2017-11-14 01:30:04 +00:00
Chandler Carruth	1594feea94	[PM] Refactor BoundsChecking further to prepare it to be exposed both as a legacy and new PM pass. This essentially moves the class state to parameters and re-shuffles the code to make that reasonable. It also does some minor cleanups along the way and leaves some comments. Differential Revision: https://reviews.llvm.org/D39081 llvm-svn: 318124	2017-11-14 01:13:59 +00:00
Hans Wennborg	08b34a017a	Update some code.google.com links llvm-svn: 318115	2017-11-13 23:47:58 +00:00
Jatin Bhateja	c61ade1ca0	[SCEV] Handling for ICmp occuring in the evolution chain. Summary: If a compare instruction is same or inverse of the compare in the branch of the loop latch, then return a constant evolution node. This shall facilitate computations of loop exit counts in cases where compare appears in the evolution chain of induction variables. Will fix PR 34538 Reviewers: sanjoy, hfinkel, junryoungju Reviewed By: sanjoy, junryoungju Subscribers: javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D38494 llvm-svn: 318050	2017-11-13 16:43:24 +00:00
Bill Seurer	44156a0efb	[PowerPC][msan] Update msan to handle changed memory layouts in newer kernels In more recent Linux kernels (including those with 47 bit VMAs) the layout of virtual memory for powerpc64 changed causing the memory sanitizer to not work properly. This patch adjusts a bit mask in the memory sanitizer to work on the newer kernels while continuing to work on the older ones as well. This is the non-runtime part of the patch and finishes it. ref: r317802 Tested on several 4.x and 3.x kernel releases. llvm-svn: 318045	2017-11-13 15:43:19 +00:00
Florian Hahn	7114755913	[CodeExtractor] Add missing AllowVarArgs initialization. llvm-svn: 318029	2017-11-13 11:08:47 +00:00
Florian Hahn	0e9dec672d	[PartialInliner] Inline vararg functions that forward varargs. Summary: This patch extends the partial inliner to support inlining parts of vararg functions, if the vararg handling is done in the outlined part. It adds a `ForwardVarArgsTo` argument to InlineFunction. If it is non-null, all varargs passed to the inlined function will be added to all calls to `ForwardVarArgsTo`. The partial inliner takes care to only pass `ForwardVarArgsTo` if the varargs handing is done in the outlined function. It checks that vastart is not part of the function to be inlined. `test/Transforms/CodeExtractor/PartialInlineNoInline.ll` (already part of the repo) checks we do not do partial inlining if vastart is used in a basic block that will be inlined. Reviewers: davide, davidxl, grosser Reviewed By: davide, davidxl, grosser Subscribers: gyiu, grosser, eraman, llvm-commits Differential Revision: https://reviews.llvm.org/D39607 llvm-svn: 318028	2017-11-13 10:35:52 +00:00
Craig Topper	d3e5781e53	[InstCombine] Teach visitICmpInst to not break integer absolute value idioms Summary: This patch adds an early out to visitICmpInst if we are looking at a compare as part of an integer absolute value idiom. Similar is already done for min/max. In the particular case I observed in a benchmark we had an absolute value of a load from an indexed global. We simplified the compare using foldCmpLoadFromIndexedGlobal into a magic bit vector, a shift, and an and. But the load result was still used for the select and the negate part of the absolute valute idiom. So we overcomplicated the code and lost the ability to recognize it as an absolute value. I've chosen a simpler case for the test here. Reviewers: spatel, davide, majnemer Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D39766 llvm-svn: 317994	2017-11-12 02:28:21 +00:00
Evgeniy Stepanov	989299c42b	[asan] Use dynamic shadow on 32-bit Android. Summary: The following kernel change has moved ET_DYN base to 0x4000000 on arm32: https://marc.info/?l=linux-kernel&m=149825162606848&w=2 Switch to dynamic shadow base to avoid such conflicts in the future. Reserve shadow memory in an ifunc resolver, but don't use it in the instrumentation until PR35221 is fixed. This will eventually let use save one load per function. Reviewers: kcc Subscribers: aemerson, srhines, kubamracek, kristof.beyls, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D39393 llvm-svn: 317943	2017-11-10 22:27:48 +00:00
Davide Italiano	acf6065183	[SimplifyCFG] Use auto * when the type is obvious. NFCI. llvm-svn: 317923	2017-11-10 20:46:21 +00:00
Daniel Neilson	6e4aa1e481	Expand IRBuilder interface for atomic memcpy to require pointer alignments. (NFC) Summary: The specification of the @llvm.memcpy.element.unordered.atomic intrinsic requires that the pointer arguments have alignments of at least the element size. The existing IRBuilder interface to create a call to this intrinsic does not allow for providing the alignment of these pointer args. Having an interface that makes it easy to construct invalid intrinsic calls doesn't seem sensible, so this patch simply adds the requirement that one provide the argument alignments when using IRBuilder to create atomic memcpy calls. llvm-svn: 317918	2017-11-10 19:38:12 +00:00
Sanjoy Das	6fabb90765	[CVP] Remove some {s\|u}add.with.overflow checks. Summary: This adds logic to CVP to remove some overflow checks. It uses LVI to remove operations with at least one constant. Specifically, this can remove many overflow intrinsics immediately following an overflow check in the source code, such as: if (x < INT_MAX) ... x + 1 ... Patch by Joel Galenson! Reviewers: sanjoy, regehr Reviewed By: sanjoy Subscribers: fhahn, pirama, srhines, llvm-commits Differential Revision: https://reviews.llvm.org/D39483 llvm-svn: 317911	2017-11-10 19:13:35 +00:00
Easwaran Raman	0a0913def2	Add a wrapper function to set branch weights metadata. Summary: This wrapper checks if there is at least one non-zero weight before setting the metadata. Reviewers: davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D39872 llvm-svn: 317845	2017-11-09 22:52:20 +00:00
Paul Robinson	b46256b0b4	Fix out-of-order stepping behavior in programs with hoisted constants. When the Constant Hoisting pass moves expensive constants into a common block, it would assign a debug location equal to the last use of that constant. While this is certainly intuitive, it places the constant in an out-of-order location, according to the debug location information. This produces out-of-order stepping when debugging programs affected by this pass. This patch creates in-order stepping behavior by merging the debug locations for hoisted constants, and the new insertion point. Patch by Matthew Voss! Differential Revision: https://reviews.llvm.org/D38088 llvm-svn: 317827	2017-11-09 20:01:31 +00:00
Alexey Bataev	0bd9004425	[SLP] Fix PR23510: Try to find best possible vectorizable stores. Summary: The analysis of the store sequence goes in straight order - from the first store to the last. Bu the best opportunity for vectorization will happen if we're going to use reverse order - from last store to the first. It may be best because usually users have some initialization part + further processing and this first initialization may confuse SLP vectorizer. Reviewers: RKSimon, hfinkel, mkuper, spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D39606 llvm-svn: 317821	2017-11-09 19:07:16 +00:00
Sanjay Patel	0d66010454	[Reassociate] don't name values "tmp"; NFCI The toxic stew of created values named 'tmp' and tests that already have values named 'tmp' and CHECK lines looking for values named 'tmp' causes bad things to happen in our test line auto-generation scripts because it wants to use 'TMP' as a prefix for unnamed values. Use less 'tmp' to avoid that. llvm-svn: 317818	2017-11-09 18:14:24 +00:00
Serguei Katkov	722339e405	[GVN PRE] Patch the source for Phi node in PRE We must patch all existing incoming values of Phi node, otherwise it is possible that we can see poison where program does not expect to see it. This is the similar what GVN does. The added test test/Transforms/GVN/PRE/pre-jt-add.ll shows an example of wrong optimization done by jump threading due to GVN PRE did not patch existing incoming value. Reviewers: mkazantsev, wmi, dberlin, davide Reviewed By: dberlin Subscribers: efriedma, llvm-commits Differential Revision: https://reviews.llvm.org/D39637 llvm-svn: 317768	2017-11-09 06:02:18 +00:00
Dan Gohman	2c74fe977d	Add an @llvm.sideeffect intrinsic This patch implements Chandler's idea [0] for supporting languages that require support for infinite loops with side effects, such as Rust, providing part of a solution to bug 965 [1]. Specifically, it adds an `llvm.sideeffect()` intrinsic, which has no actual effect, but which appears to optimization passes to have obscure side effects, such that they don't optimize away loops containing it. It also teaches several optimization passes to ignore this intrinsic, so that it doesn't significantly impact optimization in most cases. As discussed on llvm-dev [2], this patch is the first of two major parts. The second part, to change LLVM's semantics to have defined behavior on infinite loops by default, with a function attribute for opting into potential-undefined-behavior, will be implemented and posted for review in a separate patch. [0] http://lists.llvm.org/pipermail/llvm-dev/2015-July/088103.html [1] https://bugs.llvm.org/show_bug.cgi?id=965 [2] http://lists.llvm.org/pipermail/llvm-dev/2017-October/118632.html Differential Revision: https://reviews.llvm.org/D38336 llvm-svn: 317729	2017-11-08 21:59:51 +00:00
Teresa Johnson	07ec7d59c2	[ThinLTO] Ensure sanitizer passes are run Summary: In ThinLTO compilation, we exit populateModulePassManager early and were not adding PM extension passes meant to run at the end of the pipeline. This includes sanitizer passes. Add these passes before the early exit. A test will be added to projects/compiler-rt. Reviewers: pcc Subscribers: mehdi_amini, inglorion, llvm-commits Differential Revision: https://reviews.llvm.org/D39565 llvm-svn: 317714	2017-11-08 19:45:52 +00:00
Mitch Phillips	0222224da6	Revert rL317618 The implemented pass fails and is breaking a large number of unit tests. Example: http://lab.llvm.org:8011/builders/clang-with-lto-ubuntu/builds/5777/steps/build-stage3-compiler/logs/stdio This reverts commit rL317618 llvm-svn: 317641	2017-11-08 00:20:53 +00:00
Dinar Temirbulatov	b9a2832874	[SLPVectorizer] Failure to beneficially vectorize 'copyable' elements in integer binary ops. Patch tries to improve vectorization of the following code: void add1(int * __restrict dst, const int * __restrict src) { dst++ = src++; dst++ = src++ + 1; dst++ = src++ + 2; dst++ = src++ + 3; } Allows to vectorize even if the very first operation is not a binary add, but just a load. Fixed PR34619 and other issues related to previous commit. Reviewers: spatel, mzolotukhin, mkuper, hfinkel, RKSimon, filcab, ABataev Reviewed By: ABataev, RKSimon Subscribers: llvm-commits, RKSimon Differential Revision: https://reviews.llvm.org/D28907 llvm-svn: 317618	2017-11-07 21:25:34 +00:00
Craig Topper	7dd4d32431	Recommit r317510 "[InstCombine] Pull shifts through a select plus binop with constant" The hexagon test should be fixed now. Original commit message: This pulls shifts through a select+binop with a constant where the select conditionally executes the binop. We already do this for just the binop, but not with the select. This can allow us to get the select closer to other selects to enable removing one. Differential Revision: https://reviews.llvm.org/D39222 llvm-svn: 317600	2017-11-07 18:47:24 +00:00
Craig Topper	386fc2516c	[InstCombine] Update stale comment. NFC Datalayout is no longer optional so the comment didn't match what the code currently does. llvm-svn: 317594	2017-11-07 17:37:32 +00:00
Adrian Prantl	25a09dd408	Make DIExpression::createFragmentExpression() return an Optional. We can't safely split arithmetic into multiple fragments because we can't express carry-over between fragments. llvm-svn: 317534	2017-11-07 00:45:34 +00:00
Davide Italiano	1a46affb45	[IPO/LowerTypesTest] Skip blockaddress(es) when replacing uses. Blockaddresses refer to the function itself, therefore replacing them would cause an assertion in doRAUW. Fixes https://bugs.llvm.org/show_bug.cgi?id=35201 This was found when trying CFI on a proprietary kernel by Dmitry Mikulin. Differential Revision: https://reviews.llvm.org/D39695 llvm-svn: 317527	2017-11-07 00:09:25 +00:00
Adrian Prantl	182f9fea37	InstCombine: salvage the debug info of DCE'ed add instructions. rdar://problem/31209283 llvm-svn: 317522	2017-11-06 22:49:39 +00:00
Hans Wennborg	8c4b10e84a	Revert r317510 "[InstCombine] Pull shifts through a select plus binop with constant" This broke the CodeGen/Hexagon/loop-idiom/pmpy-mod.ll test on a bunch of buildbots. > This pulls shifts through a select+binop with a constant where the select conditionally executes the binop. We already do this for just the binop, but not with the select. > > This can allow us to get the select closer to other selects to enable removing one. > > Differential Revision: https://reviews.llvm.org/D39222 > > git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317510 91177308-0d34-0410-b5e6-96231b3b80d8 llvm-svn: 317518	2017-11-06 22:28:02 +00:00
Xinliang David Li	a531f189fc	Fix comment /NFC llvm-svn: 317514	2017-11-06 21:57:51 +00:00
Craig Topper	8917647333	[InstCombine] Pull shifts through a select plus binop with constant This pulls shifts through a select+binop with a constant where the select conditionally executes the binop. We already do this for just the binop, but not with the select. This can allow us to get the select closer to other selects to enable removing one. Differential Revision: https://reviews.llvm.org/D39222 llvm-svn: 317510	2017-11-06 21:07:22 +00:00
Dehao Chen	5d2a1a5045	Include already promoted counts when computing SUM for VP. Summary: When computing the SUM for indirect call promotion, if the callsite is already promoted in the profile, it will be promoted before ICP. In the current implementation, ICP only sees remaining counts in SUM. This may cause extra indirect call targets being promoted. This patch updates the SUM to include the counts already promoted earlier. This way we do not end up promoting too many indirect call targets. Reviewers: tejohnson Reviewed By: tejohnson Subscribers: llvm-commits, sanjoy Differential Revision: https://reviews.llvm.org/D38763 llvm-svn: 317502	2017-11-06 19:52:49 +00:00
Sanjay Patel	629c411538	[IR] redefine 'UnsafeAlgebra' / 'reassoc' fast-math-flags and add 'trans' fast-math-flag As discussed on llvm-dev: http://lists.llvm.org/pipermail/llvm-dev/2016-November/107104.html and again more recently: http://lists.llvm.org/pipermail/llvm-dev/2017-October/118118.html ...this is a step in cleaning up our fast-math-flags implementation in IR to better match the capabilities of both clang's user-visible flags and the backend's flags for SDNode. As proposed in the above threads, we're replacing the 'UnsafeAlgebra' bit (which had the 'umbrella' meaning that all flags are set) with a new bit that only applies to algebraic reassociation - 'AllowReassoc'. We're also adding a bit to allow approximations for library functions called 'ApproxFunc' (this was initially proposed as 'libm' or similar). ...and we're out of bits. 7 bits ought to be enough for anyone, right? :) FWIW, I did look at getting this out of SubclassOptionalData via SubclassData (spacious 16-bits), but that's apparently already used for other purposes. Also, I don't think we can just add a field to FPMathOperator because Operator is not intended to be instantiated. We'll defer movement of FMF to another day. We keep the 'fast' keyword. I thought about removing that, but seeing IR like this: %f.fast = fadd reassoc nnan ninf nsz arcp contract afn float %op1, %op2 ...made me think we want to keep the shortcut synonym. Finally, this change is binary incompatible with existing IR as seen in the compatibility tests. This statement: "Newer releases can ignore features from older releases, but they cannot miscompile them. For example, if nsw is ever replaced with something else, dropping it would be a valid way to upgrade the IR." ( http://llvm.org/docs/DeveloperPolicy.html#ir-backwards-compatibility ) ...provides the flexibility we want to make this change without requiring a new IR version. Ie, we're not loosening the FP strictness of existing IR. At worst, we will fail to optimize some previously 'fast' code because it's no longer recognized as 'fast'. This should get fixed as we audit/squash all of the uses of 'isFast()'. Note: an inter-dependent clang commit to use the new API name should closely follow commit. Differential Revision: https://reviews.llvm.org/D39304 llvm-svn: 317488	2017-11-06 16:27:15 +00:00

... 5 6 7 8 9 ...

19645 Commits