llvm-project

Commit Graph

Author	SHA1	Message	Date
David Callahan	ebcf916c5a	[ADCE] Add code to remove dead branches Summary: This is last in of a series of patches to evolve ADCE.cpp to support removing of unnecessary control flow. This patch adds the code to update the control and data flow graphs to remove the dead control flow. Also update unit tests to test the capability to remove dead, may-be-infinite loop which is enabled by the switch -adce-remove-loops. Previous patches: D23824 [ADCE] Add handling of PHI nodes when removing control flow D23559 [ADCE] Add control dependence computation D23225 [ADCE] Modify data structures to support removing control flow D23065 [ADCE] Refactor anticipating new functionality (NFC) D23102 [ADCE] Refactoring for new functionality (NFC) Reviewers: dberlin, majnemer, nadav, mehdi_amini Subscribers: llvm-commits, david2050, freik, twoh Differential Revision: https://reviews.llvm.org/D24918 llvm-svn: 289548	2016-12-13 16:42:18 +00:00
Craig Topper	ac75bca1eb	[X86][InstCombine] Fix SimplifyDemandedVectorElts to handle frcz scalar intrinsics correctly. Only the lower bits of the input element are used. And only the lower element can be undef since the upper bits are zeroed. Have InstCombineCalls call SimplifyDemandedVectorElts for these intrinsics to reuse this support. llvm-svn: 289523	2016-12-13 07:45:45 +00:00
Rong Xu	51a1e3c430	[PGO] Fix insane counts due to nonreturn calls Summary: Since we don't break BBs for function calls. We might get some insane counts (wrap of unsigned) in the presence of noreturn calls. This patch sets these counts to zero instead of the wrapped number. Reviewers: davidxl Subscribers: xur, eraman, llvm-commits Differential Revision: https://reviews.llvm.org/D27602 llvm-svn: 289521	2016-12-13 06:41:14 +00:00
Davide Italiano	463bebc319	[SCCP] Debug diagnostic goes under DEBUG(). NFCI. llvm-svn: 289519	2016-12-13 05:56:04 +00:00
Matthew Simpson	92ce0230b5	[SLP] Fix sign-extends for type-shrinking This patch ensures the correct minimum bit width during type-shrinking. Previously when type-shrinking, we always sign-extended values back to their original width. However, if we are going to sign-extend, and the sign bit is unknown, we have to increase the minimum bit width by one bit so the sign-extend will fill the upper bits correctly. If the sign bit is known to be zero, we can perform a zero-extend instead. This should fix PR31243. Reference: https://llvm.org/bugs/show_bug.cgi?id=31243 Differential Revision: https://reviews.llvm.org/D27466 llvm-svn: 289470	2016-12-12 21:11:04 +00:00
Teresa Johnson	a29bd6ffcc	[ThinLTO] Remove useless code (NFC) Should have been removed in r288446. llvm-svn: 289466	2016-12-12 20:34:28 +00:00
Sanjay Patel	e730ce87a5	[InstCombine] fix bug when offsetting case values of a switch (PR31260) We could truncate the condition and then try to fold the add into the original condition value causing wrong case constants to be used. Move the offset transform ahead of the truncate transform and return after each transform, so there's no chance of getting confused values. Fix for: https://llvm.org/bugs/show_bug.cgi?id=31260 llvm-svn: 289442	2016-12-12 16:13:52 +00:00
Sanjay Patel	87e2f677d7	[InstCombine] clean up range-for-loops in visitSwitchInst(); NFCI llvm-svn: 289439	2016-12-12 15:52:56 +00:00
Craig Topper	7fc6d34ed1	[InstCombine][XOP] The instructions for the scalar frcz intrinsics are defined to put 0 in the upper bits, not pass bits through like other intrinsics. So we should return a zero vector instead. llvm-svn: 289411	2016-12-11 22:32:38 +00:00
Davide Italiano	0a1476c756	[SCCP] Use the appropriate helper function. NFCI. llvm-svn: 289406	2016-12-11 21:19:03 +00:00
Craig Topper	23ebd9564f	[X86][InstCombine] Add support for scalar FMA intrinsics to SimplifyDemandedVectorElts. This teaches SimplifyDemandedElts that the FMA can be removed if the lower element isn't used. It also teaches it that if upper elements of the first operand aren't used then we can simplify them. llvm-svn: 289377	2016-12-11 08:54:52 +00:00
Craig Topper	61b280e7b0	[X86][InstCombine] Teach InstCombineCalls to simplify demanded elements for scalar FMA intrinsics. These intrinsics don't read the upper bits of their second and third inputs so we can try to simplify them. llvm-svn: 289372	2016-12-11 07:42:06 +00:00
Craig Topper	d96395365a	[AVX-512][InstCombine] Teach InstCombineCalls how to simplify demanded for scalar cmp intrinsics with masking and rounding. These intrinsics don't read the upper elements of their first and second input. These are slightly different the the SSE version which does use the upper bits of its first element as passthru bits since the result goes to an XMM register. For AVX-512 the result goes to a mask register instead. llvm-svn: 289371	2016-12-11 07:42:04 +00:00
Craig Topper	790d0fa569	[AVX-512][InstCombine] Teach InstCombineCalls how to simplify demanded elements for scalar add,div,mul,sub,max,min intrinsics with masking and rounding. These intrinsics don't read the upper bits of their second input. And the third input is the passthru for masking and that only uses the lower element as well. llvm-svn: 289370	2016-12-11 07:42:01 +00:00
Craig Topper	58917f3508	[AVX-512][InstCombine] Add 512-bit vpermilvar intrinsics to InstCombineCalls to match 128 and 256-bit. llvm-svn: 289354	2016-12-11 01:59:36 +00:00
Craig Topper	9a63d7ade5	[X86][InstCombine] Teach InstCombineCalls to turn pshufb intrinsic into a shufflevector if the indices are constant. llvm-svn: 289348	2016-12-11 00:23:50 +00:00
Sanjay Patel	4c48bbe94d	[InstCombine] add helper for shift-by-shift folds; NFCI These are currently limited to integer types, but we should be able to extend to splat vectors and possibly general vectors. llvm-svn: 289343	2016-12-10 22:16:29 +00:00
Davide Italiano	824d695231	[SCCP] Teach the pass about `mul %x 0` even if %x is overdefined. The motivating example is: extern int patatino; int goo() { int x = 0; for (int i = 0; i < 1000000; ++i) { x *= patatino; } return x; } Currently SCCP will not realize that this function returns always zero, therefore will try to unroll and vectorize the loop at -O3 producing an awful lot of (useless) code. With this change, it will just produce: 0000000000000000 <g>: xor %eax,%eax retq llvm-svn: 289175	2016-12-09 03:08:42 +00:00
Peter Collingbourne	8786754cc3	WholeProgramDevirt: Teach the pass to handle structs of arrays. This will become necessary in some cases once D22296 lands. llvm-svn: 289165	2016-12-09 01:10:11 +00:00
Peter Collingbourne	7a1e5bbe4e	Make WholeProgramDevirt understand ConstStruct vtables. Based on a patch by LemonBoy! Differential Revision: https://reviews.llvm.org/D26581 llvm-svn: 289162	2016-12-09 00:33:27 +00:00
Davide Italiano	54c683f9e7	[SCCP] Make sure SCCP and ConstantFolding agree on undef >> a. Currently SCCP folds the value to -1, while ConstantProp folds to 0. This changes SCCP to do what ConstantFolding does. llvm-svn: 289147	2016-12-08 22:28:53 +00:00
Alexey Bataev	4f0d469d45	[SLP] Fix for PR6246: vectorization for scalar ops on vector elements. When trying to vectorize trees that start at insertelement instructions function tryToVectorizeList() uses vectorization factor calculated as MinVecRegSize/ScalarTypeSize. But sometimes it does not work as tree cost for this fixed vectorization factor is too high. Patch tries to improve the situation. It tries different vectorization factors from max(PowerOf2Floor(NumberOfVectorizedValues), MinVecRegSize/ScalarTypeSize) to MinVecRegSize/ScalarTypeSize and tries to choose the best one. Differential Revision: https://reviews.llvm.org/D27215 llvm-svn: 289043	2016-12-08 11:57:51 +00:00
Evgeniy Stepanov	0c8957c198	CFI-icall on Thumb Replace @progbits in the section directive with %progbits, because "@" starts a comment on arm/thumb. Use b.w branch instruction. Use .thumb_function and .thumb_set for proper arm/thumb interwork. This way jumptable entry addresses on thumb have bit 0 set (correctly). This does not affect CFI check math, because the address of the jumptable start also has that bit set. This does not work on thumbv5, because it does not support b.w, and the linker would not insert a veneer (trampoline?) to extend the range of b.n. We may need to do full-range plt-style jumptables on thumbv54, which are 12 bytes per entry. Another option is "push lr; bl; pop pc" (4 bytes) but that needs unwinding instructions, etc. Differential Revision: https://reviews.llvm.org/D27499 llvm-svn: 289008	2016-12-08 00:32:26 +00:00
Davide Italiano	1ed5396304	[BDCE] Skip metadata while replacing uses. The fix committed in r288851 doesn't cover all the cases. In particular, if we have an instruction with side effects which has a no non-dbg use not depending on the bits, we still perform RAUW destroying the dbg.value's first argument. Prevent metadata from being replaced here to avoid the issue. Differential Revision: https://reviews.llvm.org/D27534 llvm-svn: 288987	2016-12-07 21:47:32 +00:00
Eli Friedman	c6885fc369	[GVNHoist] Invalidate MemDep when an instruction is moved. See also r279907. Fixes https://llvm.org/bugs/show_bug.cgi?id=30991 . Differential Revision: https://reviews.llvm.org/D27493 llvm-svn: 288968	2016-12-07 19:55:59 +00:00
Matthew Simpson	364da7e527	[LV] Scalarize operands of predicated instructions This patch attempts to scalarize the operand expressions of predicated instructions if they were conditionally executed in the original loop. After scalarization, the expressions will be sunk inside the blocks created for the predicated instructions. The transformation essentially performs un-if-conversion on the operands. The cost model has been updated to determine if scalarization is profitable. It compares the cost of a vectorized instruction, assuming it will be if-converted, to the cost of the scalarized instruction, assuming that the instructions corresponding to each vector lane will be sunk inside a predicated block, possibly avoiding execution. If it's more profitable to scalarize the entire expression tree feeding the predicated instruction, the expression will be scalarized; otherwise, it will be vectorized. We only consider the cost of the entire expression to accurately estimate the cost of the required insertelement and extractelement instructions. Differential Revision: https://reviews.llvm.org/D26083 llvm-svn: 288909	2016-12-07 15:03:32 +00:00
Benjamin Kramer	b1332d8bf6	Try unbreaking the MSVC build. llvm-svn: 288907	2016-12-07 13:35:11 +00:00
Benjamin Kramer	926ab5b00b	[LowerTypeTests] Use the TrailingObjects infrastructure for trailing objects. Also avoid allocating ~3x as much memory as needed. llvm-svn: 288904	2016-12-07 12:31:45 +00:00
Andrea Di Biagio	ae5780104f	When GVN removes a redundant load, it should not modify the debug location of the dominating load. In the case of a fully redundant load LI dominated by an equivalent load V, GVN should always preserve the original debug location of V. Otherwise, we risk to introduce an incorrect stepping. If V has debug info, then clearly it should not be modified. If V has a null debugloc, then it is still potentially incorrect to propagate LI's debugloc because LI may not post-dominate V. Differential Revision: https://reviews.llvm.org/D27468 llvm-svn: 288903	2016-12-07 12:31:36 +00:00
Andrea Di Biagio	eff22832c0	[InlineFunction] Refactor code in function `fixupLineNumbers' as suggested by David in D27462. NFC llvm-svn: 288901	2016-12-07 12:01:45 +00:00
Andrea Di Biagio	32d5aedd5b	[InlineFunction] Do not propagate the callsite debug location to instructions inlined from functions with debug info. When a function F is inlined, InlineFunction extends the debug location of every instruction inlined from F by adding an InlinedAt. However, if an instruction has a 'null' debug location, InlineFunction would propagate the callsite debug location to it. This behavior existed since revision 210459. Revision 210459 was originally committed specifically to workaround the lack of debug information for instructions inlined from intrinsic functions (which are usually declared with attributes `__always_inline__, __nodebug__`). The problem with revision 210459 is that it doesn't make any sort of distinction between instructions inlined from a 'nodebug' function and instructions which are inlined from a function built with debug info. This issue may lead to incorrect stepping in the debugger. This patch works under the assumption that a nodebug function does not have a DISubprogram. When a function F is inlined into another function G, InlineFunction checks if F has debug info associated with it. For nodebug functions, the InlineFunction logic is unchanged (i.e. it would still propagate the callsite debugloc to the inlined instructions). Otherwise, InlineFunction no longer propagates the callsite debug location. Differential Revision: https://reviews.llvm.org/D27462 llvm-svn: 288895	2016-12-07 10:37:26 +00:00
Peter Collingbourne	7357b2ad62	LowerTypeTests: Improve performance by optimising type metadata queries. Requesting metadata for a global is a relatively expensive operation as it involves a map lookup, but it's one that we need to do relatively frequently in this pass to collect the list of type metadata nodes associated with a global. This change improves the performance of type metadata queries by prebuilding data structures that keep the global together with its list of type metadata, and changing the pass to use that data structure wherever we were previously passing global references around. This change also eliminates some O(N^2) behavior by collecting the list of globals associated with each type identifier during the first pass over the list of globals rather than visiting each global to compute that list every time we add a new type identifier. Reduces pass runtime on a module containing Chrome's vtables from over 60s to 0.9s. Differential Revision: https://reviews.llvm.org/D27484 llvm-svn: 288859	2016-12-06 23:02:13 +00:00
Davide Italiano	043e66137c	[BDCE/DebugInfo] Preserve llvm.dbg.value's argument. BDCE has two phases: 1. It asks SimplifyDemandedBits if all the bits of an instruction are dead, and if so, replaces all its uses with the constant zero. 2. Then, it asks SimplifyDemandedBits again if the instruction is really dead (no side effects etc..) and if so, eliminates it. Now, in 1) if all the bits of an instruction are dead, we may end up replacing a dbg use: %call = tail call i32 (...) @g() #4, !dbg !15 tail call void @llvm.dbg.value(metadata i32 %call, i64 0, metadata !8, metadata !16), !dbg !17 -> %call = tail call i32 (...) @g() #4, !dbg !15 tail call void @llvm.dbg.value(metadata i32 0, i64 0, metadata !8, metadata !16), !dbg !17 but not eliminating the call because it may have arbitrary side effects. In other words, we lose some debug informations. This patch fixes the problem making sure that BDCE does nothing with the instruction if it has side effects and no non-dbg uses. Differential Revision: https://reviews.llvm.org/D27471 llvm-svn: 288851	2016-12-06 21:52:47 +00:00
Davide Italiano	df670a1984	Revert "[SCCP] Remove manual folding of terminator instructions." This reverts commit r288725 as it broke a bot. llvm-svn: 288759	2016-12-06 02:26:50 +00:00
Davide Italiano	3dad93d9ef	[SCCP] Remove manual folding of terminator instructions. There are two cases handled here: 1) a branch on undef 2) a switch with an undef condition. Both cases are currently handled by ResolvedUndefsIn. If we have a branch on undef, we force its value to false (which is trivially foldable). If we have a switch on undef, we force to the first constant (which is also foldable). llvm-svn: 288725	2016-12-05 23:04:21 +00:00
Adrian Prantl	941fa7588b	[DIExpression] Introduce a dedicated DW_OP_LLVM_fragment operation so we can stop using DW_OP_bit_piece with the wrong semantics. The entire back story can be found here: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20161114/405934.html The gist is that in LLVM we've been misinterpreting DW_OP_bit_piece's offset field to mean the offset into the source variable rather than the offset into the location at the top the DWARF expression stack. In order to be able to fix this in a subsequent patch, this patch introduces a dedicated DW_OP_LLVM_fragment operation with the semantics that we used to apply to DW_OP_bit_piece, which is what we actually need while inside of LLVM. This patch is complete with a bitcode upgrade for expressions using the old format. It does not yet fix the DWARF backend to use DW_OP_bit_piece correctly. Implementation note: We discussed several options for implementing this, including reserving a dedicated field in DIExpression for the fragment size and offset, but using an custom operator at the end of the expression works just fine and is more efficient because we then only pay for it when we need it. Differential Revision: https://reviews.llvm.org/D27361 rdar://problem/29335809 llvm-svn: 288683	2016-12-05 18:04:47 +00:00
Sanjay Patel	b7f8cb698c	[InstCombine] change select type to eliminate bitcasts This solves a secondary problem seen in PR6137: https://llvm.org/bugs/show_bug.cgi?id=6137#c6 This is similar to the bitwise logic op fold added with: https://reviews.llvm.org/rL287707 And like that patch, I'm artificially restricting the transform from vector <-> scalar types until we're sure that the backend can handle that. llvm-svn: 288584	2016-12-03 15:25:16 +00:00
Michael Kuperstein	997dac8709	Remove stale comment. NFC. llvm-svn: 288572	2016-12-03 01:59:13 +00:00
Kostya Serebryany	520753a321	[sanitizer-coverage] use IRB.SetCurrentDebugLocation after IRB.SetInsertPoint llvm-svn: 288568	2016-12-03 01:43:30 +00:00
Rong Xu	a5b5745a62	[PGO] Fix PGO use ICE when there are unreachable BBs For -O0 there might be unreachable BBs, which breaks the assumption that all the BBs have an auxiliary data structure. In this patch, we add another interface called findBBInfo() so that a nullptr can be returned for the unreachable BBs (and the callers can ignore those BBs). This fixes the bug reported https://llvm.org/bugs/show_bug.cgi?id=31209 Differential Revision: https://reviews.llvm.org/D27280 llvm-svn: 288528	2016-12-02 19:10:29 +00:00
Renato Golin	5b8e7ecdb3	Revert "[SLP] Fix for PR6246: vectorization for scalar ops on vector elements." This reverts commit r288497, as it broke the AArch64 build of Compiler-RT's builtins (twice: once in r288412 and once in r288497). We should investigate this offline. llvm-svn: 288508	2016-12-02 16:56:26 +00:00
Alexey Bataev	e8e94a7176	[SLP] Fix for PR6246: vectorization for scalar ops on vector elements. When trying to vectorize trees that start at insertelement instructions function tryToVectorizeList() uses vectorization factor calculated as MinVecRegSize/ScalarTypeSize. But sometimes it does not work as tree cost for this fixed vectorization factor is too high. Patch tries to improve the situation. It tries different vectorization factors from max(PowerOf2Floor(NumberOfVectorizedValues), MinVecRegSize/ScalarTypeSize) to MinVecRegSize/ScalarTypeSize and tries to choose the best one. Differential Revision: https://reviews.llvm.org/D27215 llvm-svn: 288497	2016-12-02 12:20:22 +00:00
Peter Collingbourne	bc0705240e	IR: Move NumElements field from {Array,Vector}Type to SequentialType. Now that PointerType is no longer a SequentialType, all SequentialTypes have an associated number of elements, so we can move that information to the base class, allowing for a number of simplifications. Differential Revision: https://reviews.llvm.org/D27122 llvm-svn: 288464	2016-12-02 03:20:58 +00:00
Dehao Chen	c3be225895	Change LoopUnrollPass cost from int to unsigned to make it consistent. (NFC) llvm-svn: 288463	2016-12-02 03:17:07 +00:00
Peter Collingbourne	4568158c4d	IR: Change PointerType to derive from Type rather than SequentialType. As proposed on llvm-dev: http://lists.llvm.org/pipermail/llvm-dev/2016-October/106640.html This is for a couple of reasons: - Values of type PointerType are unlike the other SequentialTypes (arrays and vectors) in that they do not hold values of the element type. By moving PointerType we can unify certain aspects of how the other SequentialTypes are handled. - PointerType will have no place in the SequentialType hierarchy once pointee types are removed, so this is a necessary step towards removing pointee types. Differential Revision: https://reviews.llvm.org/D26595 llvm-svn: 288462	2016-12-02 03:05:41 +00:00
Peter Collingbourne	ab85225be4	IR: Change the gep_type_iterator API to avoid always exposing the "current" type. Instead, expose whether the current type is an array or a struct, if an array what the upper bound is, and if a struct the struct type itself. This is in preparation for a later change which will make PointerType derive from Type rather than SequentialType. Differential Revision: https://reviews.llvm.org/D26594 llvm-svn: 288458	2016-12-02 02:24:42 +00:00
Teresa Johnson	185b4ab6d4	[ThinLTO] Stop importing constant global vars as copies in the backend Summary: We were doing an optimization in the ThinLTO backends of importing constant unnamed_addr globals unconditionally as a local copy (regardless of whether the thin link decided to import them). This should be done in the thin link instead, so that resulting exported references are marked and promoted appropriately, but will need a summary enhancement to mark these variables as constant unnamed_addr. The function import logic during the thin link was trying to handle this proactively, by conservatively marking all values referenced in the initializer lists of exported global variables as also exported. However, this only handled values referenced directly from the initializer list of an exported global variable. If the value is itself a constant unnamed_addr variable, we could end up exporting its references as well. This caused multiple issues. The first is that the transitively exported references weren't promoted. Secondly, some could not be promoted/renamed (e.g. they had a section or other constraint). recursively, instead of just adding the first level of initializer list references to the ExportList directly. Remove this optimization and the associated handling in the function import backend. SPEC measurements indicate we weren't getting much from it in any case. Fixes PR31052. Reviewers: mehdi_amini Subscribers: krasin, llvm-commits Differential Revision: https://reviews.llvm.org/D26880 llvm-svn: 288446	2016-12-02 01:02:30 +00:00
Artem Belevich	704395a25a	Revert "[SLP] Fix for PR6246: vectorization for scalar ops on vector elements." This reverts r288412 which causes severe compile-time regression. llvm-svn: 288431	2016-12-01 22:52:15 +00:00
Philip Reames	89e92d21b4	[PR29121] Don't fold if it would produce atomic vector loads or stores The instcombine code which folds loads and stores into their use types can trip up if the use is a bitcast to a type which we can't directly load or store in the IR. In principle, such types shouldn't exist, but in practice they do today. This is a workaround to avoid a bug while we work towards the long term goal. Differential Revision: https://reviews.llvm.org/D24365 llvm-svn: 288415	2016-12-01 20:17:06 +00:00
Philip Reames	4d00af1bde	Factor out common parts of LVI and Float2Int into ConstantRange [NFCI] This just extracts out the transfer rules for constant ranges into a single shared point. As it happens, neither bit of code actually overlaps in terms of the handled operators, but with this change that could easily be tweaked in the future. I also want to have this separated out to make experimenting with a eager value info implementation and possibly a ValueTracking-like fixed depth recursion peephole version. There's no reason all four of these can't share a common implementation which reduces the chances of bugs. Differential Revision: https://reviews.llvm.org/D27294 llvm-svn: 288413	2016-12-01 20:08:47 +00:00

1 2 3 4 5 ...

16655 Commits