llvm-project

Commit Graph

Author	SHA1	Message	Date
Hal Finkel	b54579fab6	[PowerPC] Don't apply the PPC64 address-formation peephole for offsets greater than 7 When applying our address-formation PPC64 peephole, we are reusing the @ha TOC addis value with the low parts associated with different offsets (i.e. different effective symbol addends). We were assuming this was okay so long as the offsets were less than the alignment of the global variable being accessed. This ignored the fact, however, that the TOC base pointer itself need only be 8-byte aligned. As a result, what we were doing is legal only for offsets less than 8 regardless of the alignment of the object being accessed. Fixes PR28727. llvm-svn: 280441	2016-09-02 00:28:20 +00:00
Hal Finkel	1e8218cc09	[PowerPC] Don't consider fusion in PPC64 address-formation peephole The logic in this function assumes that the P8 supports fusion of addis/addi, but it does not. As a result, there is no advantage to restricting our peephole application, merging addi instructions into dependent memory accesses, even when the addi has multiple users, regardless of whether or not we're optimizing for size. We might need something like this again for the P9; I suspect we'll revisit this code when we work on P9 tuning. llvm-svn: 280440	2016-09-02 00:27:50 +00:00
Michael Kuperstein	7bc54cebea	[Legalizer] Don't throw away false low half when expanding GT/LT SETCC When expanding a SETCC for which the low half is known to evaluate to false, we can only throw it away for LT/GT comparisons, not LE/GE. This fixes PR29170. Differential Revision: https://reviews.llvm.org/D24151 llvm-svn: 280424	2016-09-01 23:02:32 +00:00
Michael Kuperstein	5f17d08f49	[SelectionDAG] Generate vector_shuffle nodes for undersized result vector sizes Prior to this, we could generate a vector_shuffle from an IR shuffle when the size of the result was exactly the sum of the sizes of the input vectors. If the output vector was narrower - e.g. a <12 x i8> being formed by a shuffle with two <8 x i8> inputs - we would lower the shuffle to a sequence of extracts and inserts. Instead, we can form a larger vector_shuffle, and then extract a subvector of the right size - e.g. shuffle the two <8 x i8> inputs into a <16 x i8> and then extract a <12 x i8>. This also includes a target-specific X86 combine that in the presence of AVX2 combines: (vector_shuffle <mask> (concat_vectors t1, undef) (concat_vectors t2, undef)) into: (vector_shuffle <mask> (concat_vectors t1, t2), undef) in cases where this allows us to form VPERMD/VPERMQ. (This is not a separate commit, as that pattern does not appear without the DAGBuilder change.) llvm-svn: 280418	2016-09-01 21:32:09 +00:00
Heejin Ahn	c0f18172f5	[WebAssembly] Add asm.js-style setjmp/longjmp handling for wasm (reland r280302) Summary: This patch adds asm.js-style setjmp/longjmp handling support for WebAssembly. It also uses JavaScript's try and catch mechanism. Reviewers: jpp, dschuff Subscribers: jfb, dschuff Differential Revision: https://reviews.llvm.org/D24121 llvm-svn: 280415	2016-09-01 21:05:15 +00:00
Tim Northover	8d8812c5d7	GlobalISel: add a G_PHI instruction to give phis a type. They're another source of generic vregs, which are going to need a type on the definition when we remove the register width from MachineRegisterInfo. llvm-svn: 280412	2016-09-01 20:45:41 +00:00
Sanjay Patel	199a8e6a92	[InstCombine] add tests to show potential shuffle+insert folds llvm-svn: 280403	2016-09-01 19:14:19 +00:00
Andrey Turetskiy	cde38b6a99	[X86] Loosen memory folding requirements for cvtdq2pd and cvtps2pd instructions. According to spec cvtdq2pd and cvtps2pd instructions don't require memory operand to be aligned to 16 bytes. This patch removes this requirement from the memory folding table. Differential Revision: https://reviews.llvm.org/D23919 llvm-svn: 280402	2016-09-01 18:50:02 +00:00
Yaxun Liu	add05a8d95	AMDGPU: Add runtime metadata for pointee alignment of argument. Add runtime metdata for pointee alignment of pointer type kernel argument. The key is KeyArgPointeeAlign and the value is a 32 bit unsigned integer. Differential Revision: https://reviews.llvm.org/D24145 llvm-svn: 280399	2016-09-01 18:46:49 +00:00
Michael Kuperstein	65bc3c89ff	[DAGCombine] Don't fold a trunc if it feeds an anyext Legalization tends to create anyext(trunc) patterns. This should always be combined - into either a single trunc, a single ext, or nothing if the types match exactly. But if we happen to combine the trunc first, we may pull the trunc away from the anyext or make it implicit (e.g. the truncate(extract) -> extract(bitcast) fold). To prevent this, we can avoid doing the fold, similarly to how we already handle fpround(fpextend). Differential Revision: https://reviews.llvm.org/D23893 llvm-svn: 280386	2016-09-01 17:59:24 +00:00
Simon Dardis	bd27154757	[mips] interAptiv based generic schedule model This scheduler describes a processor which covers all MIPS ISAs based around the interAptiv and P5600 timings. Reviewers: vkalintiris, dsanders Differential Revision: https://reviews.llvm.org/D23551 llvm-svn: 280374	2016-09-01 14:53:53 +00:00
Sanjay Patel	dd861964d1	[InstCombine] remove fold of an icmp pattern that should never happen While removing a scalar shackle from an icmp fold, I noticed that I couldn't find any tests to trigger this code path. The 'and' shrinking transform should be handled by InstCombiner::foldCastedBitwiseLogic() or eliminated with InstSimplify. The icmp narrowing is part of InstCombiner::foldICmpWithCastAndCast(). Differential Revision: https://reviews.llvm.org/D24031 llvm-svn: 280370	2016-09-01 14:20:43 +00:00
Krzysztof Parzyszek	07d9f53b51	[Hexagon] Deal with undefs when extending live intervals Reapply r280275, since MSVC accepts r280358. llvm-svn: 280369	2016-09-01 13:59:35 +00:00
Elena Demikhovsky	4d7738dfde	Optimized FMA intrinsic + FNEG , like -(ab+c) and FNEG + FMA, like ab-c or (-a)*b+c. The bug description is here : https://llvm.org/bugs/show_bug.cgi?id=28892 Differential revision: https://reviews.llvm.org/D23313 llvm-svn: 280368	2016-09-01 13:58:53 +00:00
James Molloy	88cad7e5cf	[SimplifyCFG] Handle tail-sinking of more than 2 incoming branches This was a real restriction in the original version of SinkIfThenCodeToEnd. Now it's been rewritten, the restriction can be lifted. As part of this, we handle a very common and useful case where one of the incoming branches is actually conditional. Consider: if (a) x(1); else if (b) x(2); This produces the following CFG: [if] / \ [x(1)] [if] \| \| \ \| \| \ \| [x(2)] \| \ \| / [ end ] [end] has two unconditional predecessor arcs and one conditional. The conditional refers to the implicit empty 'else' arc. This same pattern can also be caused by an empty default block in a switch. We can't sink the call to x() down to end because no call to x() happens on the third incoming arc (assume that x() has sideeffects for the sake of argument; if something is safe to speculate we could indeed sink nevertheless but this cannot happen in the general case and causes many extra selects). We are now able to detect this case and split off the unconditional arcs to a common successor: [if] / \ [x(1)] [if] \| \| \ \| \| \ \| [x(2)] \| \ / \| [sink.split] \| \ / [ end ] Now we can sink the call to x() into %sink.split. This can cause significant code simplification in many testcases. llvm-svn: 280364	2016-09-01 12:58:13 +00:00
James Molloy	eec6df3193	[SimplifyCFG] Change the algorithm in SinkThenElseCodeToEnd r279460 rewrote this function to be able to handle more than two incoming edges and took pains to ensure this didn't regress anything. This time we change the logic for determining if an instruction should be sunk. Previously we used a single pass greedy algorithm - sink instructions until one requires more than one PHI node or we run out of instructions to sink. This had the problem that sinking instructions that had non-identical but trivially the same operands needed extra logic so we sunk them aggressively. For example: %a = load i32* %b %d = load i32* %b %c = gep i32* %a, i32 0 %e = gep i32* %d, i32 1 Sinking %c and %e would naively require two PHI merges as %a != %d. But the loads are obviously equivalent (and maybe can't be hoisted because there is no common predecessor). This is why we implemented the fairly complex function areValuesTriviallySame(), to look through trivial differences like this. However it's just not clever enough. Instead, throw areValuesTriviallySame away, use pointer equality to check equivalence of operands and switch to a two-stage algorithm. In the "scan" stage, we look at every sinkable instruction in isolation from end of block to front. If it's sinkable, we keep track of all operands that required PHI merging. In the "sink" stage, we iteratively sink the last non-terminator in the source blocks. But when calculating how many PHIs are actually required to be inserted (to work out if we should stop or not) we remove any values that have already been sunk from the set of PHI-merges required, which allows us to be more aggressive. This turns an algorithm with potentially recursive lookahead (looking through GEPs, casts, loads and any other instruction potentially not CSE'd) to two linear scans. llvm-svn: 280351	2016-09-01 10:44:35 +00:00
Hal Finkel	5081ac27c7	Add ISD::EH_DWARF_CFA, simplify @llvm.eh.dwarf.cfa on Mips, fix on PowerPC LLVM has an @llvm.eh.dwarf.cfa intrinsic, used to lower the GCC-compatible __builtin_dwarf_cfa() builtin. As pointed out in PR26761, this is currently broken on PowerPC (and likely on ARM as well). Currently, @llvm.eh.dwarf.cfa is lowered using: ADD(FRAMEADDR, FRAME_TO_ARGS_OFFSET) where FRAME_TO_ARGS_OFFSET defaults to the constant zero. On x86, FRAME_TO_ARGS_OFFSET is lowered to 2*SlotSize. This setup, however, does not work for PowerPC. Because of the way that the stack layout works, the canonical frame address is not exactly (FRAMEADDR + FRAME_TO_ARGS_OFFSET) on PowerPC (there is a lower save-area offset as well), so it is not just a matter of implementing FRAME_TO_ARGS_OFFSET for PowerPC (unless we redefine its semantics -- We can do that, since it is currently used only for @llvm.eh.dwarf.cfa lowering, but the better to directly lower the CFA construct itself (since it can be easily represented as a fixed-offset FrameIndex)). Mips currently does this, but by using a custom lowering for ADD that specifically recognizes the (FRAMEADDR, FRAME_TO_ARGS_OFFSET) pattern. This change introduces a ISD::EH_DWARF_CFA node, which by default expands using the existing logic, but can be directly lowered by the target. Mips is updated to use this method (which simplifies its implementation, and I suspect makes it more robust), and updates PowerPC to do the same. Fixes PR26761. Differential Revision: https://reviews.llvm.org/D24038 llvm-svn: 280350	2016-09-01 10:28:47 +00:00
Hal Finkel	40d7f5c277	Add a counter-function insertion pass As discussed in https://reviews.llvm.org/D22666, our current mechanism to support -pg profiling, where we insert calls to mcount(), or some similar function, is fundamentally broken. We insert these calls in the frontend, which means they get duplicated when inlining, and so the accumulated execution counts for the inlined-into functions are wrong. Because we don't want the presence of these functions to affect optimizaton, they should be inserted in the backend. Here's a pass which would do just that. The knowledge of the name of the counting function lives in the frontend, so we're passing it here as a function attribute. Clang will be updated to use this mechanism. Differential Revision: https://reviews.llvm.org/D22825 llvm-svn: 280347	2016-09-01 09:42:39 +00:00
Dean Michael Berris	e8ae5baaf7	[XRay] Detect and emit sleds for sibling/tail calls Summary: This change promotes the 'isTailCall(...)' member function to TargetInstrInfo as a query interface for determining on a per-target basis whether a given MachineInstr is a tail call instruction. We build upon this in the XRay instrumentation pass to emit special sleds for tail call optimisations, where we emit the correct kind of sled. The tail call sleds look like a mix between the function entry and function exit sleds. Form-wise, the sled comes before the "jmp" instruction that implements the tail call similar to how we do it for the function entry sled. Functionally, because we know this is a tail call, it behaves much like an exit sled -- i.e. at runtime we may use the exit trampolines instead of a different kind of trampoline. A follow-up change to recognise these sleds will be done in compiler-rt, so that we can start intercepting these initially as exits, but also have the option to have different log entries to more accurately reflect that this is actually a tail call. Reviewers: echristo, rSerge, majnemer Subscribers: mehdi_amini, dberris, llvm-commits Differential Revision: https://reviews.llvm.org/D23986 llvm-svn: 280334	2016-09-01 01:29:13 +00:00
Heejin Ahn	10a7086700	Revert "Add asm.js-style setjmp/longjmp handling for wasm" This reverts commit r280302, it broke the integration tests. llvm-svn: 280329	2016-09-01 00:44:37 +00:00
Nick Lewycky	97e49ac59e	Add -fprofile-dir= to clang. -fprofile-dir=path allows the user to specify where .gcda files should be emitted when the program is run. In particular, this is the first flag that causes the .gcno and .o files to have different paths, LLVM is extended to support this. -fprofile-dir= does not change the file name in the .gcno (and thus where lcov looks for the source) but it does change the name in the .gcda (and thus where the runtime library writes the .gcda file). It's different from a GCOV_PREFIX because a user can observe that the GCOV_PREFIX_STRIP will strip paths off of -fprofile-dir= but not off of a supplied GCOV_PREFIX. To implement this we split -coverage-file into -coverage-data-file and -coverage-notes-file to specify the two different names. The !llvm.gcov metadata node grows from a 2-element form {string coverage-file, node dbg.cu} to 3-elements, {string coverage-notes-file, string coverage-data-file, node dbg.cu}. In the 3-element form, the file name is already "mangled" with .gcno/.gcda suffixes, while the 2-element form left that to the middle end pass. llvm-svn: 280306	2016-08-31 23:04:32 +00:00
Heejin Ahn	23d57103a4	Add asm.js-style setjmp/longjmp handling for wasm Summary: This patch adds asm.js-style setjmp/longjmp handling support for WebAssembly. It also uses JavaScript's try and catch mechanism. Reviewers: jpp, dschuff Subscribers: jfb, dschuff Differential Revision: https://reviews.llvm.org/D23928 llvm-svn: 280302	2016-08-31 22:40:34 +00:00
Reid Kleckner	109448ee81	Revert "Add an optional parameter with a list of undefs to extendToIndices" This reverts commit r280268, it causes all MSVC 2013 to ICE. This appears to have been fixed in a later MSVC 2013 update, because I cannot reproduce it locally. That said, all upstream LLVM bots are broken right now, so I am reverting. Also reverts dependent change r280275, "[Hexagon] Deal with undefs when extending live intervals". llvm-svn: 280301	2016-08-31 22:36:02 +00:00
Sanjay Patel	0d70831d73	[InstCombine] allow icmp (shr exact X, C2), C fold for splat constant vectors The enhancement to foldICmpDivConstant ( http://llvm.org/viewvc/llvm-project?view=revision&revision=280299 ) allows us to remove the ConstantInt check; no other changes needed. llvm-svn: 280300	2016-08-31 22:18:43 +00:00
Sanjay Patel	541aef4661	[InstCombine] allow icmp (div X, Y), C folds for splat constant vectors Converting all of the overflow ops to APInt looked risky, so I've left that as a TODO. llvm-svn: 280299	2016-08-31 21:57:21 +00:00
Matt Arsenault	b50eb8dc2b	AMDGPU: Fix introducing stack access on unaligned v16i8 llvm-svn: 280298	2016-08-31 21:52:27 +00:00
Tim Northover	11a2354670	GlobalISel: use G_TYPE to annotate physregs with a type. More preparation for dropping source types from MachineInstrs: regsters coming out of already-selected code (i.e. non-generic instructions) don't have a type, but that information is needed so we must add it manually. This is done via a new G_TYPE instruction. llvm-svn: 280292	2016-08-31 21:24:02 +00:00
Derek Schuff	1b258d313c	[WebAssembly] Disable folding of GA+reg into load/store constant offsets Summary: If the register has a negative value then unsigned overflow will occur; this case is sometimes even created intentionally by LSR. For now disable GA+reg folding. Fixes PR29127 Differential Revision: https://reviews.llvm.org/D24053 llvm-svn: 280285	2016-08-31 20:27:20 +00:00
Geoff Berry	8d84605f25	[EarlyCSE] Optionally use MemorySSA. NFC. Summary: Use MemorySSA, if requested, to do less conservative memory dependency checking. This change doesn't enable the MemorySSA enhanced EarlyCSE in the default pipelines, so should be NFC. Reviewers: dberlin, sanjoy, reames, majnemer Subscribers: mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D19821 llvm-svn: 280279	2016-08-31 19:24:10 +00:00
Quentin Colombet	bd850f4185	Actually check for the diagnostic to be emitted! This makes the test case in r280273 actually useful! llvm-svn: 280276	2016-08-31 18:53:32 +00:00
Krzysztof Parzyszek	e21a0b3b9f	[Hexagon] Deal with undefs when extending live intervals llvm-svn: 280275	2016-08-31 18:52:09 +00:00
Tom Stellard	ba5730884b	AMDGPU/SI: Make sure llvm.amdgcn.implicitarg.ptr() is at least 4-byte aligned Summary: This fixes some OpenCV tests that were broken by libclc commit r276443. Reviewers: arsenm, jvesely Subscribers: arsenm, wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D24051 llvm-svn: 280274	2016-08-31 18:46:07 +00:00
Quentin Colombet	1c06a73a7c	[TargetPassConfig] Add a hook to tell whether GlobalISel should warm on fallback. Thanks to this patch, we know have a way to easly see if GlobalISel failed. llvm-svn: 280273	2016-08-31 18:43:04 +00:00
Kevin Enderby	9d0c945ad6	Next set of additional error checks for invalid Mach-O files for bad load commands that use the Mach::linkedit_data_command type for the load commands that are currently used in the MachOObjectFile constructor. This contains the missing checks for LC_DATA_IN_CODE and LC_LINKER_OPTIMIZATION_HINT load commands and the fields for the Mach::linkedit_data_command type. Checking for other load commands that use this type will be added later. Also fixed a couple of places that was using sizeof(MachOObjectFile::LoadCommandInfo) that should have been using sizeof(MachO::load_command). llvm-svn: 280267	2016-08-31 17:57:46 +00:00
Geoff Berry	64f5ed172a	[EarlyCSE] Allow forwarding a non-invariant load into an invariant load. Reviewers: sanjoy Subscribers: mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D23935 llvm-svn: 280265	2016-08-31 17:45:31 +00:00
Teresa Johnson	8068cc68f7	[LTO] Fix common test to reflect r279911 and move to X86 subdirectory Adjust the test to reflect the changes to common handling in r279911. This test wasn't running due to an incorrect REQUIRES and thus missed being modified for r279911 before. It was changed to XFAIL when the bad REQUIRES was discovered. Remove the XFAIL and move to a new X86 subdirectory that will properly disable on non-X86. llvm-svn: 280256	2016-08-31 16:15:39 +00:00
Reid Kleckner	9dac47319d	[codeview] Emit vtable shape information The shape of the vtable is passed down as the size of the __vtbl_ptr_type. This special pointer type appears both as the pointee type of the vptr type, and by itself in every dynamic class. For classes with multiple vtables, only the shape of the primary vftable is included, as the shape of all secondary vftables will be the same as in the base class. Fixes PR28150 llvm-svn: 280254	2016-08-31 15:59:30 +00:00
Philip Reames	2b1084ac93	[statepoints][experimental] Add support for live-in semantics of values in deopt bundles This is a first step towards supporting deopt value lowering and reporting entirely with the register allocator. I hope to build on this in the near future to support live-on-return semantics, but I have a use case which allows me to test and investigate code quality with just the live-in semantics so I've chosen to start there. For those curious, my use cases is our implementation of the "__llvm_deoptimize" function we bind to @llvm.deoptimize. I'm choosing not to hard code that fact in the patch and instead make it configurable via function attributes. The basic approach here is modelled on what is done for the "Live In" values on stackmaps and patchpoints. (A secondary goal here is to remove one of the last barriers to merging the pseudo instructions.) We start by adding the operands directly to the STATEPOINT SDNode. Once we've lowered to MI, we extend the remat logic used by the register allocator to fold virtual register uses into StackMap::Indirect entries as needed. This does rely on the fact that the register allocator rematerializes. If it didn't along some code path, we could end up with more vregs than physical registers and fail to allocate. Today, we only fold in the register allocator. This can create some weird effects when combined with arguments passed on the stack because we don't fold them appropriately. I have an idea how to fix that, but it needs this patch in place to work on that effectively. (There's some weird interaction with the scheduler as well, more investigation needed.) My near term plan is to land this patch off-by-default, experiment in my local tree to identify any correctness issues and then start fixing codegen problems one by one as I find them. Once I have the live-in lowering fully working (both correctness and code quality), I'm hoping to move on to the live-on-return semantics. Note: I don't have any known miscompiles with this patch enabled, but I'm pretty sure I'll find at least a couple. Thus, the "experimental" tag and the fact it's off by default. Differential Revision: https://reviews.llvm.org/D24000 llvm-svn: 280250	2016-08-31 15:12:17 +00:00
Simon Pilgrim	6199b4fd49	[X86][SSE] Improve awareness of (v)cvtpd2ps implicit zeroing of upper 64-bits of xmm result Associate x86_sse2_cvtpd2ps with X86ISD::VFPROUND to avoid inserting unnecessary zeroing shuffles. Differential Revision: https://reviews.llvm.org/D23797 llvm-svn: 280249	2016-08-31 15:09:34 +00:00
Sjoerd Meijer	46b5b88387	Clang patch r280064 introduced ways to set the FP exceptions and denormal types. This is the LLVM counterpart and it adds options that map onto FP exceptions and denormal build attributes allowing better fp math library selections. Differential Revision: https://reviews.llvm.org/D24070 llvm-svn: 280246	2016-08-31 14:17:38 +00:00
Krzysztof Parzyszek	64488118bf	Fixed spill stack objects are mutable Differential Revision: https://reviews.llvm.org/D24039 llvm-svn: 280244	2016-08-31 13:52:17 +00:00
James Molloy	cacfc16109	Revert "[SimplifyCFG] Change the algorithm in SinkThenElseCodeToEnd" This reverts commit r280216 - it caused buildbot failures. llvm-svn: 280234	2016-08-31 13:16:52 +00:00
James Molloy	76c9d423a7	Revert "[SimplifyCFG] Handle tail-sinking of more than 2 incoming branches" This reverts commit r280217. r280216 caused buildbot failures - backing out the entire chain. llvm-svn: 280233	2016-08-31 13:16:45 +00:00
James Molloy	06a45483a1	Revert "[SimplifyCFG] Add a workaround to fix PR30188" This reverts commit r280219. r280216 caused buildbot failures - backing out the entire chain. llvm-svn: 280232	2016-08-31 13:16:36 +00:00
James Molloy	8a66a39cbf	Revert "[SimplifyCFG] Fix bootstrap failure after r280220" This reverts commit r280228. r280216 caused buildbot failures - backing out the entire sequence. llvm-svn: 280231	2016-08-31 13:16:30 +00:00
James Molloy	b7efa6c227	[SimplifyCFG] Fix bootstrap failure after r280220 We check that a sinking candidate is used by only one PHI node during our legality checks. However for instructions that are used by other sinking candidates our heuristic is less conservative. This can result in a candidate actually being illegal when we come to sink it because of how we sunk a predecessor. Do the used-by-only-one-PHI checks again during sinking to ensure we don't crash. llvm-svn: 280228	2016-08-31 12:33:48 +00:00
Nikolay Haustov	eba808957e	AMDGPU/SI: Handle aliases in AMDGPUAlwaysInlinePass Summary: Simply replace usage of aliases to functions with aliasee. This came up when bitcode linking to builtin library and calls to aliases not being resolved. Also made minor improvements to existing test. Reviewers: tstellarAMD, alex-t, vpykhtin Subscribers: arsenm, wdng, rampitec Differential Revision: https://reviews.llvm.org/D24023 llvm-svn: 280221	2016-08-31 11:18:33 +00:00
James Molloy	171fdac7ce	[SimplifyCFG] Add a workaround to fix PR30188 We're sinking stores, which is a good thing, but in the process creating selects for the store address operand, which SROA/Mem2Reg can't look through, which caused serious regressions. The real fix is in SROA, which I'll be looking into. llvm-svn: 280219	2016-08-31 10:46:45 +00:00
James Molloy	c53b40b509	[SimplifyCFG] Handle tail-sinking of more than 2 incoming branches This was a real restriction in the original version of SinkIfThenCodeToEnd. Now it's been rewritten, the restriction can be lifted. As part of this, we handle a very common and useful case where one of the incoming branches is actually conditional. Consider: if (a) x(1); else if (b) x(2); This produces the following CFG: [if] / \ [x(1)] [if] \| \| \ \| \| \ \| [x(2)] \| \ \| / [ end ] [end] has two unconditional predecessor arcs and one conditional. The conditional refers to the implicit empty 'else' arc. This same pattern can also be caused by an empty default block in a switch. We can't sink the call to x() down to end because no call to x() happens on the third incoming arc (assume that x() has sideeffects for the sake of argument; if something is safe to speculate we could indeed sink nevertheless but this cannot happen in the general case and causes many extra selects). We are now able to detect this case and split off the unconditional arcs to a common successor: [if] / \ [x(1)] [if] \| \| \ \| \| \ \| [x(2)] \| \ / \| [sink.split] \| \ / [ end ] Now we can sink the call to x() into %sink.split. This can cause significant code simplification in many testcases. llvm-svn: 280217	2016-08-31 10:46:33 +00:00
James Molloy	55bd04cd20	[SimplifyCFG] Change the algorithm in SinkThenElseCodeToEnd r279460 rewrote this function to be able to handle more than two incoming edges and took pains to ensure this didn't regress anything. This time we change the logic for determining if an instruction should be sunk. Previously we used a single pass greedy algorithm - sink instructions until one requires more than one PHI node or we run out of instructions to sink. This had the problem that sinking instructions that had non-identical but trivially the same operands needed extra logic so we sunk them aggressively. For example: %a = load i32* %b %d = load i32* %b %c = gep i32* %a, i32 0 %e = gep i32* %d, i32 1 Sinking %c and %e would naively require two PHI merges as %a != %d. But the loads are obviously equivalent (and maybe can't be hoisted because there is no common predecessor). This is why we implemented the fairly complex function areValuesTriviallySame(), to look through trivial differences like this. However it's just not clever enough. Instead, throw areValuesTriviallySame away, use pointer equality to check equivalence of operands and switch to a two-stage algorithm. In the "scan" stage, we look at every sinkable instruction in isolation from end of block to front. If it's sinkable, we keep track of all operands that required PHI merging. In the "sink" stage, we iteratively sink the last non-terminator in the source blocks. But when calculating how many PHIs are actually required to be inserted (to work out if we should stop or not) we remove any values that have already been sunk from the set of PHI-merges required, which allows us to be more aggressive. This turns an algorithm with potentially recursive lookahead (looking through GEPs, casts, loads and any other instruction potentially not CSE'd) to two linear scans. llvm-svn: 280216	2016-08-31 10:46:23 +00:00

1 2 3 4 5 ...

39262 Commits