llvm-project

Commit Graph

Author	SHA1	Message	Date
Adrian Prantl	8fe9fb0ae5	Revert "Invoke salvageDebugInfo from CodeGenPrepare's SinkCast()" This reverts commit 317342 while investigating bot breakage. llvm-svn: 317345	2017-11-03 18:26:36 +00:00
Adrian Prantl	58e9a0bb16	Invoke salvageDebugInfo from CodeGenPrepare's SinkCast() This preserves the debug info for the cast operation in the original location. rdar://problem/33460652 llvm-svn: 317340	2017-11-03 18:00:02 +00:00
Clement Courbet	063bed9baf	re-land [ExpandMemCmp] Split ExpandMemCmp from CodeGen into its own pass." Fix undefined references: ExpandMemCmp belongs to CodeGen/, not Scalar/. llvm-svn: 317318	2017-11-03 12:12:27 +00:00
Clement Courbet	82bade615b	Revert "[ExpandMemCmp] Split ExpandMemCmp from CodeGen into its own pass." undefined reference to `llvm::TargetPassConfig::ID' on clang-ppc64le-linux-multistage This reverts commit eea333c33fa73ad225ef28607795984829f65688. llvm-svn: 317213	2017-11-02 15:53:10 +00:00
Clement Courbet	1dc37b9c3b	[ExpandMemCmp] Split ExpandMemCmp from CodeGen into its own pass. Summary: This is mostly a noop (most of the test diffs are renamed blocks). There are a few temporary register renames (eax<->ecx) and a few blocks are shuffled around. See the discussion in PR33325 for more details. Reviewers: spatel Subscribers: mgorny Differential Revision: https://reviews.llvm.org/D39456 llvm-svn: 317211	2017-11-02 15:02:51 +00:00
Philip Reames	9c3cbeea39	[CGP] Fix crash on i96 bit multiply Issue found by llvm-isel-fuzzer on OSS fuzz, https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=3725 If anyone actually cares about > 64 bit arithmetic, there's a lot more to do in this area. There's a bunch of obviously wrong code in the same function. I don't have the time to fix all of them and am just using this to understand what the workflow for fixing fuzzer cases might look like. llvm-svn: 316967	2017-10-30 23:59:51 +00:00
Clement Courbet	b2c3eb8cf1	[CodeGen][ExpandMemcmp] Allow memcmp to expand to vector loads (2). - Targets that want to support memcmp expansions now return the list of supported load sizes. - Expansion codegen does not assume that all power-of-two load sizes smaller than the max load size are valid. For examples, this is not the case for x86(32bit)+sse2. Fixes PR34887. llvm-svn: 316905	2017-10-30 14:19:33 +00:00
Balaram Makam	32bcb5d7fb	Revert "[CGP] Merge empty case blocks if no extra moves are added." This reverts commit r316711. The domtree isn't getting updated correctly. llvm-svn: 316721	2017-10-27 00:35:18 +00:00
Balaram Makam	cddf3c5e1c	[CGP] Merge empty case blocks if no extra moves are added. Summary: Currently we skip merging when extra moves may be added in the header of switch instead of the case block, if the case block is used as an incoming block of a PHI. If all the incoming values of the PHIs are non-constants and the destination block is dominated by the switch block then extra moves are likely not added by ISel, so there is no need to skip merging in this case. Reviewers: efriedma, junbuml, davidxl, hfinkel, qcolombet Reviewed By: efriedma Subscribers: dberlin, kuhar, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D37343 llvm-svn: 316711	2017-10-26 22:34:01 +00:00
John Brawn	eb83c7554e	[CGP] In optimizeMemoryInst handle select similarly to phi This lets us optimize away selects that perform the same address computation in two different ways and is also the first step towards being able to handle selects between two different, but compatible, address computations. Differential Revision: https://reviews.llvm.org/D38242 llvm-svn: 314794	2017-10-03 13:04:15 +00:00
Sanjoy Das	0ac5ba5ade	Revert "[BypassSlowDivision] Improve our handling of divisions by constants" This reverts commit r314253. It causes a miscompile on P100 in an internal benchmark. Reverting while I investigate. llvm-svn: 314482	2017-09-29 00:54:16 +00:00
Sanjoy Das	eda7a86d42	[BypassSlowDivision] Improve our handling of divisions by constants Summary: Don't bail out on constant divisors for divisions that can be narrowed without introducing control flow . This gives us a 32 bit multiply instead of an emulated 64 bit multiply in the generated PTX assembly. Reviewers: jlebar Subscribers: jholewinski, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D38265 llvm-svn: 314253	2017-09-26 21:54:27 +00:00
Hiroshi Yamauchi	9364432cec	Unmerge GEPs to reduce register pressure on IndirectBr edges. Summary: GEP merging can sometimes increase the number of live values and register pressure across control edges and cause performance problems particularly if the increased register pressure results in spills. This change implements GEP unmerging around an IndirectBr in certain cases to mitigate the issue. This is in the CodeGenPrepare pass (after all the GEP merging has happened.) With this patch, the Python interpreter loop runs faster by ~5%. Reviewers: sanjoy, hfinkel Reviewed By: hfinkel Subscribers: eastig, junbuml, llvm-commits Differential Revision: https://reviews.llvm.org/D36772 llvm-svn: 312930	2017-09-11 17:52:08 +00:00
Serguei Katkov	9e5604dbe1	[CGP] Fix the rematerialization of gc.relocates If we want to substitute the relocation of derived pointer with gep of base then we must ensure that relocation of base dominates the relocation of derived pointer. Currently only check for basic block is present. However it is possible that both relocation are in the same basic block but relocation of derived pointer is defined earlier. The patch moves the relocation of base pointer right before relocation of derived pointer in this case. Reviewers: sanjoy,artagnon,igor-laevsky,reames Reviewed By: reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D36462 llvm-svn: 311067	2017-08-17 05:48:30 +00:00
Sanjay Patel	4dbdd470a8	[CGP] use narrower types in memcmp expansion when possible This only affects very small memcmp on x86 for now, but it will become more important if we allow vector-sized load and compares. llvm-svn: 309711	2017-08-01 17:24:54 +00:00
Sanjay Patel	fea731a4aa	[CGP] use subtract or subtract-of-cmps for result of memcmp expansion As noted in the code comment, transforming this in the other direction might require a separate transform here in CGP given the block-at-a-time DAG constraint. Besides that theoretical motivation, there are 2 practical motivations for the subtract-of-cmps form: 1. The codegen for both x86 and PPC is better for this IR (though PPC could be better still). There is discussion about canonicalizing IR to the select form ( http://lists.llvm.org/pipermail/llvm-dev/2017-July/114885.html ), so we probably need to add DAG transforms for those patterns anyway, but this improves the memcmp output without waiting for that step. 2. If we allow vector-sized chunks for the load and compare, x86 is better prepared to convert that to optimal code when using subtract-of-cmps, so another prerequisite patch is avoided if we choose to enable that. Differential Revision: https://reviews.llvm.org/D34904 llvm-svn: 309597	2017-07-31 18:08:24 +00:00
Simon Pilgrim	18b97f78fe	[X86][CGP] Reduce memcmp() expansion to 2 load pairs (PR33914) D35067/rL308322 attempted to support up to 4 load pairs for memcmp inlining which resulted in regressions for some optimized libc memcmp implementations (PR33914). Until we can match these more optimal cases, this patch reduces the memcmp expansion to a maximum of 2 load pairs (which matches what we do for -Os). This patch should be considered for the 5.0.0 release branch as well Differential Revision: https://reviews.llvm.org/D35830 llvm-svn: 308986	2017-07-25 17:04:37 +00:00
Serguei Katkov	4ea855ebe5	[CGP] Allow cycles during Phi traversal in OptimizaMemoryInst Allowing cycles in Phi traversal increases the scope of optimize memory instruction in case we are in loop. The added test shows an example of enabling optimization inside a loop. Reviewers: loladiro, spatel, efriedma Reviewed By: efriedma Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D35294 llvm-svn: 308419	2017-07-19 04:49:17 +00:00
Simon Pilgrim	483927aefb	[x86, CGP] increase memcmp() expansion up to 4 load pairs It should be a win to avoid going out to the system lib for all small memcmp() calls using scalar ops. For x86 32-bit, this means most everything up to 16 bytes. For 64-bit, that doubles because we can do 8-byte loads. Notes: Reduced from 4 to 2 loads for -Os behavior, which might not be optimal in all cases. It's effectively a question of how much do we trust the system implementation. Linux and macOS (and Windows I assume, but did not test) have optimized memcmp() code for x86, so it's probably not bad either way? PPC is using 8/4 for defaults on these. We do not expand at all for -Oz. There are still potential improvements to make for the CGP expansion IR and/or lowering such as avoiding select-of-constants (D34904) and not doing zexts to the max load type before doing a compare. We have special-case SSE/AVX codegen for (memcmp(x, y, 16/32) == 0) that will no longer be produced after this patch. I've shown the experimental justification for that change in PR33329: https://bugs.llvm.org/show_bug.cgi?id=33329#c12 TLDR: While the vector code is a likely winner, we can't guarantee that it's a winner in all cases on all CPUs, so I'm willing to sacrifice it for the greater good of expanding all small memcmp(). If we want to resurrect that codegen, it can be done by adjusting the CGP params or poking a hole to let those fall-through the CGP expansion. Committed on behalf of Sanjay Patel Differential Revision: https://reviews.llvm.org/D35067 llvm-svn: 308322	2017-07-18 15:55:30 +00:00
Eli Friedman	6f7c9ad7d4	[CodeGenPrepare] Don't create dead instructions in addrmode sinking When we fail to sink an instruction, we must make sure not to modify the function; otherwise, we end up in an infinite loop because CodeGenPrepare iterates until it doesn't make any changes. Fixes https://bugs.llvm.org/show_bug.cgi?id=33608 . llvm-svn: 307866	2017-07-12 23:30:02 +00:00
George Burgess IV	6f92d2dd24	Add a test for r307754 As promised in D35003. Uses -codegenprepare instead of -instcombine since we hit the same buggy path anyway, and CGP lets us keep this test really simple (instcombine likes turning the alloca T, N into alloca [N x T], which hides the bug this is testing for). llvm-svn: 307811	2017-07-12 16:30:37 +00:00
Sanjay Patel	0c37b19331	[CGP, x86] update test checks; NFC This was auto-generated using an older version of the script, and that version does not work with phis, so if we enable expansion it will go bad. llvm-svn: 307267	2017-07-06 15:31:38 +00:00
Keno Fischer	05e4ac26a2	[CodeGenPrepare] Don't create inttoptr for ni ptrs Summary: Arguably non-integral pointers probably shouldn't show up here at all, but since the backend doesn't complain and this takes valid (according to the Verifier) IR and makes it invalid, make sure not to introduce any inttoptr instructions if we're dealing with non-integral pointers. Reviewed By: sanjoy Differential Revision: https://reviews.llvm.org/D33110 llvm-svn: 306737	2017-06-29 20:28:59 +00:00
Sanjay Patel	4b23fa0abf	[CGP] add specialization for memcmp expansion with only one basic block llvm-svn: 306485	2017-06-27 23:15:01 +00:00
Sanjay Patel	70b36f193d	[CGP] eliminate a sub instruction in memcmp expansion As noted in D34071, there are some IR optimization opportunities that could be handled by normal IR passes if this expansion wasn't happening so late in CGP. Regardless of that, it seems wasteful to knowingly produce suboptimal IR here, so I'm proposing this change: %s = sub i32 %x, %y %r = icmp ne %s, 0 => %r = icmp ne %x, %y Changing the predicate to 'eq' mimics what InstCombine would do, so that's just an efficiency improvement if we decide this expansion should happen sooner. The fact that the PowerPC backend doesn't eliminate the 'subf.' might be something for PPC folks to investigate separately. Differential Revision: https://reviews.llvm.org/D34416 llvm-svn: 306471	2017-06-27 21:46:34 +00:00
Sanjay Patel	deed579140	[x86] set the datalayout to match the RUN line triple; NFC I don't think there's any visible difference from having the wrong layout for the 32-bit case at this point, but that could change in the future. llvm-svn: 305931	2017-06-21 17:06:24 +00:00
Sanjay Patel	0656629b87	[x86] enable CGP memcmp() expansion for 2/4/8 byte sizes There are a couple of potential improvements as seen in the IR and asm: 1. We're unnecessarily extending to a larger type to compare values. 2. The codegen for (select cond, 1, -1) could avoid a cmov. (or we could change the order of the compares, so we have a select with 0 operand) llvm-svn: 305802	2017-06-20 15:58:30 +00:00
Sanjay Patel	bc6d8b0f51	[CGP, x86] add tests for potential memcmp expansion; NFC No IR tests were added with rL304313 ( https://reviews.llvm.org/D28637 ), so I want these for extra coverage if we enable memcmp expansion for x86. As shown, nothing is expanded for x86 in CGP yet. Also fundamentally, we're doing an IR transform, so we should have IR tests for just that part. If something goes wrong, we need to know if the bug is in CGP or later lowering. llvm-svn: 305011	2017-06-08 20:40:39 +00:00
Teresa Johnson	2a6b7991d4	Restrict call metadata based hotness detection to Sample PGO mode Summary: Don't use the metadata on call instructions for determining hotness unless we are in sample PGO mode, where it is needed because profile counts are not accurate. In instrumentation mode this is not necessary and does more harm than good when calls have VP metadata that hasn't been properly scaled after transformations or dropped after constant prop based devirtualization (both should be fixed, but we don't need to do this in the first place for instrumentation PGO). This required adjusting a number of tests to distinguish between sample and instrumentation PGO handling, and to add in profile summary metadata so that getProfileCount can get the summary. Reviewers: davidxl, danielcdh Subscribers: aemerson, rengolin, mehdi_amini, Prazek, llvm-commits Differential Revision: https://reviews.llvm.org/D32877 llvm-svn: 302844	2017-05-11 23:18:05 +00:00
Teresa Johnson	720d9b4111	Fix code section prefix for proper layout Summary: r284533 added hot and cold section prefixes based on profile information, to enable grouping of hot/cold functions at link time. However, it used "cold" as the prefix for cold sections, but gold only recognizes "unlikely" (which is used by gcc for cold sections). Therefore, cold sections were not properly being grouped. Switch to using "unlikely" Reviewers: danielcdh, davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D32983 llvm-svn: 302502	2017-05-09 01:43:24 +00:00
Brendon Cahoon	7769a0854e	[CodeGenPrepare] Fix crash due to an invalid CFG The splitIndirectCriticalEdges function generates and invalid CFG when the 'Target' basic block is a loop to itself. When this occurs, the code that updates the predecessor terminator needs to update the terminator in the split basic block. This occurs when there is an edge from block D back to D. Since D is split in to D0 and D1, the code needs to update the terminator in D1. But D1 is not in the OtherPreds vector, so it was not getting updated. Differential Revision: https://reviews.llvm.org/D32126 llvm-svn: 300480	2017-04-17 19:11:04 +00:00
Matt Arsenault	f10061ec70	Add address space mangling to lifetime intrinsics In preparation for allowing allocas to have non-0 addrspace. llvm-svn: 299876	2017-04-10 20:18:21 +00:00
Eli Friedman	5fba1e53f2	Turn on -addr-sink-using-gep by default. The new codepath has been in the tree for years, and there isn't any reason to use two codepaths here. Differential Revision: https://reviews.llvm.org/D30596 llvm-svn: 299723	2017-04-06 22:42:18 +00:00
Nikolai Bozhenov	fca527af5c	[BypassSlowDivision] Do not bypass division of hash-like values Disable bypassing if one of the operands looks like a hash value. Slow division often occurs in hashtable implementations and fast division is never taken there because a hash value is extremely unlikely to have enough upper bits set to zero. A value is considered to be hash-like if it is produced by 1) XOR operation 2) Multiplication by a constant wider than the shorter type 3) PHI node with all incoming values being hash-like Differential Revision: https://reviews.llvm.org/D28200 llvm-svn: 299329	2017-04-02 13:14:30 +00:00
Dehao Chen	775341a14c	Use isFunctionHotInCallGraph to set the function section prefix. Summary: The current prefix based function layout algorithm only looks at function's entry count, which is not sufficient. A function should be grouped together if its entry count or any call edge count is hot. Reviewers: davidxl, eraman Reviewed By: eraman Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D31225 llvm-svn: 298656	2017-03-23 23:14:11 +00:00
Matt Arsenault	3dbeefa978	AMDGPU: Mark all unspecified CC functions in tests as amdgpu_kernel Currently the default C calling convention functions are treated the same as compute kernels. Make this explicit so the default calling convention can be changed to a non-kernel. Converted with perl -pi -e 's/define void/define amdgpu_kernel void/' on the relevant test directories (and undoing in one place that actually wanted a non-kernel). llvm-svn: 298444	2017-03-21 21:39:51 +00:00
George Burgess IV	56c7e88c2c	Let llvm.objectsize be conservative with null pointers This adds a parameter to @llvm.objectsize that makes it return conservative values if it's given null. This fixes PR23277. Differential Revision: https://reviews.llvm.org/D28494 llvm-svn: 298430	2017-03-21 20:08:59 +00:00
Sanjay Patel	caf369bd03	[SimplifyCFG] move tests for PR31028 from CGP Hopefully, this will make sense with a forthcoming patch. If not, we can move these back. llvm-svn: 297660	2017-03-13 19:59:14 +00:00
Sanjay Patel	6023a2501c	[CGP] add tests for PR31028; NFC llvm-svn: 297629	2017-03-13 15:45:37 +00:00
Nikolai Bozhenov	4a04fb9e90	[BypassSlowDivision] Use ValueTracking to simplify run-time checks ValueTracking is used for more thorough analysis of operands. Based on the analysis, either run-time checks can be simplified (e.g. check only one operand instead of two) or the transformation can be avoided. For example, it is quite often the case that a divisor is promoted from a shorter type and run-time checks for it are redundant. With additional compile-time analysis of values, two special cases naturally arise and are addressed by the patch: 1) Both operands are known to be short enough. Then, the long division can be simply replaced with a short one without CFG modification. 2) If a division is unsigned and the dividend is known to be short then the long division is not needed at all. Because if the divisor is too big for short division then the quotient is obviously zero (and the remainder is equal to the dividend). Actually, the division is not needed when (divisor > dividend). Differential Revision: https://reviews.llvm.org/D29897 llvm-svn: 296832	2017-03-02 22:12:15 +00:00
Michael Kuperstein	13bf8a2684	[CGP] Split some critical edges coming out of indirect branches Splitting critical edges when one of the source edges is an indirectbr is hard in general (because it requires changing the memory the indirectbr reads). But if a block only has a single indirectbr predecessor (which is the common case), we can simulate splitting that edge by splitting the destination block, and retargeting the direct branches. This is motivated by the use of computed gotos in python 2.7: PyEval_EvalFrame() ends up using an indirect branch with ~100 successors, and passing a constant to each of those. Since MachineSink can't break indirect critical edges on demand (and doing this in MIR doesn't look feasible), this causes us to emit about ~100 defs of registers containing constants, which we in the predecessor block, where only one of those constants is used in each successor. So, at each computed goto, we needlessly spill about a 100 constants to stack. The end result is that a clang-compiled python interpreter can be about ~2.5x slower on a simple python reduction loop than a gcc-compiled interpreter. Differential Revision: https://reviews.llvm.org/D29916 llvm-svn: 296416	2017-02-28 00:11:34 +00:00
Daniel Jasper	3ca4525612	Revert "[CGP] Split some critical edges coming out of indirect branches" This reverts commit r296149 as it leads to crashes when compiling for PPC. llvm-svn: 296295	2017-02-26 11:09:12 +00:00
Eli Friedman	c12a5a7595	[CodeGenPrepare] Make -addr-sink-using-gep work with address spaces. When we construct addressing modes, we use isNoopAddrSpaceCast to ignore addrspacecast instructions. Make sure we insert the correct addrspacecast when we reconstruct the addressing mode. Differential Revision: https://reviews.llvm.org/D30114 llvm-svn: 296167	2017-02-24 20:51:36 +00:00
Michael Kuperstein	46b131e3f8	[CGP] Split some critical edges coming out of indirect branches Splitting critical edges when one of the source edges is an indirectbr is hard in general (because it requires changing the memory the indirectbr reads). But if a block only has a single indirectbr predecessor (which is the common case), we can simulate splitting that edge by splitting the destination block, and retargeting the direct branches. This is motivated by the use of computed gotos in python 2.7: PyEval_EvalFrame() ends up using an indirect branch with ~100 successors, and passing a constant to each of those. Since MachineSink can't break indirect critical edges on demand (and doing this in MIR doesn't look feasible), this causes us to emit about ~100 defs of registers containing constants, which we in the predecessor block, where only one of those constants is used in each successor. So, at each computed goto, we needlessly spill about a 100 constants to stack. The end result is that a clang-compiled python interpreter can be about ~2.5x slower on a simple python reduction loop than a gcc-compiled interpreter. Differential Revision: https://reviews.llvm.org/D29916 llvm-svn: 296149	2017-02-24 18:41:32 +00:00
Michael Kuperstein	581c9f4b20	Revert r269060 to pacify bots. llvm-svn: 296064	2017-02-24 01:22:19 +00:00
Michael Kuperstein	12e79d5002	[CGP] Split some critical edges coming out of indirect branches Splitting critical edges when one of the source edges is an indirectbr is hard in general (because it requires changing the memory the indirectbr reads). But if a block only has a single indirectbr predecessor (which is the common case), we can simulate splitting that edge by splitting the destination block, and retargeting the direct branches. This is motivated by the use of computed gotos in python 2.7: PyEval_EvalFrame() ends up using an indirect branch with ~100 successors, and passing a constant to each of those. Since MachineSink can't break indirect critical edges on demand (and doing this in MIR doesn't look feasible), this causes us to emit about ~100 defs of registers containing constants, which we in the predecessor block, where only one of those constants is used in each successor. So, at each computed goto, we needlessly spill about a 100 constants to stack. The end result is that a clang-compiled python interpreter can be about ~2.5x slower on a simple python reduction loop than a gcc-compiled interpreter. Differential Revision: https://reviews.llvm.org/D29916 llvm-svn: 296060	2017-02-24 00:56:21 +00:00
George Burgess IV	3f08914e7e	[Analysis] Centralize objectsize lowering logic. We're currently doing nearly the same thing for @llvm.objectsize in three different places: two of them are missing checks for overflow, and one of them could subtly break if InstCombine gets much smarter about removing alloc sites. Seems like a good idea to not do that. llvm-svn: 290214	2016-12-20 23:46:36 +00:00
Jun Bum Lim	90b6b5074a	[CodeGenPrep] Skip merging empty case blocks This is recommit of r287553 after fixing the invalid loop info after eliminating an empty block and unit test failures in AVR and WebAssembly : Summary: Merging an empty case block into the header block of switch could cause ISel to add COPY instructions in the header of switch, instead of the case block, if the case block is used as an incoming block of a PHI. This could potentially increase dynamic instructions, especially when the switch is in a loop. I added a test case which was reduced from the benchmark I was targetting. Reviewers: t.p.northover, mcrosier, manmanren, wmi, joerg, davidxl Subscribers: joerg, qcolombet, danielcdh, hfinkel, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D22696 llvm-svn: 289988	2016-12-16 20:38:39 +00:00
Sanjoy Das	089c699743	Fix CodeGenPrepare::stripInvariantGroupMetadata `dropUnknownNonDebugMetadata` takes a list of "known" metadata IDs. The only reason it worked at all is that `getMetadataID` returns something unrelated -- it returns the subclass ID of the receiver (which is used in `dyn_cast` etc.). That does not numerically match `LLVMContext::MD_invariant_group` and ends up dropping `invariant_group` along with every other metadata that does not numerically match `LLVMContext::MD_invariant_group`. llvm-svn: 289973	2016-12-16 18:52:33 +00:00
Jun Bum Lim	f9416af191	Revert "[CodeGenPrep] Skip merging empty case blocks" This reverts commit r289951. llvm-svn: 289960	2016-12-16 17:06:14 +00:00
Jun Bum Lim	85347dde27	[CodeGenPrep] Skip merging empty case blocks This is recommit of r287553 after fixing the invalid loop info after eliminating an empty block: Summary: Merging an empty case block into the header block of switch could cause ISel to add COPY instructions in the header of switch, instead of the case block, if the case block is used as an incoming block of a PHI. This could potentially increase dynamic instructions, especially when the switch is in a loop. I added a test case which was reduced from the benchmark I was targetting. Reviewers: t.p.northover, mcrosier, manmanren, wmi, joerg, davidxl Subscribers: joerg, qcolombet, danielcdh, hfinkel, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D22696 llvm-svn: 289951	2016-12-16 16:03:31 +00:00
Matt Arsenault	d4da0edd98	AMDGPU: Implement isCheapAddrSpaceCast llvm-svn: 288523	2016-12-02 18:12:53 +00:00
Joerg Sonnenberger	caaa82d90d	Revert r287553: [CodeGenPrep] Skip merging empty case blocks It results in assertions in lib/Analysis/BlockFrequencyInfoImpl.cpp line 670 ("Expected irreducible CFG"). llvm-svn: 288052	2016-11-28 18:56:54 +00:00
Justin Lebar	3e50a5be8f	[CodeGenPrepare] Don't sink non-cheap addrspacecasts. Summary: Previously, CGP would unconditionally sink addrspacecast instructions, even going so far as to sink them into a loop. Now we check that the cast is "cheap", as defined by TLI. We introduce a new "is-cheap" function to TLI rather than using isNopAddrSpaceCast because some GPU platforms want the ability to ask for non-nop casts to be sunk. Reviewers: arsenm, tra Subscribers: jholewinski, wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D26923 llvm-svn: 287591	2016-11-21 22:49:15 +00:00
Jun Bum Lim	82f55c5446	[CodeGenPrep] Skip merging empty case blocks Summary: Merging an empty case block into the header block of switch could cause ISel to add COPY instructions in the header of switch, instead of the case block, if the case block is used as an incoming block of a PHI. This could potentially increase dynamic instructions, especially when the switch is in a loop. I added a test case which was reduced from the benchmark I was targetting. Reviewers: t.p.northover, mcrosier, manmanren, wmi, davidxl Subscribers: qcolombet, danielcdh, hfinkel, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D22696 llvm-svn: 287553	2016-11-21 16:47:28 +00:00
Justin Lebar	2860573529	[BypassSlowDivision] Handle division by constant numerators better. Summary: We don't do BypassSlowDivision when the denominator is a constant, but we do do it when the numerator is a constant. This patch makes two related changes to BypassSlowDivision when the numerator is a constant: * If the numerator is too large to fit into the bypass width, don't bypass slow division (because we'll never run the smaller-width code). * If we bypass slow division where the numerator is a constant, don't OR together the numerator and denominator when determining whether both operands fit within the bypass width. We need to check only the denominator. Reviewers: tra Subscribers: llvm-commits, jholewinski Differential Revision: https://reviews.llvm.org/D26699 llvm-svn: 287062	2016-11-16 00:44:47 +00:00
Justin Lebar	1535a5e9df	Add missing lit.local.cfg to llvm/test/Transforms/CodeGenPrepare/NVPTX. llvm-svn: 285464	2016-10-28 21:56:07 +00:00
Justin Lebar	0ede5fb1bb	Don't leave unused divs/rems sitting around in BypassSlowDivision. Summary: This "pass" eagerly creates div and rem instructions even when only one is needed -- it relies on a later pass (machine DCE?) to clean them up. This is problematic not just from a cleanliness perspective (this pass is running during CodeGenPrepare, so should leave the IR in a better state), but it also creates a problem for instruction selection. If we always have a div+rem, isel will always select a divrem instruction (if possible), even when a single div or rem would do. Specifically, in NVPTX, we want to compute rem from the output of div, if available. But if a div is not available, we want to leave the rem alone. This transformation is overeager if div is always available. Because this code runs as part of CodeGenPrepare, it's nontrivial to write a test for this change. But this will effectively be tested by a later patch which adds the aforementioned change to NVPTX isel. Reviewers: tra Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26088 llvm-svn: 285460	2016-10-28 21:43:54 +00:00
Justin Lebar	468bf73209	Don't claim the udiv created in BypassSlowDivision is exact. Summary: In BypassSlowDivision's short-dividend path, we would create e.g. udiv exact i32 %a, %b "exact" here means that we are asserting that %a is a multiple of %b. But we have no reason to believe this must be true -- this is just a bug, as far as I can tell. Reviewers: tra Subscribers: jholewinski, llvm-commits Differential Revision: https://reviews.llvm.org/D26097 llvm-svn: 285459	2016-10-28 21:43:51 +00:00
Dehao Chen	4b41571d24	Update the section.ll to fix non-x86 failure. llvm-svn: 284566	2016-10-19 03:53:41 +00:00
Dehao Chen	95fc43143d	Revert r284545 again as the regression in ppc still exists. There is bug in MBPI exposed by th patch. Also update the section.ll to fix non-x86 failure. llvm-svn: 284563	2016-10-19 01:18:25 +00:00
Dehao Chen	83033e0b9a	Add target for test to fix regression introduced by r284533. llvm-svn: 284538	2016-10-18 21:13:31 +00:00
Dehao Chen	302b69c940	Use profile info to set function section prefix to group hot/cold functions. Summary: The original implementation is in r261607, which was reverted in r269726 to accomendate the ProfileSummaryInfo analysis pass. The new implementation: 1. add a new metadata for function section prefix 2. query against ProfileSummaryInfo in CGP to set the correct section prefix for each function 3. output the section prefix set by CGP Reviewers: davidxl, eraman Subscribers: vsk, llvm-commits Differential Revision: https://reviews.llvm.org/D24989 llvm-svn: 284533	2016-10-18 20:42:47 +00:00
David Majnemer	0c80e2eac6	[CodeGenPrepare] Don't sink a cast past its user The sink cast machinery is supposed to sink casts as close to their user as possible. However, an EH pad is the first instruction in it's basic block. Don't sink if the user is an EH pad. This fixes PR27536. llvm-svn: 267767	2016-04-27 19:36:38 +00:00
Sanjay Patel	a31b0c0ece	[CodeGenPrepare] don't convert an unpredictable select into control flow Suggested in the review of D19488: http://reviews.llvm.org/D19488 llvm-svn: 267504	2016-04-26 00:47:39 +00:00
Adrian Prantl	75819aedf6	[PR27284] Reverse the ownership between DICompileUnit and DISubprogram. Currently each Function points to a DISubprogram and DISubprogram has a scope field. For member functions the scope is a DICompositeType. DIScopes point to the DICompileUnit to facilitate type uniquing. Distinct DISubprograms (with isDefinition: true) are not part of the type hierarchy and cannot be uniqued. This change removes the subprograms list from DICompileUnit and instead adds a pointer to the owning compile unit to distinct DISubprograms. This would make it easy for ThinLTO to strip unneeded DISubprograms and their transitively referenced debug info. Motivation ---------- Materializing DISubprograms is currently the most expensive operation when doing a ThinLTO build of clang. We want the DISubprogram to be stored in a separate Bitcode block (or the same block as the function body) so we can avoid having to expensively deserialize all DISubprograms together with the global metadata. If a function has been inlined into another subprogram we need to store a reference the block containing the inlined subprogram. Attached to https://llvm.org/bugs/show_bug.cgi?id=27284 is a python script that updates LLVM IR testcases to the new format. http://reviews.llvm.org/D19034 <rdar://problem/25256815> llvm-svn: 266446	2016-04-15 15:57:41 +00:00
Petar Jovanovic	644b8c1a5d	Calculate __builtin_object_size when pointer depends on a condition This patch fixes calculating of builtin_object_size if it depends on a condition. Before this patch compiler did not know how to calculate the object size when it finds a condition that cannot be eliminated. This patch enables calculating of builtin_object_size even in case when condition cannot be eliminated by choosing minimum or maximum value as a result from condition. Choosing minimum or maximum value from condition is based on the second argument of __builtin_object_size function. Patch by Strahinja Petrovic. Differential Revision: http://reviews.llvm.org/D18438 llvm-svn: 266193	2016-04-13 12:25:25 +00:00
Peter Zotov	0b6d7bc682	[CodeGenPrepare] Avoid sinking soft-FP comparisons Sinking comparisons in CGP can undo the job of hoisting them done earlier by LICM, and soft-FP makes this an expensive mistake. A common pattern that produces floating point comparisons uniform over a loop is an explicit check for division by zero. If the divisor is hoisted out of the loop, the comparison can also be, but hoisting the function that unwinds is never legal, since it may cause side effects in the loop body prior to the unwinding to not be executed. Differential Revision: http://reviews.llvm.org/D18744 llvm-svn: 265264	2016-04-03 16:36:17 +00:00
Adrian Prantl	b8089516a5	testcase gardening: update the emissionKind enum to the new syntax. (NFC) llvm-svn: 265081	2016-04-01 00:16:49 +00:00
George Burgess IV	d4febd1612	Keep CodeGenPrepare from preserving the domtree. CGP modifies the domtree in some cases, so saying that it preserves the domtree is a lie. We'll be able to selectively preserve it with the new pass manager. Differential Revision: http://reviews.llvm.org/D16893 llvm-svn: 264099	2016-03-22 21:25:08 +00:00
Philip Reames	ac115ed72f	[CGP] Duplicate addressing computation in cold paths if required to sink addressing mode This patch teaches CGP to duplicate addressing mode computations into cold paths (detected via explicit cold attribute on calls) if required to let addressing mode be safely sunk into the basic block containing each load and store. In general, duplicating code into cold blocks may result in code growth, but should not effect performance. In this case, it's better to duplicate some code than to put extra pressure on the register allocator by making it keep the address through the entirely of the fast path. This patch only handles addressing computations, but in principal, we could implement a more general cold cold scheduling heuristic which tries to reduce register pressure in the fast path by duplicating code into the cold path. Getting the profitability of the general case right seemed likely to be challenging, so I stuck to the existing case (addressing computation) we already had. Differential Revision: http://reviews.llvm.org/D17652 llvm-svn: 263074	2016-03-09 23:13:12 +00:00
Junmo Park	161dc1c605	[CodeGenPrepare] Remove load-based heuristic Summary: Both the hardware and LLVM have changed since 2012. Now, load-based heuristic don't show big differences any more on OoO cores. There is no notable regressons and improvements on spec2000/2006. (Cortex-A57, Core i5). Reviewers: spatel, zansari Differential Revision: http://reviews.llvm.org/D16836 llvm-svn: 261809	2016-02-25 00:23:27 +00:00
Matt Arsenault	9c47dd583a	AMDGPU: Remove some old intrinsic uses from tests llvm-svn: 260493	2016-02-11 06:02:01 +00:00
James Molloy	f01488e2bc	[InstCombine] Rewrite bswap/bitreverse handling completely. There are several requirements that ended up with this design; 1. Matching bitreversals is too heavyweight for InstCombine and doesn't really need to be done so early. 2. Bitreversals and byteswaps are very related in their matching logic. 3. We want to implement support for matching more advanced bswap/bitreverse patterns like partial bswaps/bitreverses. 4. Bswaps are best matched early in InstCombine. The result of these is that a new utility function is created in Transforms/Utils/Local.h that can be configured to search for bswaps, bitreverses or both. InstCombine uses it to find only bswaps, CGP uses it to find only bitreversals. We can then extend the matching logic in one place only. llvm-svn: 257875	2016-01-15 09:20:19 +00:00
Keno Fischer	81e2e9ef86	Reapply r257105 "[Verifier] Check that debug values have proper size" I originally reapplied this in 257550, but had to revert again due to bot breakage. The only change in this version is to allow either the TypeSize or the TypeAllocSize of the variable to be the one represented in debug info (hopefully in the future we can figure out how to encode the difference). Additionally, several bot failures following r257550, were due to optimizer bugs now fixed in r257787 and r257795. r257550 commit message was: ``` The follow extra changes were made to test cases: Manually making the variable be the actual type instead of a pointer to avoid pointer-size differences in generic code: LLVM :: DebugInfo/Generic/2010-03-24-MemberFn.ll LLVM :: DebugInfo/Generic/2010-04-06-NestedFnDbgInfo.ll LLVM :: DebugInfo/Generic/2010-05-03-DisableFramePtr.ll LLVM :: DebugInfo/Generic/varargs.ll Delete sizing information from debug info for the same reason (but the presence of the pointer was important to the test case): LLVM :: DebugInfo/Generic/restrict.ll LLVM :: DebugInfo/Generic/tu-composite.ll LLVM :: Linker/type-unique-type-array-a.ll LLVM :: Linker/type-unique-simple2.ll Fixing an incorrect DW_OP_deref LLVM :: DebugInfo/Generic/2010-05-03-OriginDIE.ll Fixing a missing DW_OP_deref LLVM :: DebugInfo/Generic/incorrect-variable-debugloc.ll Additionally, clang should no longer complain during bootstrap should no longer happen after r257534. The original commit message was: `` Summary: Teach the Verifier to make sure that the storage size given to llvm.dbg.declare or the value size given to llvm.dbg.value agree with what is declared in DebugInfo. This is implicitly assumed in a number of passes (e.g. in SROA). Additionally this catches a number of common mistakes, such as passing a pointer when a value was intended or vice versa. One complication comes from stack coloring which modifies the original IR when it merges allocas in order to make sure that if AA falls back to the IR it gets the correct result. However, given this new invariant, indiscriminately replacing one alloca by a different (differently sized one) is no longer valid. Fix this by just undefing out any use of the alloca in a dbg.declare in this case. Additionally, I had to fix a number of test cases. Of particular note: - I regenerated dbg-changes-codegen-branch-folding.ll from the given source as it was affected by the bug fixed in r256077 - two-cus-from-same-file.ll was changed to avoid having a variable-typed debug variable as that would depend on the target, even though this test is supposed to be generic - I had to manually declared size/align for reference type. See also the discussion for D14275/r253186. - fpstack-debuginstr-kill.ll required changing `double` to `long double` - most others were just a question of adding OP_deref `` ``` llvm-svn: 257850	2016-01-15 00:46:17 +00:00
Keno Fischer	78e5c9e6e2	Re-Revert r257105 (Verifier debug info changes) While I investigate some new buildbot failures. This was originally reapplied as r257550 and r257558. llvm-svn: 257563	2016-01-13 02:31:14 +00:00
Keno Fischer	25916079ff	Reapply r257105 "[Verifier] Check that debug values have proper size" The follow extra changes were made to test cases: Manually making the variable be the actual type instead of a pointer to avoid pointer-size differences in generic code: LLVM :: DebugInfo/Generic/2010-03-24-MemberFn.ll LLVM :: DebugInfo/Generic/2010-04-06-NestedFnDbgInfo.ll LLVM :: DebugInfo/Generic/2010-05-03-DisableFramePtr.ll LLVM :: DebugInfo/Generic/varargs.ll Delete sizing information from debug info for the same reason (but the presence of the pointer was important to the test case): LLVM :: DebugInfo/Generic/restrict.ll LLVM :: DebugInfo/Generic/tu-composite.ll LLVM :: Linker/type-unique-type-array-a.ll LLVM :: Linker/type-unique-simple2.ll Fixing an incorrect DW_OP_deref LLVM :: DebugInfo/Generic/2010-05-03-OriginDIE.ll Fixing a missing DW_OP_deref LLVM :: DebugInfo/Generic/incorrect-variable-debugloc.ll Additionally, clang should no longer complain during bootstrap should no longer happen after r257534. The original commit message was: ``` Summary: Teach the Verifier to make sure that the storage size given to llvm.dbg.declare or the value size given to llvm.dbg.value agree with what is declared in DebugInfo. This is implicitly assumed in a number of passes (e.g. in SROA). Additionally this catches a number of common mistakes, such as passing a pointer when a value was intended or vice versa. One complication comes from stack coloring which modifies the original IR when it merges allocas in order to make sure that if AA falls back to the IR it gets the correct result. However, given this new invariant, indiscriminately replacing one alloca by a different (differently sized one) is no longer valid. Fix this by just undefing out any use of the alloca in a dbg.declare in this case. Additionally, I had to fix a number of test cases. Of particular note: - I regenerated dbg-changes-codegen-branch-folding.ll from the given source as it was affected by the bug fixed in r256077 - two-cus-from-same-file.ll was changed to avoid having a variable-typed debug variable as that would depend on the target, even though this test is supposed to be generic - I had to manually declared size/align for reference type. See also the discussion for D14275/r253186. - fpstack-debuginstr-kill.ll required changing `double` to `long double` - most others were just a question of adding OP_deref ``` llvm-svn: 257550	2016-01-13 00:31:44 +00:00
Keno Fischer	ea33a25816	Temporarily revert r257105 "[Verifier] Check that debug values have proper size" Looks like there's a case where clang generates debug info that triggers the new verifier check. Reverting while investigating. llvm-svn: 257107	2016-01-07 22:39:11 +00:00
Keno Fischer	b3326be6ad	[Verifier] Check that debug values have proper size Summary: Teach the Verifier to make sure that the storage size given to llvm.dbg.declare or the value size given to llvm.dbg.value agree with what is declared in DebugInfo. This is implicitly assumed in a number of passes (e.g. in SROA). Additionally this catches a number of common mistakes, such as passing a pointer when a value was intended or vice versa. One complication comes from stack coloring which modifies the original IR when it merges allocas in order to make sure that if AA falls back to the IR it gets the correct result. However, given this new invariant, indiscriminately replacing one alloca by a different (differently sized one) is no longer valid. Fix this by just undefing out any use of the alloca in a dbg.declare in this case. Additionally, I had to fix a number of test cases. Of particular note: - I regenerated dbg-changes-codegen-branch-folding.ll from the given source as it was affected by the bug fixed in r256077 - two-cus-from-same-file.ll was changed to avoid having a variable-typed debug variable as that would depend on the target, even though this test is supposed to be generic - I had to manually declared size/align for reference type. See also the discussion for D14275/r253186. - fpstack-debuginstr-kill.ll required changing `double` to `long double` - most others were just a question of adding OP_deref Reviewers: aprantl Differential Revision: http://reviews.llvm.org/D14276 llvm-svn: 257105	2016-01-07 22:18:37 +00:00
Chen Li	d71999ef1b	[gc.statepoint] Change gc.statepoint intrinsic's return type to token type instead of i32 type Summary: This patch changes gc.statepoint intrinsic's return type to token type instead of i32 type. Using token types could prevent LLVM to merge different gc.statepoint nodes into PHI nodes and cause further problems with gc relocations. The patch also changes the way on how gc.relocate and gc.result look for their corresponding gc.statepoint on unwind path. The current implementation uses the selector value extracted from a { i8*, i32 } landingpad as a hook to find the gc.statepoint, while the patch directly uses a token type landingpad (http://reviews.llvm.org/D15405) to find the gc.statepoint. Reviewers: sanjoy, JosephTremoulet, pgavlin, igor-laevsky, mjacob Subscribers: reames, mjacob, sanjoy, llvm-commits Differential Revision: http://reviews.llvm.org/D15662 llvm-svn: 256443	2015-12-26 07:54:32 +00:00
Manuel Jacob	f4b3577d66	Remove double blanks. NFC. llvm-svn: 256100	2015-12-19 18:26:53 +00:00
David Majnemer	550654aaf1	Move catchpad-phi-cast.ll to the X86 specific subdirectory It is X86 specific and will not be properly exercised unless LLVM is built with the X86 target. llvm-svn: 255426	2015-12-12 06:21:08 +00:00
David Majnemer	8a1c45d6e8	[IR] Reformulate LLVM's EH funclet IR While we have successfully implemented a funclet-oriented EH scheme on top of LLVM IR, our scheme has some notable deficiencies: - catchendpad and cleanupendpad are necessary in the current design but they are difficult to explain to others, even to seasoned LLVM experts. - catchendpad and cleanupendpad are optimization barriers. They cannot be split and force all potentially throwing call-sites to be invokes. This has a noticable effect on the quality of our code generation. - catchpad, while similar in some aspects to invoke, is fairly awkward. It is unsplittable, starts a funclet, and has control flow to other funclets. - The nesting relationship between funclets is currently a property of control flow edges. Because of this, we are forced to carefully analyze the flow graph to see if there might potentially exist illegal nesting among funclets. While we have logic to clone funclets when they are illegally nested, it would be nicer if we had a representation which forbade them upfront. Let's clean this up a bit by doing the following: - Instead, make catchpad more like cleanuppad and landingpad: no control flow, just a bunch of simple operands; catchpad would be splittable. - Introduce catchswitch, a control flow instruction designed to model the constraints of funclet oriented EH. - Make funclet scoping explicit by having funclet instructions consume the token produced by the funclet which contains them. - Remove catchendpad and cleanupendpad. Their presence can be inferred implicitly using coloring information. N.B. The state numbering code for the CLR has been updated but the veracity of it's output cannot be spoken for. An expert should take a look to make sure the results are reasonable. Reviewers: rnk, JosephTremoulet, andrew.w.kaylor Differential Revision: http://reviews.llvm.org/D15139 llvm-svn: 255422	2015-12-12 05:38:55 +00:00
Reid Kleckner	8de1fe23ed	[CGP] Reimplement r255055 a different way llvm-svn: 255070	2015-12-08 23:00:03 +00:00
Reid Kleckner	e18f92bfe9	Revert "[CGP] Check that we have an insert point before moving llvm.dbg.value around" This reverts commit r255055. Breakage has been reported. llvm-svn: 255063	2015-12-08 22:33:23 +00:00
Reid Kleckner	7c005324d5	[CGP] Check that we have an insert point before moving llvm.dbg.value around llvm-svn: 255055	2015-12-08 21:50:52 +00:00
Andrew Kaylor	d0430e8580	[WinEH] Fix problem where CodeGenPrepare incorrectly sinks a bitcast into an EH pad. Differential Revision: http://reviews.llvm.org/D14842 llvm-svn: 253902	2015-11-23 19:16:15 +00:00
NAKAMURA Takumi	0b498c7af8	Move free-zext.ll to llvm/test/Transforms/CodeGenPrepare/AArch64/ llvm-svn: 253730	2015-11-20 22:55:34 +00:00
Geoff Berry	5256fcada0	[CodeGenPrepare] Create more extloads and fewer ands Summary: Add and instructions immediately after loads that only have their low bits used, assuming that the (and (load x) c) will be matched as a extload and the ands/truncs fed by the extload will be removed by isel. Reviewers: mcrosier, qcolombet, ab Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D14584 llvm-svn: 253722	2015-11-20 22:34:39 +00:00
Sanjay Patel	ae3680cbcd	this new test file was accidentally left out of r253573 llvm-svn: 253574	2015-11-19 16:39:00 +00:00
Pete Cooper	67cf9a723b	Revert "Change memcpy/memset/memmove to have dest and source alignments." This reverts commit r253511. This likely broke the bots in http://lab.llvm.org:8011/builders/clang-ppc64-elf-linux2/builds/20202 http://bb.pgr.jp/builders/clang-3stage-i686-linux/builds/3787 llvm-svn: 253543	2015-11-19 05:56:52 +00:00
Pete Cooper	72bc23ef02	Change memcpy/memset/memmove to have dest and source alignments. Note, this was reviewed (and more details are in) http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html These intrinsics currently have an explicit alignment argument which is required to be a constant integer. It represents the alignment of the source and dest, and so must be the minimum of those. This change allows source and dest to each have their own alignments by using the alignment attribute on their arguments. The alignment argument itself is removed. There are a few places in the code for which the code needs to be checked by an expert as to whether using only src/dest alignment is safe. For those places, they currently take the minimum of src/dest alignments which matches the current behaviour. For example, code which used to read: call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 500, i32 8, i1 false) will now read: call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 8 %dest, i8* align 8 %src, i32 500, i1 false) For out of tree owners, I was able to strip alignment from calls using sed by replacing: (call.llvm\.memset.)i32\ [0-9]\,\ i1 false\) with: $1i1 false) and similarly for memmove and memcpy. I then added back in alignment to test cases which needed it. A similar commit will be made to clang which actually has many differences in alignment as now IRBuilder can generate different source/dest alignments on calls. In IRBuilder itself, a new argument was added. Instead of calling: CreateMemCpy(Dst, Src, getInt64(Size), DstAlign, / isVolatile / false) you now call CreateMemCpy(Dst, Src, getInt64(Size), DstAlign, SrcAlign, / isVolatile */ false) There is a temporary class (IntegerAlignment) which takes the source alignment and rejects implicit conversion from bool. This is to prevent isVolatile here from passing its default parameter to the source alignment. Note, changes in future can now be made to codegen. I didn't change anything here, but this change should enable better memcpy code sequences. Reviewed by Hal Finkel. llvm-svn: 253511	2015-11-18 22:17:24 +00:00
Igor Laevsky	f637b4a52e	[CodegenPrepare] Do not rematerialize gc.relocates across different basic blocks Differential Revision: http://reviews.llvm.org/D14258 llvm-svn: 251957	2015-11-03 18:37:40 +00:00
Sanjay Patel	0ed9aeaa5f	[CGP] widen switch condition and case constants to target's register width (2nd try) This is a redo of r251849 except the tests have been split into arch-specific folders to hopefully make the bots happy. This is a follow-up from the discussion in D12965. The block-at-a-time limitation of SelectionDAG also came up in D13297. Without the InstCombine change from D12965, I don't expect this patch to make any difference in the real world because InstCombine does not shrink cases like this in visitSwitchInst(). But we need to have this CGP safety harness in place before proceeding with any shrinkage in D12965, so we won't generate extra extends for compares. I've opted for IR regression tests in the patch because that seems like a clearer way to test the transform, but PowerPC CodeGen for an i16 widening test is shown below. x86 will need more work to solve: https://llvm.org/bugs/show_bug.cgi?id=22473 Before: BB#0: mr 4, 3 extsh. 3, 4 ble 0, .LBB0_5 BB#1: cmpwi 3, 99 bgt 0, .LBB0_9 BB#2: rlwinm 4, 4, 0, 16, 31 <--- 32-bit mask/extend li 3, 0 cmplwi 4, 1 beqlr 0 BB#3: cmplwi 4, 10 bne 0, .LBB0_12 BB#4: li 3, 1 blr .LBB0_5: rlwinm 3, 4, 0, 16, 31 <--- 32-bit mask/extend cmplwi 3, 65436 beq 0, .LBB0_13 BB#6: cmplwi 3, 65526 beq 0, .LBB0_15 BB#7: cmplwi 3, 65535 bne 0, .LBB0_12 BB#8: li 3, 4 blr .LBB0_9: rlwinm 3, 4, 0, 16, 31 <--- 32-bit mask/extend cmplwi 3, 100 beq 0, .LBB0_14 ... After: BB#0: rlwinm 4, 3, 0, 16, 31 <--- mask/extend to 32-bit and then use that for comparisons cmpwi 4, 999 ble 0, .LBB0_5 BB#1: lis 3, 0 ori 3, 3, 65525 cmpw 4, 3 bgt 0, .LBB0_9 BB#2: cmplwi 4, 1000 beq 0, .LBB0_14 BB#3: cmplwi 4, 65436 bne 0, .LBB0_13 BB#4: li 3, 6 blr .LBB0_5: li 3, 0 cmplwi 4, 1 beqlr 0 BB#6: cmplwi 4, 10 beq 0, .LBB0_12 BB#7: cmplwi 4, 100 bne 0, .LBB0_13 BB#8: li 3, 2 blr .LBB0_9: cmplwi 4, 65526 beq 0, .LBB0_15 BB#10: cmplwi 4, 65535 bne 0, .LBB0_13 ... Differential Revision: http://reviews.llvm.org/D13532 llvm-svn: 251857	2015-11-02 23:22:49 +00:00
Sanjay Patel	dfc825eb36	revert r251849; need to move tests to arch-specific folders llvm-svn: 251851	2015-11-02 23:05:20 +00:00
Sanjay Patel	b90a078de9	[CGP] widen switch condition and case constants to target's register width This is a follow-up from the discussion in D12965. The block-at-a-time limitation of SelectionDAG also came up in D13297. Without the InstCombine change from D12965, I don't expect this patch to make any difference in the real world because InstCombine does not shrink cases like this in visitSwitchInst(). But we need to have this CGP safety harness in place before proceeding with any shrinkage in D12965, so we won't generate extra extends for compares. I've opted for IR regression tests in the patch because that seems like a clearer way to test the transform, but PowerPC CodeGen for an i16 widening test is shown below. x86 will need more work to solve: https://llvm.org/bugs/show_bug.cgi?id=22473 Before: BB#0: mr 4, 3 extsh. 3, 4 ble 0, .LBB0_5 BB#1: cmpwi 3, 99 bgt 0, .LBB0_9 BB#2: rlwinm 4, 4, 0, 16, 31 <--- 32-bit mask/extend li 3, 0 cmplwi 4, 1 beqlr 0 BB#3: cmplwi 4, 10 bne 0, .LBB0_12 BB#4: li 3, 1 blr .LBB0_5: rlwinm 3, 4, 0, 16, 31 <--- 32-bit mask/extend cmplwi 3, 65436 beq 0, .LBB0_13 BB#6: cmplwi 3, 65526 beq 0, .LBB0_15 BB#7: cmplwi 3, 65535 bne 0, .LBB0_12 BB#8: li 3, 4 blr .LBB0_9: rlwinm 3, 4, 0, 16, 31 <--- 32-bit mask/extend cmplwi 3, 100 beq 0, .LBB0_14 ... After: BB#0: rlwinm 4, 3, 0, 16, 31 <--- mask/extend to 32-bit and then use that for comparisons cmpwi 4, 999 ble 0, .LBB0_5 BB#1: lis 3, 0 ori 3, 3, 65525 cmpw 4, 3 bgt 0, .LBB0_9 BB#2: cmplwi 4, 1000 beq 0, .LBB0_14 BB#3: cmplwi 4, 65436 bne 0, .LBB0_13 BB#4: li 3, 6 blr .LBB0_5: li 3, 0 cmplwi 4, 1 beqlr 0 BB#6: cmplwi 4, 10 beq 0, .LBB0_12 BB#7: cmplwi 4, 100 bne 0, .LBB0_13 BB#8: li 3, 2 blr .LBB0_9: cmplwi 4, 65526 beq 0, .LBB0_15 BB#10: cmplwi 4, 65535 bne 0, .LBB0_13 ... Differential Revision: http://reviews.llvm.org/D13532 llvm-svn: 251849	2015-11-02 22:46:24 +00:00
Sanjay Patel	69a50a1e17	[CGP] transform select instructions into branches and sink expensive operands This was originally checked in at r250527, but reverted at r250570 because of PR25222. There were at least 2 problems: 1. The cost check was checking for an instruction with an exact cost of TCC_Expensive; that should have been >=. 2. The cause of the clang stage 1 failures was illegally sinking 'call' instructions; we can't sink instructions that may have side effects / are not safe to execute speculatively. Fixed those conditions in sinkSelectOperand() and added test cases. Original commit message: This is a follow-up to the discussion in D12882. Ideally, we would like SimplifyCFG to be able to form select instructions even when the operands are expensive (as defined by the TTI cost model) because that may expose further optimizations. However, we would then like a later pass like CodeGenPrepare to undo that transformation if the target would likely benefit from not speculatively executing an expensive op (this patch). Once we have this safety mechanism in place, we can adjust SimplifyCFG to restore its select-formation behavior that changed with r248439. Differential Revision: http://reviews.llvm.org/D13297 llvm-svn: 250743	2015-10-19 21:59:12 +00:00
Benjamin Kramer	b43d33bf0f	Revert "This is a follow-up to the discussion in D12882." Breaks clang selfhost, see PR25222. This reverts commits r250527 and r250528. llvm-svn: 250570	2015-10-16 23:00:29 +00:00
Sanjay Patel	bb5d550c3d	move test case to x86 directory because it specifies an x86 target llvm-svn: 250528	2015-10-16 17:18:07 +00:00
Sanjay Patel	374dd8d88e	This is a follow-up to the discussion in D12882. Ideally, we would like SimplifyCFG to be able to form select instructions even when the operands are expensive (as defined by the TTI cost model) because that may expose further optimizations. However, we would then like a later pass like CodeGenPrepare to undo that transformation if the target would likely benefit from not speculatively executing an expensive op (this patch). Once we have this safety mechanism in place, we can adjust SimplifyCFG to restore its select-formation behavior that changed with r248439. Differential Revision: http://reviews.llvm.org/D13297 llvm-svn: 250527	2015-10-16 16:54:30 +00:00

1 2 3 4

195 Commits