llvm-project

Commit Graph

Author	SHA1	Message	Date
Eric Christopher	cee313d288	Revert "Temporarily Revert "Add basic loop fusion pass."" The reversion apparently deleted the test/Transforms directory. Will be re-reverting again. llvm-svn: 358552	2019-04-17 04:52:47 +00:00
Eric Christopher	a863435128	Temporarily Revert "Add basic loop fusion pass." As it's causing some bot failures (and per request from kbarton). This reverts commit r358543/ab70da07286e618016e78247e4a24fcb84077fda. llvm-svn: 358546	2019-04-17 02:12:23 +00:00
Johannes Doerfert	00102c7d95	[ValueTracking] Look through casts when determining non-nullness Bitcast and certain Ptr2Int/Int2Ptr instructions will not alter the value of their operand and can therefore be looked through when we determine non-nullness. Differential Revision: https://reviews.llvm.org/D54956 llvm-svn: 352293	2019-01-26 23:40:35 +00:00
Reid Kleckner	84a2d29681	[BasicAA] Fix AA bug on dynamic allocas and stackrestore Summary: BasicAA has special logic for unescaped allocas, which normally applies equally well to dynamic and static allocas. However, llvm.stackrestore has the power to end the lifetime of dynamic allocas, without referring to them directly. stackrestore is already marked with the most conservative memory modification attributes, but because the alloca is not escaped, the normal logic produces incorrect results. I think BasicAA needs a special case here to teach it about the relationship between dynamic allocas and stackrestore. Fixes PR40118 Reviewers: gbiv, efriedma, george.burgess.iv Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D55969 llvm-svn: 349945	2018-12-21 19:59:03 +00:00
Nikita Popov	dc73a6edde	Reapply "[MemCpyOpt] memset->memcpy forwarding with undef tail" Currently memcpyopt optimizes cases like memset(a, byte, N); memcpy(b, a, M); to memset(a, byte, N); memset(b, byte, M); if M <= N. Often this allows further simplifications down the line, which drop the first memset entirely. This patch extends this optimization for the case where M > N, but we know that the bytes a[N..M] are undef due to alloca/lifetime.start. This situation arises relatively often for Rust code, because Rust does not initialize trailing structure padding and loves to insert redundant memcpys. This also fixes https://bugs.llvm.org/show_bug.cgi?id=39844. The previous version of this patch did not perform dependency checking properly: While the dependency is checked at the position of the memset, the used size must be that of the memcpy. Previously the size of the memset was used, which missed modification in the region MemSetSize..CopySize, resulting in miscompiles. The added tests cover variations of this issue. Differential Revision: https://reviews.llvm.org/D55120 llvm-svn: 349078	2018-12-13 20:04:27 +00:00
David L. Jones	54c01ad6a9	Revert r348645 - "[MemCpyOpt] memset->memcpy forwarding with undef tail" This revision caused trucated memsets for structs with padding. See: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181210/610520.html llvm-svn: 349002	2018-12-13 03:15:11 +00:00
Nikita Popov	94b8e2ea4e	[MemCpyOpt] memset->memcpy forwarding with undef tail Currently memcpyopt optimizes cases like memset(a, byte, N); memcpy(b, a, M); to memset(a, byte, N); memset(b, byte, M); if M <= N. Often this allows further simplifications down the line, which drop the first memset entirely. This patch extends this optimization for the case where M > N, but we know that the bytes a[N..M] are undef due to alloca/lifetime.start. This situation arises relatively often for Rust code, because Rust does not initialize trailing structure padding and loves to insert redundant memcpys. This also fixes https://bugs.llvm.org/show_bug.cgi?id=39844. For the implementation, I'm reusing a bit of code for a similar existing optimization (direct memcpy of undef). I've also added memset support to MemDepAnalysis GetLocation -- Instead, getPointerDependencyFrom could be used, but it seems to make more sense to add this to GetLocation and thus make the computation cachable. Differential Revision: https://reviews.llvm.org/D55120 llvm-svn: 348645	2018-12-07 21:16:58 +00:00
Nikita Popov	4ca00df571	[MemCpyOpt] Add tests for memset->memcpy forwaring with undef tail; NFC These are baseline tests for D55120. llvm-svn: 348644	2018-12-07 21:16:52 +00:00
JF Bastien	73d8e4e531	Merge clang's isRepeatedBytePattern with LLVM's isBytewiseValue Summary: his code was in CGDecl.cpp and really belongs in LLVM's isBytewiseValue. Teach isBytewiseValue the tricks clang's isRepeatedBytePattern had, including merging undef properly, and recursing on more types. clang part of this patch: D51752 Subscribers: dexonsmith, llvm-commits Differential Revision: https://reviews.llvm.org/D51751 llvm-svn: 342709	2018-09-21 05:17:42 +00:00
Bjorn Pettersson	8e484dc531	[MemCpyOpt] Skip optimizing basic blocks not reachable from entry Summary: Skip basic blocks not reachable from the entry node in MemCpyOptPass::iterateOnFunction. Code that is unreachable may have properties that do not exist for reachable code (an instruction in a basic block can for example be dominated by a later instruction in the same basic block, for example if there is a single block loop). MemCpyOptPass::processStore is only safe to use for reachable basic blocks, since it may iterate past the basic block beginning when used for unreachable blocks. By simply skipping to optimize unreachable basic blocks we can avoid asserts such as "Assertion `!NodePtr->isKnownSentinel()' failed." in MemCpyOptPass::processStore. The problem was detected by fuzz tests. Reviewers: eli.friedman, dneilson, efriedma Reviewed By: efriedma Subscribers: efriedma, llvm-commits Differential Revision: https://reviews.llvm.org/D45889 llvm-svn: 330635	2018-04-23 19:55:04 +00:00
Daniel Neilson	6f1eb58e92	[MemCpyOpt] Update to new API for memory intrinsic alignment Summary: This change is part of step five in the series of changes to remove alignment argument from memcpy/memmove/memset in favour of alignment attributes. In particular, this changes the MemCpyOpt pass to cease using: 1) The old getAlignment() API of MemoryIntrinsic in favour of getting source & dest specific alignments through the new API. 2) The old IRBuilder CreateMemCpy/CreateMemMove single-alignment APIs in favour of the new API that allows setting source and destination alignments independently. We also add a few tests to fill gaps in the testing of this pass. Steps: Step 1) Remove alignment parameter and create alignment parameter attributes for memcpy/memmove/memset. ( rL322965, rC322964, rL322963 ) Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing source and dest alignments. ( rL323597 ) Step 3) Update Clang to use the new IRBuilder API. ( rC323617 ) Step 4) Update Polly to use the new IRBuilder API. ( rL323618 ) Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API, and those that use use MemIntrinsicInst::[get\|set]Alignment() to use [get\|set]DestAlignment() and [get\|set]SourceAlignment() instead. ( rL323886, rL323891, rL324148, rL324273, rL324278, rL324384, rL324395, rL324402, rL324626, rL324642, rL324653, rL324654, rL324773, rL324774, rL324781, rL324784, rL324955, rL324960, rL325816, rL327398, rL327421 ) Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the MemIntrinsicInst::[get\|set]Alignment() methods. Reference http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.html http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html llvm-svn: 328097	2018-03-21 14:14:55 +00:00
Daniel Neilson	1e68724d24	Remove alignment argument from memcpy/memmove/memset in favour of alignment attributes (Step 1) Summary: This is a resurrection of work first proposed and discussed in Aug 2015: http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.html and initially landed (but then backed out) in Nov 2015: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html The @llvm.memcpy/memmove/memset intrinsics currently have an explicit argument which is required to be a constant integer. It represents the alignment of the dest (and source), and so must be the minimum of the actual alignment of the two. This change is the first in a series that allows source and dest to each have their own alignments by using the alignment attribute on their arguments. In this change we: 1) Remove the alignment argument. 2) Add alignment attributes to the source & dest arguments. We, temporarily, require that the alignments for source & dest be equal. For example, code which used to read: call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 100, i32 4, i1 false) will now read call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 4 %dest, i8* align 4 %src, i32 100, i1 false) Downstream users may have to update their lit tests that check for @llvm.memcpy/memmove/memset call/declaration patterns. The following extended sed script may help with updating the majority of your tests, but it does not catch all possible patterns so some manual checking and updating will be required. s~declare void @llvm\.mem(set\|cpy\|move)\.p([^(])$(.), i32, i1$~declare void @llvm.mem\1.p\2(\3, i1)~g s~call void @llvm\.memset\.p([^(])i8$i8([^])\ (.), i8 (.), i8 (.), i32 [01], i1 ([^)])$~call void @llvm.memset.p\1i8(i8\2* \3, i8 \4, i8 \5, i1 \6)~g s~call void @llvm\.memset\.p([^(])i16$i8([^])\ (.), i8 (.), i16 (.), i32 [01], i1 ([^)])$~call void @llvm.memset.p\1i16(i8\2* \3, i8 \4, i16 \5, i1 \6)~g s~call void @llvm\.memset\.p([^(])i32$i8([^])\ (.), i8 (.), i32 (.), i32 [01], i1 ([^)])$~call void @llvm.memset.p\1i32(i8\2* \3, i8 \4, i32 \5, i1 \6)~g s~call void @llvm\.memset\.p([^(])i64$i8([^])\ (.), i8 (.), i64 (.), i32 [01], i1 ([^)])$~call void @llvm.memset.p\1i64(i8\2* \3, i8 \4, i64 \5, i1 \6)~g s~call void @llvm\.memset\.p([^(])i128$i8([^])\ (.), i8 (.), i128 (.), i32 [01], i1 ([^)])$~call void @llvm.memset.p\1i128(i8\2* \3, i8 \4, i128 \5, i1 \6)~g s~call void @llvm\.memset\.p([^(])i8$i8([^])\ (.), i8 (.), i8 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.memset.p\1i8(i8\2 align \6 \3, i8 \4, i8 \5, i1 \7)~g s~call void @llvm\.memset\.p([^(])i16$i8([^])\ (.), i8 (.), i16 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.memset.p\1i16(i8\2 align \6 \3, i8 \4, i16 \5, i1 \7)~g s~call void @llvm\.memset\.p([^(])i32$i8([^])\ (.), i8 (.), i32 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.memset.p\1i32(i8\2 align \6 \3, i8 \4, i32 \5, i1 \7)~g s~call void @llvm\.memset\.p([^(])i64$i8([^])\ (.), i8 (.), i64 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.memset.p\1i64(i8\2 align \6 \3, i8 \4, i64 \5, i1 \7)~g s~call void @llvm\.memset\.p([^(])i128$i8([^])\ (.), i8 (.), i128 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.memset.p\1i128(i8\2 align \6 \3, i8 \4, i128 \5, i1 \7)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i8$i8([^])\ (.), i8([^])\ (.), i8 (.), i32 [01], i1 ([^)])$~call void @llvm.mem\1.p\2i8(i8\3 \4, i8\5* \6, i8 \7, i1 \8)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i16$i8([^])\ (.), i8([^])\ (.), i16 (.), i32 [01], i1 ([^)])$~call void @llvm.mem\1.p\2i16(i8\3 \4, i8\5* \6, i16 \7, i1 \8)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i32$i8([^])\ (.), i8([^])\ (.), i32 (.), i32 [01], i1 ([^)])$~call void @llvm.mem\1.p\2i32(i8\3 \4, i8\5* \6, i32 \7, i1 \8)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i64$i8([^])\ (.), i8([^])\ (.), i64 (.), i32 [01], i1 ([^)])$~call void @llvm.mem\1.p\2i64(i8\3 \4, i8\5* \6, i64 \7, i1 \8)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i128$i8([^])\ (.), i8([^])\ (.), i128 (.), i32 [01], i1 ([^)])$~call void @llvm.mem\1.p\2i128(i8\3 \4, i8\5* \6, i128 \7, i1 \8)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i8$i8([^])\ (.), i8([^])\ (.), i8 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.mem\1.p\2i8(i8\3* align \8 \4, i8\5* align \8 \6, i8 \7, i1 \9)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i16$i8([^])\ (.), i8([^])\ (.), i16 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.mem\1.p\2i16(i8\3* align \8 \4, i8\5* align \8 \6, i16 \7, i1 \9)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i32$i8([^])\ (.), i8([^])\ (.), i32 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.mem\1.p\2i32(i8\3* align \8 \4, i8\5* align \8 \6, i32 \7, i1 \9)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i64$i8([^])\ (.), i8([^])\ (.), i64 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.mem\1.p\2i64(i8\3* align \8 \4, i8\5* align \8 \6, i64 \7, i1 \9)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i128$i8([^])\ (.), i8([^])\ (.), i128 (.), i32 ([0-9]), i1 ([^)])$~call void @llvm.mem\1.p\2i128(i8\3* align \8 \4, i8\5* align \8 \6, i128 \7, i1 \9)~g The remaining changes in the series will: Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing source and dest alignments. Step 3) Update Clang to use the new IRBuilder API. Step 4) Update Polly to use the new IRBuilder API. Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API, and those that use use MemIntrinsicInst::[get\|set]Alignment() to use getDestAlignment() and getSourceAlignment() instead. Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the MemIntrinsicInst::[get\|set]Alignment() methods. Reviewers: pete, hfinkel, lhames, reames, bollu Reviewed By: reames Subscribers: niosHD, reames, jholewinski, qcolombet, jfb, sanjoy, arsenm, dschuff, dylanmckay, mehdi_amini, sdardis, nemanjai, david2050, nhaehnle, javed.absar, sbc100, jgravelle-google, eraman, aheejin, kbarton, JDevlieghere, asb, rbar, johnrusso, simoncook, jordy.potman.lists, apazos, sabuasal, llvm-commits Differential Revision: https://reviews.llvm.org/D41675 llvm-svn: 322965	2018-01-19 17:13:12 +00:00
Reid Kleckner	6d31001cd6	Revert "[memcpyopt] Teach memcpyopt to optimize across basic blocks" This reverts r321138. It seems there are still underlying issues with memdep. PR35519 seems to still be present if debug info is enabled. We end up losing a memcpy. Somehow during store to memset merging, we insert the memset after the memcpy or fail to update the memdep analysis to account for the newly inserted memset of a pair. Reduced test case: #include <assert.h> #include <stdio.h> #include <string> #include <utility> #include <vector> void do_push_back( std::vector<std::pair<std::string, std::vector<std::string>>>* crls) { crls->push_back(std::make_pair(std::string(), std::vector<std::string>())); } int __attribute__((optnone)) main() { // Put some data in the vector and then remove it so we take the push_back // fast path. std::vector<std::pair<std::string, std::vector<std::string>>> crl_set; crl_set.push_back({"asdf", {}}); crl_set.pop_back(); printf("first word in vector storage: %p\n", (void)crl_set.data()); // Do the push_back which may fail to initialize the data. do_push_back(&crl_set); auto first = &crl_set.back().first; printf("first word in vector storage (should be zero): %p\n", (void*)crl_set.data()); assert(first->empty()); puts("ok"); } Compile with libc++, enable optimizations, and enable debug info: $ clang++ -stdlib=libc++ -g -O2 t.cpp -o t.exe -Wl,-rpath=llvm/build/lib This program will assert with this change. llvm-svn: 321510	2017-12-28 05:10:33 +00:00
Dan Gohman	aa3922819e	[memcpyopt] Teach memcpyopt to optimize across basic blocks This teaches memcpyopt to make a non-local memdep query when a local query indicates that the dependency is non-local. This notably allows it to eliminate many more llvm.memcpy calls in common Rust code, often by 20-30%. This is r319482 and r319483, along with fixes for PR35519: fix the optimization that merges stores into memsets to preserve cached memdep info, and fix memdep's non-local caching strategy to not assume that larger queries are always more conservative than smaller ones. Fixes PR28958 and PR35519. Differential Revision: https://reviews.llvm.org/D40802 llvm-svn: 321138	2017-12-20 01:36:25 +00:00
Hans Wennborg	146a9c3e51	Revert r319482 and r319483 "[memcpyopt] Teach memcpyopt to optimize across basic blocks" This caused PR35519. > [memcpyopt] Teach memcpyopt to optimize across basic blocks > > This teaches memcpyopt to make a non-local memdep query when a local query > indicates that the dependency is non-local. This notably allows it to > eliminate many more llvm.memcpy calls in common Rust code, often by 20-30%. > > Fixes PR28958. > > Differential Revision: https://reviews.llvm.org/D38374 > > [memcpyopt] Commit file missed in r319482. > > This change was meant to be included with r319482 but was accidentally > omitted. llvm-svn: 319873	2017-12-06 01:47:55 +00:00
Dan Gohman	59e4c0b938	[memcpyopt] Teach memcpyopt to optimize across basic blocks This teaches memcpyopt to make a non-local memdep query when a local query indicates that the dependency is non-local. This notably allows it to eliminate many more llvm.memcpy calls in common Rust code, often by 20-30%. Fixes PR28958. Differential Revision: https://reviews.llvm.org/D38374 llvm-svn: 319482	2017-11-30 22:10:53 +00:00
Matt Arsenault	90e4f719e1	Fix some misc. -enable-var-scope violations llvm-svn: 318006	2017-11-13 01:47:52 +00:00
Matt Arsenault	f10061ec70	Add address space mangling to lifetime intrinsics In preparation for allowing allocas to have non-0 addrspace. llvm-svn: 299876	2017-04-10 20:18:21 +00:00
Matt Arsenault	daa08875b3	[MemCpyOpt] Only replace memcpy with bitcast if address spaces match Patch by James Price llvm-svn: 299866	2017-04-10 19:00:25 +00:00
Fiona Glaser	a9bd572b6f	MemCpyOptimizer: don't create new addrspace casts This isn't safe on all targets, and since we don't have a way to know it's safe, avoid doing it for now. llvm-svn: 297788	2017-03-14 22:37:38 +00:00
Bryant Wong	7cb744621b	[MemCpyOpt] Don't sink LoadInst below possible clobber. Differential Revision: https://reviews.llvm.org/D26811 llvm-svn: 290611	2016-12-27 17:58:12 +00:00
Benjamin Kramer	1697d39eef	[MemCpyOpt] Don't emit IR in an unspecified order Argument evaluation order is one of the edge cases where Clang differs from GCC, yielding different IR depending on which compiler LLVM was built with. Make the order deterministic and tune the test to actually verify the order instead of trying to hide it. llvm-svn: 286126	2016-11-07 17:47:28 +00:00
Tim Shen	3ad8b43cc2	[MemCpy] Add comments for r279769 Differential Revision: https://reviews.llvm.org/D23846 llvm-svn: 279778	2016-08-25 21:03:46 +00:00
Tim Shen	a3dbead2d6	[MemCpy] Check for alias in performMemCpyToMemSetOptzn, instead of the identity of two operands Summary: This fixes pr29105. The reason is that lifetime marks creates new aliasing pointers the original ones, but before this patch aliases were not checked in performMemCpyToMemSetOptzn. Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D23846 llvm-svn: 279769	2016-08-25 19:27:26 +00:00
Anna Thomas	037e540f08	[AliasAnalysis] Treat invariant.start as read-memory Summary: We teach alias analysis that invariant.start is readonly. This helps with GVN and memcopy optimizations that currently treat. invariant.start as a clobber. We need to treat this as readonly, so that DSE does not incorrectly remove stores prior to the invariant.start Reviewers: sanjoy, reames, majnemer, dberlin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D23214 llvm-svn: 278138	2016-08-09 17:18:05 +00:00
Sean Silva	6347df0f81	[PM] Port MemCpyOpt to the new PM. The need for all these Lookup* functions is just because of calls to getAnalysis inside methods (i.e. not at the top level) of the runOnFunction method. They should be straightforward to clean up when the old PM is gone. llvm-svn: 272615	2016-06-14 02:44:55 +00:00
Tim Shen	7aa0ad65ce	[MemCpyOpt] Do not exchange llvm.lifetime.start and llvm.memcpy Reviewers: iteratee Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D21087 llvm-svn: 272192	2016-06-08 19:42:32 +00:00
David Majnemer	d99068d26d	[MemCpyOpt] Don't perform callslot optimization across may-throw calls An exception could prevent a store from occurring but MemCpyOpt's callslot optimization would fire anyway, causing the store to occur. This fixes PR27849. llvm-svn: 270892	2016-05-26 19:24:24 +00:00
Jun Bum Lim	f28beac419	[MemCpyOpt] Use MaxIntSize in byte instead of bit Summary: This change fix the bug in isProfitableToUseMemset() where MaxIntSize shoule be in byte, not bit. Reviewers: arsenm, joker.eph, mcrosier Subscribers: mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D20176 llvm-svn: 269433	2016-05-13 16:52:24 +00:00
Tim Northover	3961735f03	Revert "MemCpyOpt: combine local load/store sequences into memcpy." This reverts commit r269125. It was in my tree when I ran "git svn dcommit". It's really still under review. llvm-svn: 269127	2016-05-10 21:49:40 +00:00
Tim Northover	6c65c71639	MemCpyOpt: combine local load/store sequences into memcpy. Sort of the BB-local equivalent to idiom-recognizer: if we have a basic-block that really implements a memcpy operation, backends can benefit from seeing this. llvm-svn: 269125	2016-05-10 21:48:11 +00:00
Amaury Sechet	bdb261b4c0	Imporove load to store => memcpy Summary: This now try to reorder instructions in order to help create the optimizable pattern. Reviewers: craig.topper, spatel, dexonsmith, Prazek, chandlerc, joker.eph, majnemer Differential Revision: http://reviews.llvm.org/D16523 llvm-svn: 263503	2016-03-14 22:52:27 +00:00
Mehdi Amini	0535003bef	Fix PR26051: Memcpy optimization should introduce a call to memcpy before the store destination position This is a conservative fix, I expect Amaury to relax this. Follow-up for r256923 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 256999	2016-01-06 23:50:22 +00:00
Amaury Sechet	3235c08253	Promote aggregate store to memset when possible Summary: As per title. This will allow the optimizer to pick up on it. Reviewers: craig.topper, spatel, dexonsmith, Prazek, chandlerc, joker.eph, majnemer Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D15923 llvm-svn: 256969	2016-01-06 19:47:24 +00:00
Amaury Sechet	d3b2c0fd94	Improve load/store to memcpy for aggregate Summary: It turns out that if we don't try to do it at the store location, we can do it before any operation that alias the load, as long as no operation alias the store. Reviewers: craig.topper, spatel, dexonsmith, Prazek, chandlerc, joker.eph Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D15903 llvm-svn: 256923	2016-01-06 09:30:39 +00:00
Amaury Sechet	a0c242cdfd	Implement load to store => memcpy in MemCpyOpt for aggregates Summary: Most of the tool chain is able to optimize scalar and memcpy like operation effisciently while it isn't that good with aggregates. In order to improve the support of aggregate, we try to change aggregate manipulation into either scalar or memcpy like ones whenever possible without loosing informations. This is one such opportunity. Reviewers: craig.topper, spatel, dexonsmith, Prazek, chandlerc Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D15894 llvm-svn: 256868	2016-01-05 20:17:48 +00:00
Pete Cooper	67cf9a723b	Revert "Change memcpy/memset/memmove to have dest and source alignments." This reverts commit r253511. This likely broke the bots in http://lab.llvm.org:8011/builders/clang-ppc64-elf-linux2/builds/20202 http://bb.pgr.jp/builders/clang-3stage-i686-linux/builds/3787 llvm-svn: 253543	2015-11-19 05:56:52 +00:00
Pete Cooper	72bc23ef02	Change memcpy/memset/memmove to have dest and source alignments. Note, this was reviewed (and more details are in) http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html These intrinsics currently have an explicit alignment argument which is required to be a constant integer. It represents the alignment of the source and dest, and so must be the minimum of those. This change allows source and dest to each have their own alignments by using the alignment attribute on their arguments. The alignment argument itself is removed. There are a few places in the code for which the code needs to be checked by an expert as to whether using only src/dest alignment is safe. For those places, they currently take the minimum of src/dest alignments which matches the current behaviour. For example, code which used to read: call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 500, i32 8, i1 false) will now read: call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 8 %dest, i8* align 8 %src, i32 500, i1 false) For out of tree owners, I was able to strip alignment from calls using sed by replacing: (call.llvm\.memset.)i32\ [0-9]\,\ i1 false\) with: $1i1 false) and similarly for memmove and memcpy. I then added back in alignment to test cases which needed it. A similar commit will be made to clang which actually has many differences in alignment as now IRBuilder can generate different source/dest alignments on calls. In IRBuilder itself, a new argument was added. Instead of calling: CreateMemCpy(Dst, Src, getInt64(Size), DstAlign, / isVolatile / false) you now call CreateMemCpy(Dst, Src, getInt64(Size), DstAlign, SrcAlign, / isVolatile */ false) There is a temporary class (IntegerAlignment) which takes the source alignment and rejects implicit conversion from bool. This is to prevent isVolatile here from passing its default parameter to the source alignment. Note, changes in future can now be made to codegen. I didn't change anything here, but this change should enable better memcpy code sequences. Reviewed by Hal Finkel. llvm-svn: 253511	2015-11-18 22:17:24 +00:00
Akira Hatanaka	5dda592643	Sort the enums in Attributes.h in case insensitive alphabetical order. Sort the enums in preparation for moving the attributes to a table-gen file. rdar://problem/19836465 llvm-svn: 252692	2015-11-11 02:11:46 +00:00
Andrea Di Biagio	99493df257	[MemCpyOpt] Fix wrong merging adjacent nontemporal stores into memset calls. Pass MemCpyOpt doesn't check if a store instruction is nontemporal. As a consequence, adjacent nontemporal stores are always merged into a memset call. Example: ;;; define void @foo(<4 x float>* nocapture %p) { entry: store <4 x float> zeroinitializer, <4 x float>* %p, align 16, !nontemporal !0 %p1 = getelementptr inbounds <4 x float>, <4 x float>* %dst, i64 1 store <4 x float> zeroinitializer, <4 x float>* %p1, align 16, !nontemporal !0 ret void } !0 = !{i32 1} ;;; In this example, the two nontemporal stores are combined to a memset of zero which does not preserve the nontemporal hint. Later on the backend (tested on a x86-64 corei7) expands that memset call into a sequence of two normal 16-byte aligned vector stores. opt -memcpyopt example.ll -S -o - \| llc -mcpu=corei7 -o - Before: xorps %xmm0, %xmm0 movaps %xmm0, 16(%rdi) movaps %xmm0, (%rdi) With this patch, we no longer merge nontemporal stores into calls to memset. In this example, llc correctly expands the two stores into two movntps: xorps %xmm0, %xmm0 movntps %xmm0, 16(%rdi) movntps %xmm0, (%rdi) In theory, we could extend the usage of !nontemporal metadata to memcpy/memset calls. However a change like that would only have the effect of forcing the backend to expand !nontemporal memsets back to sequences of store instructions. A memset library call would not have exactly the same semantic of a builtin !nontemporal memset call. So, SelectionDAG will have to conservatively expand it back to a sequence of !nontemporal stores (effectively undoing the merging). Differential Revision: http://reviews.llvm.org/D13519 llvm-svn: 249820	2015-10-09 10:53:41 +00:00
Igor Laevsky	30143aee11	Emit argmemonly attribute for intrinsics. Differential Revision: http://reviews.llvm.org/D11352 llvm-svn: 244920	2015-08-13 17:40:04 +00:00
Ahmed Bougacha	97876fa894	[MemCpyOpt] Do move the memset, but look at its dest's dependencies. In effect a partial revert of r237858, which was a dumb shortcut. Looking at the dependencies of the destination should be the proper fix: if the new memset would depend on anything other than itself, the transformation isn't correct. llvm-svn: 237874	2015-05-21 01:43:39 +00:00
Ahmed Bougacha	5e0f425c27	[MemCpyOpt] Don't move the memset when optimizing memset+memcpy. Fixes PR23599, another miscompile introduced by r235232: when there is another dependency on the destination of the created memset (i.e., the part of the original destination that the memcpy doesn't depend on) between the memcpy and the original memset, we would insert the created memset after the memcpy, and thus after the other dependency. Instead, insert the created memset right after the old one. llvm-svn: 237858	2015-05-20 23:55:16 +00:00
Ahmed Bougacha	f8fa3b8d4b	[MemCpyOpt] Turn memcpy from just-memset'd source into memset. There's no point in copying around constants, so, when all else fails, we can still transform memcpy of memset into two independent memsets. To quote the example, we can turn: memset(dst1, c, dst1_size); memcpy(dst2, dst1, dst2_size); into: memset(dst1, c, dst1_size); memset(dst2, c, dst2_size); When dst2_size <= dst1_size. Like r235232 for copy constructors, this can occur in move constructors. Differential Revision: http://reviews.llvm.org/D9682 llvm-svn: 237506	2015-05-16 01:32:26 +00:00
Ahmed Bougacha	d2b8fc1f3a	Remove dead code in testcase. NFC. llvm-svn: 237501	2015-05-16 01:10:40 +00:00
Ahmed Bougacha	b61696656e	[MemCpyOpt] Look at any dependency -not just source- for memset+memcpy. This fixes another miscompile introduced by r235232: when there was a dependency on the memcpy destination other than the memset, we would ignore it, because we only looked at the source dependency. It was a mistake to use SrcDepInfo. Instead, just use DepInfo. llvm-svn: 237066	2015-05-11 23:09:46 +00:00
Ahmed Bougacha	9692e30e8b	[MemCpyOpt] Use the raw i8* dest when optimizing memset+memcpy. MemIntrinsic::getDest() looks through pointer casts, and using it directly when building the new GEP+memset results in stuff like: %0 = getelementptr i64* %p, i32 16 %1 = bitcast i64* %0 to i8* call ..memset(i8* %1, ...) instead of the correct: %0 = bitcast i64* %p to i8* %1 = getelementptr i8* %0, i32 16 call ..memset(i8* %1, ...) Instead, use getRawDest, which just gives you the i8* value. While there, use the memcpy's dest, as it's live anyway. In most cases, when the optimization triggers, the memset and memcpy sizes are the same, so the built memset is 0-sized and eliminated. The problem occurs when they're different. Fixes a regression caused by r235232: PR23300. llvm-svn: 235419	2015-04-21 21:28:33 +00:00
Ahmed Bougacha	05b72c1fd8	[MemCpyOpt] Don't force i64 when promoting memset/memcpy sizes. Harden r235258 to support any integer bitwidth. The quick glance at the reference made me think only i32 and i64 were valid types, but they're not special, so any overload is legal. Thanks to David Majnemer for noticing! llvm-svn: 235261	2015-04-18 23:06:04 +00:00
Ahmed Bougacha	7216ccc3f3	[MemCpyOpt] Promote both memset/memcpy sizes if differently typed. Followup to r235232, which caused PR23278. We can't assume the memset and memcpy sizes have the same type, as nothing in the language reference prevents that. Instead, zext both to i64 if they disagree. While there, robustify tests by using i8 %c rather than i8 0 for the memset character. llvm-svn: 235258	2015-04-18 17:57:41 +00:00
Ahmed Bougacha	83f78a459a	[MemCpyOpt] Optimize double-storing by memset+memcpy. A common idiom in some code is to do the following: memset(dst, 0, dst_size); memcpy(dst, src, src_size); Some of the memset is redundant; instead, we can do: memcpy(dst, src, src_size); memset(dst + src_size, 0, dst_size <= src_size ? 0 : dst_size - src_size); Original patch by: Joel Jones Differential Revision: http://reviews.llvm.org/D498 llvm-svn: 235232	2015-04-17 22:20:57 +00:00

1 2 3

133 Commits