llvm-project

Commit Graph

Author	SHA1	Message	Date
David Green	8699492304	[DA] Delinearise AddRecs if we can prove they don't wrap We can prove that some delinearized subscripts do not wrap around to become negative by the fact that they are from inbound geps of load/store locations. This helps improve the delinearisation in cases where we can't prove that they are non-negative from SCEV alone. Differential Revision: https://reviews.llvm.org/D48481 llvm-svn: 335481	2018-06-25 15:13:26 +00:00
Simon Pilgrim	9c8f9374b5	[CostModel][AArch64] Add some initial costs for SK_Select and SK_PermuteSingleSrc AArch64 was only setting costs for SK_Transpose, which meant that many of the simpler shuffles (e.g. SK_Select and SK_PermuteSingleSrc for larger vector elements) was being severely overestimated by the default shuffle expansion. This patch adds costs to help improve SLP performance and avoid a regression in reductions introduced by D48174. I'm not very knowledgeable about AArch64 shuffle lowering so I've kept the extra costs to a minimum - someone who knows this code can add extra costs which should improve vectorization a lot more. Differential Revision: https://reviews.llvm.org/D48172 llvm-svn: 335329	2018-06-22 09:45:31 +00:00
Tim Shen	63f244c4f4	[SCEV] Re-apply r335197 (with Polly fixes). Summary: This initiates a discussion on changing Polly accordingly while re-applying r335197 (D48338). I have never worked on Polly. The proposed change to param_div_div_div_2.ll is not educated, but just patterns that match the output. All LLVM files are already reviewed in D48338. Reviewers: jdoerfert, bollu, efriedma Subscribers: jlebar, sanjoy, hiraditya, llvm-commits, bixia Differential Revision: https://reviews.llvm.org/D48453 llvm-svn: 335292	2018-06-21 21:29:54 +00:00
Nicolai Haehnle	1045928aab	AMDGPU: Convert test cases to the dimension-aware intrinsics Summary: Also explicitly port over some tests in llvm.amdgcn.image.* that were missing. Some tests are removed because they no longer apply (i.e. explicitly testing building an address vector via insertelement). This is in preparation for the eventual removal of the old-style intrinsics. Some additional notes: - constant-address-space-32bit.ll: change some GCN-NEXT to GCN because the instruction schedule was subtly altered - insert_vector_elt.ll: the old test didn't actually test anything, because %tmp1 was not used; remove the load, because it doesn't work (Because of the amdgpu_ps calling convention? In any case, it's orthogonal to what the test claims to be testing.) Change-Id: Idfa99b6512ad139e755e82b8b89548ab08f0afcf Reviewers: arsenm, rampitec Subscribers: MatzeB, qcolombet, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D48018 llvm-svn: 335229	2018-06-21 13:37:19 +00:00
David Green	d143c65de3	[DA] Enable -da-delinearize by default This enables da-delinearize in Dependence Analysis for delinearizing array accesses into multiple dimensions. This can help to increase the power of Dependence analysis on multi-dimensional arrays and prevent having to fall back to the slower and less accurate MIV tests. It adds static checks on the bounds of the arrays to ensure that one dimension doesn't overflow into another, and brings our code in line with our tests. Differential Revision: https://reviews.llvm.org/D45872 llvm-svn: 335217	2018-06-21 11:53:16 +00:00
Simon Pilgrim	2a9cde026c	[X86][AVX] Reduce v4f64/v4i64 shuffle costs (PR37882) These were being over cautious for costs for one/two op general shuffles - VSHUFPD doesn't have to replicate the same shuffle in both lanes like VSHUFPS does. llvm-svn: 335216	2018-06-21 11:37:13 +00:00
Tim Shen	433b9761ce	Revert "[SCEV] Improve zext(A /u B) and zext(A % B)" This reverts commit r335197, as some bots are not happy. llvm-svn: 335198	2018-06-21 02:15:32 +00:00
Tim Shen	5af61e0a28	[SCEV] Improve zext(A /u B) and zext(A % B) Summary: Try to match udiv and urem patterns, and sink zext down to the leaves. I'm not entirely sure why some unrelated tests change, but the added <nsw>s seem right. Reviewers: sanjoy Subscribers: jlebar, hiraditya, bixia, llvm-commits Differential Revision: https://reviews.llvm.org/D48338 llvm-svn: 335197	2018-06-21 01:49:07 +00:00
Roman Lebedev	42a1ff11fb	[NFC][SCEV] Add tests related to bit masking (PR37793) Summary: Related to https://bugs.llvm.org/show_bug.cgi?id=37793, https://reviews.llvm.org/D46760#1127287 We'd like to do this canonicalization https://rise4fun.com/Alive/Gmc But it is currently restricted by rL155136 / rL155362, which says: ``` // This is a constant shift of a constant shift. Be careful about hiding // shl instructions behind bit masks. They are used to represent multiplies // by a constant, and it is important that simple arithmetic expressions // are still recognizable by scalar evolution. // // The transforms applied to shl are very similar to the transforms applied // to mul by constant. We can be more aggressive about optimizing right // shifts. // // Combinations of right and left shifts will still be optimized in // DAGCombine where scalar evolution no longer applies. ``` I think these tests show that for constants, SCEV has no issues with that canonicalization. Reviewers: mkazantsev, spatel, efriedma, sanjoy Reviewed By: mkazantsev Subscribers: sanjoy, javed.absar, llvm-commits, stoklund, bixia Differential Revision: https://reviews.llvm.org/D48229 llvm-svn: 335101	2018-06-20 07:54:11 +00:00
Sanjoy Das	6e9b355cc9	Revert "[SCEV] Add nuw/nsw to mul ops in StrengthenNoWrapFlags" This reverts r334428. It incorrectly marks some multiplications as nuw. Tim Shen is working on a proper fix. Original commit message: [SCEV] Add nuw/nsw to mul ops in StrengthenNoWrapFlags where safe. Summary: Previously we would add them for adds, but not multiplies. llvm-svn: 335016	2018-06-19 04:09:44 +00:00
George Burgess IV	aa283d80fe	[MSSA] Print more optimization information In particular, when asked to print a MemoryAccess, we'll now print where defs are optimized to, and we'll print optimized access types. This patch also introduces an operator<< to make printing AliasResults easier. Patch by Juneyoung Lee! Differential Revision: https://reviews.llvm.org/D47860 llvm-svn: 334760	2018-06-14 19:55:53 +00:00
Justin Lebar	fe455464eb	[SCEV] Simplify zext/trunc idiom that appears when handling bitmasks. Summary: Specifically, we transform zext(2^K * (trunc X to iN)) to iM -> 2^K * (zext(trunc X to i{N-K}) to iM)<nuw> This is helpful because pulling the 2^K out of the zext allows further optimizations. Reviewers: sanjoy Subscribers: hiraditya, llvm-commits, timshen Differential Revision: https://reviews.llvm.org/D48158 llvm-svn: 334737	2018-06-14 17:13:48 +00:00
Justin Lebar	b326904dba	[SCEV] Simplify trunc-of-add/mul to add/mul-of-trunc under more circumstances. Summary: Previously we would do this simplification only if it did not introduce any new truncs (excepting new truncs which replace other cast ops). This change weakens this condition: If the number of truncs stays the same, but we're able to transform trunc(X + Y) to X + trunc(Y), that's still simpler, and it may open up additional transformations. While we're here, also clean up some duplicated code. Reviewers: sanjoy Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D48160 llvm-svn: 334736	2018-06-14 17:13:35 +00:00
Simon Pilgrim	607a1e2196	[CostModel][AArch64] Add cost tests for ALTERNATE/SELECT style shuffle masks Precursor to fixing a regression with SLP vectorizer for supporting SELECT shuffles (vs the current ALTERNATE) llvm-svn: 334714	2018-06-14 14:20:20 +00:00
Simon Pilgrim	32702cc86a	[CostModel] Recognise REVERSE shuffle mask if the elements come from the second src llvm-svn: 334698	2018-06-14 09:35:00 +00:00
Simon Pilgrim	9fd634db22	[CostModel][X86] Test showing failure to recognise REVERSE shuffle mask if the elements come from the second src llvm-svn: 334623	2018-06-13 17:12:11 +00:00
Simon Pilgrim	54a138a0c5	[CostModel] Recognise BROADCAST shuffle mask if the elements come from the second src llvm-svn: 334620	2018-06-13 16:52:02 +00:00
Simon Pilgrim	5af0b99ea4	[CostModel][X86] Test showing failure to recognise BROADCAST shuffle mask if the elements come from the second src llvm-svn: 334616	2018-06-13 16:33:42 +00:00
Max Kazantsev	0ed79620c6	[SimplifyIndVars] Ignore dead users IndVarSimplify sometimes makes transforms basing on users that are trivially dead. In particular, if DCE wasn't run before it, there may be a dead `sext/zext` in loop that will trigger widening transforms, however it makes no sense to do it. This patch teaches IndVarsSimplify ignore the mist trivial cases of that. Differential Revision: https://reviews.llvm.org/D47974 Reviewed By: sanjoy llvm-svn: 334567	2018-06-13 02:25:32 +00:00
Simon Pilgrim	e39fa6cbbb	[CostModel] Replace ShuffleKind::SK_Alternate with ShuffleKind::SK_Select (PR33744) As discussed on PR33744, this patch relaxes ShuffleKind::SK_Alternate which requires shuffle masks to only match an alternating pattern from its 2 sources: e.g. v4f32: <0,5,2,7> or <4,1,6,3> This seems far too restrictive as most SIMD hardware which will implement it using a general blend/bit-select instruction, so replaces it with SK_Select, permitting elements from either source as long as they are inline: e.g. v4f32: <0,5,2,7>, <4,1,6,3>, <0,1,6,7>, <4,1,2,3> etc. This initial patch just updates the name and cost model shuffle mask analysis, later patch reviews will update SLP to better utilise this - it still limits itself to SK_Alternate style patterns. Differential Revision: https://reviews.llvm.org/D47985 llvm-svn: 334513	2018-06-12 16:12:29 +00:00
Simon Pilgrim	0783921987	[CostModel] Treat Identity shuffle masks as zero cost As discussed on D47985, identity shuffle masks should probably be free. I've limited this to the case where the input and output types all match - but we could probably accept all cases. Differential Revision: https://reviews.llvm.org/D47986 llvm-svn: 334506	2018-06-12 14:47:13 +00:00
Simon Pilgrim	cfd96329f0	[CostModel][X86] Add extra Identity shuffle mask cost tests (D47986) llvm-svn: 334486	2018-06-12 09:18:13 +00:00
Tim Shen	df2d6652c1	Fix incorrect CHECK-LABEL llvm-svn: 334434	2018-06-11 19:56:12 +00:00
Justin Lebar	4da41c13a5	[SCEV] Add transform zext((A * B * ...)<nuw>) --> (zext(A) * zext(B) * ...)<nuw>. Reviewers: sanjoy Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D48041 llvm-svn: 334429	2018-06-11 18:57:58 +00:00
Justin Lebar	aa4fec94d8	[SCEV] Add nuw/nsw to mul ops in StrengthenNoWrapFlags where safe. Summary: Previously we would add them for adds, but not multiplies. Reviewers: sanjoy Subscribers: llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D48038 llvm-svn: 334428	2018-06-11 18:57:42 +00:00
Tim Shen	cc63761720	[SCEV] Canonicalize "A /u C1 /u C2" to "A /u (C1C2)". Summary: FWIW InstCombine already folds this. Also avoid the case where C1C2 overflows. Reviewers: sunfish, sanjoy Subscribers: hiraditya, bixia, llvm-commits Differential Revision: https://reviews.llvm.org/D47965 llvm-svn: 334425	2018-06-11 18:44:58 +00:00
Simon Pilgrim	5297506625	[CostModel][X86] Add 'select' style shuffle costs tests (PR33744) llvm-svn: 334351	2018-06-09 16:08:25 +00:00
Krzysztof Parzyszek	b10ea39270	[SCEV] Look through zero-extends in howFarToZero An expression like (zext i2 {(trunc i32 (1 + %B) to i2),+,1}<%while.body> to i32) will become zero exactly when the nested value becomes zero in its type. Strip injective operations from the input value in howFarToZero to make the value simpler. Differential Revision: https://reviews.llvm.org/D47951 llvm-svn: 334318	2018-06-08 20:43:07 +00:00
Artur Pilipenko	4d063e7bb1	[BPI] Apply invoke heuristic before loop branch heuristic Currently the loop branch heuristic is applied before the invoke heuristic which makes us overestimate the probability of the unwind destination of invokes inside loops. This in turn makes us grossly underestimate the frequencies of loops with invokes. Reviewed By: skatkov, vsk Differential Revision: https://reviews.llvm.org/D47371 llvm-svn: 334285	2018-06-08 13:03:21 +00:00
Simon Pilgrim	f2f043acbb	[X86][SSE] Use multiplication scale factors for v8i16 SHL on pre-AVX2 targets. Similar to v4i32 SHL, convert v8i16 shift amounts to scale factors instead to improve performance and reduce instruction count. We were already doing this for constant shifts, this adds variable shift support. Reduces the serial nature of the codegen, which relies on chains of plendvb/pand+pandn+por shifts. This is a step towards adding support for vXi16 vector rotates. Differential Revision: https://reviews.llvm.org/D47546 llvm-svn: 334023	2018-06-05 15:17:39 +00:00
Karl-Johan Karlsson	6d52e5c3e4	[ConstantFold] Disallow folding vector geps into bitcasts Summary: Getelementptr returns a vector of pointers, instead of a single address, when one or more of its arguments is a vector. In such case it is not possible to simplify the expression by inserting a bitcast of operand(0) into the destination type, as it will create a bitcast between different sizes. Reviewers: majnemer, mkuper, mssimpso, spatel Reviewed By: spatel Subscribers: lebedev.ri, llvm-commits Differential Revision: https://reviews.llvm.org/D46379 llvm-svn: 333783	2018-06-01 19:34:35 +00:00
Karl-Johan Karlsson	b60b920a8c	[ConstantFold] Add lit testcase for bitcast problem. NFC llvm-svn: 333767	2018-06-01 15:08:14 +00:00
David Green	2911b3a07a	[DA] Fix direction vectors for weakZeroSrcSIV Both weakZeroSrcSIV and weakZeroDstSIV are currently giving the same direction vectors. Fix weakZeroSrcSIVtest by flipping the directions it gives. Differential Revision: https://reviews.llvm.org/D46678 llvm-svn: 333658	2018-05-31 14:55:29 +00:00
Daniel Neilson	6b23fb764e	[AliasSet] Teach the alias set how to handle atomic memcpy/memmove/memset Summary: The atomic variants of the memcpy/memmove/memset intrinsics can be treated the same was as the regular forms, with respect to aliasing. Update the AliasSetTracker to treat the atomic forms the same was as the regular forms. llvm-svn: 333551	2018-05-30 14:43:39 +00:00
Daniel Neilson	3a6c50f4e0	[BasicAA] Teach the analysis about atomic memcpy Summary: A simple change to derive mod/ref info from the atomic memcpy intrinsic in the same way as from the regular memcpy intrinsic. llvm-svn: 333454	2018-05-29 19:23:50 +00:00
Piotr Padlewski	d6f7346a4b	Fix aliasing of launder.invariant.group Summary: Patch for capture tracking broke bootstrap of clang with -fstict-vtable-pointers which resulted in debbugging nightmare. It was fixed https://reviews.llvm.org/D46900 but as it turned out, there were other parts like inliner (computing of noalias metadata) that I found after bootstraping with enabled assertions. Reviewers: hfinkel, rsmith, chandlerc, amharc, kuhar Subscribers: JDevlieghere, eraman, llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D47088 llvm-svn: 333070	2018-05-23 09:16:44 +00:00
Simon Pilgrim	4162d77744	[TTI] Add uniform/non-uniform constant Pow2 detection to TargetTransformInfo::getInstructionThroughput This enables us to detect more fast path sdiv cases under cost analysis. This patch also enables us to handle non-uniform-constant pow2 cases for X86 SDIV costs. Found while working on D46276 Future patches can then extend the vectorizers to more fully support non-uniform pow2 cases. Differential Revision: https://reviews.llvm.org/D46637 llvm-svn: 332969	2018-05-22 10:40:09 +00:00
Piotr Padlewski	5642a42442	Propagate nonnull and dereferenceable throught launder Summary: invariant.group.launder should not stop propagation of nonnull and dereferenceable, because e.g. we would not be able to hoist loads speculatively. Reviewers: rsmith, amharc, kuhar, xbolva00, hfinkel Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D46972 llvm-svn: 332788	2018-05-18 23:54:33 +00:00
Piotr Padlewski	153fe60079	[MemDep] Fixed handling of invariant.group Summary: Memdep had funny bug related to invariant.groups - because it did not invalidated cache, in some very rare cases it was possible to show memory dependence of the instruction that was deleted, but because other instruction took it's place it resulted in call to vtable! Thanks @amharc for repro!. Reviewers: dberlin, kuhar, amharc Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D45320 Co-authored-by: Krzysztof Pszeniczny <krzysztof.pszeniczny@gmail.com> llvm-svn: 332781	2018-05-18 22:40:34 +00:00
Serguei Katkov	5095883fe9	[LICM] Extend the MustExecute scope CanProveNotTakenFirstIteration utility does not handle the case when condition of the branch is a constant. Add its handling. Reviewers: reames, anna, mkazantsev Reviewed By: reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D46996 llvm-svn: 332695	2018-05-18 04:56:28 +00:00
George Burgess IV	c6526176cf	Revert r332657: "[AA] cfl-anders-aa with field sensitivity" I don't believe the person who LGTMed this review has appropriate context on this code. I apologize if I'm wrong. llvm-svn: 332674	2018-05-17 21:56:39 +00:00
David Bolvansky	b1c59e3f30	[AA] cfl-anders-aa with field sensitivity Summary: There was some unfinished work started for offset tracking in CFLGraph by the author of implementation of Andersen algorithm. This work was completed and support for field sensitivity was added to the core of Andersen algorithm. The performance results seem promising. SPEC2006 int_base score was increased by 1.1 % (I compared clang 6.0 with clang 6.0 with this patch). The avergae compile time was increased by +- 1 % according my measures with small and medium C/C++ projects (I did not tested it on the large projects with milions of lines of code) Reviewers: chandlerc, george.burgess.iv, rja Reviewed By: rja Subscribers: rja, llvm-commits Differential Revision: https://reviews.llvm.org/D46282 llvm-svn: 332657	2018-05-17 20:23:33 +00:00
Krzysztof Pszeniczny	2ba8fd4914	[BasicAA] Fix handling of invariant group launders Summary: A recent patch ([[ https://reviews.llvm.org/rL331587 \| rL331587 ]]) to Capture Tracking taught it that the `launder_invariant_group` intrinsic captures its argument only by returning it. Unfortunately, BasicAA still considered every call instruction as a possible escape source and hence concluded that the result of a `launder_invariant_group` call cannot alias any local non-escaping value. This led to [[ https://bugs.llvm.org/show_bug.cgi?id=37458 \| bug 37458 ]]. This patch updates the relevant check for escape sources in BasicAA. Reviewers: Prazek, kuhar, rsmith, hfinkel, sanjoy, xbolva00 Reviewed By: hfinkel, xbolva00 Subscribers: JDevlieghere, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D46900 llvm-svn: 332466	2018-05-16 13:16:54 +00:00
Michael Zolotukhin	67cfbaac89	[MemorySSA] Don't sort IDF blocks. Summary: After r332167 we started to sort the IDF blocks inside IDF calculation, so there is no need to re-sort them on the user site. The test changes are due to a slightly different order we're using now (originally we used DFSInNumber and now the blocks are sorted by a pair (LevelFromRoot, DFSInNumber)). Reviewers: dberlin, mgrang Subscribers: Prazek, hiraditya, george.burgess.iv, llvm-commits Differential Revision: https://reviews.llvm.org/D46899 llvm-svn: 332385	2018-05-15 18:40:29 +00:00
George Burgess IV	175400a801	Add regression test for r331976 In general, it's difficult to poke the ConstantExpr code in CFLAA, since LLVM is so great at eagerly reducing ConstantExprs. :) Sadly, this only shows a functional difference from before the patch because CFLAA has some special logic around taking loads of non-pointers into account. Namely, with the broken select behavior, CFLAA will completely fail to take note of @g3. Since CFLAA doesn't have any record about @g3 when we do an alias query for @g3 and %a, it conservatively answers MayAlias. When we properly take @g3 into account with the new select logic, we get NoAlias for this query. I suspect that the aforementioned "special logic" isn't completely correct, but this test-case should prevent future wonky aliasing results from appearing for these flavors of ConstantExprs, so I think it's still worth having. llvm-svn: 332017	2018-05-10 18:37:54 +00:00
Serguei Katkov	32754aa37f	[SCEV] Add missed Test for rL331949. llvm-svn: 331950	2018-05-10 01:42:59 +00:00
Adhemerval Zanella	f384bc7166	[AArch64] Improve cost of vector division by constant With custom lowering for vector MULLH{S,U}, it is now profitable to vectorize a divide by constant loop for the custom types (v16i8, v8i16, and v4i32). The cost if based on TargetLowering::Build{S,U}DIV which uses a multiply by constant plus adjustment to express a divide by constant. Both {u,s}mull{2} are expressed as Instruction::Mul and shifts by Instruction::AShr. llvm-svn: 331873	2018-05-09 12:48:22 +00:00
Simon Pilgrim	fe5c5277ed	[CostModel][X86] Split off SLM checks A future patch will require this and the diff is much better if we perform the split separately. llvm-svn: 331867	2018-05-09 11:42:34 +00:00
Shiva Chen	2c864551df	[DebugInfo] Add DILabel metadata and intrinsic llvm.dbg.label. In order to set breakpoints on labels and list source code around labels, we need collect debug information for labels, i.e., label name, the function label belong, line number in the file, and the address label located. In order to keep these information in LLVM IR and to allow backend to generate debug information correctly. We create a new kind of metadata for labels, DILabel. The format of DILabel is !DILabel(scope: !1, name: "foo", file: !2, line: 3) We hope to keep debug information as much as possible even the code is optimized. So, we create a new kind of intrinsic for label metadata to avoid the metadata is eliminated with basic block. The intrinsic will keep existing if we keep it from optimized out. The format of the intrinsic is llvm.dbg.label(metadata !1) It has only one argument, that is the DILabel metadata. The intrinsic will follow the label immediately. Backend could get the label metadata through the intrinsic's parameter. We also create DIBuilder API for labels to be used by Frontend. Frontend could use createLabel() to allocate DILabel objects, and use insertLabel() to insert llvm.dbg.label intrinsic in LLVM IR. Differential Revision: https://reviews.llvm.org/D45024 Patch by Hsiangkai Wang. llvm-svn: 331841	2018-05-09 02:40:45 +00:00
Piotr Padlewski	5dde809404	Rename invariant.group.barrier to launder.invariant.group Summary: This is one of the initial commit of "RFC: Devirtualization v2" proposal: https://docs.google.com/document/d/16GVtCpzK8sIHNc2qZz6RN8amICNBtvjWUod2SujZVEo/edit?usp=sharing Reviewers: rsmith, amharc, kuhar, sanjoy Subscribers: arsenm, nhaehnle, javed.absar, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D45111 llvm-svn: 331448	2018-05-03 11:03:01 +00:00
Max Kazantsev	58fce7e54b	Re-enable "[SCEV] Make computeExitLimit more simple and more powerful" This patch was temporarily reverted because it has exposed bug 37229 on PowerPC platform. The bug is unrelated to the patch and was just a general bug in the optimization done for PowerPC platform only. The bug was fixed by the patch rL331410. This patch returns the disabled commit since the bug was fixed. llvm-svn: 331427	2018-05-03 02:37:55 +00:00
Piotr Padlewski	f801205e48	Mark invariant.group.barrier as inaccessiblememonly It turned out that readonly argmemonly is not enough. store 42, %p %b = barrier(%p) store 43, %b the first store is dead, but because barrier was marked as reading argument memory, it was considered alive. With inaccessiblememonly it doesn't read the argument, but it also can't be CSEd. based on: https://reviews.llvm.org/D32006 llvm-svn: 331338	2018-05-02 08:22:07 +00:00
Mikhail Maltsev	ffaa8a8781	[IR] Do not assume that function pointers are aligned Summary: The value tracking analysis uses function alignment to infer that the least significant bits of function pointers are known to be zero. Unfortunately, this is not correct for ARM targets: the least significant bit of a function pointer stores the ARM/Thumb state information (i.e., the LSB is set for Thumb functions and cleared for ARM functions). The original approach (https://reviews.llvm.org/D44781) introduced a new field for function pointer alignment in the DataLayout structure to address this. But it seems unlikely that optimizations based on function pointer alignment would bring much benefit in practice to justify the additional maintenance burden, so this patch simply assumes that function pointer alignment is always unknown. Reviewers: javed.absar, efriedma Reviewed By: efriedma Subscribers: kristof.beyls, llvm-commits, hfinkel, rogfer01 Differential Revision: https://reviews.llvm.org/D46110 llvm-svn: 331025	2018-04-27 09:12:12 +00:00
Matthew Simpson	b4096ebe26	[TTI, AArch64] Add transpose shuffle kind This patch adds a new shuffle kind useful for transposing a 2xn matrix. These transpose shuffle masks read corresponding even- or odd-numbered vector elements from two n-dimensional source vectors and write each result into consecutive elements of an n-dimensional destination vector. The transpose shuffle kind is meant to model the TRN1 and TRN2 AArch64 instructions. As such, this patch also considers transpose shuffles in the AArch64 implementation of getShuffleCost. Differential Revision: https://reviews.llvm.org/D45982 llvm-svn: 330941	2018-04-26 13:48:33 +00:00
Max Kazantsev	2c287ec9c5	Revert "[SCEV] Make computeExitLimit more simple and more powerful" This reverts commit 023c8be90980e0180766196cba86f81608b35d38. This patch triggers miscompile of zlib on PowerPC platform. Most likely it is caused by some pre-backend PPC-specific pass, but we don't clearly know the reason yet. So we temporally revert this patch with intention to return it once the problem is resolved. See bug 37229 for details. llvm-svn: 330893	2018-04-26 02:07:40 +00:00
Simon Pilgrim	0ae4bba911	[CostModel][X86] Add div/rem tests for non-uniform constant divisors llvm-svn: 330852	2018-04-25 18:03:31 +00:00
Matthew Simpson	3fd67df3f8	[AArch64] Add cost model test case for transpose This patch adds a cost model test case for vector shuffles having transpose masks. The given costs are inaccurate and will be updated in a follow-on patch. llvm-svn: 330625	2018-04-23 18:21:29 +00:00
Simon Pilgrim	ab9798765c	[CostModel][X86] Add vector element insert/extract cost tests llvm-svn: 330439	2018-04-20 15:26:59 +00:00
Simon Pilgrim	863ffeb750	[CostModel][X86] Add srem/urem constant cost tests llvm-svn: 330436	2018-04-20 15:01:03 +00:00
Simon Pilgrim	8a15d72550	[CostModel][X86] Add SLM/GLM/BtVer2 compare + division/remainder cost tests llvm-svn: 330435	2018-04-20 14:50:34 +00:00
Simon Pilgrim	cd9ccf8824	[CostModel][X86] Split off BtVer2 cost checks llvm-svn: 330433	2018-04-20 13:50:33 +00:00
Simon Pilgrim	25b7782975	[CostModel][X86] Add GoldmontPlus cost tests Just reuses goldmont costs atm llvm-svn: 330432	2018-04-20 13:42:53 +00:00
Shiva Chen	c84e77aeae	[BasicAA] Return MayAlias for the pointer plus variable offset to structure object member Differential Revision: https://reviews.llvm.org/D45510 llvm-svn: 330106	2018-04-16 01:58:39 +00:00
Simon Pilgrim	34b397a318	[CostModel][X86] Add some specific cpu targets to the cost models We're mostly testing with generic isa attributes, but PR36550 will require testing of specific target's scheduler models as well. llvm-svn: 330056	2018-04-13 19:30:15 +00:00
Simon Pilgrim	3ede11b58c	[CostModel][X86] Split fma arith costs tests from other fp tests Was proving cumbersome to test with/without fma llvm-svn: 330054	2018-04-13 19:12:32 +00:00
Simon Pilgrim	237730a196	[CostModel][X86] Regenerate latency/codesize cost tests llvm-svn: 330052	2018-04-13 18:56:58 +00:00
Simon Pilgrim	0c07ccc4e3	[CostModel][X86] Regenerate cast conversion cost tests llvm-svn: 330051	2018-04-13 18:56:05 +00:00
Simon Pilgrim	e30db80d30	[CostModel][X86] Regenerate masked intrinsic cost tests llvm-svn: 330050	2018-04-13 18:54:16 +00:00
David Green	5ef933b02c	[DA] Improve alias checking in dependence analysis Improve the alias analysis to account for cases where we know that src/dst pairs cannot alias due to things like TBAA. As we know they are noalias, we know no dependency can occur. Also fixes issues around the size parameter to AA being incorrect. Differential Revision: https://reviews.llvm.org/D42381 llvm-svn: 329692	2018-04-10 11:37:21 +00:00
Simon Pilgrim	8a8ff4f6d4	[CostModel][X86] Regenerate vector reduction cost tests with update_analyze_test_checks.py NOTE: We're only really interested in the extractelement cost (which represents the entire reduction). llvm-svn: 329504	2018-04-07 14:20:10 +00:00
Simon Pilgrim	7bd5ff8b4a	[CostModel][X86] Regenerate vector select cost tests with update_analyze_test_checks.py llvm-svn: 329502	2018-04-07 14:09:54 +00:00
Simon Pilgrim	495b660269	[CostModel][X86] Regenerate vector integer truncation cost tests with update_analyze_test_checks.py llvm-svn: 329500	2018-04-07 14:05:35 +00:00
Simon Pilgrim	84d8498fc5	[CostModel][X86] Regenerate silvermont (and added goldmont) cost tests with update_analyze_test_checks.py llvm-svn: 329499	2018-04-07 14:02:14 +00:00
Simon Pilgrim	80ce1dde44	[CostModel][X86] Fix v32i16/v64i8 SETCC costs on AVX512BW targets llvm-svn: 329498	2018-04-07 13:24:33 +00:00
Simon Pilgrim	a49a1b9ccc	[CostModel][X86] Regenerate vector comparison cost tests with update_analyze_test_checks.py llvm-svn: 329497	2018-04-07 12:47:35 +00:00
Simon Pilgrim	61f704e4bd	[CostModel][X86] Regenerate bit count cost tests with update_analyze_test_checks.py llvm-svn: 329413	2018-04-06 16:14:27 +00:00
Simon Pilgrim	63ae5579e7	[CostModel][X86] Regenerate vector shuffle cost tests with update_analyze_test_checks.py llvm-svn: 329410	2018-04-06 16:00:28 +00:00
Simon Pilgrim	d55ad63bfe	[CostModel][X86] Regenerate bswap/bitreverse cost tests with update_analyze_test_checks.py llvm-svn: 329407	2018-04-06 15:46:26 +00:00
Simon Pilgrim	74402acb00	[CostModel][X86] Regenerate integer extension/truncation cost tests with update_analyze_test_checks.py llvm-svn: 329402	2018-04-06 15:28:26 +00:00
Simon Pilgrim	06fba8b204	[CostModel][X86] Regenerate integer division/remainder tests with update_analyze_test_checks.py llvm-svn: 329401	2018-04-06 15:23:26 +00:00
Simon Pilgrim	60fc843fc6	[CostModel][X86] Regenerate vector shift cost tests with update_analyze_test_checks.py llvm-svn: 329400	2018-04-06 15:14:34 +00:00
Simon Pilgrim	ad768585ff	[CostModel][X86] Regenerate int<->fp cost tests with update_analyze_test_checks.py llvm-svn: 329398	2018-04-06 15:12:36 +00:00
Simon Pilgrim	5334a2c571	[UpdateTestChecks] Add update_analyze_test_checks.py for cost model analysis generation The script allows the auto-generation of checks for cost model tests to speed up their creation and help improve coverage, which will help a lot with PR36550. If the need arises we can add support for other analyze passes as well, but the cost models was the one I needed to get done - at the moment it just warns that any other analysis mode is unsupported. I've regenerated a couple of x86 test files to show the effect. Differential Revision: https://reviews.llvm.org/D45272 llvm-svn: 329390	2018-04-06 12:36:27 +00:00
Simon Pilgrim	d152d55ab2	[X86][CostModel] Use generic SSE levels instead of particular CPUs for shuffle costs llvm-svn: 329168	2018-04-04 11:14:12 +00:00
Nicolai Haehnle	2f5a73820c	AMDGPU: Dimension-aware image intrinsics Summary: These new image intrinsics contain the texture type as part of their name and have each component of the address/coordinate as individual parameters. This is a preparatory step for implementing the A16 feature, where coordinates are passed as half-floats or -ints, but the Z compare value and texel offsets are still full dwords, making it difficult or impossible to distinguish between A16 on or off in the old-style intrinsics. Additionally, these intrinsics pass the 'texfailpolicy' and 'cachectrl' as i32 bit fields to reduce operand clutter and allow for future extensibility. v2: - gather4 supports 2darray images - fix a bug with 1D images on SI Change-Id: I099f309e0a394082a5901ea196c3967afb867f04 Reviewers: arsenm, rampitec, b-sumner Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D44939 llvm-svn: 329166	2018-04-04 10:58:54 +00:00
Max Kazantsev	c01e47b43f	[SCEV] Make computeExitLimit more simple and more powerful Current implementation of `computeExitLimit` has a big piece of code the only purpose of which is to prove that after the execution of this block the latch will be executed. What it currently checks is actually a subset of situations where the exiting block dominates latch. This patch replaces all these checks for simple particular cases with domination check over loop's latch which is the only necessary condition of taking the exiting block into consideration. This change allows to calculate exact loop taken count for simple loops like for (int i = 0; i < 100; i++) { if (cond) {...} else {...} if (i > 50) break; . . . } Differential Revision: https://reviews.llvm.org/D44677 Reviewed By: efriedma llvm-svn: 329047	2018-04-03 05:57:19 +00:00
George Burgess IV	3588fd4865	[MemorySSA] Consider callsite args for hashing and equality. We use a `DenseMap<MemoryLocOrCall, MemlocStackInfo>` to keep track of prior work when optimizing uses in MemorySSA. Because we weren't accounting for callsite arguments in either the hash code or equality tests for `MemoryLocOrCall`s, we optimized uses too aggressively in some rare cases. Fix by Daniel Berlin. Should fix PR36883. llvm-svn: 328748	2018-03-29 00:54:39 +00:00
Max Kazantsev	7094c8deb2	[SCEV] Make exact taken count calculation more optimistic Currently, `getExact` fails if it sees two exit counts in different blocks. There is no solid reason to do so, given that we only calculate exact non-taken count for exiting blocks that dominate latch. Using this fact, we can simply take min out of all exits of all blocks to get the exact taken count. This patch makes the calculation more optimistic with enforcing our assumption with asserts. It allows us to calculate exact backedge taken count in trivial loops like for (int i = 0; i < 100; i++) { if (i > 50) break; . . . } Differential Revision: https://reviews.llvm.org/D44676 Reviewed By: fhahn llvm-svn: 328611	2018-03-27 07:30:38 +00:00
Craig Topper	a985919d3e	[X86] Update cost model for Goldmont. Add fsqrt costs for Silvermont Add fdiv costs for Goldmont using table 16-17 of the Intel Optimization Manual. Also add overrides for FSQRT for Goldmont and Silvermont. Reviewers: RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D44644 llvm-svn: 328451	2018-03-25 15:58:12 +00:00
Philip Reames	37a1a29fcb	[MustExecute] Shwo the effect of using full loop info variant Most basic possible test for the logic used by LICM. Also contains a speculative build fix for compiles which complain about a definition of a stuct K; followed by a declaration as class K; llvm-svn: 328058	2018-03-20 23:00:54 +00:00
Philip Reames	164b1b4e21	[MustExecute] Add simplest possible test for LoopSafetyOnfo (Currently showing without, will enable and check in diff to show impact) llvm-svn: 328056	2018-03-20 22:55:20 +00:00
Philip Reames	ce998adf0a	[MustExecute] Use the annotation style printer As suggested in the original review (https://reviews.llvm.org/D44524), use an annotation style printer instead. Note: The switch from -analyze to -disable-output in tests was driven by the fact that seems to be the idiomatic style used in annoation passes. I tried to keep both working, but the old style pass API for printers really doesn't make this easy. It invokes (runOnFunction, print(Module)) repeatedly. I decided the extra state wasn't worth it given the old pass manager is going away soonish anyway. llvm-svn: 328015	2018-03-20 18:43:44 +00:00
Philip Reames	89f2241770	Add an analysis printer for must execute reasoning Many of our loop passes make use of so called "must execute" or "guaranteed to execute" facts to prove the legality of code motion. The basic notion is that we know (by assumption) an instruction didn't fault at it's original location, so if the location we move it to is strictly post dominated by the original, then we can't have introduced a new fault. At the moment, the testing for this logic is somewhat adhoc and done mostly through LICM. Since I'm working on that code, I want to improve the testing. This patch is the first step in that direction. It doesn't actually test the variant used by the loop passes - I need to move that to the Analysis library first - but instead exercises an alternate implementation used by SCEV. (I plan on merging both implementations.) Note: I'll be replacing the printing logic within this with an annotation based version in the near future. Anna suggested this in review, and it seems like a strictly better format. Differential Revision: https://reviews.llvm.org/D44524 llvm-svn: 328004	2018-03-20 17:09:21 +00:00
Serguei Katkov	529f42331e	[SCEV] Re-land: Fix isKnownPredicate This is re-land of https://reviews.llvm.org/rL327362 with a fix and regression test. The crash was due to it is possible that for found MDL loop, LHS or RHS may contain an invariant unknown SCEV which does not dominate the MDL. Please see regression test for an example. Reviewers: sanjoy, mkazantsev, reames Reviewed By: mkazantsev Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D44553 llvm-svn: 327822	2018-03-19 06:35:30 +00:00
Matthew Simpson	eacfefd056	[AArch64] Implement getArithmeticReductionCost This patch provides an implementation of getArithmeticReductionCost for AArch64. We can specialize the cost of add reductions since they are computed using the 'addv' instruction. Differential Revision: https://reviews.llvm.org/D44490 llvm-svn: 327702	2018-03-16 11:34:15 +00:00
Matthew Simpson	883e96c9d4	[TTI, AArch64] Allow the cost model analysis to test vector reduce intrinsics This patch considers the experimental vector reduce intrinsics in the default implementation of getIntrinsicInstrCost. The cost of these intrinsics is computed with getArithmeticReductionCost and getMinMaxReductionCost. This patch also adds a test case for AArch64 that indicates the costs we currently compute for vector reduce intrinsics. These costs are inaccurate and will be updated in a follow-on patch. Differential Revision: https://reviews.llvm.org/D44489 llvm-svn: 327698	2018-03-16 10:00:30 +00:00
Matthew Simpson	c1c4ad6e64	[ConstantFolding, InstSimplify] Handle more vector GEPs This patch addresses some additional cases where the compiler crashes upon encountering vector GEPs. This should fix PR36116. Differential Revision: https://reviews.llvm.org/D44219 Reference: https://bugs.llvm.org/show_bug.cgi?id=36116 llvm-svn: 327638	2018-03-15 16:00:29 +00:00
Evandro Menezes	f9bd871d32	[AArch64] Adjust the cost of integer vector division Since there is no instruction for integer vector division, factor in the cost of singling out each element to be used with the scalar division instruction. Differential revision: https://reviews.llvm.org/D43974 llvm-svn: 326955	2018-03-07 22:35:32 +00:00
Sebastian Pop	bf6e1c26cf	DA: remove uses of GEP, only ask SCEV It's been quite some time the Dependence Analysis (DA) is broken, as it uses the GEP representation to "identify" multi-dimensional arrays. It even wrongly detects multi-dimensional arrays in single nested loops: from test/Analysis/DependenceAnalysis/Coupled.ll, example @couple6 ;; for (long int i = 0; i < 50; i++) { ;; A[i][3i - 6] = i; ;; B++ = A[i][i]; DA used to detect two subscripts, which makes no sense in the LLVM IR or in C/C++ semantics, as there are no guarantees as in Fortran of subscripts not overlapping into a next array dimension: maximum nesting levels = 1 SrcPtrSCEV = %A DstPtrSCEV = %A using GEPs subscript 0 src = {0,+,1}<nuw><nsw><%for.body> dst = {0,+,1}<nuw><nsw><%for.body> class = 1 loops = {1} subscript 1 src = {-6,+,3}<nsw><%for.body> dst = {0,+,1}<nuw><nsw><%for.body> class = 1 loops = {1} Separable = {} Coupled = {1} With the current patch, DA will correctly work on only one dimension: maximum nesting levels = 1 SrcSCEV = {(-2424 + %A)<nsw>,+,1212}<%for.body> DstSCEV = {%A,+,404}<%for.body> subscript 0 src = {(-2424 + %A)<nsw>,+,1212}<%for.body> dst = {%A,+,404}<%for.body> class = 1 loops = {1} Separable = {0} Coupled = {} This change removes all uses of GEP from DA, and we now only rely on the SCEV representation. The patch does not turn on -da-delinearize by default, and so the DA analysis will be more conservative in the case of multi-dimensional memory accesses in nested loops. I disabled some interchange tests, as the DA is not able to disambiguate the dependence anymore. To make DA stronger, we may need to compute a bound on the number of iterations based on the access functions and array dimensions. The patch cleans up all the CHECKs in test/Transforms/LoopInterchange/*.ll to avoid checking for snippets of LLVM IR: this form of checking is very hard to maintain. Instead, we now check for output of the pass that are more meaningful than dozens of lines of LLVM IR. Some tests now require -debug messages and thus only enabled with asserts. Patch written by Sebastian Pop and Aditya Kumar. Differential Revision: https://reviews.llvm.org/D35430 llvm-svn: 326837	2018-03-06 21:55:59 +00:00
Simon Pilgrim	74ff6ff437	[X86] Add silvermont fp arithmetic cost model tests Add silvermont to existing high coverage tests instead of repeating in slm-arith-costs.ll llvm-svn: 326747	2018-03-05 22:13:22 +00:00
Max Kazantsev	f8d2969abb	[SCEV] Smart range calculation for SCEVUnknown Phis The range of SCEVUnknown Phi which merges values `X1, X2, ..., XN` can be evaluated as `U(Range(X1), Range(X2), ..., Range(XN))`. Reviewed By: sanjoy Differential Revision: https://reviews.llvm.org/D43810 llvm-svn: 326418	2018-03-01 06:56:48 +00:00
Simon Pilgrim	9929f90740	[X86][SSE] Reduce FADD/FSUB/FMUL costs on later targets (PR36280) Agner's tables indicate that for SSE42+ targets (Core2 and later) we can reduce the FADD/FSUB/FMUL costs down to 1, which should fix the Himeno benchmark. Note: the AVX512 FDIV costs look rather dodgy, but this isn't part of this patch. Differential Revision: https://reviews.llvm.org/D43733 llvm-svn: 326133	2018-02-26 22:10:17 +00:00
John Brawn	29bbed3613	[BPI] Detect branches in loops that make themselves not taken If we have a loop like this: int n = 0; while (...) { if (++n >= MAX) { n = 0; } } then the body of the 'if' statement will only be executed once every MAX iterations. Detect this by looking for branches in loops where taking the branch makes the branch condition evaluate to 'not taken' in the next iteration of the loop, and reduce the probability of such branches. This slightly improves EEMBC benchmarks on cortex-m4/cortex-m33 due to making better choices in if-conversion, but has no effect on any other cpu/benchmark that I could detect. Differential Revision: https://reviews.llvm.org/D35804 llvm-svn: 325925	2018-02-23 17:17:31 +00:00
Sanjay Patel	e6143904b9	revert r325515: [TTI CostModel] change default cost of FP ops to 1 (PR36280) There are too many perf regressions resulting from this, so we need to investigate (and add tests for) targets like ARM and AArch64 before trying to reinstate. llvm-svn: 325658	2018-02-21 01:42:52 +00:00
Sanjay Patel	3e8a76abfd	[TTI CostModel] change default cost of FP ops to 1 (PR36280) This change was mentioned at least as far back as: https://bugs.llvm.org/show_bug.cgi?id=26837#c26 ...and I found a real program that is harmed by this: Himeno running on AMD Jaguar gets 6% slower with SLP vectorization: https://bugs.llvm.org/show_bug.cgi?id=36280 ...but the change here appears to solve that bug only accidentally. The div/rem costs for x86 look very wrong in some cases, but that's already true, so we can fix those in follow-up patches. There's also evidence that more cost model changes are needed to solve SLP problems as shown in D42981, but that's an independent problem (though the solution may be adjusted after this change is made). Differential Revision: https://reviews.llvm.org/D43079 llvm-svn: 325515	2018-02-19 16:11:44 +00:00
Max Kazantsev	db3a9e0cfe	[SCEV] Make getPostIncExpr guaranteed to return AddRec The current implementation of `getPostIncExpr` invokes `getAddExpr` for two recurrencies and expects that it always returns it a recurrency. But this is not guaranteed to happen if we have reached max recursion depth or refused to make SCEV simplification for other reasons. This patch changes its implementation so that now it always returns SCEVAddRec without relying on `getAddExpr`. Differential Revision: https://reviews.llvm.org/D42953 llvm-svn: 324866	2018-02-12 05:09:38 +00:00
Simon Pilgrim	cb9a02f60e	[X86][SSE] Increase PMULLD costs to better match hardware Until Skylake, most hardware could only issue a PMULLD op every other cycle llvm-svn: 324823	2018-02-10 19:27:10 +00:00
Yaxun Liu	2a22c5deff	[AMDGPU] Switch to the new addr space mapping by default This requires corresponding clang change. Differential Revision: https://reviews.llvm.org/D40955 llvm-svn: 324101	2018-02-02 16:07:16 +00:00
Ivan A. Kosarev	ab68bbe515	[Analysis] Support aggregate access types in TBAA This patch implements analysis for new-format TBAA access tags with aggregate types as their final access types. Differential Revision: https://reviews.llvm.org/D41501 llvm-svn: 324092	2018-02-02 14:09:22 +00:00
Daniel Neilson	147810d28a	[Lint] Upgrade uses of MemoryIntrinic::getAlignment() to new API. (NFCI) Summary: This change is part of step five in the series of changes to remove alignment argument from memcpy/memmove/memset in favour of alignment attributes. In particular, this changes the Lint analysis to cease using the old getAlignment() API of MemoryIntrinsic in favour of getting source & dest specific alignments through the new API. Steps: Step 1) Remove alignment parameter and create alignment parameter attributes for memcpy/memmove/memset. ( rL322965, rC322964, rL322963 ) Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing source and dest alignments. ( rL323597 ) Step 3) Update Clang to use the new IRBuilder API. ( rC323617 ) Step 4) Update Polly to use the new IRBuilder API. ( rL323618 ) Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API, and those that use use MemIntrinsicInst::[get\|set]Alignment() to use [get\|set]DestAlignment() and [get\|set]SourceAlignment() instead. Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the MemIntrinsicInst::[get\|set]Alignment() methods. Reference http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.html http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html llvm-svn: 323886	2018-01-31 16:42:15 +00:00
Sanjay Patel	1d91ec34b2	[ValueTracking] add recursion depth param to matchSelectPattern We're getting bug reports: https://bugs.llvm.org/show_bug.cgi?id=35807 https://bugs.llvm.org/show_bug.cgi?id=35840 https://bugs.llvm.org/show_bug.cgi?id=36045 ...where we blow up the stack in value tracking because other passes are sending in selects that have an operand that is itself the select. We don't currently have a reliable way to avoid analyzing dead code that may take non-standard forms, so bail out when things go too far. This mimics the recursion depth limitations in other parts of value tracking. Unfortunately, this pushes the underlying problems for other passes (jump-threading, simplifycfg, correlated-propagation) into hiding. If someone wants to uncover those again, the first draft of this patch on Phab would do that (it would assert rather than bail out). Differential Revision: https://reviews.llvm.org/D42442 llvm-svn: 323331	2018-01-24 15:20:37 +00:00
Daniel Neilson	1e68724d24	Remove alignment argument from memcpy/memmove/memset in favour of alignment attributes (Step 1) Summary: This is a resurrection of work first proposed and discussed in Aug 2015: http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.html and initially landed (but then backed out) in Nov 2015: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html The @llvm.memcpy/memmove/memset intrinsics currently have an explicit argument which is required to be a constant integer. It represents the alignment of the dest (and source), and so must be the minimum of the actual alignment of the two. This change is the first in a series that allows source and dest to each have their own alignments by using the alignment attribute on their arguments. In this change we: 1) Remove the alignment argument. 2) Add alignment attributes to the source & dest arguments. We, temporarily, require that the alignments for source & dest be equal. For example, code which used to read: call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 100, i32 4, i1 false) will now read call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 4 %dest, i8* align 4 %src, i32 100, i1 false) Downstream users may have to update their lit tests that check for @llvm.memcpy/memmove/memset call/declaration patterns. The following extended sed script may help with updating the majority of your tests, but it does not catch all possible patterns so some manual checking and updating will be required. s~declare void @llvm\.mem(set\|cpy\|move)\.p([^(])\((.), i32, i1\)~declare void @llvm.mem\1.p\2(\3, i1)~g s~call void @llvm\.memset\.p([^(])i8\(i8([^])\ (.), i8 (.), i8 (.), i32 [01], i1 ([^)])\)~call void @llvm.memset.p\1i8(i8\2* \3, i8 \4, i8 \5, i1 \6)~g s~call void @llvm\.memset\.p([^(])i16\(i8([^])\ (.), i8 (.), i16 (.), i32 [01], i1 ([^)])\)~call void @llvm.memset.p\1i16(i8\2* \3, i8 \4, i16 \5, i1 \6)~g s~call void @llvm\.memset\.p([^(])i32\(i8([^])\ (.), i8 (.), i32 (.), i32 [01], i1 ([^)])\)~call void @llvm.memset.p\1i32(i8\2* \3, i8 \4, i32 \5, i1 \6)~g s~call void @llvm\.memset\.p([^(])i64\(i8([^])\ (.), i8 (.), i64 (.), i32 [01], i1 ([^)])\)~call void @llvm.memset.p\1i64(i8\2* \3, i8 \4, i64 \5, i1 \6)~g s~call void @llvm\.memset\.p([^(])i128\(i8([^])\ (.), i8 (.), i128 (.), i32 [01], i1 ([^)])\)~call void @llvm.memset.p\1i128(i8\2* \3, i8 \4, i128 \5, i1 \6)~g s~call void @llvm\.memset\.p([^(])i8\(i8([^])\ (.), i8 (.), i8 (.), i32 ([0-9]), i1 ([^)])\)~call void @llvm.memset.p\1i8(i8\2 align \6 \3, i8 \4, i8 \5, i1 \7)~g s~call void @llvm\.memset\.p([^(])i16\(i8([^])\ (.), i8 (.), i16 (.), i32 ([0-9]), i1 ([^)])\)~call void @llvm.memset.p\1i16(i8\2 align \6 \3, i8 \4, i16 \5, i1 \7)~g s~call void @llvm\.memset\.p([^(])i32\(i8([^])\ (.), i8 (.), i32 (.), i32 ([0-9]), i1 ([^)])\)~call void @llvm.memset.p\1i32(i8\2 align \6 \3, i8 \4, i32 \5, i1 \7)~g s~call void @llvm\.memset\.p([^(])i64\(i8([^])\ (.), i8 (.), i64 (.), i32 ([0-9]), i1 ([^)])\)~call void @llvm.memset.p\1i64(i8\2 align \6 \3, i8 \4, i64 \5, i1 \7)~g s~call void @llvm\.memset\.p([^(])i128\(i8([^])\ (.), i8 (.), i128 (.), i32 ([0-9]), i1 ([^)])\)~call void @llvm.memset.p\1i128(i8\2 align \6 \3, i8 \4, i128 \5, i1 \7)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i8\(i8([^])\ (.), i8([^])\ (.), i8 (.), i32 [01], i1 ([^)])\)~call void @llvm.mem\1.p\2i8(i8\3 \4, i8\5* \6, i8 \7, i1 \8)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i16\(i8([^])\ (.), i8([^])\ (.), i16 (.), i32 [01], i1 ([^)])\)~call void @llvm.mem\1.p\2i16(i8\3 \4, i8\5* \6, i16 \7, i1 \8)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i32\(i8([^])\ (.), i8([^])\ (.), i32 (.), i32 [01], i1 ([^)])\)~call void @llvm.mem\1.p\2i32(i8\3 \4, i8\5* \6, i32 \7, i1 \8)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i64\(i8([^])\ (.), i8([^])\ (.), i64 (.), i32 [01], i1 ([^)])\)~call void @llvm.mem\1.p\2i64(i8\3 \4, i8\5* \6, i64 \7, i1 \8)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i128\(i8([^])\ (.), i8([^])\ (.), i128 (.), i32 [01], i1 ([^)])\)~call void @llvm.mem\1.p\2i128(i8\3 \4, i8\5* \6, i128 \7, i1 \8)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i8\(i8([^])\ (.), i8([^])\ (.), i8 (.), i32 ([0-9]), i1 ([^)])\)~call void @llvm.mem\1.p\2i8(i8\3* align \8 \4, i8\5* align \8 \6, i8 \7, i1 \9)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i16\(i8([^])\ (.), i8([^])\ (.), i16 (.), i32 ([0-9]), i1 ([^)])\)~call void @llvm.mem\1.p\2i16(i8\3* align \8 \4, i8\5* align \8 \6, i16 \7, i1 \9)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i32\(i8([^])\ (.), i8([^])\ (.), i32 (.), i32 ([0-9]), i1 ([^)])\)~call void @llvm.mem\1.p\2i32(i8\3* align \8 \4, i8\5* align \8 \6, i32 \7, i1 \9)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i64\(i8([^])\ (.), i8([^])\ (.), i64 (.), i32 ([0-9]), i1 ([^)])\)~call void @llvm.mem\1.p\2i64(i8\3* align \8 \4, i8\5* align \8 \6, i64 \7, i1 \9)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i128\(i8([^])\ (.), i8([^])\ (.), i128 (.), i32 ([0-9]), i1 ([^)])\)~call void @llvm.mem\1.p\2i128(i8\3* align \8 \4, i8\5* align \8 \6, i128 \7, i1 \9)~g The remaining changes in the series will: Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing source and dest alignments. Step 3) Update Clang to use the new IRBuilder API. Step 4) Update Polly to use the new IRBuilder API. Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API, and those that use use MemIntrinsicInst::[get\|set]Alignment() to use getDestAlignment() and getSourceAlignment() instead. Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the MemIntrinsicInst::[get\|set]Alignment() methods. Reviewers: pete, hfinkel, lhames, reames, bollu Reviewed By: reames Subscribers: niosHD, reames, jholewinski, qcolombet, jfb, sanjoy, arsenm, dschuff, dylanmckay, mehdi_amini, sdardis, nemanjai, david2050, nhaehnle, javed.absar, sbc100, jgravelle-google, eraman, aheejin, kbarton, JDevlieghere, asb, rbar, johnrusso, simoncook, jordy.potman.lists, apazos, sabuasal, llvm-commits Differential Revision: https://reviews.llvm.org/D41675 llvm-svn: 322965	2018-01-19 17:13:12 +00:00
Mikael Holmen	4653b1a4f1	[GlobalsAA] Don't let dbg intrinsics affect analysis result Summary: This fixes PR35899. Debug info intrinsics shouldn't affect code generation so ignore them in GlobalsAA. Reviewers: hfinkel, aprantl Reviewed By: aprantl Subscribers: aprantl, llvm-commits Differential Revision: https://reviews.llvm.org/D41984 llvm-svn: 322470	2018-01-15 07:05:51 +00:00
Davide Italiano	7ccd4619e4	[BasicAA] Stop crashing when dealing with pointers > 64 bits. An alternative (and probably better) fix would be that of making `Scale` an APInt, and there's a patch floating around to do this. As we're still discussing it, at least stop crashing in the meanwhile (added bonus, we now have a regression test for this situation). Fixes PR35843. Thanks to Eli for suggesting the fix and Simon for reporting and reducing the bug. llvm-svn: 322467	2018-01-15 01:40:18 +00:00
Brian M. Rzycki	9b7ae23256	[JumpThreading] Preservation of DT and LVI across the pass Summary: See D37528 for a previous (non-deferred) version of this patch and its description. Preserves dominance in a deferred manner using a new class DeferredDominance. This reduces the performance impact of updating the DominatorTree at every edge insertion and deletion. A user may call DDT->flush() within JumpThreading for an up-to-date DT. This patch currently has one flush() at the end of runImpl() to ensure DT is preserved across the pass. LVI is also preserved to help subsequent passes such as CorrelatedValuePropagation. LVI is simpler to maintain and is done immediately (not deferred). The code to perform the preversation was minimally altered and simply marked as preserved for the PassManager to be informed. This extends the analysis available to JumpThreading for future enhancements such as threading across loop headers. Reviewers: dberlin, kuhar, sebpop Reviewed By: kuhar, sebpop Subscribers: mgorny, dmgreen, kuba, rnk, rsmith, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D40146 llvm-svn: 322401	2018-01-12 21:06:48 +00:00
Craig Topper	d58c165545	[X86] Make v2i1 and v4i1 legal types without VLX Summary: There are few oddities that occur due to v1i1, v8i1, v16i1 being legal without v2i1 and v4i1 being legal when we don't have VLX. Particularly during legalization of v2i32/v4i32/v2i64/v4i64 masked gather/scatter/load/store. We end up promoting the mask argument to these during type legalization and then have to widen the promoted type to v8iX/v16iX and truncate it to get the element size back down to v8i1/v16i1 to use a 512-bit operation. Since need to fill the upper bits of the mask we have to fill with 0s at the promoted type. It would be better if we could just have the v2i1/v4i1 types as legal so they don't undergo any promotion. Then we can just widen with 0s directly in a k register. There are no real v4i1/v2i1 instructions anyway. Everything is done on a larger register anyway. This also fixes an issue that we couldn't implement a masked vextractf32x4 from zmm to xmm properly. We now have to support widening more compares to 512-bit to get a mask result out so new tablegen patterns got added. I had to hack the legalizer for widening the operand of a setcc a bit so it didn't try create a setcc returning v4i32, extract from it, then try to promote it using a sign extend to v2i1. Now we create the setcc with v4i1 if the original setcc's result type is v2i1. Then extract that and don't sign extend it at all. There's definitely room for improvement with some follow up patches. Reviewers: RKSimon, zvi, guyblank Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D41560 llvm-svn: 321967	2018-01-07 18:20:37 +00:00
Davide Italiano	554f68be44	[BasicAA] Fix linearization of shifts beyond the bitwidth. Thanks to Simon Pilgrim for the reduced testcase. Fixes PR35821. llvm-svn: 321873	2018-01-05 16:18:47 +00:00
Reid Kleckner	cd78ddc119	Revert "[JumpThreading] Preservation of DT and LVI across the pass" This reverts r321825, it causes crashes in Chromium. Reproducer forthcoming. llvm-svn: 321832	2018-01-04 23:23:46 +00:00
Brian M. Rzycki	cdad6c0b60	[JumpThreading] Preservation of DT and LVI across the pass Summary: See D37528 for a previous (non-deferred) version of this patch and its description. Preserves dominance in a deferred manner using a new class DeferredDominance. This reduces the performance impact of updating the DominatorTree at every edge insertion and deletion. A user may call DDT->flush() within JumpThreading for an up-to-date DT. This patch currently has one flush() at the end of runImpl() to ensure DT is preserved across the pass. LVI is also preserved to help subsequent passes such as CorrelatedValuePropagation. LVI is simpler to maintain and is done immediately (not deferred). The code to perfom the preversation was minimally altered and was simply marked as preserved for the PassManager to be informed. This extends the analysis available to JumpThreading for future enhancements. One example is loop boundary threading. Reviewers: dberlin, kuhar, sebpop Reviewed By: kuhar, sebpop Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D40146 llvm-svn: 321825	2018-01-04 21:57:32 +00:00
Nikolai Bozhenov	4d5e34b221	[ValueTracking] Adding missed lit-test for commit r316208 Reviewers: reames, hfinkel Differential Revision: https://reviews.llvm.org/D34101 Patch by: Olga Chupina <olga.chupina@intel.com> llvm-svn: 321792	2018-01-04 10:02:50 +00:00
Benjamin Kramer	0af3be4560	Fix tests after move to utohexstr. llvm-svn: 321527	2017-12-28 17:00:37 +00:00
Mikael Holmen	fb2fd20f50	[Lint] Don't warn about noalias argument aliasing if other argument is byval Summary: When using byval, the data is effectively copied as part of the call anyway, so we aren't actually passing the pointer and thus there is no reason to issue a warning. Reviewers: rnk Reviewed By: rnk Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D40118 llvm-svn: 321478	2017-12-27 08:48:33 +00:00
Serguei Katkov	edf3c8292b	[SCEV] Do not insert if it is already in cache This is fix for the crash caused by ScalarEvolution::getTruncateExpr. It expects that if it checked the condition that SCEV is not in UniqueSCEVs cache in the beginning that it will not be there inside this method. However during recursion and transformation/simplification for sub expression, it is possible that these modifications will end up with the same SCEV as we started from. So we must always check whether SCEV is in cache and do not insert item if it is already there. Reviewers: sanjoy, mkazantsev, craig.topper Reviewed By: sanjoy Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D41380 llvm-svn: 321472	2017-12-27 07:15:23 +00:00
Alina Sbirlea	ca741a87f2	[MemorySSA] Allow reordering of loads that alias in the presence of volatile loads. Summary: Make MemorySSA allow reordering of two loads that may alias, when one is volatile. This makes MemorySSA less conservative and behaving the same as the AliasSetTracker. For more context, see D16875. LLVM language reference: "The optimizers must not change the number of volatile operations or change their order of execution relative to other volatile operations. The optimizers may change the order of volatile operations relative to non-volatile operations. This is not Java’s “volatile” and has no cross-thread synchronization behavior." Reviewers: george.burgess.iv, dberlin Subscribers: sanjoy, reames, hfinkel, llvm-commits, Prazek Differential Revision: https://reviews.llvm.org/D41525 llvm-svn: 321382	2017-12-22 19:54:03 +00:00
Alina Sbirlea	50db8a2086	[ModRefInfo] Add must alias info to ModRefInfo. Summary: Add an additional bit to ModRefInfo, ModRefInfo::Must, to be cleared for known must aliases. Shift existing Mod/Ref/ModRef values to include an additional most significant bit. Update wrappers that modify ModRefInfo values to reflect the change. Notes: * ModRefInfo::Must is almost entirely cleared in the AAResults methods, the remaining changes are trying to preserve it. * Only some small changes to make custom AA passes set ModRefInfo::Must (BasicAA). * GlobalsModRef already declares a bit, who's meaning overlaps with the most significant bit in ModRefInfo (MayReadAnyGlobal). No changes to shift the value of MayReadAnyGlobal (see AlignedMap). FunctionInfo.getModRef() ajusts most significant bit so correctness is preserved, but the Must info is lost. * There are cases where the ModRefInfo::Must is not set, e.g. 2 calls that only read will return ModRefInfo::NoModRef, though they may read from exactly the same location. Reviewers: dberlin, hfinkel, george.burgess.iv Subscribers: llvm-commits, sanjoy Differential Revision: https://reviews.llvm.org/D38862 llvm-svn: 321309	2017-12-21 21:41:53 +00:00
Bjorn Steinbrink	030123e8e8	Give up on array allocas in getPointerDereferenceableBytes Summary: As suggested by Eli Friedman, don't try to handle array allocas here, because of possible overflows, instead rely on instcombine converting them to allocations of array types. Reviewers: efriedma Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D41398 llvm-svn: 321159	2017-12-20 10:01:30 +00:00
Bjorn Steinbrink	2da4d9d86d	Treat sret arguments as being dereferenceable in getPointerDereferenceableBytes() Reviewers: rnk, hfinkel, efriedma Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D41355 llvm-svn: 321061	2017-12-19 08:46:46 +00:00
Max Kazantsev	1acab00229	[LVI] Support for ashr in LVI Enhance LVI to analyze the ‘ashr’ binary operation. This leverages the infrastructure in ConstantRange for the ashr operation. Patch by Surya Kumari Jangala! Differential Revision: https://reviews.llvm.org/D40886 llvm-svn: 320983	2017-12-18 14:23:30 +00:00
Craig Topper	7034d401f8	[X86] Use mattr instead of mcpu in some of the cost model tests. Based on the names of the check lines, features seems more appropriate that cpu. Spotted while prototyping my patch to make 512-bit vectors illegal on SKX sometimes. llvm-svn: 320959	2017-12-18 07:21:58 +00:00
Bjorn Steinbrink	3603de2fa2	Re-commit "Properly handle multi-element and dynamically sized allocas in getPointerDereferenceableBytes()"" llvm-clang-x86_64-expensive-checks-win is still broken, so the failure seems unrelated. llvm-svn: 320953	2017-12-17 21:20:16 +00:00
Bjorn Steinbrink	6f7bbf349f	Revert "Properly handle multi-element and dynamically sized allocas in getPointerDereferenceableBytes()" This reverts commit 217067d5179882de9deb60d2e866befea4c126e7. Fails on llvm-clang-x86_64-expensive-checks-win llvm-svn: 320945	2017-12-17 15:16:58 +00:00
Bjorn Steinbrink	c27f81b92b	Properly handle byval arguments in getPointerDereferenceableBytes() Summary: For byval arguments, the number of dereferenceable bytes is equal to the size of the pointee, not the pointer. Reviewers: hfinkel, rnk Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D41305 llvm-svn: 320939	2017-12-17 02:37:42 +00:00
Bjorn Steinbrink	5d86532467	Properly handle multi-element and dynamically sized allocas in getPointerDereferenceableBytes() Reviewers: hfinkel, rnk Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D41288 llvm-svn: 320938	2017-12-17 01:54:25 +00:00
Brian M. Rzycki	580bc3c8fa	Reverting [JumpThreading] Preservation of DT and LVI across the pass Stage 2 bootstrap failed: http://lab.llvm.org:8011/builders/clang-x86_64-linux-selfhost-modules-2/builds/14434 llvm-svn: 320641	2017-12-13 22:01:17 +00:00
Brian M. Rzycki	d989af98b3	[JumpThreading] Preservation of DT and LVI across the pass Summary: See D37528 for a previous (non-deferred) version of this patch and its description. Preserves dominance in a deferred manner using a new class DeferredDominance. This reduces the performance impact of updating the DominatorTree at every edge insertion and deletion. A user may call DDT->flush() within JumpThreading for an up-to-date DT. This patch currently has one flush() at the end of runImpl() to ensure DT is preserved across the pass. LVI is also preserved to help subsequent passes such as CorrelatedValuePropagation. LVI is simpler to maintain and is done immediately (not deferred). The code to perfom the preversation was minimally altered and was simply marked as preserved for the PassManager to be informed. This extends the analysis available to JumpThreading for future enhancements. One example is loop boundary threading. Reviewers: dberlin, kuhar, sebpop Reviewed By: kuhar, sebpop Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D40146 llvm-svn: 320612	2017-12-13 20:52:26 +00:00
Duncan P. N. Exon Smith	9b8caf5bd7	Revert part of "Cleanup some GraphTraits iteration code" This reverts part of r300656, which caused a regression in propagateMassToSuccessors by counting edges n^2 times, where n is the number of edges from the source basic block to the same successor basic block. The result was both incorrect and very slow to compute for large values of n (e.g. switches with multiple cases that go to the same basic block). Patch by Andrew Scheidecker! llvm-svn: 320208	2017-12-08 22:42:43 +00:00
Max Kazantsev	9c08b7a053	[SCEV] Fix predicate usage in computeExitLimitFromICmp In this method, we invoke `SimplifyICmpOperands` which takes the `Cond` predicate by reference and may change it along with `LHS` and `RHS` SCEVs. But then we invoke `computeShiftCompareExitLimit` with Values from which the SCEVs have been derived, these Values have not been modified while `Cond` could be. One of possible outcomes of this is that we may falsely prove that an infinite loop ends within some finite number of iterations. In this patch, we save the original `Cond` and pass it along with original operands. This logic may be removed in future once `computeShiftCompareExitLimit` works with SCEVs instead of value operands. Reviewed By: sanjoy Differential Revision: https://reviews.llvm.org/D40953 llvm-svn: 320142	2017-12-08 12:19:45 +00:00
Craig Topper	fbf7b3bf3e	[X86] Promote fp_to_sint v16f32->v16i16/v16i8 to avoid scalarization. llvm-svn: 319266	2017-11-29 00:32:09 +00:00
Max Kazantsev	23044fa639	[SCEV] Strengthen variance condition in calculateLoopDisposition Given loops `L1` and `L2` with AddRecs `AR1` and `AR2` varying in them respectively. When identifying loop disposition of `AR2` w.r.t. `L1`, we only say that it is varying if `L1` contains `L2`. But there is also a possible situation where `L1` and `L2` are consecutive sibling loops within the parent loop. In this case, `AR2` is also varying w.r.t. `L1`, but we don't correctly identify it. It can lead, for exaple, to attempt of incorrect folding. Consider: AR1 = {a,+,b}<L1> AR2 = {c,+,d}<L2> EXAR2 = sext(AR1) MUL = mul AR1, EXAR2 If we incorrectly assume that `EXAR2` is invariant w.r.t. `L1`, we can end up trying to construct something like: `{a * {c,+,d}<L2>,+,b * {c,+,d}<L2>}<L1>`, which is incorrect because `AR2` is not available on entrance of `L1`. Both situations "`L1` contains `L2`" and "`L1` preceeds sibling loop `L2`" can be handled with one check: "header of `L1` dominates header of `L2`". This patch replaces the old insufficient check with this one. Differential Revision: https://reviews.llvm.org/D39453 llvm-svn: 318819	2017-11-22 06:21:39 +00:00
Hiroshi Yamauchi	c94d4d70d8	Add heuristics for irreducible loop metadata under PGO Summary: Add the following heuristics for irreducible loop metadata: - When an irreducible loop header is missing the loop header weight metadata, give it the minimum weight seen among other headers. - Annotate indirectbr targets with the loop header weight metadata (as they are likely to become irreducible loop headers after indirectbr tail duplication.) These greatly improve the accuracy of the block frequency info of the Python interpreter loop (eg. from ~3-16x off down to ~40-55% off) and the Python performance (eg. unpack_sequence from ~50% slower to ~8% faster than GCC) due to better register allocation under PGO. Reviewers: davidxl Reviewed By: davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D39980 llvm-svn: 318693	2017-11-20 21:03:38 +00:00
Mohammed Agabaria	115f68ea3e	[LV][X86] Support of AVX2 Gathers code generation and update the LV with this This patch depends on: https://reviews.llvm.org/D35348 Support of pattern selection of masked gathers of AVX2 (X86\AVX2 code gen) Update LoopVectorize to generate gathers for AVX2 processors. Reviewers: delena, zvi, RKSimon, craig.topper, aaboud, igorb Reviewed By: delena, RKSimon Differential Revision: https://reviews.llvm.org/D35772 llvm-svn: 318641	2017-11-20 08:18:12 +00:00
Yaxun Liu	407ca36b27	Let llvm.invariant.group.barrier accepts pointer to any address space llvm.invariant.group.barrier may accept pointers to arbitrary address space. This patch let it accept pointers to i8 in any address space and returns pointer to i8 in the same address space. Differential Revision: https://reviews.llvm.org/D39973 llvm-svn: 318413	2017-11-16 16:32:16 +00:00
Mohammed Agabaria	6e6d5326a1	[TTI][X86] update costs of interleaved load\store of i64\double This patch contains more accurate cost of interelaved load\store of stride 2 for the types int64\double on AVX2. Reviewers: delena, RKSimon, craig.topper, dorit Reviewed By: dorit Differential Revision: https://reviews.llvm.org/D40008 llvm-svn: 318385	2017-11-16 09:38:32 +00:00
Mikael Holmen	6e60297ee6	[Lint] Don't warn about passing alloca'd value to tail call if using byval Summary: This fixes PR35241. When using byval, the data is effectively copied as part of the call anyway, so the pointer returned by the alloca will not be leaked to the callee and thus there is no reason to issue a warning. Reviewers: rnk Reviewed By: rnk Subscribers: Ka-Ka, llvm-commits Differential Revision: https://reviews.llvm.org/D40009 llvm-svn: 318279	2017-11-15 07:46:48 +00:00
Hiroshi Yamauchi	69c233ac6c	Simplify irreducible loop metadata test code. Summary: Shorten the irreducible loop metadata test code by removing insignificant instructions. Reviewers: davidxl Reviewed By: davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D40043 llvm-svn: 318182	2017-11-14 19:48:59 +00:00
Jatin Bhateja	c61ade1ca0	[SCEV] Handling for ICmp occuring in the evolution chain. Summary: If a compare instruction is same or inverse of the compare in the branch of the loop latch, then return a constant evolution node. This shall facilitate computations of loop exit counts in cases where compare appears in the evolution chain of induction variables. Will fix PR 34538 Reviewers: sanjoy, hfinkel, junryoungju Reviewed By: sanjoy, junryoungju Subscribers: javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D38494 llvm-svn: 318050	2017-11-13 16:43:24 +00:00
Mohammed Agabaria	6691758364	[LV][X86] update the cost of interleaving mem. access of floats Recommit: This patch contains update of the costs of interleaved loads of v8f32 of stride 3 and 8. fixed the location of the lit test it works with make check-all. Differential Revision: https://reviews.llvm.org/D39403 llvm-svn: 317471	2017-11-06 10:56:20 +00:00
Mohammed Agabaria	acd69dbc7c	[REVERT][LV][X86] update the cost of interleaving mem. access of floats reverted my changes will be committed later after fixing the failure This patch contains update of the costs of interleaved loads of v8f32 of stride 3 and 8. Differential Revision: https://reviews.llvm.org/D39403 llvm-svn: 317433	2017-11-05 09:36:54 +00:00
Mohammed Agabaria	f74c767de6	[LV][X86] update the cost of interleaving mem. access of floats This patch contains update of the costs of interleaved loads of v8f32 of stride 3 and 8. Differential Revision: https://reviews.llvm.org/D39403 llvm-svn: 317432	2017-11-05 09:06:23 +00:00
Hiroshi Yamauchi	dce9def3dd	Irreducible loop metadata for more accurate block frequency under PGO. Summary: Currently the block frequency analysis is an approximation for irreducible loops. The new irreducible loop metadata is used to annotate the irreducible loop headers with their header weights based on the PGO profile (currently this is approximated to be evenly weighted) and to help improve the accuracy of the block frequency analysis for irreducible loops. This patch is a basic support for this. Reviewers: davidxl Reviewed By: davidxl Subscribers: mehdi_amini, llvm-commits, eraman Differential Revision: https://reviews.llvm.org/D39028 llvm-svn: 317278	2017-11-02 22:26:51 +00:00

1 2 3 4 5 ...

1575 Commits