llvm-project

Commit Graph

Author	SHA1	Message	Date
Jonas Paulsson	8ae0f88b13	[SystemZ::TTI] Return zero cost for ICmp that becomes Load And Test. A loaded value with multiple users compared with 0 will become a load and test single instruction. The load is not folded in this case (multiple users), but the compare instruction is eliminated. This patch returns 0 cost for the icmp in these cases. Review: Ulrich Weigand https://reviews.llvm.org/D55111 llvm-svn: 348141	2018-12-03 14:30:18 +00:00
Michal Gorny	014a6f930a	[test] Fix ScalarEvolution test to allow __func__ with prototype Fix ScalarEvolution/solve-quadratic.ll test to account for __func__ output listing the complete function prototype rather than just its name, as it does on NetBSD. Example Linux output: GetQuadraticEquation: addrec coeff bw: 4 GetQuadraticEquation: equation -2x^2 + -2x + -4, coeff bw: 5, multiplied by 2 Example NetBSD output: llvm::Optional<std::tuple<llvm::APInt, llvm::APInt, llvm::APInt, llvm::APInt, unsigned int> > GetQuadraticEquation(const llvm::SCEVAddRecExpr): addrec coeff bw: 4 llvm::Optional<std::tuple<llvm::APInt, llvm::APInt, llvm::APInt, llvm::APInt, unsigned int> > GetQuadraticEquation(const llvm::SCEVAddRecExpr): equation -2x^2 + -2x + -4, coeff bw: 5, multiplied by 2 Differential Revision: https://reviews.llvm.org/D55162 llvm-svn: 348096	2018-12-02 16:49:28 +00:00
Simon Pilgrim	102854f4d4	[TTI] Reduction costs only need to include a single extract element cost (REAPPLIED) We were adding the entire scalarization extraction cost for reductions, which returns the total cost of extracting every element of a vector type. For reductions we don't need to do this - we just need to extract the 0'th element after the reduction pattern has completed. Fixes PR37731 Rebased and reapplied after being reverted in rL347541 due to PR39774 - which was fixed by D54955/rL347759 and D55017/rL347997 Differential Revision: https://reviews.llvm.org/D54585 llvm-svn: 348076	2018-12-01 14:18:31 +00:00
Craig Topper	8e10e9423d	[X86] Replace '-mcpu=skx' with -mattr=avx512f or -mattr=avx512bw in interleave/strided load/store cost model tests. llvm-svn: 348056	2018-12-01 00:21:49 +00:00
Nicolai Haehnle	56d0ed2a50	[DA] GPUDivergenceAnalysis for unstructured GPU kernels Summary: This is patch #3 of the new DivergenceAnalysis <https://lists.llvm.org/pipermail/llvm-dev/2018-May/123606.html> The GPUDivergenceAnalysis is intended to eventually supersede the existing LegacyDivergenceAnalysis. The existing LegacyDivergenceAnalysis produces incorrect results on unstructured Control-Flow Graphs: <https://bugs.llvm.org/show_bug.cgi?id=37185> This patch adds the option -use-gpu-divergence-analysis to the LegacyDivergenceAnalysis to turn it into a transparent wrapper for the GPUDivergenceAnalysis. Reviewers: nhaehnle Reviewed By: nhaehnle Subscribers: jholewinski, jvesely, jfb, llvm-commits, alex-t, sameerds, arsenm, nhaehnle Differential Revision: https://reviews.llvm.org/D53493 llvm-svn: 348048	2018-11-30 22:55:20 +00:00
Jonas Paulsson	b1d014883c	[SystemZ::TTI] i8/i16 operands extension costs revisited Three minor changes to these extra costs: * For ICmp instructions, instead of adding 2 all the time for extending each operand, this is only done if that operand is neither a load or an immediate. * The operands extension costs for divides removed, because we now use a high cost already for the divide (20). * The costs for lhsr/ashr extra costs removed as this did not seem useful. Review: Ulrich Weigand https://reviews.llvm.org/D55053 llvm-svn: 347961	2018-11-30 07:09:34 +00:00
Craig Topper	81f1b4a361	[X86] Make X86TTIImpl::getCastInstrCost properly handle the case where AVX512 is enabled, but 512-bit vectors aren't legal. Unlike most cost model functions this code makes a lot of table lookups without using the results from getTypeLegalizationCost. This means 512-bit vectors can be looked up even when the type isn't legal. This patch adds a check around the two tables that contain 512-bit types to make sure that neither of the types would be split by type legalization. Meaning 512 bit types are illegal. I wanted to write this in a somewhat generic way that uses type legalization query hooks. But if prefered, I can switch to just using is512BitVector and the subtarget feature. Differential Revision: https://reviews.llvm.org/D54984 llvm-svn: 347786	2018-11-28 18:11:42 +00:00
Craig Topper	d3bb036bc9	[X86] Add some cost model entries for sext/zext for avx512bw This fixes some of scalarization costs reported for sext/zext using avx512bw. This does not fix all scalarization costs being reported. Just the worst. I've restricted this only to combinations of types that are legal with avx512bw like v32i1/v64i1/v32i16/v64i8 and conversions between vXi1 and vXi8/vXi16 with legal vXi8/vXi16 result types. Differential Revision: https://reviews.llvm.org/D54979 llvm-svn: 347785	2018-11-28 18:11:39 +00:00
Craig Topper	f3b6f583e2	[X86] Add a combine for back to back VSRAI instructions Expansion of SIGN_EXTEND_INREG can create a VSRAI instruction. If there is already a VSRAI after it, we should combine them into a larger VSRAI Differential Revision: https://reviews.llvm.org/D54959 llvm-svn: 347784	2018-11-28 18:03:38 +00:00
Jonas Paulsson	06acb3a236	[SystemZ::TTI] Improve cost for compare of i64 with extended i32 load CGF/CLGF compares an i64 register with a sign/zero extended loaded i32 value in memory. This patch makes such a load considered foldable and so gets a 0 cost. Review: Ulrich Weigand https://reviews.llvm.org/D54944 llvm-svn: 347735	2018-11-28 08:58:27 +00:00
Jonas Paulsson	d6b7aca911	[SystemZ::TTI] Improve costs for i16 add, sub and mul against memory. AH, SH and MH costs are already covered in the cases where LHS is 32 bits and RHS is 16 bits of memory sign-extended to i32. As these instructions are also used when LHS is i16, this patch recognizes that the loads will get folded then as well. Review: Ulrich Weigand https://reviews.llvm.org/D54940 llvm-svn: 347734	2018-11-28 08:31:50 +00:00
Jonas Paulsson	011a503f25	[SystemZ::TTI] Improved cost values for comparison against memory. Single instructions exist for i8 and i16 comparisons of memory against a small immediate. This patch makes sure that if the load in these cases has a single user (the ICmp), it gets a 0 cost (folded), and also that the ICmp gets a cost of 1. Review: Ulrich Weigand https://reviews.llvm.org/D54897 llvm-svn: 347733	2018-11-28 08:08:05 +00:00
Jonas Paulsson	5da8e432b9	[SystemZ::TTI] Return zero cost for scalar load/store connected with a bswap. Since byte-swapping loads and stores are supported, a 'load -> bswap' or 'bswap -> store' sequence should have the cost of one. Review: Ulrich Weigand https://reviews.llvm.org/D54870 llvm-svn: 347732	2018-11-28 07:52:34 +00:00
Craig Topper	078e58da15	[X86] Add test cases to show that we don't properly take -mprefer-vector-width=256 and -min-legal-vector-width=256 into account when costing sext/zext. The check lines marked AVX256 in the zext256/sext256 functions should be closer to the AVX values which would take into account a splitting cost. llvm-svn: 347722	2018-11-28 00:33:34 +00:00
Craig Topper	5e7dcc65bd	[X86] Add exhaustive cost model testing for sext/zext for all vector types we reasonably support. Add cost model tests for truncating to vXi1. Our sext/zext cost modeling was somewhat incomplete. And had no coverage for the fact that avx512bw v32i16/v64i8 types return a scalarization cost. Truncates are a whole different mess because isTruncateFree is returning true for vectors when it shouldn't and that's the fall back for anything not in the tables. llvm-svn: 347719	2018-11-27 22:46:05 +00:00
Craig Topper	e535babe4c	[X86] Add cost model tests for experimental.vector.reduce.* with -x86-experimental-vector-widening-legalization llvm-svn: 347697	2018-11-27 19:44:40 +00:00
Craig Topper	2f9e5c40a6	[X86] Add cost model test for masked load an store with -x86-experimental-vector-widening-legalization llvm-svn: 347696	2018-11-27 19:44:36 +00:00
Craig Topper	69cf86e327	[X86] Add cost model tests for fp_to_int/int_to_fp with -x86-experimental-vector-widening-legalization llvm-svn: 347695	2018-11-27 19:44:34 +00:00
Craig Topper	1a2f9f765e	[X86] Add cost model tests for shifts with -x86-experimental-vector-widening-legalization. llvm-svn: 347694	2018-11-27 19:44:30 +00:00
Vitaly Buka	42b050673e	[stack-safety] Inter-Procedural Analysis implementation Summary: IPA is implemented as module pass which produce map from Function or Alias to StackSafetyInfo for a single function. From prototype by Evgenii Stepanov and Vlad Tsyrklevich. Reviewers: eugenis, vlad.tsyrklevich, pcc, glider Subscribers: hiraditya, mgrang, llvm-commits Differential Revision: https://reviews.llvm.org/D54543 llvm-svn: 347611	2018-11-26 23:05:58 +00:00
Vitaly Buka	b8e6fa6638	[stack-safety] Empty local passes for Stack Safety Global Analysis Reviewers: eugenis, vlad.tsyrklevich Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D54541 llvm-svn: 347610	2018-11-26 23:05:48 +00:00
Vitaly Buka	fa98c074b7	[stack-safety] Local analysis implementation Summary: Analysis produces StackSafetyInfo which contains information with how allocas and parameters were used in functions. From prototype by Evgenii Stepanov and Vlad Tsyrklevich. Reviewers: eugenis, vlad.tsyrklevich, pcc, glider Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D54504 llvm-svn: 347603	2018-11-26 21:57:59 +00:00
Vitaly Buka	4493fe1c1b	[stack-safety] Empty local passes for Stack Safety Local Analysis Reviewers: eugenis, vlad.tsyrklevich Subscribers: mgorny, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D54502 llvm-svn: 347602	2018-11-26 21:57:47 +00:00
Nikita Popov	f94c8f0d1b	[DemandedBits] Add support for funnel shifts Add support for funnel shifts to the DemandedBits analysis. The demanded bits of the first two operands can be determined if the shift amount is constant. The demanded bits of the third operand (shift amount) can be determined if the bitwidth is a power of two. This is basically the same functionality as implemented in D54869 and D54478, but for DemandedBits rather than InstCombine. Differential Revision: https://reviews.llvm.org/D54876 llvm-svn: 347561	2018-11-26 15:36:57 +00:00
Fedor Sergeev	8cd9d1b5ce	Revert "[TTI] Reduction costs only need to include a single extract element cost" This reverts commit r346970. It was causing PR39774, a crash in slp-vectorizer on a rather simple loop with just a bunch of 'and's in the body. llvm-svn: 347541	2018-11-26 10:17:27 +00:00
Jonas Paulsson	96782c2c0b	[SystemZTTIImpl] Give correct cost values for vector bswap intrinsics. Implement getIntrinsicInstrCost() and return costs reflecting that bswap can be done with a vperm per vector register. Review: Ulrich Weigand https://reviews.llvm.org/D54789 llvm-svn: 347445	2018-11-22 07:17:29 +00:00
John Regehr	3a1c9d55cc	[LVI] run transfer function for binary operator even when the RHS isn't a constant LVI was symbolically executing binary operators only when the RHS was constant, missing the case where we have a ConstantRange for the RHS, but not an actual constant. Tested using check-all and by bootstrapping. Compile time is not impacted measurably. Differential Revision: https://reviews.llvm.org/D19859 llvm-svn: 347379	2018-11-21 05:24:12 +00:00
Sanjay Patel	efc3d1dfaa	[ConstantFolding] Add support for saturating add/sub Support saturating add/sub in constant folding, based on the APInt methods introduced in D54332. Patch by: @nikic (Nikita Popov) Differential Revision: https://reviews.llvm.org/D54531 llvm-svn: 347328	2018-11-20 17:05:55 +00:00
Anna Thomas	5e9215f02b	[LV] Avoid vectorizing unsafe dependencies in uniform address Summary: Currently, when vectorizing stores to uniform addresses, the only instance we prevent vectorization is if there are multiple stores to the same uniform address causing an unsafe dependency. This patch teaches LAA to avoid vectorizing loops that have an unsafe cross-iteration dependency between a load and a store to the same uniform address. Fixes PR39653. Reviewers: Ayal, efriedma Subscribers: rkruppe, llvm-commits Differential Revision: https://reviews.llvm.org/D54538 llvm-svn: 347220	2018-11-19 15:39:59 +00:00
Simon Pilgrim	924f193419	[TTI] Reduction costs only need to include a single extract element cost We were adding the entire scalarization extraction cost for reductions, which returns the total cost of extracting every element of a vector type. For reductions we don't need to do this - we just need to extract the 0'th element after the reduction pattern has completed. Fixes PR37731 Differential Revision: https://reviews.llvm.org/D54585 llvm-svn: 346970	2018-11-15 17:42:53 +00:00
Simon Pilgrim	2b166c5044	[TTI] getOperandInfo - a broadcast shuffle means the result is OK_UniformValue llvm-svn: 346868	2018-11-14 15:04:08 +00:00
Simon Pilgrim	cdb170794b	[CostModel] Add generic expansion funnel shift cost support Add support for the expansion of funnelshift/rotates to getIntrinsicInstrCost. This also required us to move the X86 fshl/fshr costs to the same place as the rotates to avoid expansion and get correct scalarization vs vectorization costs. llvm-svn: 346854	2018-11-14 12:24:50 +00:00
Simon Pilgrim	e827fe09b3	[CostModel][X86] Fix constant vector XOP rights shifts We'll constant fold these cases so they are as cheap as vector left shift cases. Noticed while improving funnel shift costs. llvm-svn: 346760	2018-11-13 16:40:10 +00:00
Simon Pilgrim	2fe1076a08	[CostModel][X86] Add more cost tests for funnel shifts Added full uniform/constant coverage for funnel shifts + rotates llvm-svn: 346754	2018-11-13 12:11:15 +00:00
Simon Pilgrim	93c64e5c76	[CostModel][X86] Add funnel shift rotation special case costs When we repeat the 2 shifting operands then this is a bit rotation - annoyingly this has to be done in the other getIntrinsicInstrCost than most intrinsics as we need to check the operands are the same. llvm-svn: 346688	2018-11-12 18:27:54 +00:00
Simon Pilgrim	49e93d2f0e	[CostModel][X86] Add SHLD/SHRD scalar funnel shift costs The costs match the typical reg-reg cases - the RMW case can be a lot slower but we don't model that at this level llvm-svn: 346683	2018-11-12 17:56:59 +00:00
Simon Pilgrim	095ea939c3	[CostModel][X86] Add some initial cost tests for funnel shifts Still need to add full uniform/constant coverage but this is enough to check basic fshl/fshr cost handling llvm-svn: 346670	2018-11-12 16:39:41 +00:00
Simon Pilgrim	f4cd292ba2	[CostModel][X86] SK_ExtractSubvector is cheap if the (legal) subvector is aligned within the source vector llvm-svn: 346664	2018-11-12 15:48:06 +00:00
Jonas Paulsson	5cea85dd59	[SystemZ::TTI] Improve accuracy of costs for vector fp <-> int conversions Improve getCastInstrCost() by respecting the different types of Src and Dst for vector integer <-> fp conversions. This means that extracting from integer becomes more expensive (by the extraction penalty), and the extraction from fp becomes cheaper (no longer has a false extraction penalty). Review: Ulrich Weigand https://reviews.llvm.org/D54423 llvm-svn: 346663	2018-11-12 15:32:27 +00:00
Simon Pilgrim	47d38198eb	[CostModel] Add more realistic SK_InsertSubvector generic costs. Instead of defaulting to a cost = 1, expand to element extract/insert like we do for other shuffles. llvm-svn: 346662	2018-11-12 15:20:24 +00:00
Simon Pilgrim	631f2bf51e	[CostModel] Add more realistic SK_ExtractSubvector generic costs. Instead of defaulting to a cost = 1, expand to element extract/insert like we do for other shuffles. This exposes an issue in LoopVectorize which could call SK_ExtractSubvector with a scalar subvector type. llvm-svn: 346656	2018-11-12 14:25:23 +00:00
Simon Pilgrim	fc8f1d7da7	[CostModel][X86] SK_ExtractSubvector is free if the subvector is at the start of the source vector llvm-svn: 346538	2018-11-09 19:04:27 +00:00
Simon Pilgrim	d0c71609c5	[CostModel] Add SK_ExtractSubvector handling to getInstructionThroughput (PR39368) Add ShuffleVectorInst::isExtractSubvectorMask helper to match shuffle masks. llvm-svn: 346510	2018-11-09 16:28:19 +00:00
Max Kazantsev	266c087b9d	Return "[IndVars] Smart hard uses detection" The patch has been reverted because it ended up prohibiting propagation of a constant to exit value. For such values, we should skip all checks related to hard uses because propagating a constant is always profitable. Differential Revision: https://reviews.llvm.org/D53691 llvm-svn: 346397	2018-11-08 11:54:35 +00:00
Max Kazantsev	e059f4452b	Revert "[IndVars] Smart hard uses detection" This reverts commit 2f425e9c7946b9d74e64ebbfa33c1caa36914402. It seems that the check that we still should do the transform if we know the result is constant is missing in this code. So the logic that has been deleted by this change is still sometimes accidentally useful. I revert the change to see what can be done about it. The motivating case is the following: @Y = global [400 x i16] zeroinitializer, align 1 define i16 @foo() { entry: br label %for.body for.body: ; preds = %entry, %for.body %i = phi i16 [ 0, %entry ], [ %inc, %for.body ] %arrayidx = getelementptr inbounds [400 x i16], [400 x i16]* @Y, i16 0, i16 %i store i16 0, i16* %arrayidx, align 1 %inc = add nuw nsw i16 %i, 1 %cmp = icmp ult i16 %inc, 400 br i1 %cmp, label %for.body, label %for.end for.end: ; preds = %for.body %inc.lcssa = phi i16 [ %inc, %for.body ] ret i16 %inc.lcssa } We should be able to figure out that the result is constant, but the patch breaks it. Differential Revision: https://reviews.llvm.org/D51584 llvm-svn: 346198	2018-11-06 02:02:05 +00:00
Jonas Paulsson	cced2a2775	[SystemZ::TTI] Improve cost handling of uint/sint to fp conversions. Let i8/i16 uint/sint to fp conversions cost 1 if operand is a load. Since the load already does the extension, there is no extra cost (previously returned 2). Review: Ulrich Weigand https://reviews.llvm.org/D54028 llvm-svn: 346009	2018-11-02 17:53:31 +00:00
Easwaran Raman	c5e1506ec8	[ProfileSummary] Add options to override hot and cold count thresholds. Summary: The hot and cold count thresholds are derived from the summary, but for debugging purposes it is convenient to provide the actual thresholds. Reviewers: davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54040 llvm-svn: 346005	2018-11-02 17:39:31 +00:00
Jonas Paulsson	6749c24f40	[SystemZ::TTI] Recognize the higher cost of scalar i1 -> fp conversion Scalar i1 to fp conversions are done with a branch sequence, so it should have a higher cost. Review: Ulrich Weigand https://reviews.llvm.org/D53924 llvm-svn: 345818	2018-11-01 09:05:32 +00:00
Jonas Paulsson	f15a53bc81	[SystemZ::TTI] Accurate costs for i1->double vector conversions This factors out a new method getBoolVecToIntConversionCost() containing the code for vector sext/zext of i1, in order to reuse it for i1 to double vector conversions. Review: Ulrich Weigand https://reviews.llvm.org/D53923 llvm-svn: 345817	2018-11-01 09:01:51 +00:00
Max Kazantsev	3d347bf545	[IndVars] Smart hard uses detection When rewriting loop exit values, IndVars considers this transform not profitable if the loop instruction has a loop user which it believes cannot be optimized away. In current implementation only calls that immediately use the instruction are considered as such. This patch extends the definition of "hard" users to any side-effecting instructions (which usually cannot be optimized away from the loop) and also allows handling of not just immediate users, but use chains. Differentlai Revision: https://reviews.llvm.org/D51584 Reviewed By: etherzhhb llvm-svn: 345814	2018-11-01 06:47:01 +00:00
Max Kazantsev	e0a2613aea	[SCEV] Avoid redundant computations when doing AddRec merge When we calculate a product of 2 AddRecs, we end up making quite massive computations to deduce the operands of resulting AddRec. This process can be optimized by computing all args of intermediate sum and then calling `getAddExpr` once rather than calling `getAddExpr` with intermediate result every time a new argument is computed. Differential Revision: https://reviews.llvm.org/D53189 Reviewed By: rtereshin llvm-svn: 345813	2018-11-01 06:18:27 +00:00
Simon Pilgrim	44a9a71d2a	[TTI] Fix uses of SK_ExtractSubvector shuffle costs (PR39368) Correct costings of SK_ExtractSubvector requires the SubTy argument to indicate the type/size of the extracted subvector. Unlike the rest of the shuffle kinds this means that the main Ty argument represents the source vector type not the destination! I've done my best to fix a number of vectorizer uses: SLP - the reduction epilogue costs should be using a SK_PermuteSingleSrc shuffle as these all occur at the hardware vector width - we're not extracting (illegal) subvector types. This is causing the cost model diffs as SK_ExtractSubvector costs are poorly handled and tend to just return 1 at the moment. LV - I'm not clear on what the SK_ExtractSubvector should represents for recurrences - I've used a <1 x ?> subvector extraction as that seems to match the VF delta. Differential Revision: https://reviews.llvm.org/D53573 llvm-svn: 345617	2018-10-30 18:10:02 +00:00
Jonas Paulsson	af8e036c29	[SystemZ] Improve isFoldableLoad() for Sub, SDiv and UDiv. Sub, SDiv and UDiv are not commutative, so only the RHS operand can fold a load. This patch adds a check for this. Review: Ulrich Weigand https://reviews.llvm.org/D53791 llvm-svn: 345596	2018-10-30 13:41:03 +00:00
Craig Topper	faf423ca74	[X86] Add -LABEL to some FileCheck checks. NFC llvm-svn: 345407	2018-10-26 17:21:19 +00:00
Jonas Paulsson	b7caa809e1	[SystemZ] Improve getMemoryOpCost() to find foldable loads that are converted. The SystemZ backend can do arithmetic of memory by loading and then extending one of the operands. Similarly, a load + truncate can be folded into an operand. This patch improves the SystemZ TTI cost function to recognize this. Review: Ulrich Weigand https://reviews.llvm.org/D52692 llvm-svn: 345327	2018-10-25 22:28:25 +00:00
Jonas Paulsson	4645711a8d	[SystemZ] Improve handling and cost estimates of vector integer div/rem Enable the DAG optimization that converts vector div/rem with constants into multiply+shifts sequences by expanding them early. This is needed since ISD::SMUL_LOHI is 'Custom' lowered on SystemZ, and will therefore not be available to BuildSDIV after legalization. Better cost values for these instructions based on how they will be implemented (a constant divisor is cheaper). Review: Ulrich Weigand https://reviews.llvm.org/D53196 llvm-svn: 345321	2018-10-25 21:47:22 +00:00
Simon Pilgrim	53e8e145e9	[CostModel][X86] Add realistic vXi64 uitofp vXf64 costs Match codegen improvements from D53649/rL345256 llvm-svn: 345263	2018-10-25 13:06:20 +00:00
Simon Pilgrim	0573b8d8b6	[CostModel][X86] Add realistic i64 uitofp f64 scalar costs llvm-svn: 345261	2018-10-25 12:42:10 +00:00
Simon Pilgrim	ac84005841	[CostModel][X86] Add vXi8 vector division by constants costs. ISD::MULHS/ISD::MULHU lowering of vXi8 types means we expand these in TargetLowering BuildSDIV/BuildUDIV. llvm-svn: 345175	2018-10-24 18:44:12 +00:00
Simon Pilgrim	2cce074e8c	[CostModel][X86] Enable non-uniform vector division by constants costs. Non-uniform division/remainder handling was added back at D49248/D50765 - so share the 'mul+sub' costs that already exist for uniform cases. llvm-svn: 345164	2018-10-24 17:30:29 +00:00
Simon Pilgrim	f04a04c2b6	[TTI][X86] Treat SK_Transpose shuffles as SK_PermuteTwoSrc - there's no difference in lowering. llvm-svn: 345048	2018-10-23 16:45:26 +00:00
Simon Pilgrim	f1d8b7c49e	[CostModel][X86] Add transpose shuffle cost tests llvm-svn: 345045	2018-10-23 16:27:14 +00:00
Simon Pilgrim	9a2ee1ea2d	Add BROADCAST shuffle cost tests. Part of a lot of cleanup necessary before PR39368. llvm-svn: 345025	2018-10-23 13:14:54 +00:00
Simon Pilgrim	051feee5c2	Add BROADCAST shuffle cost tests. Part of a lot of cleanup necessary before PR39368. llvm-svn: 345023	2018-10-23 13:00:22 +00:00
Simon Pilgrim	8c3d87b8cf	[ARM] Regenerate reverse shuffle costs Came about while cleaning up general shuffle costs for PR39368 llvm-svn: 344966	2018-10-22 22:26:00 +00:00
Simon Pilgrim	acd10b9f6c	[CostModel][X86] Add some initial extract/insert subvector shuffle cost tests Just f64/i64 tests initially to demonstrate PR39368 llvm-svn: 344857	2018-10-20 17:38:33 +00:00
Simon Pilgrim	e612ab0c80	[CostModel][X86] Add integer vector reduction cost tests llvm-svn: 344846	2018-10-20 14:29:59 +00:00
Thomas Lively	fa54e56d84	[ConstantFolding] Constant fold minimum and maximum intrinsics Summary: Depends on D52764 Reviewers: aheejin, dschuff Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52765 llvm-svn: 344796	2018-10-19 18:15:32 +00:00
Anna Thomas	6f732bfb79	[LV] Teach vectorizer about variant value store into uniform address Summary: Teach vectorizer about vectorizing variant value stores to uniform address. Similar to rL343028, we do not allow vectorization if we have multiple stores to the same uniform address. Cost model already has the change for considering the extract instruction cost for a variant value store. See added test cases for how vectorization is done. The patch also contains changes to the ORE messages. Reviewers: Ayal, mkuper, anemet, hsaito Subscribers: rkruppe, llvm-commits Differential Revision: https://reviews.llvm.org/D52656 llvm-svn: 344613	2018-10-16 15:46:26 +00:00
Max Kazantsev	fdfd98ceec	[SCEV] Limit AddRec "simplifications" to avoid combinatorial explosions SCEV's transform that turns `{A1,+,A2,+,...,+,An}<L> * {B1,+,B2,+,...,+,Bn}<L>` into a single AddRec of size `2n+1` with complex combinatorial coefficients can easily trigger exponential growth of the SCEV (in case if nothing gets folded and simplified). We tried to restrain this transform using the option `scalar-evolution-max-add-rec-size`, but its default value seems to be insufficiently small: the test attached to this patch with default value of this option `16` has a SCEV of >3M symbols (when printed out). This patch reduces the simplification limit. It is not a cure to combinatorial explosions, but at least it reduces this corner case to something more or less reasonable. Differential Revision: https://reviews.llvm.org/D53282 Reviewed By: sanjoy llvm-svn: 344584	2018-10-16 05:26:21 +00:00
Jonas Paulsson	bf66f38705	[SystemZ] Temporarily disable high VFs with integer div/rem. Until mischeduler is clever enough to avoid spilling in a vectorized loop with many (scalar) DLRs it is better to avoid high vectorization factors (8 and above). llvm-svn: 344129	2018-10-10 09:30:29 +00:00
Jonas Paulsson	2c8b33770c	[SystemZ] Take better care when computing needed vector registers in TTI. A new function getNumVectorRegs() is better to use for the number of needed vector registers instead of getNumberOfParts(). This is to make sure that the number of vector registers (and typically operations) required for a vector type is accurate. getNumberOfParts() which was previously used works by splitting the vector type until it is legal gives incorrect results for types with a non power of two number of elements (rare). A new static function getScalarSizeInBits() that also checks for a pointer type and returns 64U for it since otherwise it gets a value of 0). Used in a few places where Ty may be pointer. Review: Ulrich Weigand llvm-svn: 344115	2018-10-10 07:36:27 +00:00
George Burgess IV	d98d505c0d	[Analysis] Make LocationSize pretty-printing more descriptive This is the third patch in a series intended to make https://reviews.llvm.org/D44748 more easily reviewable. Please see that patch for more context. The second being r344013. The intent is to make the output of printing a LocationSize more precise. The main motivation for this is that we plan to add a bit to distinguish whether a given LocationSize is an upper-bound or is precise; making that information available in pretty-printing is nice. llvm-svn: 344108	2018-10-10 01:35:22 +00:00
Anna Thomas	b1e3d45318	[LV][LAA] Vectorize loop invariant values stored into loop invariant address Summary: We are overly conservative in loop vectorizer with respect to stores to loop invariant addresses. More details in https://bugs.llvm.org/show_bug.cgi?id=38546 This is the first part of the fix where we start with vectorizing loop invariant values to loop invariant addresses. This also includes changes to ORE for stores to invariant address. Reviewers: anemet, Ayal, mkuper, mssimpso Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D50665 llvm-svn: 343028	2018-09-25 20:57:20 +00:00
Jonas Paulsson	77df2f2f38	[SystemZ] Adjust cost functions for subtargets that use LI + LOC instead of IPM After recent improvements which makes better use of LOC instead of IPM, the TTI cost functions also needs to be updated to reflect this. This involves sext, zext and xor of i1. The tests were updated so that for z13 the new costs are expected, while the old costs are still checked for on zEC12. Review: Ulrich Weigand https://reviews.llvm.org/D51339 llvm-svn: 342207	2018-09-14 06:46:55 +00:00
Peter Collingbourne	c7d281905b	Prevent Constant Folding From Optimizing inrange GEP This patch does the following things: 1. update SymbolicallyEvaluateGEP so that it bails out if it cannot preserve inrange arribute; 2. update llvm/test/Analysis/ConstantFolding/gep.ll to remove UB in it; 3. remove inaccurate comment above ConstantFoldInstOperandsImpl in llvm/lib/Analysis/ConstantFolding.cpp; 4. add a new regression test that makes sure that no optimizations change an inrange GEP in an unexpected way. Patch by Zhaomo Yang! Differential Revision: https://reviews.llvm.org/D51698 llvm-svn: 341888	2018-09-11 01:53:36 +00:00
Philip Reames	9f09161290	[AST] Add test coverage of memsets Immediately after posting https://reviews.llvm.org/D51895, I noticed a small bug. These tests would have caught that. llvm-svn: 341880	2018-09-10 23:14:30 +00:00
Philip Reames	5660bd460b	[AST] Visit memtransfer arguments in order The only point to this change is the test diffs. When I remove this code entirely (in favor of the recently added generic handling), I don't want there to be any confusion due to spurious test diffs. As an aside, the fact out tests are AST construction order dependent is not great. I thought about fixing that, but the reasonable schemes I might want (e.g. sort by name) need the test diffs anyways. Philip llvm-svn: 341841	2018-09-10 16:00:27 +00:00
Tim Northover	12c1f7675f	InstCombine: move hasOneUse check to the top of foldICmpAddConstant There were two combines not covered by the check before now, neither of which actually differed from normal in the benefit analysis. The most recent seems to be because it was just added at the top of the function (naturally). The older is from way back in 2008 (r46687) when we just didn't put those checks in so routinely, and has been diligently maintained since. llvm-svn: 341831	2018-09-10 14:26:44 +00:00
Matt Arsenault	72d27f5525	AMDGPU: Fix tests using old number for constant address space llvm-svn: 341770	2018-09-10 02:54:25 +00:00
Philip Reames	cb8b3278e5	[AST] Generalize argument specific aliasing AliasSetTracker has special case handling for memset, memcpy and memmove which pre-existed argmemonly on functions and readonly and writeonly on arguments. This patch generalizes it using the AA infrastructure to any call correctly annotated. The motivation here is to cut down on confusion, not performance per se. For most instructions, there is a direct mapping to alias set. However, this is not guaranteed by the interface and was not in fact true for these three intrinsics and only these three intrinsics. I kept getting myself confused about this invariant, so I figured it would be good to clearly distinguish between a instructions and alias sets. Calls happened to be an easy target. The nice side effect is that custom implementations of memset/memcpy/memmove - including wrappers discovered by IPO - can now be optimized the same as builts by LICM. Note: The actual removal of the memset/memtransfer specific handling will happen in a follow on NFC patch. It was originally part of this one, but separate for ease of review and rebase. Differential Revision: https://reviews.llvm.org/D50730 llvm-svn: 341713	2018-09-07 21:36:11 +00:00
Nicola Zaghen	9588ad9611	[InstCombine] Fold icmp ugt/ult (add nuw X, C2), C --> icmp ugt/ult X, (C - C2) Support for sgt/slt was added in rL294898, this adds the same cases also for unsigned compares. This is the Alive proof: https://rise4fun.com/Alive/nyY Differential Revision: https://reviews.llvm.org/D50972 llvm-svn: 341353	2018-09-04 10:29:48 +00:00
Nicolai Haehnle	65c026259e	Move test/Analysis/DivergenceAnalysis/AMDGPU/loads.ll Should fix failures of buildbots that don't build the AMDGPU backend. Change-Id: I01cb84b4b47803b10c5b21ea0353546239860a51 llvm-svn: 341079	2018-08-30 15:24:00 +00:00
Nicolai Haehnle	35617ed4cb	[NFC] Rename the DivergenceAnalysis to LegacyDivergenceAnalysis Summary: This is patch 1 of the new DivergenceAnalysis (https://reviews.llvm.org/D50433). The purpose of this patch is to free up the name DivergenceAnalysis for the new generic implementation. The generic implementation class will be shared by specialized divergence analysis classes. Patch by: Simon Moll Reviewed By: nhaehnle Subscribers: jvesely, jholewinski, arsenm, nhaehnle, mgorny, jfb, llvm-commits Differential Revision: https://reviews.llvm.org/D50434 Change-Id: Ie8146b11be2c50d5312f30e11c7a3036a15b48cb llvm-svn: 341071	2018-08-30 14:21:36 +00:00
Sanjay Patel	7a05641fa8	[InstCombine] remove unnecessary shuffle undef folding Add a test for constant folding to show that (shuffle undef, undef, mask) should already be handled via instsimplify. llvm-svn: 340926	2018-08-29 13:24:34 +00:00
Kit Barton	7c80f98b69	[PPC] Remove Darwin support from POWER backend. This patch issues an error message if Darwin ABI is attempted with the PPC backend. It also cleans up existing test cases, either converting the test to use an alternative triple or removing the test if the coverage is no longer needed. Updated Tests ------------- The majority of test cases were updated to use a different triple that does not include the Darwin ABI. Many tests were also updated to use FileCheck, in place of grep. Deleted Tests ------------- llvm/test/tools/dsymutil/PowerPC/sibling.test was originally added to test specific functionality of dsymutil using an object file created with an old version of llvm-gcc for a Powerbook G4. After a discussion with @JDevlieghere he suggested removing the test. llvm/test/CodeGen/PowerPC/combine_loads_from_build_pair.ll was converted from a PPC test to a SystemZ test, as the behavior is also reproducible there. All other tests that were deleted were specific to the darwin/ppc ABI and no longer necessary. Phabricator Review: https://reviews.llvm.org/D50988 llvm-svn: 340795	2018-08-28 01:18:29 +00:00
Craig Topper	a72012c206	[X86] Correct the cost of (v4i32 (fptoui (v4f64))) under AVX512F. Summary: This was inheriting the cost from the AVX table, but should be legal under AVX512. Reviewers: RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D51267 llvm-svn: 340708	2018-08-26 18:47:44 +00:00
John Brawn	980da83f84	[PhiValues] Use callback value handles to invalidate deleted values The way that PhiValues is integrated with BasicAA it is possible for a pass which uses BasicAA to pick up an instance of BasicAA that uses PhiValues without intending to, and then delete values from a function in a way that causes PhiValues to return dangling pointers to these deleted values. Fix this by having a set of callback value handles to invalidate values when they're deleted. llvm-svn: 340613	2018-08-24 15:48:30 +00:00
Brian Homerding	3ecabd709f	[FunctionAttrs] Infer WriteOnly Function Attribute These changes expand the FunctionAttr logic in order to mark functions as WriteOnly when appropriate. This is done through an additional bool variable and extended logic. Reviewers: hfinkel, jdoerfert Differential Revision: https://reviews.llvm.org/D48387 llvm-svn: 340537	2018-08-23 15:05:22 +00:00
Philip Reames	6b6d2e0105	[AST] Add a test for attribute intersection Already works, but I initially convinced myself it doesn't, so add a test which shows it does. :) llvm-svn: 340453	2018-08-22 21:10:56 +00:00
Scott Linder	72855e36c5	[AMDGPU] Consider loads from flat addrspace to be potentially divergent In general we can't assume flat loads are uniform, and cases where we can prove they are should be handled through infer-address-spaces. Differential Revision: https://reviews.llvm.org/D50991 llvm-svn: 340343	2018-08-21 21:24:31 +00:00
Philip Reames	c3c23e8cf2	[AST] Remove notion of volatile from alias sets [NFCI] Volatility is not an aliasing property. We used to model volatile as if it had extremely conservative aliasing implications, but that hasn't been true for several years now. So, it doesn't make sense to be in AliasSet. It also turns out the code is entirely a noop. Outside of the AST code to update it, there was only one user: load store promotion in LICM. L/S promotion doesn't need the check since it walks all the users of the address anyway. It already checks each load or store via !isUnordered which causes us to bail for volatile accesses. (Look at the lines immediately following the two remove asserts.) There is the possibility of some small compile time impact here, but the only case which will get noticeably slower is a loop with a large number of loads and stores to the same address where only the last one we inspect is volatile. This is sufficiently rare it's not worth optimizing for.. llvm-svn: 340312	2018-08-21 17:59:11 +00:00
Sanjay Patel	3ce999fa41	[ConstantFolding] improve folding of binops with vector undef operand A non-undef operand may still have undef constant elements, so we should always propagate the vector results per-lane. llvm-svn: 340194	2018-08-20 18:19:02 +00:00
Sanjay Patel	7ff7bd9b3c	[ConstantFolding] add tests for binops on vectors with undef elements; NFC llvm-svn: 340190	2018-08-20 17:31:34 +00:00
Philip Reames	96bc076c3a	[AST] Clarify printing of unknown size locations [NFC] Printing "unknown" is much more clear than an arbitrary large integer llvm-svn: 340108	2018-08-17 23:17:31 +00:00
Philip Reames	26f6176f38	[AST][Tests] Clarify what each test is doing llvm-svn: 340100	2018-08-17 21:58:26 +00:00
Philip Reames	9e313167cf	[AST[Tests] Shorten tests using noalias params llvm-svn: 340099	2018-08-17 21:45:57 +00:00
Philip Reames	079c92e201	[AST] Add tests for argmemonly calls [NFC] First step towards building a test set to rebase D50730 on top of. Starting with clone of memtransfer tests, more to come. llvm-svn: 340095	2018-08-17 21:42:18 +00:00
Sanjay Patel	411b86081e	[ConstantFolding] add simplifications for funnel shift intrinsics This is another step towards being able to canonicalize to the funnel shift intrinsics in IR (see D49242 for the initial patch). We should not have any loss of simplification power in IR between these and the equivalent IR constructs. Differential Revision: https://reviews.llvm.org/D50848 llvm-svn: 340022	2018-08-17 13:23:44 +00:00
Max Kazantsev	7b78d3920c	[MustExecute] Fix algorithmic bug in isGuaranteedToExecute. PR38514 The description of `isGuaranteedToExecute` does not correspond to its implementation. According to description, it should return `true` if an instruction is executed under the assumption that its loop is entered. However there is a sophisticated alrogithm inside that tries to prove that the instruction is executed if the loop is exited, which is not the same thing for infinite loops. There is an attempt to protect from dealing with infinite loops by prohibiting loops without exit blocks, however an infinite loop can have exit blocks. As result of that, MustExecute can falsely consider some blocks that are never entered as mustexec, and LICM can hoist dangerous instructions out of them basing on this fact. This may introduce UB to programs which did not contain it initially. This patch removes the problematic algorithm and replaced it with a one which tries to prove what is required in description. Differential Revision: https://reviews.llvm.org/D50558 Reviewed By: reames llvm-svn: 339984	2018-08-17 06:19:17 +00:00
Sanjay Patel	0ea8d8b951	[ConstantFolding] add tests for funnel shift intrinsics; NFC No functionality for this yet. llvm-svn: 339889	2018-08-16 16:10:42 +00:00
Easwaran Raman	aca738b742	[BFI] Use rounding while computing profile counts. Summary: Profile count of a block is computed by multiplying its block frequency by entry count and dividing the result by entry block frequency. Do rounded division in the last step and update test cases appropriately. Reviewers: davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D50822 llvm-svn: 339835	2018-08-16 00:26:59 +00:00
Max Kazantsev	5a10d127b9	[AliasSetTracker] Do not treat experimental_guard intrinsic as memory writing instruction The `experimental_guard` intrinsic has memory write semantics to model the thread-exiting logic, but does not do any actual writes to memory. Currently, `AliasSetTracker` treats it as a normal memory write. As result, a loop-invariant load cannot be hoisted out of loop because the guard may possibly alias with it. This patch makes `AliasSetTracker` so that it doesn't treat guards as memory writes. Differential Revision: https://reviews.llvm.org/D50497 Reviewed By: reames llvm-svn: 339753	2018-08-15 06:21:02 +00:00
Max Kazantsev	837418f3f9	[NFC] Add comprehensive test of AliasSetTracker with guards llvm-svn: 339643	2018-08-14 06:37:39 +00:00
Reid Kleckner	40e7663b1f	[BasicAA] Don't assume tail calls with byval don't alias allocas Summary: Calls marked 'tail' cannot read or write allocas from the current frame because the current frame might be destroyed by the time they run. However, a tail call may use an alloca with byval. Calling with byval copies the contents of the alloca into argument registers or stack slots, so there is no lifetime issue. Tail calls never modify allocas, so we can return just ModRefInfo::Ref. Fixes PR38466, a longstanding bug. Reviewers: hfinkel, nlewycky, gbiv, george.burgess.iv Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D50679 llvm-svn: 339636	2018-08-14 01:24:35 +00:00
Max Kazantsev	4e9def57c7	[NFC] Add tests that demonstrate that MustExecute is fundamentally broken llvm-svn: 339417	2018-08-10 09:20:46 +00:00
Krzysztof Parzyszek	90f3249ce2	[SCEV] Properly solve quadratic equations Differential Revision: https://reviews.llvm.org/D48283 llvm-svn: 338758	2018-08-02 19:13:35 +00:00
John Brawn	cd73fe8989	[BasicAA] Use PhiValuesAnalysis if available when handling phi alias By using PhiValuesAnalysis we can get all the values reachable from a phi, so we can be more precise instead of giving up when a phi has phi operands. We can't make BaseicAA directly use PhiValuesAnalysis though, as the user of BasicAA may modify the function in ways that PhiValuesAnalysis can't cope with. For this optional usage to work correctly BasicAAWrapperPass now needs to be not marked as CFG-only (i.e. it is now invalidated even when CFG is preserved) due to how the legacy pass manager handles dependent passes being invalidated, namely the depending pass still has a pointer to the now-dead dependent pass. Differential Revision: https://reviews.llvm.org/D44564 llvm-svn: 338242	2018-07-30 11:52:08 +00:00
Keno Fischer	864fbd8e9a	[SCEV] Don't expand Wrap predicate using inttoptr in ni addrspaces Summary: In non-integral address spaces, we're not allowed to introduce inttoptr/ptrtoint intrinsics. Instead, we need to expand any pointer arithmetic as geps on the base pointer. Luckily this is a common task for SCEV, so all we have to do here is hook up the corresponding helper function and add test case. Fixes PR38290 Reviewers: sanjoy Differential Revision: https://reviews.llvm.org/D49832 llvm-svn: 338073	2018-07-26 21:55:06 +00:00
Stanislav Mekhanoshin	b8269a9589	Fix llvm::ComputeNumSignBits with some operations and llvm.assume Currently ComputeNumSignBits does early exit while processing some of the operations (add, sub, mul, and select). This prevents the function from using AssumptionCacheTracker if passed. Differential Revision: https://reviews.llvm.org/D49759 llvm-svn: 337936	2018-07-25 16:39:24 +00:00
Roman Tereshin	1ba1f9310c	[SCEV] Add zext(C + x + ...) -> D + zext(C-D + x + ...)<nuw><nsw> transform if the top level addition in (D + (C-D + x + ...)) could be proven to not wrap, where the choice of D also maximizes the number of trailing zeroes of (C-D + x + ...), ensuring homogeneous behaviour of the transformation and better canonicalization of such expressions. This enables better canonicalization of expressions like 1 + zext(5 + 20 * %x + 24 * %y) and zext(6 + 20 * %x + 24 * %y) which get both transformed to 2 + zext(4 + 20 * %x + 24 * %y) This pattern is common in address arithmetics and the transformation makes it easier for passes like LoadStoreVectorizer to prove that 2 or more memory accesses are consecutive and optimize (vectorize) them. Reviewed By: mzolotukhin Differential Revision: https://reviews.llvm.org/D48853 llvm-svn: 337859	2018-07-24 21:48:56 +00:00
Max Kazantsev	d41faecc49	[SCEV] Fix buggy behavior in getAddExpr with truncs SCEV tries to constant-fold arguments of trunc operands in SCEVAddExpr, and when it does that, it passes wrong flags into the recursion. It is only valid to pass flags that are proved for narrow type into a computation in wider type if we can prove that trunc instruction doesn't actually change the value. If it did lose some meaningful bits, we may end up proving wrong no-wrap flags for sum of arguments of trunc. In the provided test we end up with `nuw` where it shouldn't be because of this bug. The solution is to conservatively pass `SCEV::FlagAnyWrap` which is always a valid thing to do. Reviewed By: sanjoy Differential Revision: https://reviews.llvm.org/D49471 llvm-svn: 337435	2018-07-19 01:46:21 +00:00
Max Kazantsev	6b12506200	[NFC] Make a test more neat llvm-svn: 337379	2018-07-18 11:03:40 +00:00
Tim Shen	a064622bd3	Re-apply "[SCEV] Strengthen StrengthenNoWrapFlags (reapply r334428)." llvm-svn: 337075	2018-07-13 23:58:46 +00:00
Tim Renouf	f3d8295105	DivergenceAnalysis: added debug output Summary: This commit does two things: 1. modified the existing DivergenceAnalysis::dump() so it dumps the whole function with added DIVERGENT: annotations; 2. added code to do that dump if the appropriate -debug-only option is on. Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D47700 Change-Id: Id97b605aab1fc6f5a11a20c58a99bbe8c565bf83 llvm-svn: 336998	2018-07-13 13:13:30 +00:00
Piotr Padlewski	c63b492bcd	Simplify recursive launder.invariant.group and strip Summary: This patch is crucial for proving equality laundered/stripped pointers. eg: bool foo(A a) { return a == std::launder(a); } Clang with -fstrict-vtable-pointers will emit something like: define dso_local zeroext i1 @_Z3fooP1A(%struct.A %a) { entry: %c = bitcast %struct.A* %a to i8* %call = tail call i8* @llvm.launder.invariant.group.p0i8(i8* %c) %0 = bitcast %struct.A* %a to i8* %1 = tail call i8* @llvm.strip.invariant.group.p0i8(i8* %0) %2 = tail call i8* @llvm.strip.invariant.group.p0i8(i8* %call) %cmp = icmp eq i8* %1, %2 ret i1 %cmp } and because %2 can be replaced with @llvm.strip.invariant.group(%0) and that %2 and %1 will produce the same value (because strip is readnone) we can replace compare with true. Reviewers: rsmith, hfinkel, majnemer, amharc, kuhar Subscribers: llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D47423 llvm-svn: 336963	2018-07-12 23:55:20 +00:00
Simon Pilgrim	667a5b541f	[TargetTransformInfo] Add pow2 analysis for scalar constants Add ConstantInt analysis to getOperandInfo so we get more realistic div/rem expansion costs comparable to the vector costs. llvm-svn: 336827	2018-07-11 17:51:27 +00:00
Manoj Gupta	77eeac3d9e	llvm: Add support for "-fno-delete-null-pointer-checks" Summary: Support for this option is needed for building Linux kernel. This is a very frequently requested feature by kernel developers. More details : https://lkml.org/lkml/2018/4/4/601 GCC option description for -fdelete-null-pointer-checks: This Assume that programs cannot safely dereference null pointers, and that no code or data element resides at address zero. -fno-delete-null-pointer-checks is the inverse of this implying that null pointer dereferencing is not undefined. This feature is implemented in LLVM IR in this CL as the function attribute "null-pointer-is-valid"="true" in IR (Under review at D47894). The CL updates several passes that assumed null pointer dereferencing is undefined to not optimize when the "null-pointer-is-valid"="true" attribute is present. Reviewers: t.p.northover, efriedma, jyknight, chandlerc, rnk, srhines, void, george.burgess.iv Reviewed By: efriedma, george.burgess.iv Subscribers: eraman, haicheng, george.burgess.iv, drinkcat, theraven, reames, sanjoy, xbolva00, llvm-commits Differential Revision: https://reviews.llvm.org/D47895 llvm-svn: 336613	2018-07-09 22:27:23 +00:00
Simon Pilgrim	dc113dc7ed	[CostModel][X86] Add SREM/UREM general and constant costs (PR38056) We penalize general SDIV/UDIV costs but don't do the same for SREM/UREM. This patch makes general vector SREM/UREM x20 as costly as scalar, the same approach as we do for SDIV/UDIV. The patch also extends the existing SDIV/UDIV constant costs for SREM/UREM - at the moment this means the additional cost of a MUL+SUB (see D48975). Differential Revision: https://reviews.llvm.org/D48980 llvm-svn: 336486	2018-07-07 16:53:30 +00:00
Tim Shen	2ed501d656	Revert "[SCEV] Strengthen StrengthenNoWrapFlags (reapply r334428)." This reverts commit r336140. Our tests shows that LSR assert fails with it. llvm-svn: 336473	2018-07-06 23:20:35 +00:00
Max Kazantsev	20da7e467a	Revert "[InstCombine] Delay foldICmpUsingKnownBits until simple transforms are done" llvm-svn: 336410	2018-07-06 04:04:13 +00:00
Simon Pilgrim	8c3765dc6b	[CostModel][X86] Add UDIV/UREM by pow2 costs Normally InstCombine would have simplified these to SRL/AND instructions but we may still see these during SLP vectorization etc. llvm-svn: 336371	2018-07-05 16:56:28 +00:00
Max Kazantsev	3097b76e8c	[InstCombine] Delay foldICmpUsingKnownBits until simple transforms are done This patch changes order of transform in InstCombineCompares to avoid performing transforms based on ranges which produce complex bit arithmetics before more simple things (like folding with constants) are done. See PR37636 for the motivating example. Differential Revision: https://reviews.llvm.org/D48584 Reviewed By: spatel, lebedev.ri llvm-svn: 336172	2018-07-03 06:23:57 +00:00
Tim Shen	c7cef4bcc4	[SCEV] Strengthen StrengthenNoWrapFlags (reapply r334428). Summary: Comment on Transforms/LoopVersioning/incorrect-phi.ll: With the change SCEV is able to prove that the loop doesn't wrap-self (due to zext i16 to i64), disabling the entire loop versioning pass. Removed the zext and just use i64. Reviewers: sanjoy Subscribers: jlebar, hiraditya, javed.absar, bixia, llvm-commits Differential Revision: https://reviews.llvm.org/D48409 llvm-svn: 336140	2018-07-02 20:01:54 +00:00
Simon Pilgrim	ac193d4b5c	[CostModel][X86] Add cost tests for fp rounding intrinsics Add cost tests for fp ceil, floor, nearbyint, rint and trunc. llvm-svn: 336122	2018-07-02 17:07:01 +00:00
Piotr Padlewski	5b3db45e8f	Implement strip.invariant.group Summary: This patch introduce new intrinsic - strip.invariant.group that was described in the RFC: Devirtualization v2 Reviewers: rsmith, hfinkel, nlopes, sanjoy, amharc, kuhar Subscribers: arsenm, nhaehnle, JDevlieghere, hiraditya, xbolva00, llvm-commits Differential Revision: https://reviews.llvm.org/D47103 Co-authored-by: Krzysztof Pszeniczny <krzysztof.pszeniczny@gmail.com> llvm-svn: 336073	2018-07-02 04:49:30 +00:00
Roman Shirokiy	272eac85c7	Fix overconfident assert in ScalarEvolution::isImpliedViaMerge We can have AddRec with loops having many predecessors. This changes an assert to an early return. Differential Revision: https://reviews.llvm.org/D48766 llvm-svn: 335965	2018-06-29 11:46:30 +00:00
John Brawn	bdbbd8381f	Add a PhiValuesAnalysis pass to calculate the underlying values of phis This pass is being added in order to make the information available to BasicAA, which can't do caching of this information itself, but possibly this information may be useful for other passes. Incorporates code based on Daniel Berlin's implementation of Tarjan's algorithm. Differential Revision: https://reviews.llvm.org/D47893 llvm-svn: 335857	2018-06-28 14:13:06 +00:00
Adhemerval Zanella	cadcfed7aa	[AArch64] Add custom lowering for v4i8 trunc store This patch adds a custom trunc store lowering for v4i8 vector types. Since there is not v.4b register, the v4i8 is promoted to v4i16 (v.4h) and default action for v4i8 is to extract each element and issue 4 byte stores. A better strategy would be to extended the promoted v4i16 to v8i16 (with undef elements) and extract and store the word lane which represents the v4i8 subvectores. The construction: define void @foo(<4 x i16> %x, i8* nocapture %p) { %0 = trunc <4 x i16> %x to <4 x i8> %1 = bitcast i8* %p to <4 x i8>* store <4 x i8> %0, <4 x i8>* %1, align 4, !tbaa !2 ret void } Can be optimized from: umov w8, v0.h[3] umov w9, v0.h[2] umov w10, v0.h[1] umov w11, v0.h[0] strb w8, [x0, #3] strb w9, [x0, #2] strb w10, [x0, #1] strb w11, [x0] ret To: xtn v0.8b, v0.8h str s0, [x0] ret The patch also adjust the memory cost for autovectorization, so the C code: void foo (const int src, int width, unsigned char dst) { for (int i = 0; i < width; i++) dst++ = src++; } can be vectorized to: .LBB0_4: // %vector.body // =>This Inner Loop Header: Depth=1 ldr q0, [x0], #16 subs x12, x12, #4 // =4 xtn v0.4h, v0.4s xtn v0.8b, v0.8h st1 { v0.s }[0], [x2], #4 b.ne .LBB0_4 Instead of byte operations. llvm-svn: 335735	2018-06-27 13:58:46 +00:00
David Green	8699492304	[DA] Delinearise AddRecs if we can prove they don't wrap We can prove that some delinearized subscripts do not wrap around to become negative by the fact that they are from inbound geps of load/store locations. This helps improve the delinearisation in cases where we can't prove that they are non-negative from SCEV alone. Differential Revision: https://reviews.llvm.org/D48481 llvm-svn: 335481	2018-06-25 15:13:26 +00:00
Simon Pilgrim	9c8f9374b5	[CostModel][AArch64] Add some initial costs for SK_Select and SK_PermuteSingleSrc AArch64 was only setting costs for SK_Transpose, which meant that many of the simpler shuffles (e.g. SK_Select and SK_PermuteSingleSrc for larger vector elements) was being severely overestimated by the default shuffle expansion. This patch adds costs to help improve SLP performance and avoid a regression in reductions introduced by D48174. I'm not very knowledgeable about AArch64 shuffle lowering so I've kept the extra costs to a minimum - someone who knows this code can add extra costs which should improve vectorization a lot more. Differential Revision: https://reviews.llvm.org/D48172 llvm-svn: 335329	2018-06-22 09:45:31 +00:00
Tim Shen	63f244c4f4	[SCEV] Re-apply r335197 (with Polly fixes). Summary: This initiates a discussion on changing Polly accordingly while re-applying r335197 (D48338). I have never worked on Polly. The proposed change to param_div_div_div_2.ll is not educated, but just patterns that match the output. All LLVM files are already reviewed in D48338. Reviewers: jdoerfert, bollu, efriedma Subscribers: jlebar, sanjoy, hiraditya, llvm-commits, bixia Differential Revision: https://reviews.llvm.org/D48453 llvm-svn: 335292	2018-06-21 21:29:54 +00:00
Nicolai Haehnle	1045928aab	AMDGPU: Convert test cases to the dimension-aware intrinsics Summary: Also explicitly port over some tests in llvm.amdgcn.image.* that were missing. Some tests are removed because they no longer apply (i.e. explicitly testing building an address vector via insertelement). This is in preparation for the eventual removal of the old-style intrinsics. Some additional notes: - constant-address-space-32bit.ll: change some GCN-NEXT to GCN because the instruction schedule was subtly altered - insert_vector_elt.ll: the old test didn't actually test anything, because %tmp1 was not used; remove the load, because it doesn't work (Because of the amdgpu_ps calling convention? In any case, it's orthogonal to what the test claims to be testing.) Change-Id: Idfa99b6512ad139e755e82b8b89548ab08f0afcf Reviewers: arsenm, rampitec Subscribers: MatzeB, qcolombet, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D48018 llvm-svn: 335229	2018-06-21 13:37:19 +00:00
David Green	d143c65de3	[DA] Enable -da-delinearize by default This enables da-delinearize in Dependence Analysis for delinearizing array accesses into multiple dimensions. This can help to increase the power of Dependence analysis on multi-dimensional arrays and prevent having to fall back to the slower and less accurate MIV tests. It adds static checks on the bounds of the arrays to ensure that one dimension doesn't overflow into another, and brings our code in line with our tests. Differential Revision: https://reviews.llvm.org/D45872 llvm-svn: 335217	2018-06-21 11:53:16 +00:00
Simon Pilgrim	2a9cde026c	[X86][AVX] Reduce v4f64/v4i64 shuffle costs (PR37882) These were being over cautious for costs for one/two op general shuffles - VSHUFPD doesn't have to replicate the same shuffle in both lanes like VSHUFPS does. llvm-svn: 335216	2018-06-21 11:37:13 +00:00
Tim Shen	433b9761ce	Revert "[SCEV] Improve zext(A /u B) and zext(A % B)" This reverts commit r335197, as some bots are not happy. llvm-svn: 335198	2018-06-21 02:15:32 +00:00
Tim Shen	5af61e0a28	[SCEV] Improve zext(A /u B) and zext(A % B) Summary: Try to match udiv and urem patterns, and sink zext down to the leaves. I'm not entirely sure why some unrelated tests change, but the added <nsw>s seem right. Reviewers: sanjoy Subscribers: jlebar, hiraditya, bixia, llvm-commits Differential Revision: https://reviews.llvm.org/D48338 llvm-svn: 335197	2018-06-21 01:49:07 +00:00
Roman Lebedev	42a1ff11fb	[NFC][SCEV] Add tests related to bit masking (PR37793) Summary: Related to https://bugs.llvm.org/show_bug.cgi?id=37793, https://reviews.llvm.org/D46760#1127287 We'd like to do this canonicalization https://rise4fun.com/Alive/Gmc But it is currently restricted by rL155136 / rL155362, which says: ``` // This is a constant shift of a constant shift. Be careful about hiding // shl instructions behind bit masks. They are used to represent multiplies // by a constant, and it is important that simple arithmetic expressions // are still recognizable by scalar evolution. // // The transforms applied to shl are very similar to the transforms applied // to mul by constant. We can be more aggressive about optimizing right // shifts. // // Combinations of right and left shifts will still be optimized in // DAGCombine where scalar evolution no longer applies. ``` I think these tests show that for constants, SCEV has no issues with that canonicalization. Reviewers: mkazantsev, spatel, efriedma, sanjoy Reviewed By: mkazantsev Subscribers: sanjoy, javed.absar, llvm-commits, stoklund, bixia Differential Revision: https://reviews.llvm.org/D48229 llvm-svn: 335101	2018-06-20 07:54:11 +00:00
Sanjoy Das	6e9b355cc9	Revert "[SCEV] Add nuw/nsw to mul ops in StrengthenNoWrapFlags" This reverts r334428. It incorrectly marks some multiplications as nuw. Tim Shen is working on a proper fix. Original commit message: [SCEV] Add nuw/nsw to mul ops in StrengthenNoWrapFlags where safe. Summary: Previously we would add them for adds, but not multiplies. llvm-svn: 335016	2018-06-19 04:09:44 +00:00
George Burgess IV	aa283d80fe	[MSSA] Print more optimization information In particular, when asked to print a MemoryAccess, we'll now print where defs are optimized to, and we'll print optimized access types. This patch also introduces an operator<< to make printing AliasResults easier. Patch by Juneyoung Lee! Differential Revision: https://reviews.llvm.org/D47860 llvm-svn: 334760	2018-06-14 19:55:53 +00:00
Justin Lebar	fe455464eb	[SCEV] Simplify zext/trunc idiom that appears when handling bitmasks. Summary: Specifically, we transform zext(2^K * (trunc X to iN)) to iM -> 2^K * (zext(trunc X to i{N-K}) to iM)<nuw> This is helpful because pulling the 2^K out of the zext allows further optimizations. Reviewers: sanjoy Subscribers: hiraditya, llvm-commits, timshen Differential Revision: https://reviews.llvm.org/D48158 llvm-svn: 334737	2018-06-14 17:13:48 +00:00
Justin Lebar	b326904dba	[SCEV] Simplify trunc-of-add/mul to add/mul-of-trunc under more circumstances. Summary: Previously we would do this simplification only if it did not introduce any new truncs (excepting new truncs which replace other cast ops). This change weakens this condition: If the number of truncs stays the same, but we're able to transform trunc(X + Y) to X + trunc(Y), that's still simpler, and it may open up additional transformations. While we're here, also clean up some duplicated code. Reviewers: sanjoy Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D48160 llvm-svn: 334736	2018-06-14 17:13:35 +00:00
Simon Pilgrim	607a1e2196	[CostModel][AArch64] Add cost tests for ALTERNATE/SELECT style shuffle masks Precursor to fixing a regression with SLP vectorizer for supporting SELECT shuffles (vs the current ALTERNATE) llvm-svn: 334714	2018-06-14 14:20:20 +00:00
Simon Pilgrim	32702cc86a	[CostModel] Recognise REVERSE shuffle mask if the elements come from the second src llvm-svn: 334698	2018-06-14 09:35:00 +00:00
Simon Pilgrim	9fd634db22	[CostModel][X86] Test showing failure to recognise REVERSE shuffle mask if the elements come from the second src llvm-svn: 334623	2018-06-13 17:12:11 +00:00
Simon Pilgrim	54a138a0c5	[CostModel] Recognise BROADCAST shuffle mask if the elements come from the second src llvm-svn: 334620	2018-06-13 16:52:02 +00:00
Simon Pilgrim	5af0b99ea4	[CostModel][X86] Test showing failure to recognise BROADCAST shuffle mask if the elements come from the second src llvm-svn: 334616	2018-06-13 16:33:42 +00:00
Max Kazantsev	0ed79620c6	[SimplifyIndVars] Ignore dead users IndVarSimplify sometimes makes transforms basing on users that are trivially dead. In particular, if DCE wasn't run before it, there may be a dead `sext/zext` in loop that will trigger widening transforms, however it makes no sense to do it. This patch teaches IndVarsSimplify ignore the mist trivial cases of that. Differential Revision: https://reviews.llvm.org/D47974 Reviewed By: sanjoy llvm-svn: 334567	2018-06-13 02:25:32 +00:00
Simon Pilgrim	e39fa6cbbb	[CostModel] Replace ShuffleKind::SK_Alternate with ShuffleKind::SK_Select (PR33744) As discussed on PR33744, this patch relaxes ShuffleKind::SK_Alternate which requires shuffle masks to only match an alternating pattern from its 2 sources: e.g. v4f32: <0,5,2,7> or <4,1,6,3> This seems far too restrictive as most SIMD hardware which will implement it using a general blend/bit-select instruction, so replaces it with SK_Select, permitting elements from either source as long as they are inline: e.g. v4f32: <0,5,2,7>, <4,1,6,3>, <0,1,6,7>, <4,1,2,3> etc. This initial patch just updates the name and cost model shuffle mask analysis, later patch reviews will update SLP to better utilise this - it still limits itself to SK_Alternate style patterns. Differential Revision: https://reviews.llvm.org/D47985 llvm-svn: 334513	2018-06-12 16:12:29 +00:00
Simon Pilgrim	0783921987	[CostModel] Treat Identity shuffle masks as zero cost As discussed on D47985, identity shuffle masks should probably be free. I've limited this to the case where the input and output types all match - but we could probably accept all cases. Differential Revision: https://reviews.llvm.org/D47986 llvm-svn: 334506	2018-06-12 14:47:13 +00:00

1 2 3 4 5 ...

1704 Commits