llvm-project

Commit Graph

Author	SHA1	Message	Date
Florian Hahn	7902405c42	[ValueTracking] Fix a misuse of APInt in GetPointerBaseWithConstantOffset GetPointerBaseWithConstantOffset include this code, where ByteOffset and GEPOffset are both of type llvm::APInt : ByteOffset += GEPOffset.getSExtValue(); The problem with this line is that getSExtValue() returns an int64_t, but the += matches an overload for uint64_t. The problem is that the resulting APInt is no longer considered to be signed. That in turn causes assertion failures later on if the relevant pointer type is > 64 bits in width and the GEPOffset was negative. Changing it to ByteOffset += GEPOffset.sextOrTrunc(ByteOffset.getBitWidth()); resolves the issue and explicitly performs the sign-extending or truncation. Additionally, instead of asserting later if the result is > 64 bits, it breaks out of the loop in that case. See also https://reviews.llvm.org/D24729 https://reviews.llvm.org/D24772 This commit must be merged after D38662 in order for the test to pass. Patch by Michael Ferguson <mpfergu@gmail.com>. Reviewers: reames, sanjoy, hfinkel Reviewed By: hfinkel Differential Revision: https://reviews.llvm.org/D38501 llvm-svn: 350395	2019-01-04 14:53:22 +00:00
Simon Pilgrim	c2aadfaaad	[SLPVectorizer] Flag ADD/SUB SSAT/USAT intrinsics trivially vectorizable (PR40123) Enables SLP vectorization for the SSE2 PADDS/PADDUS/PSUBS/PSUBUS style intrinsics llvm-svn: 350300	2019-01-03 12:18:23 +00:00
Hal Finkel	4f2381440d	[BasicAA] Support arbitrary pointer sizes (and fix an overflow bug) Motivated by the discussion in D38499, this patch updates BasicAA to support arbitrary pointer sizes by switching most remaining non-APInt calculations to use APInt. The size of these APInts is set to the maximum pointer size (maximum over all address spaces described by the data layout string). Most of this translation is straightforward, but this patch contains a fix for a bug that revealed itself during this translation process. In order for test/Analysis/BasicAA/gep-and-alias.ll to pass, which is run with 32-bit pointers, the intermediate calculations must be performed using 64-bit integers. This is because, as noted in the patch, when GetLinearExpression decomposes an expression into C1V+C2, and we then multiply this by Scale, and distribute, to get (C1Scale)V + C2Scale, it can be the case that, even through C1V+C2 does not overflow for relevant values of V, (C2Scale) can overflow. If this happens, later logic will draw invalid conclusions from the (base) offset value. Thus, when initially applying the APInt conversion, because the maximum pointer size in this test is 32 bits, it started failing. Suspicious, I created a 64-bit version of this test (included here), and that failed (miscompiled) on trunk for a similar reason (the multiplication can overflow). After fixing this overflow bug, the first test case (at least) in Analysis/BasicAA/q.bad.ll started failing. This is also a 32-bit test, and was relying on having 64-bit intermediate values to have BasicAA return an accurate result. In order to fix this problem, and because I believe that it is not uncommon to use i64 indexing expressions in 32-bit code (especially portable code using int64_t), it seems reasonable to always use at least 64-bit integers. In this way, we won't regress our analysis capabilities (and there's a command-line option added, so experimenting with this should be easy). As pointed out by Eli during the review, there are other potential overflow conditions that this patch does not address. Fixing those is left to follow-up work. Patch by me with contributions from Michael Ferguson (mferguson@cray.com). Differential Revision: https://reviews.llvm.org/D38662 llvm-svn: 350220	2019-01-02 16:28:09 +00:00
Nikita Popov	bc9986e9ad	Reapply "[BDCE][DemandedBits] Detect dead uses of undead instructions" This (mostly) fixes https://bugs.llvm.org/show_bug.cgi?id=39771. BDCE currently detects instructions that don't have any demanded bits and replaces their uses with zero. However, if an instruction has multiple uses, then some of the uses may be dead (have no demanded bits) even though the instruction itself is still live. This patch extends DemandedBits/BDCE to detect such uses and replace them with zero. While this will not immediately render any instructions dead, it may lead to simplifications (in the motivating case, by converting a rotate into a simple shift), break dependencies, etc. The implementation tries to strike a balance between analysis power and complexity/memory usage. Originally I wanted to track demanded bits on a per-use level, but ultimately we're only really interested in whether a use is entirely dead or not. I'm using an extra set to track which uses are dead. However, as initially all uses are dead, I'm not storing uses those user is also dead. This case is checked separately instead. The previous attempt to land this lead to miscompiles, because cases where uses were initially dead but were later found to be live during further analysis were not always correctly removed from the DeadUses set. This is fixed now and the added test case demanstrates such an instance. Differential Revision: https://reviews.llvm.org/D55563 llvm-svn: 350188	2019-01-01 10:05:26 +00:00
George Burgess IV	69952979da	[MemoryLocation] Use LocationSize instead of ints; NFC Trying to keep these patches super small so they're easily post-commit verifiable, as requested in D44748. This one sadly isn't super small, but all of the changes here are either to: - libfuncs that are passed a constant size (memcpy, memset, ...) - instructions that store/load a constant size So they have to be precise llvm-svn: 350017	2018-12-23 03:36:44 +00:00
George Burgess IV	8c5413f3f7	[Loads] Use LocationSize instead of ints; NFC Keeping these patches super small so they're easily post-commit verifiable, as requested in D44748. This tries to find literal loads/stores of the given type, so this has to be precise. llvm-svn: 350016	2018-12-23 03:10:56 +00:00
George Burgess IV	1329cf1791	[Lint] Use LocationSize instead of ints; NFC Keeping these patches super small so they're easily post-commit verifiable, as requested in D44748. llvm-svn: 350015	2018-12-23 02:50:08 +00:00
George Burgess IV	685e781d55	[AAEval] Use LocationSize instead of ints; NFC Keeping these patches super small so they're easily post-commit verifiable, as requested in D44748. llvm-svn: 350014	2018-12-23 02:39:58 +00:00
George Burgess IV	640be69249	[Analysis] More LocationSize cleanup; NFC Keeping these patches super small so they're easily post-commit verifiable, as requested in D44748. llvm-svn: 350008	2018-12-22 18:23:21 +00:00
Vedant Kumar	b264d69de7	[IR] Add Instruction::isLifetimeStartOrEnd, NFC Instruction::isLifetimeStartOrEnd() checks whether an Instruction is an llvm.lifetime.start or an llvm.lifetime.end intrinsic. This was suggested as a cleanup in D55967. Differential Revision: https://reviews.llvm.org/D56019 llvm-svn: 349964	2018-12-21 21:49:40 +00:00
Reid Kleckner	84a2d29681	[BasicAA] Fix AA bug on dynamic allocas and stackrestore Summary: BasicAA has special logic for unescaped allocas, which normally applies equally well to dynamic and static allocas. However, llvm.stackrestore has the power to end the lifetime of dynamic allocas, without referring to them directly. stackrestore is already marked with the most conservative memory modification attributes, but because the alloca is not escaped, the normal logic produces incorrect results. I think BasicAA needs a special case here to teach it about the relationship between dynamic allocas and stackrestore. Fixes PR40118 Reviewers: gbiv, efriedma, george.burgess.iv Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D55969 llvm-svn: 349945	2018-12-21 19:59:03 +00:00
Florian Hahn	ef307b8c26	[LAA] Avoid generating RT checks for known deps preventing vectorization. If we found unsafe dependences other than 'unknown', we already know at compile time that they are unsafe and the runtime checks should always fail. So we can avoid generating them in those cases. This should have no negative impact on performance as the runtime checks that would be created previously should always fail. As a sanity check, I measured the test-suite, spec2k and spec2k6 and there were no regressions. Reviewers: Ayal, anemet, hsaito Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D55798 llvm-svn: 349794	2018-12-20 18:49:09 +00:00
Clement Courbet	d4bd3eb85d	[NFC] Fix trailing comma after function. lib/Analysis/VectorUtils.cpp:482:2: warning: extra ‘;’ [-Wpedantic] llvm-svn: 349732	2018-12-20 09:20:07 +00:00
Michael Kruse	978ba61536	Introduce llvm.loop.parallel_accesses and llvm.access.group metadata. The current llvm.mem.parallel_loop_access metadata has a problem in that it uses LoopIDs. LoopID unfortunately is not loop identifier. It is neither unique (there's even a regression test assigning the some LoopID to multiple loops; can otherwise happen if passes such as LoopVersioning make copies of entire loops) nor persistent (every time a property is removed/added from a LoopID's MDNode, it will also receive a new LoopID; this happens e.g. when calling Loop::setLoopAlreadyUnrolled()). Since most loop transformation passes change the loop attributes (even if it just to mark that a loop should not be processed again as llvm.loop.isvectorized does, for the versioned and unversioned loop), the parallel access information is lost for any subsequent pass. This patch unlinks LoopIDs and parallel accesses. llvm.mem.parallel_loop_access metadata on instruction is replaced by llvm.access.group metadata. llvm.access.group points to a distinct MDNode with no operands (avoiding the problem to ever need to add/remove operands), called "access group". Alternatively, it can point to a list of access groups. The LoopID then has an attribute llvm.loop.parallel_accesses with all the access groups that are parallel (no dependencies carries by this loop). This intentionally avoid any kind of "ID". Loops that are clones/have their attributes modifies retain the llvm.loop.parallel_accesses attribute. Access instructions that a cloned point to the same access group. It is not necessary for each access to have it's own "ID" MDNode, but those memory access instructions with the same behavior can be grouped together. The behavior of llvm.mem.parallel_loop_access is not changed by this patch, but should be considered deprecated. Differential Revision: https://reviews.llvm.org/D52116 llvm-svn: 349725	2018-12-20 04:58:07 +00:00
Nikita Popov	3817ee7908	Revert "[BDCE][DemandedBits] Detect dead uses of undead instructions" This reverts commit r349674. It causes a failure in test-suite enc-3des.execution_time. llvm-svn: 349684	2018-12-19 22:09:02 +00:00
Nikita Popov	649e125451	[BDCE][DemandedBits] Detect dead uses of undead instructions This (mostly) fixes https://bugs.llvm.org/show_bug.cgi?id=39771. BDCE currently detects instructions that don't have any demanded bits and replaces their uses with zero. However, if an instruction has multiple uses, then some of the uses may be dead (have no demanded bits) even though the instruction itself is still live. This patch extends DemandedBits/BDCE to detect such uses and replace them with zero. While this will not immediately render any instructions dead, it may lead to simplifications (in the motivating case, by converting a rotate into a simple shift), break dependencies, etc. The implementation tries to strike a balance between analysis power and complexity/memory usage. Originally I wanted to track demanded bits on a per-use level, but ultimately we're only really interested in whether a use is entirely dead or not. I'm using an extra set to track which uses are dead. However, as initially all uses are dead, I'm not storing uses those user is also dead. This case is checked separately instead. The test case has a couple of cases that are not simplified yet. In particular, we're only looking at uses of instructions right now. I think it would make sense to also extend this to arguments. Furthermore DemandedBits doesn't yet know some of the tricks that InstCombine does for the demanded bits or bitwise or/and/xor in combination with known bits information. Differential Revision: https://reviews.llvm.org/D55563 llvm-svn: 349674	2018-12-19 19:56:21 +00:00
Sanjay Patel	798c5982a0	[ValueTracking] remove unused parameters from helper functions; NFC llvm-svn: 349641	2018-12-19 16:49:18 +00:00
Florian Hahn	485f2826ba	[LAA] Introduce enum for vectorization safety status (NFC). This patch adds a VectorizationSafetyStatus enum, which will be extended in a follow up patch to distinguish between 'safe with runtime checks' and 'known unsafe' dependences. Reviewers: anemet, anna, Ayal, hsaito Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D54892 llvm-svn: 349556	2018-12-18 22:25:11 +00:00
Pete Cooper	be4f571107	Change the objc ARC optimizer to use the new objc.* intrinsics We're moving ARC optimisation and ARC emission in clang away from runtime methods and towards intrinsics. This is the part which actually uses the intrinsics in the ARC optimizer when both analyzing the existing calls and emitting new ones. Differential Revision: https://reviews.llvm.org/D55348 Reviewers: ahatanak llvm-svn: 349534	2018-12-18 20:32:49 +00:00
Artur Pilipenko	2a0146e0fd	[CaptureTracking] Pass MaxUsesToExplore from wrappers to the actual implementation This is a follow up for rL347910. In the original patch I somehow forgot to pass the limit from wrappers to the function which actually does the job. llvm-svn: 349438	2018-12-18 03:32:33 +00:00
Nikita Popov	221f3fc750	[InstSimplify] Simplify saturating add/sub + icmp If a saturating add/sub has one constant operand, then we can determine the possible range of outputs it can produce, and simplify an icmp comparison based on that. The implementation is based on a similar existing mechanism for simplifying binary operator + icmps. Differential Revision: https://reviews.llvm.org/D55735 llvm-svn: 349369	2018-12-17 17:45:18 +00:00
Wei Mi	66c6c5abea	[SampleFDO] handle ProfileSampleAccurate when initializing function entry count ProfileSampleAccurate is used to indicate the profile has exact match to the code to be optimized. Previously ProfileSampleAccurate is handled in ProfileSummaryInfo::isColdCallSite and ProfileSummaryInfo::isColdBlock. A better solution is to initialize function entry count to 0 when ProfileSampleAccurate is true, so we don't have to handle ProfileSampleAccurate in multiple places. Differential Revision: https://reviews.llvm.org/D55660 llvm-svn: 349088	2018-12-13 21:51:42 +00:00
Easwaran Raman	5a7056fa03	[ThinLTO] Compute synthetic function entry count Summary: This patch computes the synthetic function entry count on the whole program callgraph (based on module summary) and writes the entry counts to the summary. After function importing, this count gets attached to the IR as metadata. Since it adds a new field to the summary, this bumps up the version. Reviewers: tejohnson Subscribers: mehdi_amini, inglorion, llvm-commits Differential Revision: https://reviews.llvm.org/D43521 llvm-svn: 349076	2018-12-13 19:54:27 +00:00
David L. Jones	54c01ad6a9	Revert r348645 - "[MemCpyOpt] memset->memcpy forwarding with undef tail" This revision caused trucated memsets for structs with padding. See: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181210/610520.html llvm-svn: 349002	2018-12-13 03:15:11 +00:00
Michael Kruse	7244852557	[Unroll/UnrollAndJam/Vectorizer/Distribute] Add followup loop attributes. When multiple loop transformation are defined in a loop's metadata, their order of execution is defined by the order of their respective passes in the pass pipeline. For instance, e.g. #pragma clang loop unroll_and_jam(enable) #pragma clang loop distribute(enable) is the same as #pragma clang loop distribute(enable) #pragma clang loop unroll_and_jam(enable) and will try to loop-distribute before Unroll-And-Jam because the LoopDistribute pass is scheduled after UnrollAndJam pass. UnrollAndJamPass only supports one inner loop, i.e. it will necessarily fail after loop distribution. It is not possible to specify another execution order. Also,t the order of passes in the pipeline is subject to change between versions of LLVM, optimization options and which pass manager is used. This patch adds 'followup' attributes to various loop transformation passes. These attributes define which attributes the resulting loop of a transformation should have. For instance, !0 = !{!0, !1, !2} !1 = !{!"llvm.loop.unroll_and_jam.enable"} !2 = !{!"llvm.loop.unroll_and_jam.followup_inner", !3} !3 = !{!"llvm.loop.distribute.enable"} defines a loop ID (!0) to be unrolled-and-jammed (!1) and then the attribute !3 to be added to the jammed inner loop, which contains the instruction to distribute the inner loop. Currently, in both pass managers, pass execution is in a fixed order and UnrollAndJamPass will not execute again after LoopDistribute. We hope to fix this in the future by allowing pass managers to run passes until a fixpoint is reached, use Polly to perform these transformations, or add a loop transformation pass which takes the order issue into account. For mandatory/forced transformations (e.g. by having been declared by #pragma omp simd), the user must be notified when a transformation could not be performed. It is not possible that the responsible pass emits such a warning because the transformation might be 'hidden' in a followup attribute when it is executed, or it is not present in the pipeline at all. For this reason, this patche introduces a WarnMissedTransformations pass, to warn about orphaned transformations. Since this changes the user-visible diagnostic message when a transformation is applied, two test cases in the clang repository need to be updated. To ensure that no other transformation is executed before the intended one, the attribute `llvm.loop.disable_nonforced` can be added which should disable transformation heuristics before the intended transformation is applied. E.g. it would be surprising if a loop is distributed before a #pragma unroll_and_jam is applied. With more supported code transformations (loop fusion, interchange, stripmining, offloading, etc.), transformations can be used as building blocks for more complex transformations (e.g. stripmining+stripmining+interchange -> tiling). Reviewed By: hfinkel, dmgreen Differential Revision: https://reviews.llvm.org/D49281 Differential Revision: https://reviews.llvm.org/D55288 llvm-svn: 348944	2018-12-12 17:32:52 +00:00
Wei Mi	7da5a08e1a	[SampleFDO] Extend profile-sample-accurate option to cover isFunctionColdInCallGraph For SampleFDO, when a callsite doesn't appear in the profile, it will not be marked as cold callsite unless the option -profile-sample-accurate is specified. But profile-sample-accurate doesn't cover function isFunctionColdInCallGraph which is used to decide whether a function should be put into text.unlikely section, so even if the user knows the profile is accurate and specifies profile-sample-accurate, those functions not appearing in the sample profile are still not be put into text.unlikely section right now. The patch fixes that. Differential Revision: https://reviews.llvm.org/D55567 llvm-svn: 348940	2018-12-12 17:09:27 +00:00
Nikita Popov	79c994d976	[ConstantFolding] Handle leading zero-size elements in load folding Struct types may have leading zero-size elements like [0 x i32], in which case the "real" element at offset 0 will not necessarily coincide with the 0th element of the aggregate. ConstantFoldLoadThroughBitcast() wants to drill down the element at offset 0, but currently always picks the 0th aggregate element to do so. This patch changes the code to find the first non-zero-size element instead, for the struct case. The motivation behind this change is https://github.com/rust-lang/rust/issues/48627. Rust is fond of emitting [0 x iN] separators between struct elements to enforce alignment, which prevents constant folding in this particular case. The additional tests with [4294967295 x [0 x i32]] check that we don't end up unnecessarily looping over a large number of zero-size elements of a zero-size array. Differential Revision: https://reviews.llvm.org/D55169 llvm-svn: 348895	2018-12-11 20:29:16 +00:00
Fedor Sergeev	a1d95c3fc4	[NewPM] fixing asserts on deleted loop in -print-after-all IR-printing AfterPass instrumentation might be called on a loop that has just been invalidated. We should skip printing it to avoid spurious asserts. Reviewed By: chandlerc, philip.pfaffe Differential Revision: https://reviews.llvm.org/D54740 llvm-svn: 348887	2018-12-11 19:05:35 +00:00
Nikita Popov	94b8e2ea4e	[MemCpyOpt] memset->memcpy forwarding with undef tail Currently memcpyopt optimizes cases like memset(a, byte, N); memcpy(b, a, M); to memset(a, byte, N); memset(b, byte, M); if M <= N. Often this allows further simplifications down the line, which drop the first memset entirely. This patch extends this optimization for the case where M > N, but we know that the bytes a[N..M] are undef due to alloca/lifetime.start. This situation arises relatively often for Rust code, because Rust does not initialize trailing structure padding and loves to insert redundant memcpys. This also fixes https://bugs.llvm.org/show_bug.cgi?id=39844. For the implementation, I'm reusing a bit of code for a similar existing optimization (direct memcpy of undef). I've also added memset support to MemDepAnalysis GetLocation -- Instead, getPointerDependencyFrom could be used, but it seems to make more sense to add this to GetLocation and thus make the computation cachable. Differential Revision: https://reviews.llvm.org/D55120 llvm-svn: 348645	2018-12-07 21:16:58 +00:00
Nikita Popov	110cf05203	Reapply "[DemandedBits][BDCE] Support vectors of integers" DemandedBits and BDCE currently only support scalar integers. This patch extends them to also handle vector integer operations. In this case bits are not tracked for individual vector elements, instead a bit is demanded if it is demanded for any of the elements. This matches the behavior of computeKnownBits in ValueTracking and SimplifyDemandedBits in InstCombine. Unlike the previous iteration of this patch, getDemandedBits() can now again be called on arbirary (sized) instructions, even if they don't have integer or vector of integer type. (For vector types the size of the returned mask will now be the scalar size in bits though.) The added LoopVectorize test case shows a case which triggered an assertion failure with the previous attempt, because getDemandedBits() was called on a pointer-typed instruction. Differential Revision: https://reviews.llvm.org/D55297 llvm-svn: 348602	2018-12-07 15:38:13 +00:00
Nikita Popov	14ca9a8355	Revert "[DemandedBits][BDCE] Support vectors of integers" This reverts commit r348549. Causing assertion failures during clang build. llvm-svn: 348558	2018-12-07 00:42:03 +00:00
Nikita Popov	cf65b9207b	[DemandedBits][BDCE] Support vectors of integers DemandedBits and BDCE currently only support scalar integers. This patch extends them to also handle vector integer operations. In this case bits are not tracked for individual vector elements, instead a bit is demanded if it is demanded for any of the elements. This matches the behavior of computeKnownBits in ValueTracking and SimplifyDemandedBits in InstCombine. The getDemandedBits() method can now only be called on instructions that have integer or vector of integer type. Previously it could be called on any sized instruction (even if it was not particularly useful). The size of the return value is now always the scalar size in bits (while previously it was the type size in bits). Differential Revision: https://reviews.llvm.org/D55297 llvm-svn: 348549	2018-12-06 23:50:32 +00:00
David L. Jones	5ff7b8a04a	Revert r347934 "[SCEV] Guard movement of insertion point for loop-invariants" This change caused SEGVs in instcombine. (The r347934 change seems to me to be a precipitating cause, not a root cause. Details are on the llvm-commits thread for r347934.) llvm-svn: 348426	2018-12-05 23:13:50 +00:00
Sanjay Patel	3d5bb15a1d	[CmpInstAnalysis] fix function signature for ICmp code to predicate; NFC The old function underspecified the return type, took an unused parameter, and had a misleading name. llvm-svn: 348292	2018-12-04 18:53:27 +00:00
Sanjay Patel	472652ef68	[CmpInstAnalysis] fix formatting; NFC There are potential improvements to the structure of this API raised by D54994, but remove some cosmetic blemishes before making any functional changes. llvm-svn: 348149	2018-12-03 15:48:30 +00:00
Fedor Sergeev	7254d3c51c	Fixing -print-module-scope for legacy SCC passes It appears that print-module-scope was not implemented for legacy SCC passes. Fixed to print a whole module instead of just current SCC. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D54793 llvm-svn: 348144	2018-12-03 14:48:15 +00:00
Nikita Popov	687b92cd9c	[ValueTracking] Support funnel shifts in computeKnownBits() If the shift amount is known, we can determine the known bits of the output based on the known bits of two inputs. This is essentially the same functionality as implemented in D54869, but for ValueTracking rather than InstCombine SimplifyDemandedBits. Differential Revision: https://reviews.llvm.org/D55140 llvm-svn: 348091	2018-12-02 14:14:11 +00:00
Sanjay Patel	7d82d37854	[ValueTracking] add helper function for testing implied condition; NFCI We were duplicating code around the existing isImpliedCondition() that checks for a predecessor block/dominating condition, so make that a wrapper call. llvm-svn: 348088	2018-12-02 13:26:03 +00:00
Teresa Johnson	5b8ff375c8	[ThinLTO] Allow importing of functions with var args Summary: Follow up to D54270, which allowed importing of var args functions unless they called va_start. As pointed out in the post-commit comments on that patch, the inliner can handle functions that call va_start in certain situations as well. Go ahead and enable importing of all var args functions. Measurements on a large binary show that this increases imports and binary size by an insignificant amount. Reviewers: davidxl Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, llvm-commits Differential Revision: https://reviews.llvm.org/D54607 llvm-svn: 348068	2018-12-01 05:11:46 +00:00
Nicolai Haehnle	413f8691ab	LegacyDivergenceAnalysis: fix uninitialized value Change-Id: I014502e431a68f7beddf169f6a3d19dac5dd2c26 llvm-svn: 348051	2018-11-30 23:07:49 +00:00
Nicolai Haehnle	56d0ed2a50	[DA] GPUDivergenceAnalysis for unstructured GPU kernels Summary: This is patch #3 of the new DivergenceAnalysis <https://lists.llvm.org/pipermail/llvm-dev/2018-May/123606.html> The GPUDivergenceAnalysis is intended to eventually supersede the existing LegacyDivergenceAnalysis. The existing LegacyDivergenceAnalysis produces incorrect results on unstructured Control-Flow Graphs: <https://bugs.llvm.org/show_bug.cgi?id=37185> This patch adds the option -use-gpu-divergence-analysis to the LegacyDivergenceAnalysis to turn it into a transparent wrapper for the GPUDivergenceAnalysis. Reviewers: nhaehnle Reviewed By: nhaehnle Subscribers: jholewinski, jvesely, jfb, llvm-commits, alex-t, sameerds, arsenm, nhaehnle Differential Revision: https://reviews.llvm.org/D53493 llvm-svn: 348048	2018-11-30 22:55:20 +00:00
Renato Golin	de4b88e5ac	Fix parenthesis warning in IVDescriptors llvm-svn: 347990	2018-11-30 13:54:36 +00:00
Renato Golin	135e72e1b9	Add a new reduction pattern match Adding a new reduction pattern match for vectorizing code similar to TSVC s3111: for (int i = 0; i < N; i++) if (a[i] > b) sum += a[i]; This patch adds support for fadd, fsub and fmull, as well as multiple branches and different (but compatible) instructions (ex. add+sub) in different branches. The difference from the previous patch(https://reviews.llvm.org/D49168) is as follows: - Added check of fast-math property of fp-instruction to the previous patch - Fix/add some pattern for if-reduction.ll Differential Revision: https://reviews.llvm.org/D54464 Patch by Takahiro Miyoshi <takahiro.miyoshi@linaro.org> and Masakazu Ueno <masakazu.ueno@linaro.org> llvm-svn: 347989	2018-11-30 13:40:10 +00:00
Warren Ristow	72d1f3a285	[SCEV] Guard movement of insertion point for loop-invariants r320789 suppressed moving the insertion point of SCEV expressions with dev/rem operations to the loop header in non-loop-invariant situations. This, and similar, hoisting is also unsafe in the loop-invariant case, since there may be a guard against a zero denominator. This is an adjustment to the fix of r320789 to suppress the movement even in the loop-invariant case. This fixes PR30806. Differential Revision: https://reviews.llvm.org/D54713 llvm-svn: 347934	2018-11-30 00:02:54 +00:00
Artur Pilipenko	eba2365f23	Introduce MaxUsesToExplore argument to capture tracking Currently CaptureTracker gives up if it encounters a value with more than 20 uses. The motivation for this cap is to keep it relatively cheap for BasicAliasAnalysis use case, where the results can't be cached. Although, other clients of CaptureTracker might be ok with higher cost. This patch introduces an argument for PointerMayBeCaptured functions to specify the max number of uses to explore. The motivation for this change is a downstream user of CaptureTracker, but I believe upstream clients of CaptureTracker might also benefit from more fine grained cap. Reviewed By: hfinkel Differential Revision: https://reviews.llvm.org/D55042 llvm-svn: 347910	2018-11-29 20:08:12 +00:00
Sanjay Patel	d802270808	[InstSimplify] fold select with implied condition This is an almost direct move of the functionality from InstCombine to InstSimplify. There's no reason not to do this in InstSimplify because we never create a new value with this transform. (There's a question of whether any dominance-based transform belongs in either of these passes, but that's a separate issue.) I've changed 1 of the conditions for the fold (1 of the blocks for the branch must be the block we started with) into an assert because I'm not sure how that could ever be false. We need 1 extra check to make sure that the instruction itself is in a basic block because passes other than InstCombine may be using InstSimplify as an analysis on values that are not wired up yet. The 3-way compare changes show that InstCombine has some kind of phase-ordering hole. Otherwise, we would have already gotten the intended final result that we now show here. llvm-svn: 347896	2018-11-29 18:44:39 +00:00
Artur Pilipenko	8b92c1d142	NFC. Use unsigned type for uses counter in CaptureTracking llvm-svn: 347826	2018-11-29 02:15:35 +00:00
Nikita Popov	cf596a8c26	[ValueTracking] Determine always-overflow condition for unsigned sub Always-overflow was already determined for unsigned addition, but not subtraction. This patch establishes parity. This allows us to perform some additional simplifications for signed saturating subtractions. This change is part of https://reviews.llvm.org/D54534. llvm-svn: 347771	2018-11-28 16:37:04 +00:00
Vitaly Buka	44abeb5c3c	[stack-safety] Update comment llvm-svn: 347626	2018-11-27 01:56:44 +00:00
Vitaly Buka	7792f5f145	[stack-safety] Fix and uncomment assert llvm-svn: 347625	2018-11-27 01:56:35 +00:00

1 2 3 4 5 ...

8304 Commits