Commit Graph

10249 Commits

Author SHA1 Message Date
Sanjay Patel b3f0c2653b [Analysis] simplify propagation of FMF in recurrences; NFC
This is a mess, but this is hopefully no-functional-change.
The 'Prev' descriptor is only used for min/max recurrences
or when starting a match from a phi, so it should not be a
factor when propagating FMF for fmul/fadd.

The API is confusing (and should be reduced in subsequent steps)
because the "UnsafeAlgebraInst" appears to actually be a placeholder
for a recurrence that does NOT have FMF, but we still want to
treat it as reassociative.
2021-03-03 17:28:10 -05:00
Philip Reames c8cf27e333 Fix a build warning from ea7d208 2021-03-03 09:16:56 -08:00
Philip Reames ea7d208b78 [basicaa] Rewrite isGEPBaseAtNegativeOffset in terms of index difference [mostly NFC]
This is almost purely NFC, it just fits more obviously in the flow of the code now that we've standardized on the index different approach.  The non-NFC bit is that because of canceling the VariableOffsets in the subtract, we can now handle the case where both sides involve a common variable offset.  This isn't an "interesting" improvement; it just happens to fall out of the natural code structure.

One subtle point - the placement of this above the BaseAlias check is important in the original code as this can return NoAlias even when we can't find a relation between the bases otherwise.

Also added some enhancement TODOs noticed while understanding the existing code.

Note: This is slightly different than the LGTMed version.  I fixed the "inbounds" issue Nikita noticed with the original code in e6e5ef4 and rebased this to include the same fix.

Differential Revision: https://reviews.llvm.org/D97520
2021-03-03 09:03:28 -08:00
Philip Reames e6e5ef40cb [basicaa] Fix a latent bug in isGEPBaseAtNegativeOffset
This was pointed out in review of D97520 by Nikita, but existed in the original code as well.

The basic issue is that a decomposed GEP expression describes (potentially) more than one getelementptr.  The "inbounds" derived UB which justifies this aliasing rule requires that the entire offset be composed of "inbounds" geps.  Otherwise, as can be seen in the recently added and changes in this patch test, we can end up with a large commulative offset with only a small sub-offset actually being "inbounds".  If that small sub-offset lies within the object, the result was unsound.

We could potentially be fancier here, but for the moment, simply be conservative when any of the GEPs parsed aren't inbounds.
2021-03-03 08:43:32 -08:00
Hans Wennborg 0a5dd06718 Revert "[ObjC][ARC] Use operand bundle 'clang.arc.attachedcall' instead of explicitly emitting retainRV or claimRV calls in the IR"
This caused miscompiles of Chromium tests for iOS due clobbering of live
registers. See discussion on the code review for details.

> Background:
>
> This fixes a longstanding problem where llvm breaks ARC's autorelease
> optimization (see the link below) by separating calls from the marker
> instructions or retainRV/claimRV calls. The backend changes are in
> https://reviews.llvm.org/D92569.
>
> https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue
>
> What this patch does to fix the problem:
>
> - The front-end adds operand bundle "clang.arc.attachedcall" to calls,
>   which indicates the call is implicitly followed by a marker
>   instruction and an implicit retainRV/claimRV call that consumes the
>   call result. In addition, it emits a call to
>   @llvm.objc.clang.arc.noop.use, which consumes the call result, to
>   prevent the middle-end passes from changing the return type of the
>   called function. This is currently done only when the target is arm64
>   and the optimization level is higher than -O0.
>
> - ARC optimizer temporarily emits retainRV/claimRV calls after the calls
>   with the operand bundle in the IR and removes the inserted calls after
>   processing the function.
>
> - ARC contract pass emits retainRV/claimRV calls after the call with the
>   operand bundle. It doesn't remove the operand bundle on the call since
>   the backend needs it to emit the marker instruction. The retainRV and
>   claimRV calls are emitted late in the pipeline to prevent optimization
>   passes from transforming the IR in a way that makes it harder for the
>   ARC middle-end passes to figure out the def-use relationship between
>   the call and the retainRV/claimRV calls (which is the cause of
>   PR31925).
>
> - The function inliner removes an autoreleaseRV call in the callee if
>   nothing in the callee prevents it from being paired up with the
>   retainRV/claimRV call in the caller. It then inserts a release call if
>   claimRV is attached to the call since autoreleaseRV+claimRV is
>   equivalent to a release. If it cannot find an autoreleaseRV call, it
>   tries to transfer the operand bundle to a function call in the callee.
>   This is important since the ARC optimizer can remove the autoreleaseRV
>   returning the callee result, which makes it impossible to pair it up
>   with the retainRV/claimRV call in the caller. If that fails, it simply
>   emits a retain call in the IR if retainRV is attached to the call and
>   does nothing if claimRV is attached to it.
>
> - SCCP refrains from replacing the return value of a call with a
>   constant value if the call has the operand bundle. This ensures the
>   call always has at least one user (the call to
>   @llvm.objc.clang.arc.noop.use).
>
> - This patch also fixes a bug in replaceUsesOfNonProtoConstant where
>   multiple operand bundles of the same kind were being added to a call.
>
> Future work:
>
> - Use the operand bundle on x86-64.
>
> - Fix the auto upgrader to convert call+retainRV/claimRV pairs into
>   calls with the operand bundles.
>
> rdar://71443534
>
> Differential Revision: https://reviews.llvm.org/D92808

This reverts commit ed4718eccb.
2021-03-03 15:51:40 +01:00
Nikita Popov 3d8f842712 [LICM] Make promotion faster
Even when MemorySSA-based LICM is used, an AST is still populated
for scalar promotion. As the AST has quadratic complexity, a lot
of time is spent in this step despite the existing access count
limit. This patch optimizes the identification of promotable stores.

The idea here is pretty simple: We're only interested in must-alias
mod sets of loop invariant pointers. As such, only populate the AST
with loop-invariant loads and stores (anything else is definitely
not promotable) and then discard any sets which alias with any of
the remaining, definitely non-promotable accesses.

If we promoted something, check whether this has made some other
accesses loop invariant and thus possible promotion candidates.

This is much faster in practice, because we need to perform AA
queries for O(NumPromotable^2 + NumPromotable*NumNonPromotable)
instead of O(NumTotal^2), and NumPromotable tends to be small.
Additionally, promotable accesses have loop invariant pointers,
for which AA is cheaper.

This has a signicant positive compile-time impact. We save ~1.8%
geomean on CTMark at O3, with 6% on lencod in particular and 25%
on individual files.

Conceptually, this change is NFC, but may not be so in practice,
because the AST is only an approximation, and can produce
different results depending on the order in which accesses are
added. However, there is at least no impact on the number of promotions
(licm.NumPromoted) in test-suite O3 configuration with this change.

Differential Revision: https://reviews.llvm.org/D89264
2021-03-02 22:10:48 +01:00
dfukalov 6e967834b9 [AA] Cache (optionally) estimated PartialAlias offsets.
For the cases of two clobbering loads and one loaded object is fully contained
in the second `BasicAAResult::aliasGEP` returns just `PartialAlias` that
is actually more common case of partial overlap, it doesn't say anything about
actual overlapping sizes.

AA users such as GVN and DSE have no functionality to estimate aliasing of GEPs
with non-constant offsets. The change stores estimated relative offsets so they
can be used further.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D93529
2021-03-02 19:04:15 +03:00
Kazu Hirata d639120983 [llvm] Use set_is_subset (NFC) 2021-02-28 10:59:20 -08:00
Philip Reames f2cfef3596 Be more mathematicly precise about definition of recurrence [NFC]
This clarifies the interface of the matchSimpleRecurrence helper introduced in 8020be0b8 for non-commutative operators.  After ebd3aeba, I realized the original way I framed the routine was inconsistent.  For shifts, we only matched the the LHS form, but for sub we matched both and the caller wanted that information.  So, instead, we now consistently match both forms for non-commutative operators and the caller becomes responsible for filtering if needed.  I tried to put a clear warning in the header because I suspect the RHS form of e.g. a sub recurrence is non-obvious for most folks.  (It was for me.)
2021-02-26 11:22:01 -08:00
Philip Reames ebd3aeba27 Use helper introduced in 8020be0b8 to simplify ValueTracking [NFC]
Direct rewrite of the code the helper was extracted from.
2021-02-26 10:47:26 -08:00
Philip Reames 8020be0b8b Add a helper for matching simple recurrence cycles
This helper came up in another review, and I've got about 4 different patches with copies of this copied into it.  Time to precommit the routine.  :)
2021-02-26 10:21:23 -08:00
Stanislav Mekhanoshin d9c99043bd Option to ignore llvm[.compiler].used uses in hasAddressTaken()
Differential Revision: https://reviews.llvm.org/D96087
2021-02-25 10:06:24 -08:00
Stanislav Mekhanoshin 29e2d9461a Option to ignore assume like intrinsic uses in hasAddressTaken()
Differential Revision: https://reviews.llvm.org/D96081
2021-02-25 09:48:29 -08:00
Simon Pilgrim 96a3dfeb93 Revert rGd65ddca83ff85c7345fe9a0f5a15750f01e38420 - "[ValueTracking] ComputeKnownBits - minimum leading/trailing zero bits in LSHR/SHL (PR44526)"
This is causing sanitizer test failures that I haven't been able to fix yet.
2021-02-24 18:03:17 +00:00
Nico Weber 3d837ad704 Revert "[ValueTracking] computeKnownBitsFromShiftOperator - remove non-zero shift amount handling."
This reverts commit d37400168c.
Breaks Analysis/./AnalysisTests/ComputeKnownBitsTest.KnownNonZeroShift
2021-02-24 09:06:12 -05:00
Simon Pilgrim d37400168c [ValueTracking] computeKnownBitsFromShiftOperator - remove non-zero shift amount handling.
This no longer affects any tests after the improvements to the KnownBits shift helpers.
2021-02-24 13:49:13 +00:00
Simon Pilgrim d65ddca83f [ValueTracking] ComputeKnownBits - minimum leading/trailing zero bits in LSHR/SHL (PR44526)
Followup to D72573 - as detailed in https://blog.regehr.org/archives/1709 we don't make use of the known leading/trailing zeros for shifted values in cases where we don't know the shift amount value.

Stop ValueTracking returning zero for poison shift patterns and use the KnownBits shift helpers directly.

Extend KnownBits::shl to combine all possible shifted combinations if both min/max shift amount values are in range.

Differential Revision: https://reviews.llvm.org/D90479
2021-02-24 12:15:45 +00:00
Ta-Wei Tu 98c6110d9b [LoopNest] Use `getUniqueSuccessor()` instead when checking empty blocks
Blocks that contain only a single branch instruction to the next block can be skipped in analyzing the loop-nest structure.
This is currently done by `getSingleSuccessor()`.
However, the branch instruction might have multiple targets which happen to all be the same.
In this case, the block should still be considered as empty and skipped.

An example is `test/Transforms/LoopInterchange/update-condbranch-duplicate-successors.ll` (the LIT test for this patch is modified from it as well).

Reviewed By: Whitney

Differential Revision: https://reviews.llvm.org/D97286
2021-02-24 09:53:12 +08:00
Fangrui Song ef312951fd collectUsedGlobalVariables: migrate SmallPtrSetImpl overload to SmallVecImpl overload after D97128
And delete the SmallPtrSetImpl overload.

While here, decrease inline element counts from 8 to 4. See D97128 for the choice.

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D97257
2021-02-23 16:09:06 -08:00
Simon Pilgrim 1020d16156 [InstSimplify] Handle nsw shl -> poison patterns
Pulled out from D90479 - this recognises invalid nsw shl patterns with signbit changes that result in poison.

Differential Revision: https://reviews.llvm.org/D97305
2021-02-23 18:26:56 +00:00
Simon Pilgrim 18b9fc48f1 [InstructionSimplify] SimplifyShift - rename shift amount KnownBits. NFCI.
As suggested on D97305.
2021-02-23 18:12:59 +00:00
David Green dd2dbf7ee2 [TTI] Change getOperandsScalarizationOverhead to take Type args
As a followup to D95291, getOperandsScalarizationOverhead was still
using a VF as a vector factor if the arguments were scalar, and would
assert on certain matrix intrinsics with differently sized vector
arguments. This patch removes the VF arg, instead passing the Types
through directly. This should allow it to more accurately compute the
cost without having to guess at which operands will be vectorized,
something difficult with more complex intrinsics.

This adjusts one SVE test as it is now calling the wrong intrinsic vs
veccall. Without invalid InstructCosts the cost of the scalarized
intrinsic is too low. This should get fixed when the cost of
scalarization is accounted for with scalable types.

Differential Revision: https://reviews.llvm.org/D96287
2021-02-23 13:04:59 +00:00
David Green bd4b61efbd [CostModel] Remove VF from IntrinsicCostAttributes
getIntrinsicInstrCost takes a IntrinsicCostAttributes holding various
parameters of the intrinsic being costed. It can either be called with a
scalar intrinsic (RetTy==Scalar, VF==1), with a vector instruction
(RetTy==Vector, VF==1) or from the vectorizer with a scalar type and
vector width (RetTy==Scalar, VF>1). A RetTy==Vector, VF>1 is considered
an error. Both of the vector modes are expected to be treated the same,
but because this is confusing many backends end up getting it wrong.

Instead of trying work with those two values separately this removes the
VF parameter, widening the RetTy/ArgTys by VF used called from the
vectorizer. This keeps things simpler, but does require some other
modifications to keep things consistent.

Most backends look like this will be an improvement (or were not using
getIntrinsicInstrCost). AMDGPU needed the most changes to keep the code
from c230965ccf working. ARM removed the fix in
dfac521da1, webassembly happens to get a fixup for an SLP cost
issue and both X86 and AArch64 seem to now be using better costs from
the vectorizer.

Differential Revision: https://reviews.llvm.org/D95291
2021-02-23 13:03:26 +00:00
Arthur Eubanks 468fa037b2 Only verify LazyCallGraph under expensive checks
These verify calls are causing a lot of slowdown on some files, up to 8x.
The LazyCallGraph infra has been tested a lot over the years, so I'm fairly confident that we don't always need to run the verifys.

These verifies took >90% of total time in one of the compilations I looked at.

Reviewed By: thakis

Differential Revision: https://reviews.llvm.org/D97225
2021-02-22 20:18:59 -08:00
Kazu Hirata 896d0e1a2a [Analysis] Use range-based for loops (NFC) 2021-02-22 20:17:18 -08:00
Kazu Hirata 4ed47858ab [llvm] Use llvm::drop_begin (NFC) 2021-02-22 20:17:16 -08:00
Kazu Hirata 871affc5e7 [Analysis] Use ListSeparator (NFC) 2021-02-22 20:17:15 -08:00
Craig Topper 89440df64a [ValueTracking] Improve ComputeNumSignBits for SRem.
The result will have the same sign as the dividend unless the
result is 0. The magnitude of the result will always be less
than or equal to the dividend. So the result will have at least
as many sign bits as the dividend.

Previously we would do this if the divisor was a positive constant,
but that isn't required.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D97170
2021-02-22 14:36:25 -08:00
Simon Pilgrim 476ff0327b [InstSimplify] Cleanup out-of-range shift amount handling.
Use APInt::uge() direct instead of getLimitedValue().

Use KnownBits::getMinValue() to make the bounds check more obvious.
2021-02-22 17:00:49 +00:00
Kazu Hirata 047fc3bf20 [Analysis] Use ListSeparator (NFC) 2021-02-21 19:58:04 -08:00
Nikita Popov e0615bcd39 [Loads] Add optimized FindAvailableLoadedValue() overload (NFCI)
FindAvailableLoadedValue() accepts an iterator by reference. If no
available value is found, then the iterator will either be left
at a clobbering instruction or the beginning of the basic block.
This allows using FindAvailableLoadedValue() across multiple blocks.

If this functionality is not needed, as is the case in InstCombine,
then we can use a much more efficient implementation: First try
to find an available value, and only perform clobber checks if
we actually found one. As this function only looks at a very small
number of instructions (6 by default) and usually doesn't find an
available value, this saves many expensive alias analysis queries.
2021-02-21 18:42:56 +01:00
Nikita Popov 7c706aa0d8 [Loads] Extract helper frunction for available load/store (NFC)
This contains the logic for extracting an available load/store
from a given instruction, to be reused in a following patch.
2021-02-21 18:24:58 +01:00
Juneyoung Lee aacf7878bc [ValueTracking] Improve impliesPoison
This patch improves ValueTracking's impliesPoison(V1, V2) to do this reasoning:

```
  %res = call { i64, i1 } @llvm.umul.with.overflow.i64(i64 %a, i64 %b)
  %overflow = extractvalue { i64, i1 } %res, 1
  %mul      = extractvalue { i64, i1 } %res, 0

	; If %mul is poison, %overflow is also poison, and vice versa.
```

This improvement leads to supporting this optimization under `-instcombine-unsafe-select-transform=0`:

```
define i1 @test2_logical(i64 %a, i64 %b, i64* %ptr) {
; CHECK-LABEL: @test2_logical(
; CHECK-NEXT:    [[MUL:%.*]] = mul i64 [[A:%.*]], [[B:%.*]]
; CHECK-NEXT:    [[TMP1:%.*]] = icmp ne i64 [[A]], 0
; CHECK-NEXT:    [[TMP2:%.*]] = icmp ne i64 [[B]], 0
; CHECK-NEXT:    [[OVERFLOW_1:%.*]] = and i1 [[TMP1]], [[TMP2]]
; CHECK-NEXT:    [[NEG:%.*]] = sub i64 0, [[MUL]]
; CHECK-NEXT:    store i64 [[NEG]], i64* [[PTR:%.*]], align 8
; CHECK-NEXT:    ret i1 [[OVERFLOW_1]]
;

  %res = tail call { i64, i1 } @llvm.umul.with.overflow.i64(i64 %a, i64 %b)
  %overflow = extractvalue { i64, i1 } %res, 1
  %mul = extractvalue { i64, i1 } %res, 0
  %cmp = icmp ne i64 %mul, 0
  %overflow.1 = select i1 %overflow, i1 true, i1 %cmp
  %neg = sub i64 0, %mul
  store i64 %neg, i64* %ptr, align 8
  ret i1 %overflow.1
}
```

Previously, this didn't happen because the flag prevented `select i1 %overflow, i1 true, i1 %cmp` from being `or i1 %overflow, %cmp`.
Note that the select -> or conversion happens only when `impliesPoison(%cmp, %overflow)` returns true.
This improvement allows `impliesPoison` to do the reasoning.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D96929
2021-02-20 13:22:34 +09:00
Philip Reames b13e942224 [ValueTracking] Add a two argument form of safeCtxI [NFC]
The existing implementation was relying on order of evaluation to achieve a particular result.  This got really confusing when wanting to change the handling for arguments in a later patch.
2021-02-19 14:52:51 -08:00
Sanjay Patel 5b250a27ec [Analysis][LoopVectorize] do not form reductions of pointers
This is a fix for https://llvm.org/PR49215 either before/after
we make a verifier enhancement for vector reductions with D96904.

I'm not sure what the current thinking is for pointer math/logic
in IR. We allow icmp on pointer values. Therefore, we match min/max
patterns, so without this patch, the vectorizer could form a vector
reduction from that sequence.

But the LangRef definitions for min/max and vector reduction
intrinsics do not allow pointer types:
https://llvm.org/docs/LangRef.html#llvm-smax-intrinsic
https://llvm.org/docs/LangRef.html#llvm-vector-reduce-umax-intrinsic

So we would crash/assert at some point - either in IR verification,
in the cost model, or in codegen. If we do want to allow this kind
of transform, we will need to update the LangRef and all of those
parts of the compiler.

Differential Revision: https://reviews.llvm.org/D97047
2021-02-19 14:01:57 -05:00
Philip Reames 4a5edea193 [SCEV] Use both known bits and sign bits when computing range of SCEV unknowns
When computing a range for a SCEVUnknown, today we use computeKnownBits for unsigned ranges, and computeNumSignBots for signed ranges. This means we miss opportunities to improve range results.

One common missed pattern is that we have a signed range of a value which CKB can determine is positive, but CNSB doesn't convey that information. The current range includes the negative part, and is thus double the size.

Per the removed comment, the original concern which delayed using both (after some code merging years back) was a compile time concern. CTMark results (provided by Nikita, thanks!) showed a geomean impact of about 0.1%. This doesn't seem large enough to avoid higher quality results.

Differential Revision: https://reviews.llvm.org/D96534
2021-02-19 08:29:12 -08:00
Nikita Popov 2f17ed294f [DCE] Don't remove non-willreturn calls
In both ADCE and BDCE (via DemandedBits) we should not remove
instructions that are not guaranteed to return. This issue was
pointed out by fhahn in the recent llvm-dev thread.

Differential Revision: https://reviews.llvm.org/D96993
2021-02-19 12:35:40 +01:00
Nikita Popov 370addb996 [IR] Move willReturn() to Instruction
This moves the willReturn() helper from CallBase to Instruction,
so that it can be used in a more generic manner. This will make
it easier to fix additional passes (ADCE and BDCE), and will give
us one place to change if additional instructions should become
non-willreturn (e.g. there has been talk about handling volatile
operations this way).

I have also included the IntrinsicInst workaround directly in
here, so that it gets applied consistently. (As such this change
is not entirely NFC -- FuncAttrs will now use this as well.)

Differential Revision: https://reviews.llvm.org/D96992
2021-02-19 11:56:01 +01:00
Nikita Popov 1d9f4903c6 [BasicAA] Add simple depth limit to avoid stack overflow (PR49151)
This is a simpler variant of D96647. It just adds a straightforward
depth limit with a high cutoff, without introducing complex logic
for BatchAA consistency. It accepts that we may cache a sub-optimal
result if the depth limit is hit.

Eventually this should be more fully addressed by D96647 or similar,
but in the meantime this avoids stack overflows in a cheap way.

Differential Revision: https://reviews.llvm.org/D96996
2021-02-19 11:05:42 +01:00
Wei Mi 5fb65c02ca [SampleFDO] Stop repeated indirect call promotion for the same target.
Found a problem in indirect call promotion in sample loader pass. Currently
if an indirect call is promoted for a target, and if the parent function is
inlined into some other function, the indirect call can be promoted for the
same target again. That is redundent which can harm performance and can cause
excessive compile time in some extreme case.

The patch fixes the issue. If a target is promoted for an indirect call, the
patch will write ICP metadata with the target call count being set to 0.
In the later ICP in sample profile loader, if it sees a target has 0 count
for an indirect call, it knows the target has been promoted and won't do
indirect call promotion for the indirect call.

The fix brings 0.1~0.2% performance on our search benchmark.

Differential Revision: https://reviews.llvm.org/D96806
2021-02-18 17:01:32 -08:00
Nikita Popov 70e3c9a8b6 [BasicAA] Always strip single-argument phi nodes
We can always look through single-argument (LCSSA) phi nodes when
performing alias analysis. getUnderlyingObject() already does this,
but stripPointerCastsAndInvariantGroups() does not. We still look
through these phi nodes with the usual aliasPhi() logic, but
sometimes get sub-optimal results due to the restrictions on value
equivalence when looking through arbitrary phi nodes. I think it's
generally beneficial to keep the underlying object logic and the
pointer cast stripping logic in sync, insofar as it is possible.

With this patch we get marginally better results:

  aa.NumMayAlias | 5010069 | 5009861
  aa.NumMustAlias | 347518 | 347674
  aa.NumNoAlias | 27201336 | 27201528
  ...
  licm.NumPromoted | 1293 | 1296

I've renamed the relevant strip method to stripPointerCastsForAliasAnalysis(),
as we're past the point where we can explicitly spell out everything
that's getting stripped.

Differential Revision: https://reviews.llvm.org/D96668
2021-02-18 23:07:50 +01:00
Kazu Hirata e54579307b [llvm] Ensure newlines at the end of files (NFC)
This patch eliminates pesky "No newline at end of file" messages from
git diff.
2021-02-17 23:58:44 -08:00
William S. Moses 40862b1a74 [SROA] Propagate correct TBAA/TBAA Struct offsets
SROA does not correctly account for offsets in TBAA/TBAA struct metadata.
This patch creates functionality for generating new MD with the corresponding
offset and updates SROA to use this functionality.

Differential Revision: https://reviews.llvm.org/D95826
2021-02-17 11:59:00 -05:00
Kazu Hirata df35a183d7 [SCEV] Use ListSeparator (NFC) 2021-02-16 23:23:05 -08:00
Mircea Trofin 33481c9997 [mlgo] Fetch models from path / URL
Allow custom location for pre-trained models used when AOT-compiling
policies.

Differential Revision: https://reviews.llvm.org/D96796
2021-02-16 22:47:14 -08:00
Kerry McLaughlin ba1e150d03 [SVE] Add support for scalable vectorization of loops with int/fast FP reductions
This patch enables scalable vectorization of loops with integer/fast reductions, e.g:

```
unsigned sum = 0;
for (int i = 0; i < n; ++i) {
  sum += a[i];
}
```

A new TTI interface, isLegalToVectorizeReduction, has been added to prevent
reductions which are not supported for scalable types from vectorizing.
If the reduction is not supported for a given scalable VF,
computeFeasibleMaxVF will fall back to using fixed-width vectorization.

Reviewed By: david-arm, fhahn, dmgreen

Differential Revision: https://reviews.llvm.org/D95245
2021-02-16 13:50:06 +00:00
Sameer Sahasrabuddhe 11bf7da64a [NewPM] Introduce (GPU)DivergenceAnalysis in the new pass manager
The GPUDivergenceAnalysis is now renamed to just "DivergenceAnalysis"
since there is no conflict with LegacyDivergenceAnalysis. In the
legacy PM, this analysis can only be used through the legacy DA
serving as a wrapper. It is now made available as a pass in the new
PM, and has no relation with the legacy DA.

The new DA currently cannot handle irreducible control flow; its
presence can cause the analysis to run indefinitely. The analysis is
now modified to detect this and report all instructions in the
function as divergent. This is super conservative, but allows the
analysis to be used without hanging the compiler.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D96615
2021-02-16 10:26:45 +05:30
Sanjay Patel 378941f611 [ValueTracking] add scan limit for assumes
In the motivating example from https://llvm.org/PR49171 and
reduced test here, we would unroll and clone assumes so much
that compile-time effectively became infinite while analyzing
all of those assumes.
2021-02-15 15:24:20 -05:00
Kerry McLaughlin 5fe1593438 [LoopVectorizer] Require no-signed-zeros-fp-math=true for fmin/fmax
Currently, setting the `no-nans-fp-math` attribute to true will allow
loops with fmin/fmax to vectorize, though we should be requiring that
`no-signed-zeros-fp-math` is also set.

This patch adds the check for no-signed-zeros at the function level and includes
tests to make sure we don't vectorize functions with only one of the attributes
associated.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D96604
2021-02-15 13:47:05 +00:00
Caroline Concatto 2d728bbff5 [CodeGen][SelectionDAG]Add new intrinsic experimental.vector.reverse
This patch adds  a new intrinsic experimental.vector.reduce that takes a single
vector and returns a vector of matching type but with the original lane order
 reversed. For example:

```
vector.reverse(<A,B,C,D>) ==> <D,C,B,A>
```

The new intrinsic supports fixed and scalable vectors types.
The fixed-width vector relies on shufflevector to maintain existing behaviour.
Scalable vector uses the new ISD node - VECTOR_REVERSE.

This new intrinsic is one of the named shufflevector intrinsics proposed on the
mailing-list in the RFC at [1].

Patch by Paul Walker (@paulwalker-arm).

[1] https://lists.llvm.org/pipermail/llvm-dev/2020-November/146864.html

Differential Revision: https://reviews.llvm.org/D94883
2021-02-15 13:39:43 +00:00