Commit Graph

1934 Commits

Author SHA1 Message Date
Simon Pilgrim b82438872b [CostModel][X86] We don't need a scale factor for SLM extract costs
D74976 will handle larger vector types, but since SLM doesn't support AVX+ then we will always be extracting from 128-bit vectors so don't need to scale the cost.
2020-02-24 14:23:04 +00:00
Simon Pilgrim eaa41e103c [CostModel][X86] Try to check against common prefixes before using target-specific cpu checks
SLM/GLM is still a mess so not all of them have been updated yet.
2020-02-24 11:59:07 +00:00
Jonas Paulsson 41bd9ead35 [SystemZ] Return scalarized costs for vector instructions on older archs.
A cost query for a vector instruction should return a cost even without
target vector support, and not trigger an assert.

VectorCombine does this with an input containing source code vectors.

Review: Ulrich Weigand
2020-02-21 09:17:37 -08:00
Evgeniy Brevnov b0761bbc76 [DependenceAnalysis] Memory dependence analysis internal caching mechanism is broken in presence of TBAA (PR42733).
Summary:
There is a flaw in memory dependence analysis caching mechanism when memory accesses with TBAA are involved. Assume we first analysed and cached results for access with TBAA. Later we request dependence for the same memory but without TBAA (or different TBAA). By design these two queries should share one entry in the internal cache which corresponds to a general access (without TBAA).  Thus upon second request internal cached is cleared and we continue analysis for access as if there is no TBAA.

The problem is that even though internal cache is cleared the set of visited nodes is not. That means we won't traverse visited nodes again and populate internal cache with the corresponding dependence results. So we end up  with internal cache in an incomplete state. Current implementation tries to signal that situation by resetting CacheInfo->Pair at line 1104. But that doesn't actually help since later code ignores this invalidation and relies on 'Cache->empty()' property to decide on cache completeness.

Reviewers: reames, hfinkel, chandlerc, fedor.sergeev, asbirlea, fhahn, john.brawn, Prazek, sunfish

Reviewed By: john.brawn

Subscribers: DaniilSuchkov, kosarev, jfb, dantrushin, hiraditya, bmahjour, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73032
2020-02-21 20:20:36 +07:00
Sanjay Patel d799190851 [ConstantFold] fold fsub -0.0, undef to undef rather than NaN
A question about this behavior came up on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2020-February/139003.html
...and as part of backend improvements in D73978, but this is an IR
change first because we already have fairly thorough tests in place
here.

We decided not to implement a more general change that would have
folded any FP binop with nearly arbitrary constant + undef operand
to undef because that is not theoretically correct (even if it is
practically correct).

Differential Revision: https://reviews.llvm.org/D74713
2020-02-21 08:03:19 -05:00
Sanjay Patel 7ddbf802cf [ConstantFold] add/move tests for FP with undef operand; NFC 2020-02-20 15:07:11 -05:00
Hideto Ueno e253cdda35 [MustExecute] Add backward exploration for must-be-executed-context
Summary:
As mentioned in D71974, it is useful for must-be-executed-context to explore CFG backwardly.
This patch is ported from parts of D64975. We use a dominator tree to find the previous context if
a dominator tree is available.

Reviewers: jdoerfert, hfinkel, baziotis, sstefan1

Reviewed By: jdoerfert

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74817
2020-02-20 14:49:30 +09:00
Bardia Mahjour 0a2626d0cd [DDG] Data Dependence Graph - Graph Simplification
Summary:
This is the last functional patch affecting the representation of DDG.
Here we try to simplify the DDG to reduce the number of nodes and edges by
iteratively merging pairs of nodes that satisfy the following conditions,
until no such pair can be identified. A pair of nodes consisting of a and b
can be merged if:

    1. the only edge from a is a def-use edge to b and
    2. the only edge to b is a def-use edge from a and
    3. there is no cyclic edge from b to a and
    4. all instructions in a and b belong to the same basic block and
    5. both a and b are simple (single or multi instruction) nodes.

These criteria allow us to fold many uninteresting def-use edges that
commonly exist in the graph while avoiding the risk of introducing
dependencies that didn't exist before.

Authored By: bmahjour

Reviewer: Meinersbur, fhahn, myhsu, xtian, dmgreen, kbarton, jdoerfert

Reviewed By: Meinersbur

Subscribers: ychen, arphaman, simoll, a.elovikov, mgorny, hiraditya, jfb, wuzish, llvm-commits, jsji, Whitney, etiotto, ppc-slack

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72350
2020-02-19 13:41:51 -05:00
Jay Foad b329d1b06e [AMDGPU][ConstantFolding] Fold llvm.amdgcn.fmul.legacy intrinsic
Reviewers: arsenm, rampitec, nhaehnle

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74835
2020-02-19 16:01:30 +00:00
Evgeniy Brevnov cae643d596 Reverting D73027 [DependenceAnalysis] Dependecies for loads marked with "ivnariant.load" should not be shared with general accesses(PR42151). 2020-02-14 22:57:23 +07:00
Evgeniy Brevnov 5573abceab [DependenceAnalysis] Dependecies for loads marked with "ivnariant.load" should not be shared with general accesses(PR42151).
Summary:
This is second attempt to fix the problem with incorrect dependencies reported in presence of invariant load. Initial fix (https://reviews.llvm.org/D64405) was reverted due to a regression reported in https://reviews.llvm.org/D70516.

The original fix changed caching behavior for invariant loads. Namely such loads are not put into the second level cache (NonLocalDepInfo). The problem with that fix is the first level cache  (CachedNonLocalPointerInfo) still works as if invariant loads were in the second level cache. The solution is in addition to not putting dependence results into the second level cache avoid putting info about invariant loads into the first level cache as well.

Reviewers: jdoerfert, reames, hfinkel, efriedma

Reviewed By: jdoerfert

Subscribers: DaniilSuchkov, hiraditya, bmahjour, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73027
2020-02-14 12:18:31 +07:00
Huihui Zhang 5350a48931 [ConstantFold][SVE] Fix constant fold for FoldReinterpretLoadFromConstPtr.
Summary:
Bail out early for scalable vectors. As global variables are not expected
to be scalable.

Use explicit call of getFixedSize() to assert on places where scalable size
doesn't make sense.

Reviewers: sdesmalen, efriedma, apazos, huntergr, willlovett

Reviewed By: sdesmalen

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74424
2020-02-12 10:24:50 -08:00
Ehud Katz 2470d2988a [ConstantFolding] Fold calls to FP remainder function
With the fixed implementation of the "remainder" operation in
rG9d0956ebd471, we can now add support to folding calls to it.

Differential Revision: https://reviews.llvm.org/D69777
2020-02-12 13:21:18 +02:00
Nicolai Hähnle ab2f610f38 AMDGPU: llvm.amdgcn.writelane is a source of divergence
Summary:
Consider:

  %r = call i32 @llvm.amdgcn.writelane(i32 0, i32 1, i32 2)

This produces a value that is 0 on lane 1, and 2 everywhere else; i.e.,
it is divergent.

Reported-by: Marek Olsak <Marek.Olsak@amd.com>

Reviewers: arsenm, foad, mareko

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74400
2020-02-12 09:12:56 +01:00
Huihui Zhang 88de9338f2 [ConstantFold][SVE] Fix constand fold for vector call.
Summary:
Do not iterate on scalable vectors.

Reviewers: sdesmalen, efriedma, apazos, huntergr, willlovett

Reviewed By: sdesmalen

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74419
2020-02-11 14:06:15 -08:00
Alina Sbirlea 0cecafd647 [BasicAA] Make BasicAA a cfg pass.
Summary:
Part of the changes in D44564 made BasicAA not CFG only due to it using
PhiAnalysisValues which may have values invalidated.
Subsequent patches (rL340613) appear to have addressed this limitation.

BasicAA should not be invalidated by non-CFG-altering passes.
A concrete example is MemCpyOpt which preserves CFG, but we are testing
it invalidates BasicAA.

llvm-dev RFC: https://groups.google.com/forum/#!topic/llvm-dev/eSPXuWnNfzM

Reviewers: john.brawn, sebpop, hfinkel, brzycki

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74353
2020-02-11 11:30:08 -08:00
Rachel Craik 1f55420065 [LoopCacheAnalysis]: Add support for negative stride
LoopCacheAnalysis currently assumes the loop will be iterated over in
a forward direction. This patch addresses the issue by using the
absolute value of the stride when iterating backwards.

Note: this patch will treat negative and positive array access the
same, resulting in the same cost being calculated for single and
bi-directional access patterns. This should be improved in a
subsequent patch.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D73064
2020-02-10 13:22:35 -05:00
Huihui Zhang 5389ca7a1f [ConstantFold][NFC] Move scalable vector unit tests under vscale.ll 2020-02-05 16:03:51 -08:00
Huihui Zhang 801857c59e [ConstantFold][SVE] Fix constant folding for bitcast.
Do not iterate on scalable vector type in BitCastConstantVector.
Continuation work of D70985, D71147.

Support for folding bitcast into splat value is kept in D74095, as
it depends on D71637.

Differential Revision: https://reviews.llvm.org/D71389
2020-02-05 15:39:57 -08:00
Christopher Tetreault b03f3fbd6a Reapply: [SVE] Fix bug in simplification of scalable vector instructions
This reverts commit a05441038a, reapplying
commit 31574d38ac
2020-02-05 10:00:09 -08:00
Matt Arsenault 096cd991ee AMDGPU: Fix divergence analysis of control flow intrinsics
The mask results of these should be uniform. The trickier part is the
dummy booleans used as IR glue need to be treated as divergent. This
should make the divergence analysis results correct for the IR the DAG
is constructed from.

This should allow us to eliminate requiresUniformRegister, which has
an expensive, recursive scan over all users looking for control flow
intrinsics. This should avoid recent compile time regressions.
2020-02-05 09:30:54 -08:00
Matt Arsenault 4f9f5d09de AMDGPU: Fix isAlwaysUniform for simple asm SGPR results
We were handling the case where the result was a struct with an
extracted SGPR component, but not for the simple case.
2020-02-04 13:34:14 -08:00
Matt Arsenault cb7b661d3d AMDGPU: Analyze divergence of inline asm 2020-02-03 12:42:16 -08:00
Reid Kleckner a05441038a Revert "[SVE] Fix bug in simplification of scalable vector instructions"
This reverts commit 31574d38ac.

The newly added shufflevector test does not pass locally on either of my
workstations.
2020-02-03 11:12:09 -08:00
Christopher Tetreault 31574d38ac [SVE] Fix bug in simplification of scalable vector instructions
Summary:
* Most of the simplifications in SimplifyShuffleVectorInst depend on the
concrete value of, or the length of the mask vector. For scalable
vectors, this cannot be known at compile time.
** for these tests, detect if the vector is scalable before attempting
the transformation
* The functions ShuffleVectorInst::getMaskValue and
ShuffleVectorInst::getShuffleMask access the value of the constant mask.
However, since the length of the mask is unknown at compile time, these
function do not work for scalable vectors. Add asserts to ensure that
the input mask is not scalable

Reviewers: efriedma, sdesmalen, apazos, chrisj, huihuiz

Reviewed By: efriedma

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73555
2020-02-03 10:15:56 -08:00
Huihui Zhang b0d25fff9b [ConstantFold][SVE][NFC] Add test for select instruction in scalable vector.
Side notes from D73669, no need to guard the iteration on vectors, as
it is explicitly looking for a ConstantVector/ConstantDataVector, which
is not expected to be scalable at the moment. So, add the test only.
2020-01-30 10:56:12 -08:00
Huihui Zhang 34e6552dcb [ConstantFold][SVE] Fix constant folding for scalable vector unary operations.
Summary:
Similar to issue D71445. Scalable vector should not be evaluated element by element.
Add support to handle scalable vector UndefValue.

Reviewers: sdesmalen, efriedma, apazos, huntergr, willlovett

Reviewed By: efriedma

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73678
2020-01-30 10:45:15 -08:00
Craig Topper 35625464c6 [X86] Fix the cost model for v16i16->v16i32 zero_extend/sign_extend with AVX2
We seem to be inheriting the cost from sse4.1. But if we have 256-bit registers we should be able to do this with just one extract to split the 16i16 and two v8i16->v8i32 operations so our cost should be 3 not 4.

Differential Revision: https://reviews.llvm.org/D73646
2020-01-29 15:52:10 -08:00
Huihui Zhang d2e2fc450e [ConstantFold][SVE] Fix constant folding for scalable vector binary operations.
Summary:
Scalable vector should not be evaluated element by element.
Add support to handle scalable vector UndefValue.

Reviewers: sdesmalen, huntergr, spatel, lebedev.ri, apazos, efriedma, willlovett

Reviewed By: efriedma

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71445
2020-01-29 10:49:08 -08:00
Evgenii Stepanov 34ab56904e Support zero size types in StackSafetyAnalysis.
Reviewers: vitalybuka

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73395
2020-01-27 15:22:59 -08:00
Evgenii Stepanov c3b80adcee Fix StackSafetyAnalysis crash with scalable vector types.
Summary:
Treat scalable allocas as if they have storage size of 0, and
scalable-typed memory accesses as if their range is unlimited.

This is not a proper support of scalable vector types in the analysis -
we can do better, but not today.

Reviewers: vitalybuka

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73394
2020-01-27 15:22:59 -08:00
Roman Lebedev 9c801c48ee
[NFC][IndVarSimplify] Autogenerate tests affected by isHighCostExpansionHelper() cost modelling (PR44668) 2020-01-27 23:34:29 +03:00
Craig Topper b1f3a0f972 Revert a107f86 "[GlobalsAA] Add back a check to intrinsic_addresstaken.ll to see if the AVX and AVX512 bots still fail for it."
It still fails some buildbots which is what I was trying to test.
2020-01-24 13:15:23 -08:00
Craig Topper a107f86417 [GlobalsAA] Add back a check to intrinsic_addresstaken.ll to see if the AVX and AVX512 bots still fail for it.
These bots failed for this several months ago and as a result, this
check was removed. If they still fail I'm going to try to see if I
can figure out why.
2020-01-24 11:54:23 -08:00
Austin Kerbow c226646337 Resubmit: [DA][TTI][AMDGPU] Add option to select GPUDA with TTI
Summary:
Enable the new diveregence analysis by default for AMDGPU.

Resubmit with test updates since GPUDA was causing failures on Windows.

Reviewers: rampitec, nhaehnle, arsenm, thakis

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73315
2020-01-24 10:39:40 -08:00
Austin Kerbow 37aa16ebb7 [DA] Don't propagate from unreachable blocks
Summary: Fixes crash that could occur when a divergent terminator has an unreachable parent.

Reviewers: rampitec, nhaehnle, arsenm

Subscribers: jvesely, wdng, hiraditya, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73323
2020-01-24 10:28:11 -08:00
David Green e9c198278e [ARM] Basic gather scatter cost model
This is a very basic MVE gather/scatter cost model, based roughly on the
code that we will currently produce. It does not handle truncating
scatters or extending gathers correctly yet, as it is difficult to tell
that they are going to be correctly extended/truncated from the limited
information in the cost function.

This can be improved as we extend support for these in the future.

Based on code originally written by David Sherwood.

Differential Revision: https://reviews.llvm.org/D73021
2020-01-22 14:41:38 +00:00
David Green 0b83e14804 [ARM] MVE Gather Scatter cost model tests. NFC 2020-01-22 14:41:38 +00:00
Florian Hahn 0b21d55262 [IR] Mark memset.* intrinsics as IntrWriteMem.
llvm.memset intrinsics do only write memory, but are missing
IntrWriteMem, so they doesNotReadMemory() returns false for them.

The test change is due to the test checking the fn attribute ids at the
call sites, which got bumped up due to a new combination with writeonly
appearing in the test file.

Reviewers: jdoerfert, reames, efriedma, nlopes, lebedev.ri

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D72789
2020-01-16 10:35:46 +00:00
Zheng Chen a6342c247a [SCEV] accurate range for addrecexpr with nuw flag
If addrecexpr has nuw flag, the value should never be less than its
start value and start value does not required to be SCEVConstant.

Reviewed By: nikic, sanjoy

Differential Revision: https://reviews.llvm.org/D71690
2020-01-12 20:22:37 -05:00
Zheng Chen 569ccfc384 [SCEV] more accurate range for addrecexpr with nsw flag.
Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D72436
2020-01-11 23:26:35 -05:00
Zheng Chen a701be8f03 [SCEV] [NFC] add more test cases for range of addrecexpr with nsw flag 2020-01-10 22:44:47 -05:00
Zheng Chen 4ebb589629 [SCEV] [NFC] add testcase for constant range for addrecexpr with nsw flag 2020-01-09 01:26:57 -05:00
Simon Pilgrim 5d986a68a5 [CostModel][X86] Add missing scalar i64->f32 uitofp costs 2020-01-06 13:17:02 +00:00
Fangrui Song a36ddf0aa9 Migrate function attribute "no-frame-pointer-elim"="false" to "frame-pointer"="none" as cleanups after D56351 2019-12-24 16:27:51 -08:00
Fangrui Song eb16435b5e Migrate function attribute "no-frame-pointer-elim-non-leaf" to "frame-pointer"="non-leaf" as cleanups after D56351 2019-12-24 16:05:15 -08:00
Fangrui Song 502a77f125 Migrate function attribute "no-frame-pointer-elim" to "frame-pointer"="all" as cleanups after D56351 2019-12-24 15:57:33 -08:00
czhengsz 7259f04dde [SCEV] add testcase for get accurate range for addrecexpr with nuw flag 2019-12-22 20:58:19 -05:00
Bardia Mahjour 86acaa9457 [DDG] Data Dependence Graph - Ordinals
Summary:
This patch associates ordinal numbers to the DDG Nodes allowing
the builder to order nodes within a pi-block in program order. The
algorithm works by simply assuming the order in which the BBList
is fed into the builder. The builder already relies on the blocks being
in program order so that it can compute the dependencies correctly.
Similarly the order of instructions in their parent basic blocks
determine their program order.

Authored By: bmahjour

Reviewer: Meinersbur, fhahn, myhsu, xtian, dmgreen, kbarton, jdoerfert

Reviewed By: Meinersbur

Subscribers: ychen, arphaman, simoll, a.elovikov, mgorny, hiraditya, jfb, wuzish, llvm-commits, jsji, Whitney, etiotto, ppc-slack

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70986
2019-12-19 10:57:33 -05:00
czhengsz d588a00206 [SCEV] NFC - add testcase for get accurate range for AddExpr 2019-12-19 04:11:45 -05:00