Commit Graph

2212 Commits

Author SHA1 Message Date
Vitaly Buka d3b7f90d00 [StackSafety] Skip non-pointer parameters
Summary: Depends on D80908.

Reviewers: eugenis, pcc

Reviewed By: eugenis

Subscribers: hiraditya, steven_wu, dexonsmith, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80956
2020-06-03 01:16:39 -07:00
Vitaly Buka 232d348c6e [MTE] Convert StackSafety into analysis
This lets us to remove !stack-safe metadata and
better controll when to perform StackSafety
analysis.

Reviewers: eugenis

Subscribers: hiraditya, steven_wu, dexonsmith, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D80771
2020-06-02 16:08:14 -07:00
Vitaly Buka fc07c1af69 [StackSafety] Delete useless test 2020-06-02 16:08:14 -07:00
Sam Parker e70cf280f8 [NFC][ARM][AArch64] Test runs
Add code size tests runs for memory ops for both architectures.
2020-06-02 09:05:30 +01:00
Yevgeny Rouban 07239c736a [BrachProbablityInfo] Proportional distribution of reachable probabilities
When fixing probability of unreachable edges in
BranchProbabilityInfo::calcMetadataWeights() proportionally distribute
remainder probability over the reachable edges. The old implementation
distributes the remainder probability evenly.
See examples in the fixed tests.

Reviewers: yamauchi, ebrevnov
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80611
2020-06-02 12:06:52 +07:00
Florian Hahn d99a1848c4 [BasicAA] Use known lower bounds for index values for size based check.
Currently, BasicAA does not exploit information about value ranges of
indexes. For example, consider the 2 pointers %a = %base and
%b = %base + %stride below, assuming they are used to access 4 elements.

If we know that %stride >= 4, we know the accesses do not alias. If
%stride is a constant, BasicAA currently gets that. But if the >= 4
constraint is encoded using an assume, it misses the NoAlias.

This patch extends DecomposedGEP to include an additional MinOtherOffset
field, which tracks the constant offset similar to the existing
OtherOffset, which the difference that it also includes non-negative
lower bounds on the range of the index value. When checking if the
distance between 2 accesses exceeds the access size, we can use this
improved bound.

For now this is limited to using non-negative lower bounds for indices,
as this conveniently skips cases where we do not have a useful lower
bound (because it is not constrained). We potential miss out in cases
where the lower bound is constrained but negative, but that can be
exploited in the future.

Reviewers: sanjoy, hfinkel, reames, asbirlea

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D76194
2020-05-30 16:20:42 +01:00
David Green a01c0049b1 [ConstantFolding] Constant folding for integer vector reduce intrinsics
This add constant folding for all the integer vector reduce intrinsics,
providing that the argument is a constant vector. zeroinitializer always
produces 0 for all intrinsics, and other values can be handled with
APInt operators.

Differential Revision: https://reviews.llvm.org/D80516
2020-05-29 17:58:42 +01:00
Philip Reames 27304b1737 [Tests] Switch a few statepoint tests to using operand bundles
We've started (D80598) the process of migrating away from the inline operand lists in statepoints to using explicit operand bundles.  Update a few tests to reflect the new preference.  More to come, these were simply the ones outside any obvious grouping.
2020-05-28 14:36:05 -07:00
Vitaly Buka 892c71a5bb [StackSafety] Don't run datafow on allocas
We need to process only parameters. Allocas access can be calculated
afterwards.
Also don't create fake function for aliases and just resolve them on
initialization.
2020-05-28 13:32:57 -07:00
Vitaly Buka f6383643d9 [StackSafety] Bailout on some function calls
Don't miss values used in calls outside regular argument list.
2020-05-27 02:48:42 -07:00
Vitaly Buka 06a07dd608 [StackSafety] Fix formatting in the test 2020-05-27 02:48:41 -07:00
Vitaly Buka b101c6251a [StackSafety] Ignore some use of values
We should ignore value used in MemTransferInst
as other then src/dst argument.
2020-05-27 02:48:41 -07:00
Vitaly Buka 32a1f60d11 [StackSafety] Use SCEV to find mem operation length 2020-05-26 23:22:37 -07:00
Vitaly Buka d0f1f5adfa [StackSafety] Use getSignedRange for offsets 2020-05-26 23:22:36 -07:00
Vitaly Buka b5ae70046b [StackSafety] Simplify SCEVRewriteVisitor
Probably NFC.
2020-05-26 18:09:43 -07:00
Sam Parker 792575ff32 [NFC][ARM][AArch64] More code size tests
Add analysis runs for icmp, fcmp and select instructions.
2020-05-26 14:47:02 +01:00
Sam Parker c5bbc8dd6d [NFC][ARM] Fix for previous commit
Actually analyse code-size for the size runs...
2020-05-26 10:45:35 +01:00
Sam Parker 48cdbd081c [NFC][ARM] Add code size analysis tests
Add code size runs for the cast costs.
2020-05-26 10:30:43 +01:00
Sam Parker 64cfb8a864 [NFC][ARM] Add intrinsic code size runs
Add code size analysis of arithmetic intrinsics.
2020-05-26 09:41:54 +01:00
Sam Parker 1f72d5880e [CostModel] Check for free intrinsics in BasicTTI
Recommitting part of "[CostModel] Unify Intrinsic Costs."
de71def3f5

Now that the 'free' intrinsic information has been sunk to the lowest
level, query the base implementation in BasicTTI before doing
anything else. I suspect this is the change that was causing the main
changes, particularly the large effects on debug builds.

Differential Revision: https://reviews.llvm.org/D80012
2020-05-26 08:37:13 +01:00
Denis Antrushin 5451289aba [SCEV] Constant fold MultExpr before applying depth limit.
Summary:
Users of SCEV reasonably assume that multiplication of two constant
SCEVs will in turn be constant.
However, that is not always the case:
First, we can get here with reached depth limit, and will create
MultExpr SCEV `C1 * C2` and cache it.
Then, we can get here with the same operands, but with small depth
level. But this time we will find existing MultExpr SCEV and return
it, instead of expected constant SCEV.

This patch changes getMultExpr to not apply depth limit to all constant
operands expression, allowing them to be folded.

Reviewers: reames, mkazantsev

Subscribers: hiraditya, javed.absar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79893
2020-05-22 18:34:32 +03:00
Sam Parker fb3ba38021 [CostModel] Remove getExtCost
This has not been implemented by any backends which appear to cover
the functionality through getCastInstrCost. Sink what there is in the
default implementation into BasicTTI.

Differential Revision: https://reviews.llvm.org/D78922
2020-05-21 07:18:06 +01:00
Yevgeny Rouban 8138487468 [BrachProbablityInfo] Set edge probabilities at once and fix calcMetadataWeights()
Hide the method that allows setting probability for particular edge
and introduce a public method that sets probabilities for all
outgoing edges at once.
Setting individual edge probability is error prone. More over it is
difficult to check that the total probability is 1.0 because there is
no easy way to know when the user finished setting all
the probabilities.

Related bug is fixed in BranchProbabilityInfo::calcMetadataWeights().
Changing unreachable branch probabilities to raw(1) and distributing
the rest (oldProbability - raw(1)) over the reachable branches could
introduce total probability inaccuracy bigger than 1/numOfBranches.

Reviewers: yamauchi, ebrevnov
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79396
2020-05-21 12:52:37 +07:00
Eli Friedman f26bdb539e Make Value::getPointerAlignment() return an Align, not a MaybeAlign.
If we don't know anything about the alignment of a pointer, Align(1) is
still correct: all pointers are at least 1-byte aligned.

Included in this patch is a bugfix for an issue discovered during this
cleanup: pointers with "dereferenceable" attributes/metadata were
assumed to be aligned according to the type of the pointer.  This
wasn't intentional, as far as I can tell, so Loads.cpp was fixed to
stop making this assumption. Frontends may need to be updated.  I
updated clang's handling of C++ references, and added a release note for
this.

Differential Revision: https://reviews.llvm.org/D80072
2020-05-20 16:37:20 -07:00
Nikita Popov 5fae613a4f [LVI] Don't require DominatorTree in LVI (NFC)
After D76797 the dominator tree is no longer used in LVI, so we
can remove it as a pass dependency, and also get rid of the
dominator tree enabling/disabling logic in JumpThreading.

Apart from cleaning up the code, this also clarifies LVI
cache consistency, in that the LVI cache can no longer
depend on whether the DT was or wasn't enabled due to
pending DT updates at any given time.

Differential Revision: https://reviews.llvm.org/D76985
2020-05-19 20:21:46 +02:00
Eli Friedman 11aa3707e3 StoreInst should store Align, not MaybeAlign
This is D77454, except for stores.  All the infrastructure work was done
for loads, so the remaining changes necessary are relatively small.

Differential Revision: https://reviews.llvm.org/D79968
2020-05-15 12:26:58 -07:00
Nikita Popov f89f7da999 [IR] Convert null-pointer-is-valid into an enum attribute
The "null-pointer-is-valid" attribute needs to be checked by many
pointer-related combines. To make the check more efficient, convert
it from a string into an enum attribute.

In the future, this attribute may be replaced with data layout
properties.

Differential Revision: https://reviews.llvm.org/D78862
2020-05-15 19:41:07 +02:00
Sam Parker 0ef62fc25d [NFC][ARM] Intrinsic CostModel Tests
Add throughput tests for saturating, overflowing and reduction
operations.
2020-05-15 13:38:42 +01:00
Stanislav Mekhanoshin 184b383457 Add v16f64 value type
We need to use it to handle <16 x double> indirect indexes
in the AMDGPU BE.

The only visible change from adding it is in ARM cost model.
To me it looks reasonable. With doubling a vector size it
quadruples the cost up to the size 8 and then it did only
double it. Now it also quadruples, which seems a logical
progression to me.

Actual AMDGPU code is to follow, this is a common part, plus
load/store legalization in the AMDGPU BE not to break what
works now.

Differential Revision: https://reviews.llvm.org/D79952
2020-05-14 14:28:00 -07:00
Eli Friedman 4532a50899 Infer alignment of unmarked loads in IR/bitcode parsing.
For IR generated by a compiler, this is really simple: you just take the
datalayout from the beginning of the file, and apply it to all the IR
later in the file. For optimization testcases that don't care about the
datalayout, this is also really simple: we just use the default
datalayout.

The complexity here comes from the fact that some LLVM tools allow
overriding the datalayout: some tools have an explicit flag for this,
some tools will infer a datalayout based on the code generation target.
Supporting this properly required plumbing through a bunch of new
machinery: we want to allow overriding the datalayout after the
datalayout is parsed from the file, but before we use any information
from it. Therefore, IR/bitcode parsing now has a callback to allow tools
to compute the datalayout at the appropriate time.

Not sure if I covered all the LLVM tools that want to use the callback.
(clang? lli? Misc IR manipulation tools like llvm-link?). But this is at
least enough for all the LLVM regression tests, and IR without a
datalayout is not something frontends should generate.

This change had some sort of weird effects for certain CodeGen
regression tests: if the datalayout is overridden with a datalayout with
a different program or stack address space, we now parse IR based on the
overridden datalayout, instead of the one written in the file (or the
default one, if none is specified). This broke a few AVR tests, and one
AMDGPU test.

Outside the CodeGen tests I mentioned, the test changes are all just
fixing CHECK lines and moving around datalayout lines in weird places.

Differential Revision: https://reviews.llvm.org/D78403
2020-05-14 13:03:50 -07:00
Sam Parker 6bbad7285c [CostModel] Modify BasicTTI getCastInstrCost
Fix the assumption that all bitcasts of the same type sizes are free.
We now only assume that bitcasts between ints and ptrs of the same
size are free. This allows TTImpl to just call the concrete
implementation of getCastInstrCost.

Differential Revision: https://reviews.llvm.org/D78918
2020-05-13 07:26:08 +01:00
Juneyoung Lee e5f602d82c [ValueTracking] Let propagatesPoison support binops/unaryops/cast/etc.
Summary:
This patch makes propagatesPoison be more accurate by returning true on
more bin ops/unary ops/casts/etc.

The changed test in ScalarEvolution/nsw.ll was introduced by
a19edc4d15 .
IIUC, the goal of the tests is to show that iv.inc's SCEV expression still has
no-overflow flags even if the loop isn't in the wanted form.
It becomes more accurate with this patch, so think this is okay.

Reviewers: spatel, lebedev.ri, jdoerfert, reames, nikic, sanjoy

Reviewed By: spatel, nikic

Subscribers: regehr, nlopes, efriedma, fhahn, javed.absar, llvm-commits, hiraditya

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78615
2020-05-13 02:51:42 +09:00
Sam Parker f1f8cffce4 [NFC][AArch64] More casts tests...
Don't use truncs are users because sometimes they're free too.
2020-05-12 13:06:17 +01:00
Sam Parker e114bdf072 [NFC][AArch64] More cast cost tests
Add truncating stores and casts with users.
2020-05-12 11:32:52 +01:00
Sam Parker b4a8091a11 [ARM][CostModel] Improve getCastInstrCost
- Specifically check for sext/zext users which have 'long' form NEON
  instructions.
- Add more entries to the table for sext/zexts so that we can report
  more accurately the number of vmovls required for NEON.
- Pass the instruction to the pass implementation.

Differential Revision: https://reviews.llvm.org/D79561
2020-05-12 10:32:20 +01:00
Sam Parker 1952c86d61 [AArch64][CostModel] getCastInstrCost
Pass the instruction to the base implementation.

Differential Revision: https://reviews.llvm.org/D79562
2020-05-12 10:02:29 +01:00
Sam Parker 494c7ecef9 [NFC][AArch64] Update tests
Add cost model tests for extending loads.
2020-05-12 08:49:05 +01:00
Tyker 821a0f23d8 [AssumeBundles] Prevent generation of some redundant assumes
Summary: with this patch the assume salvageKnowledge will not generate assume if all knowledge is already available in an assume with valid context. assume bulider can also in some cases update an existing assume with better information.

Reviewers: jdoerfert

Reviewed By: jdoerfert

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78014
2020-05-10 19:23:59 +02:00
zoecarver f65f566aeb Re-commit: Mark values as trivially dead when their only use is a start or end lifetime intrinsic.
Summary:
If the only use of a value is a start or end lifetime intrinsic then mark the intrinsic as trivially dead. This should allow for that value to then be removed as well.

Currently, this only works for allocas, globals, and arguments.

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79355
2020-05-08 12:24:10 -07:00
Nikita Popov 5a2265647e Reapply [InstSimplify] Remove known bits constant folding
No changes relative to last time, but after a mitigation for
an AMDGPU regression landed.

---

If SimplifyInstruction() does not succeed in simplifying the
instruction, it will compute the known bits of the instruction
in the hope that all bits are known and the instruction can be
folded to a constant. I have removed a similar optimization
from InstCombine in D75801, and would like to drop this one as well.

On average, we spend ~1% of total compile-time performing this
known bits calculation. However, if we introduce some additional
statistics for known bits computations and how many of them succeed
in simplifying the instruction we get (on test-suite):

    instsimplify.NumKnownBits: 216
    instsimplify.NumKnownBitsComputed: 13828375
    valuetracking.NumKnownBitsComputed: 45860806

Out of ~14M known bits calculations (accounting for approximately
one third of all known bits calculations), only 0.0015% succeed in
producing a constant. Those cases where we do succeed to compute
all known bits will get folded by other passes like InstCombine
later. On test-suite, only lencod.test and GCC-C-execute-pr44858.test
show a hash difference after this change. On lencod we see an
improvement (a loop phi is optimized away), on the GCC torture
test a regression (a function return value is determined only
after IPSCCP, preventing propagation from a noinline function.)

There are various regressions in InstSimplify tests. However, all
of these cases are already handled by InstCombine, and corresponding
tests have already been added there.

Differential Revision: https://reviews.llvm.org/D79294
2020-05-08 10:24:53 +02:00
Sam Parker 751da4d596 [NFC][AArch64] Add test
Add cost model test for cast operations.
2020-05-07 13:16:03 +01:00
Alina Sbirlea 8e911545d6 [MemorySSA] Make MemoryLocation unknown when phi translation cannot be performed.
Summary: When phi translation cannot be performed, be conservative and make the MemoryLocation unknown.

Reviewers: george.burgess.iv

Subscribers: Prazek, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79386
2020-05-05 13:32:32 -07:00
Nikita Popov 46ee652c70 Revert "[InstSimplify] Remove known bits constant folding"
This reverts commit 08556afc54.

This breaks some AMDGPU tests.
2020-05-03 20:45:10 +02:00
Nikita Popov 08556afc54 [InstSimplify] Remove known bits constant folding
If SimplifyInstruction() does not succeed in simplifying the
instruction, it will compute the known bits of the instruction
in the hope that all bits are known and the instruction can be
folded to a constant. I have removed a similar optimization
from InstCombine in D75801, and would like to drop this one as well.

On average, we spend ~1% of total compile-time performing this
known bits calculation. However, if we introduce some additional
statistics for known bits computations and how many of them succeed
in simplifying the instruction we get (on test-suite):

    instsimplify.NumKnownBits: 216
    instsimplify.NumKnownBitsComputed: 13828375
    valuetracking.NumKnownBitsComputed: 45860806

Out of ~14M known bits calculations (accounting for approximately
one third of all known bits calculations), only 0.0015% succeed in
producing a constant. Those cases where we do succeed to compute
all known bits will get folded by other passes like InstCombine
later. On test-suite, only lencod.test and GCC-C-execute-pr44858.test
show a hash difference after this change. On lencod we see an
improvement (a loop phi is optimized away), on the GCC torture
test a regression (a function return value is determined only
after IPSCCP, preventing propagation from a noinline function.)

There are various regressions in InstSimplify tests. However, all
of these cases are already handled by InstCombine, and corresponding
tests have already been added there.

Differential Revision: https://reviews.llvm.org/D79294
2020-05-03 20:26:58 +02:00
Nikita Popov 7cf0f8568c [ValueTracking] Convert test to unit test (NFC)
Test this directly, rather than going through InstSimplify.
2020-05-03 12:23:57 +02:00
Craig Topper e39c7ab2b9 [CostModel][X86][ARM] Teach default implementation of getCastInstrCost to not add a split/join cost if source type and the destination type both have a SplitVector action
If both the source and the destination need to be split then the two halves of the split operation are completely independent and don't need to be split or joined. So we don't need to assess a cost for the split or join.

Differential Revision: https://reviews.llvm.org/D79111
2020-05-01 18:55:23 -07:00
Craig Topper b938168aef [X86] Lower the cost of v4i64->v4i32 truncate with avx512.
We use the vpmovqd instruction which is a single uop. So
the cost should be 1.
2020-05-01 11:09:37 -07:00
Craig Topper 6a1ad76dab [X86] Don't return true from isTruncateFree for vectors
Also fix some cost tables for vXi1 types to match the costs entries for the types they will be promoted to.

Differential Revision: https://reviews.llvm.org/D79045
2020-04-30 16:43:35 -07:00
Craig Topper ff66919020 [X86][CostModel] Bump the cost of vpermw/vpermt2b/vperm2w
vpermw is 2 uops. vpermt2b/vpermt2w are two shuffle uops and a port 015 uop. Weirdly vpermb is a single uop.

This patch bumps the cost to 2 for these operations. Maybe should go to 3 for the vpermt2*, but I've started conservative.

I've also removed a few entries that were now the same as earlier subtargets or that I didn't think we really did. Like I don't think we extend v32i8 to v32i16, shuffle, and then truncate.

Differential Revision: https://reviews.llvm.org/D79148
2020-04-30 11:32:25 -07:00
Evgeniy Brevnov bb0842a3f1 [BPI] Incorrect probability reported in case of mulptiple edges.
Summary:
By design 'BranchProbabilityInfo:: getEdgeProbability(const BasicBlock *Src, const BasicBlock *Dst) const' should return sum of probabilities over all edges from Src to Dst. Current implementation is buggy and returns 1/num_of_successors if probabilities are not explicitly set.

Note current implementation of BPI printing has an issue as well and annotates each edge with sum of probabilities over all ages from one basic block to another. That's why 30% probability reported (instead of 10%) in the lit test. This is not urgent issue since only printing is affected.
Note also current implementation assumes that either all or none edges have probabilities set. This is not the only place which uses such assumption. At least we should assert that in verifier. In addition we can think on a more robust API of BPI which would prevent situations.

Reviewers: skatkov, yrouban, taewookoh

Reviewed By: skatkov

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79071
2020-04-30 11:41:03 +07:00
Alina Sbirlea 161ccfe5ba [MemorySSA] Pass DT to the upward iterator for proper PhiTranslation.
Summary:
A valid DominatorTree is needed to do PhiTranslation.
Before this patch, a MemoryUse could be optimized to an access outside a loop, while the address it loads from is modified in the loop.
This can lead to a miscompile.

Reviewers: george.burgess.iv

Subscribers: Prazek, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79068
2020-04-29 14:28:31 -07:00
Craig Topper cff6686532 [X86] Lower the cost of v4i64->v4i32 and v8i64->v8i32 truncate with AVX
We generate much better code these days than we used to. And we use the same sequence for AVX1 and AVX2 for these

For v4i64->v4i32 we generate:
vextractf128    xmm1, ymm0, 1
vshufps xmm0, xmm0, xmm1, 136   # xmm0 = xmm0[0,2],xmm1[0,2]

And for v8i64->v8i32 we generate:
vperm2f128      ymm2, ymm0, ymm1, 49 # ymm2 = ymm0[2,3],ymm1[2,3]
vinsertf128     ymm0, ymm0, xmm1, 1
vshufps ymm0, ymm0, ymm2, 136   # ymm0 = ymm0[0,2],ymm2[0,2],ymm0[4,6],ymm2[4,6]

Differential Revision: https://reviews.llvm.org/D79109
2020-04-29 13:21:44 -07:00
Sam Parker e9d0f1c8ea [NFC][ARM] Modify cost model test 2020-04-29 12:42:47 +01:00
Sam Parker 850bdefa65 [NFC][ARM] Add two cost model tests 2020-04-29 12:36:05 +01:00
Simon Pilgrim 090cae8491 [TTI] Add DemandedElts to getScalarizationOverhead
The improvements to the x86 vector insert/extract element costs in D74976 resulted in the estimated costs for vector initialization and scalarization increasing higher than should be expected. This is particularly noticeable on pre-SSE4 targets where the available of legal INSERT_VECTOR_ELT ops is more limited.

This patch does 2 things:
1 - it implements X86TTIImpl::getScalarizationOverhead to more accurately represent the typical costs of a ISD::BUILD_VECTOR pattern.
2 - it adds a DemandedElts mask to getScalarizationOverhead to permit the SLP's BoUpSLP::getGatherCost to be rewritten to use it directly instead of accumulating raw vector insertion costs.

This fixes PR45418 where a v4i8 (zext'd to v4i32) was no longer vectorizing.

A future patch should extend X86TTIImpl::getScalarizationOverhead to tweak the EXTRACT_VECTOR_ELT scalarization costs as well.

Reviewed By: @craig.topper

Differential Revision: https://reviews.llvm.org/D78216
2020-04-29 12:00:38 +01:00
Craig Topper 59b9e6fe76 [X86] Update costs for truncates from less than 128-bit vectors to vXi1 on pre-avx512 targets
vXi1 types are legalized by promoting, but the narrow vectors
are legalized by widening. This results in some truncates turning
into any_extends.
2020-04-28 11:35:41 -07:00
Craig Topper d42192c50f [X86][CostModel] Correct the costs for truncate to a mask register with avx512
I've modified isTruncateFree to get an accurate cost for types that need to be split. I'm planning to look into fixing it for all vectors, but need more cost cleanups first.

Differential Revision: https://reviews.llvm.org/D78973
2020-04-28 10:39:36 -07:00
Craig Topper 9ea5cc8a25 [X86][CostModel] Add vXiY->vXi1 truncate tests to min-legal-vector-width.ll. NFC 2020-04-27 15:48:11 -07:00
Craig Topper 37ec709233 [X86][CostModel] Update truncate costs for some narrow vector cases to match their wider version.
This updates v4i16->v4i8 with sse2 to match v8i16->v8i8.
Update v2i16->v2i8 and v4i16->v4i8 with sse 4.1 to match v8i16->v8i8.
2020-04-27 13:47:48 -07:00
Craig Topper bdbbed115f [X86][CostModel] Update costs for vector truncate with avx512f/avx512bw.
All avx512 truncate instructions except vXi64->vXi32 are 2 uops
on port 5. So raise their costs to 2. Except when we have an
earlier faster sequence like pshufb for 128 bit input vectors.

Add a lower cost of 3 v16i16->v16i8 with avx512f where we can
extend to v16i32 then truncate. And a cost of 2 for avx512bw with
and without avx512vl. There we can use vpmovwb with either a ymm
or zmm input. Both of these beat masking, splitting, and using
packuswb which is our avx/avx2 codegen.
2020-04-27 12:00:24 -07:00
Craig Topper 5eff75d86a [X86][CostModel] Improve costs for fp_to_uint/fp_to_sint for vXi8/vXi16/v2i32 results.
Differential Revision: https://reviews.llvm.org/D78893
2020-04-27 10:35:15 -07:00
Craig Topper 8296bcf76f [X86][CostModel] Fix typos in test. NFC 2020-04-26 21:17:38 -07:00
Craig Topper 5f2ea70980 [X86] Add cost model tests for conversions between <2 x float> and integers.
For all but 2 x i32 we were starting from 4 x float.
2020-04-26 19:59:01 -07:00
Craig Topper b9de62c2b6 [X86] Fix the cost of v16i1->v16i16 sext/zext on avx targets.
Previously we were hitting the scalarization case in the default
implementation.
2020-04-25 23:16:20 -07:00
Craig Topper 19cb26f517 [X86][CostModel] Improve costs for vXi1 sign_extend/zero_extend with avx512.
With avx512 vXi1 is legal and uses k-registers with many custom cases
for extending.
2020-04-25 23:16:20 -07:00
Craig Topper 084433702d [X86][CostModel] Add sext/zext from vXi1 tests to min-legal-vector-width.ll. NFC
We aren't properly costing extends from k-registers. I also added
command lines without avx512bw to be able to show all the different
extending strategies we have.
2020-04-25 23:15:40 -07:00
Craig Topper 061f330d7e [X86] Add avx512vl to the truncate cost model test. NFC 2020-04-25 12:59:10 -07:00
Craig Topper 999058ba5e [X86] Add cost model tests for truncating from v2i8/v4i8/v8i8/v16i8 to vXi1. NFC 2020-04-24 23:11:17 -07:00
Craig Topper 7664a0d282 [X86] Improve accuracy of cost for v16i64->v16i8 truncate with avx512.
The 2 vpmovqds are only 1 uop each.
2020-04-24 19:13:55 -07:00
Craig Topper 03aa967c0d [CostModel][X86][ARM] Teach getCastInstrCost to include the splitting factor when handling operations that type legalize to the same number of subvectors or scalar components
Previously, we just always returned 1. But that ignores that we have to do the operation for each subvector or scalar component.

Differential Revision: https://reviews.llvm.org/D78824
2020-04-24 13:36:26 -07:00
Tyker 42431da895 [AssumeBundles] Use assume bundles in isKnownNonZero
Summary: Use nonnull and dereferenceable from an assume bundle in isKnownNonZero

Reviewers: jdoerfert, nikic, lebedev.ri, reames, fhahn, sstefan1

Reviewed By: jdoerfert

Subscribers: fhahn, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76149
2020-04-24 20:41:51 +02:00
Craig Topper 4cf73a3fc6 [CostModel][X86] Account for splitting cost when vector zext/sext type legalize to the same size vector. 2020-04-24 09:59:23 -07:00
Juneyoung Lee aca335955c [ValueTracking] Let analyses assume a value cannot be partially poison
Summary:
This is RFC for fixes in poison-related functions of ValueTracking.
These functions assume that a value can be poison bitwisely, but the semantics
of bitwise poison is not clear at the moment.
Allowing a value to have bitwise poison adds complexity to reasoning about
correctness of optimizations.

This patch makes the analysis functions simply assume that a value is
either fully poison or not, which has been used to understand the correctness
of a few previous optimizations.
The bitwise poison semantics seems to be only used by these functions as well.

In terms of implementation, using value-wise poison concept makes existing
functions do more precise analysis, which is what this patch contains.

Reviewers: spatel, lebedev.ri, jdoerfert, reames, nikic, nlopes, regehr

Reviewed By: nikic

Subscribers: fhahn, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78503
2020-04-23 08:08:53 +09:00
Juneyoung Lee 5ceef26350 Revert "RFC: [ValueTracking] Let analyses assume a value cannot be partially poison"
This reverts commit 80faa8c3af.
2020-04-23 08:07:09 +09:00
Juneyoung Lee 80faa8c3af RFC: [ValueTracking] Let analyses assume a value cannot be partially poison
Summary:
This is RFC for fixes in poison-related functions of ValueTracking.
These functions assume that a value can be poison bitwisely, but the semantics
of bitwise poison is not clear at the moment.
Allowing a value to have bitwise poison adds complexity to reasoning about
correctness of optimizations.

This patch makes the analysis functions simply assume that a value is
either fully poison or not, which has been used to understand the correctness
of a few previous optimizations.
The bitwise poison semantics seems to be only used by these functions as well.

In terms of implementation, using value-wise poison concept makes existing
functions do more precise analysis, which is what this patch contains.

Reviewers: spatel, lebedev.ri, jdoerfert, reames, nikic, nlopes, regehr

Reviewed By: nikic

Subscribers: fhahn, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78503
2020-04-23 07:57:12 +09:00
Sam Parker 04ef154124 [NFC] Test changes
Add some more targets for the ARM cost model tests and add some tests
for icmps and bitcasts.
2020-04-22 08:28:52 +01:00
Eli Friedman 9b9454af8a Require "target datalayout" to be at the beginning of an IR file.
This will allow us to use the datalayout to disambiguate other
constructs in IR, like load alignment. Split off from D78403.

Differential Revision: https://reviews.llvm.org/D78413
2020-04-20 11:55:49 -07:00
Craig Topper 8dfb9627b7 [X86] Make v32i16/v64i8 legal types without avx512bw. Use custom splitting instead.
This moves v32i16/v64i8 to a model consistent with how we
treat integer types with avx1.

This does change the ABI for types vXi16/vXi8 vectors larger than
512 bits to pass in multiple zmms instead of multiple ymms. We'd
already hacked some code to make v64i8/v32i16 pass in zmm.

Cost model is still a bit of a mess. In some place I tried to
match existing behavior. But really we need to account for
splitting and concating costs. Cost model for shuffles is
especially pessimistic.

Differential Revision: https://reviews.llvm.org/D76212
2020-04-15 12:17:18 -07:00
Simon Pilgrim 2f951e99c6 [CostModel][X86] Regenerate load_store.ll costs tests
Add SSE + AVX512 targets
Add some illegal type store tests
2020-04-15 11:54:39 +01:00
Tyker 1d2b76a8fc [AssumeBundles] adapte GVN to assume bundles
Summary:
prevent GVN from removing assume bundles
make GVN preserve information from removed instructions

Reviewers: jdoerfert

Reviewed By: jdoerfert

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77405
2020-04-14 12:48:14 +02:00
Craig Topper 2f60fbce6c [X86] Use a more realisitic cost for truncate v16i64->v16i8 with avx512f.
Still not great and we could probably codegen this better, but
11 was clearly ridiculous.
2020-04-13 21:09:43 -07:00
Craig Topper 071c64d68d [X86] Add a more accurate truncate cost for v8i64->v8i8 2020-04-13 21:09:41 -07:00
Craig Topper b37b1840eb [X86] Add truncate cost model tests to min-legal-vector-width.ll for when we're avoiding 512 bit vectors. 2020-04-13 21:09:40 -07:00
Simon Pilgrim 353347288b [CostModel][X86] Remove comments that begin with a filecheck prefix.
Stop filecheck from confusing a general comment with a check.
2020-04-13 18:39:24 +01:00
Huihui Zhang 6c989d0248 [BasicAA] Fix aliasGEP/DecomposeGEPExpression for scalable type.
Summary:
Don't attempt to analyze the decomposed GEP for scalable type.
GEP index scale is not compile-time constant for scalable type.
Be conservative, return MayAlias.

Explicitly call TypeSize::getFixedSize() to assert on places where
scalable type doesn't make sense.

Add unit tests to check functionality of -basicaa for scalable type.

This patch is needed for D76944.

Reviewers: sdesmalen, efriedma, spatel, bjope, ctetreau

Reviewed By: efriedma

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77828
2020-04-10 16:58:26 -07:00
Simon Pilgrim 91bc50c0d7 [CostModel][X86] Improve InsertElement costs for sub-128bit vectors
If we're inserting into v2i8/v4i8/v8i8/v2i16/v4i16 style sub-128bit vectors ensure we don't use the SK_PermuteTwoSrc cost of the legalized value type - this is a followup to rG12c629ec6c59 which added equivalent sub-128bit shuffle costs
2020-04-10 14:55:46 +01:00
Craig Topper 5625e6ab37 [X86] Improve min/max reduction costs.
This is similar to what I recently did for getArithmeticReductionCost.

I'm trying to account for the narrowing from 512->256->128 as we go.

I've also added a new helper method getMinMaxCost that tries to
handle the cases where we have native min/max instructions and
fall back to cmp+select when we don't.

Differential Revision: https://reviews.llvm.org/D76634
2020-04-09 17:28:50 -07:00
Simon Pilgrim 12c629ec6c [CostModel][X86] Add shuffle costs for some common sub-128bit vectors
v2i8/v4i8/v8i8 + v2i16/v4i16 all show up in vectorizer code and by just using the legalized types (v16i8/v8i16) we're highly exaggerating the actual cost of the shuffle.
2020-04-09 19:57:06 +01:00
Simon Pilgrim 898e22908c [MemorySSA] invariant-groups.ll - add missing check to fix issue reported on D77354 2020-04-08 15:18:04 +01:00
Sanjay Patel a2bb19ca42 [x86] add size cost tests for casts and binops; NFC
Shows bugs for div/rem/fdiv and possibly others.
2020-04-06 12:38:15 -04:00
Jonathan Roelofs 7c5d2bec76 [llvm] Fix missing FileCheck directive colons
https://reviews.llvm.org/D77352
2020-04-06 09:59:08 -06:00
Simon Pilgrim be84d2b5b7 [CostModel][X86] Add some insert subvector cost tests for vXf32/vXi32/vXi16/vXi8 types 2020-04-04 22:46:57 +01:00
Simon Pilgrim 6a57ba17c0 [CostModel][X86] Add shuffle cost tests for sub-128bit vectors 2020-04-04 13:08:25 +01:00
Simon Pilgrim 87fd686f6f [CostModel][X86] Add insert/extract cost tests for sub-128bit vXi8/vXi16 vectors 2020-04-04 13:08:25 +01:00
Matt Arsenault 5660bb6bc9 AMDGPU: Remove denormal subtarget features
Switch to using the denormal-fp-math/denormal-fp-math-f32 attributes.
2020-04-02 17:17:12 -04:00
Denis Antrushin 06c58f11a9 [SCEV] Use backedge SCEV of PHI only if its input is loop invariant
For the PHI node

      %1 = phi [%A, %entry], [%X, %latch]

it is incorrect to use SCEV of backedge val %X as an exit value
of PHI unless %X is loop invariant.
This is because exit value of %1 is value of %X at one-before-last
iteration of the loop.

Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D73181
2020-03-31 18:39:24 +07:00
Sebastian Neubauer 5d3a69feca [AMDGPU] New llvm.amdgcn.ballot intrinsic
Add a new llvm.amdgcn.ballot intrinsic modeled on the ballot function
in GLSL and other shader languages. It returns a bitfield containing the
result of its boolean argument in all active lanes, and zero in all
inactive lanes.

This is intended to replace the existing llvm.amdgcn.icmp and
llvm.amdgcn.fcmp intrinsics after a suitable transition period.

Use the new intrinsic in the atomic optimizer pass.

Differential Revision: https://reviews.llvm.org/D65088
2020-03-31 10:35:39 +02:00
Craig Topper b4695351cb [TTI][X86] Fix the value passed to IsUnsigned for cost modeling of experimental.vector.reduce.smin/smax/umin/umax.
We were passing true for smax/smin and false for umax/umin.
2020-03-29 23:34:22 -07:00
Craig Topper d74533a18b [X86] Add sse4.1 RUNs lines to the min/max reduction cost model tests.
Mostly this matches the sse4.2 we already had command lines for.
Except in the i64 case since sse4.1 doesn't have pcmpgtq.
2020-03-29 16:05:35 -07:00
Craig Topper c0aa97b632 [X86] Add cost model test cases for fmin/fmax reduction. 2020-03-28 17:12:49 -07:00
Craig Topper f4c67dfa92 [X86] More accurately model the cost of horizontal reductions.
This patch attempts to more accurately model the reduction of
power of 2 vectors of types we natively support. This takes into
account the narrowing of vectors that occur as we go from 512
bits to 256 bits, to 128 bits. It also takes into account the use
of wider elements in the shuffles for the first 2 steps of a
reduction from 128 bits. And uses a v8i16 shift for the final step
of vXi8 reduction.

The default implementation uses the legalized type for the arithmetic
for all levels. And uses the single source permute cost of the
legalized type for all levels. This penalizes things like
lack of v16i8 pshufb on pre-sse3 targets and the splitting and
joining that needs to be done for integer types on AVX1. We never
need v16i8 shuffle for a reduction and we only need split AVX1 ops
when type the type wide and needs to be split. I think we're still
over costing splits and joins for AVX1, but we're closer now.

I've also removed all pairwise special casing because I don't
think we ever want to generate that on X86. I've also adjusted
the add handling to more accurately account for any type splitting
that occurs before we reach a legal type.

Differential Revision: https://reviews.llvm.org/D76478
2020-03-22 14:20:15 -07:00
Bjorn Pettersson d077d678d3 [ValueTracking] Avoid blind cast from Operator to Instruction
Summary:
Avoid blind cast from Operator to ExtractElementInst in
computeKnownBitsFromOperator. This resulted in some crashes
in downstream fuzzy testing. Instead we use getOperand directly
on the Operator when accessing the vector/index operands.

Haven't seen any problems with InsertElement and ShuffleVector,
but I believe those could be used in constant expressions as well.
So the same kind of fix as for ExtractElement was also applied for
InsertElement.

When it comes to ShuffleVector we now simply bail out if a dynamic
cast of the Operator to ShuffleVectorInst fails. I've got no
reproducer indicating problems for ShuffleVector, and a fix would be
slightly more complicated as getShuffleDemandedElts is involved.

Reviewers: RKSimon, nikic, spatel, efriedma

Reviewed By: RKSimon

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76564
2020-03-22 14:45:31 +01:00
Nikita Popov ce6c95aaca [InstCombine] Move test to instcombine; NFC
This test uses -instcombine, so move it into the appropriate
directory. Also fork it for expensive checks enabled/disabled.
2020-03-20 12:41:19 +01:00
Nikita Popov a09ff56b5b [Tests] Regenerate some test checks; NFC 2020-03-20 12:06:53 +01:00
Craig Topper c13aa36bb7 [X86] Attempt to more accurately model the cost of a bool reduction of wide vector type.
Previously we multiplied the cost for the table entries by the number of splits needed. But that implies that each split goes through a reduction to scalar independently. I think what really happens is that the we AND/OR the split pieces until we're down to a single value with a legal type and then do special reduction sequence on that.

So to model that this patch takes the number of splits minus one multiplied by the cost of a AND/OR at the legal element count and adds that on top of the table lookup.

Differential Revision: https://reviews.llvm.org/D76400
2020-03-19 09:31:05 -07:00
Eli Friedman ebec984e14 [AliasAnalysis] Misc fixes for checking aliasing with scalable types.
This is fixing up various places that use the implicit
TypeSize->uint64_t conversion.

The new overloads in MemoryLocation.h are already used in various places
that construct a MemoryLocation from a TypeSize, including MemorySSA.
(They were using the implicit conversion before.)

Differential Revision: https://reviews.llvm.org/D76249
2020-03-18 12:28:47 -07:00
Sam Parker ef56b55e12 [NFC][ARM] Add thumb triple to test
Test the costs of selects for thumbv8m.base too.
2020-03-18 09:15:19 +00:00
Evgenii Stepanov 2a3723ef11 [memtag] Plug in stack safety analysis.
Summary:
Run StackSafetyAnalysis at the end of the IR pipeline and annotate
proven safe allocas with !stack-safe metadata. Do not instrument such
allocas in the AArch64StackTagging pass.

Reviewers: pcc, vitalybuka, ostannard

Reviewed By: vitalybuka

Subscribers: merge_guards_bot, kristof.beyls, hiraditya, cfe-commits, gilang, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D73513
2020-03-16 16:35:25 -07:00
Nico Weber 623cb95eb3 Revert "[InstSimplify] Simplify calls with "returned" attribute"
This reverts commit 45555c3819.
Causes clang crashes in some causes, see comments on
https://reviews.llvm.org/D75815 for details (including
repro steps).
2020-03-16 15:21:30 -04:00
Craig Topper b2da1ddaef [X86] Add a non-zero cost for truncating v32i16->v32i8 on avx512bw. 2020-03-15 17:18:46 -07:00
Florian Hahn b8b8f04c0d [ValueLattice] Go to overdefined in getRange() for full ranges.
This is was split off 4878aa36d4,
as it can go in separately.
2020-03-14 19:50:15 +00:00
Eli Friedman 65fc706ddf [SCEV] Add support for GEPs over scalable vectors.
Because we have to use a ConstantExpr at some point, the canonical form
isn't set in stone, but this seems reasonable.

The pretty sizeof(<vscale x 4 x i32>) dumping is a relic of ancient
LLVM; I didn't have to touch that code. :)

Differential Revision: https://reviews.llvm.org/D75887
2020-03-13 16:12:45 -07:00
Simon Pilgrim a2db388dce [CostModel][X86] Improve ISD::CTTZ costs accounting for BSF/TZCNT implementations 2020-03-13 16:51:13 +00:00
Huihui Zhang f4f2706572 [ConstantFold][SVE] Fix constant folding for scalable vector compare instruction.
Summary:
Do not iterate on scalable vector. Also do not return constant scalable vector
from ConstantInt::get().
Fix result type by using getElementCount() instead of getNumElements().

Reviewers: sdesmalen, efriedma, apazos, huntergr, willlovett

Reviewed By: efriedma

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73753
2020-03-12 16:15:38 -07:00
Anna Welker a6d3bec83f [TTI][ARM][MVE] Refine gather/scatter cost model
Refines the gather/scatter cost model, but also changes the TTI
function getIntrinsicInstrCost to accept an additional parameter
which is needed for the gather/scatter cost evaluation.
This did require trivial changes in some non-ARM backends to
adopt the new parameter.
Extending gathers and truncating scatters are now priced cheaper.

Differential Revision: https://reviews.llvm.org/D75525
2020-03-11 10:23:41 +00:00
Simon Pilgrim 5cbddf7cbc [X86][SSE] Add more accurate costs for fmaxnum/fminnum codegen
Based off llvm-mca reports on codegen in llvm\test\CodeGen\X86\fmaxnum.ll + llvm\test\CodeGen\X86\fminnum.ll
2020-03-10 11:59:40 +00:00
Simon Pilgrim 0b1dc6016f [CostModel][X86] Add fmaxnum/fminnum costs tests 2020-03-10 11:18:27 +00:00
Nikita Popov 45555c3819 [InstSimplify] Simplify calls with "returned" attribute
If a call argument has the "returned" attribute, we can simplify
the call to the value of that argument. The "-inst-simplify" pass
already handled this for the constant integer argument case via
known bits, which is invoked in SimplifyInstruction. However,
non-constant (or non-int) arguments are not handled at all right now.

This addresses one of the regressions from D75801.

Differential Revision: https://reviews.llvm.org/D75815
2020-03-09 18:53:47 +01:00
Jay Foad 596446623b [AMDGPU][ConstantFolding] Fold llvm.amdgcn.cube* intrinsics
Summary:
This folds the following family of intrinsics:
llvm.amdgcn.cubeid (face id)
llvm.amdgcn.cubema (major axis)
llvm.amdgcn.cubesc (S coordinate)
llvm.amdgcn.cubetc (T coordinate)

Reviewers: nhaehnle, arsenm, rampitec

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75187
2020-03-06 16:42:53 +00:00
David Green 587feec07e [ARM] Change all tests from "thumbv8.1-m.main" to "thumbv8.1m.main". NFC 2020-03-04 13:47:35 +00:00
Evgeniy Brevnov e60c28746b Lost regression test from commit 5a63813dc7. 2020-03-04 19:52:42 +07:00
Whitney Tsang c84532a70a [LoopNest]: Analysis to discover properties of a loop nest.
Summary: This patch adds an analysis pass to collect loop nests and
summarize properties of the nest (e.g the nest depth, whether the nest
is perfect, what's the innermost loop, etc...).

The motivation for this patch was discussed at the latest meeting of the
LLVM loop group (https://ibm.box.com/v/llvm-loop-nest-analysis) where we
discussed
the unimodular loop transformation framework ( “A Loop Transformation
Theory and an Algorithm to Maximize Parallelism”, Michael E. Wolf and
Monica S. Lam, IEEE TPDS, October 1991). The unimodular framework
provides a convenient way to unify legality checking and code generation
for several loop nest transformations (e.g. loop reversal, loop
interchange, loop skewing) and their compositions. Given that the
unimodular framework is applicable to perfect loop nests this is one
property of interest we expose in this analysis. Several other utility
functions are also provided. In the future other properties of interest
can be added in a centralized place.
Authored By: etiotto
Reviewer: Meinersbur, bmahjour, kbarton, Whitney, dmgreen, fhahn,
reames, hfinkel, jdoerfert, ppc-slack
Reviewed By: Meinersbur
Subscribers: bryanpkc, ppc-slack, mgorny, hiraditya, llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D68789
2020-03-03 18:25:19 +00:00
Whitney Tsang 613f791131 Revert "[LoopNest]: Analysis to discover properties of a loop nest."
This reverts commit 3a063d68e3.

Broke the build with modules enabled:
http://green.lab.llvm.org/green/job/lldb-cmake/10655/console .
2020-03-03 14:07:49 +00:00
Whitney Tsang 3a063d68e3 [LoopNest]: Analysis to discover properties of a loop nest.
Summary: This patch adds an analysis pass to collect loop nests and
summarize properties of the nest (e.g the nest depth, whether the nest
is perfect, what's the innermost loop, etc...).

The motivation for this patch was discussed at the latest meeting of the
LLVM loop group (https://ibm.box.com/v/llvm-loop-nest-analysis) where we
discussed
the unimodular loop transformation framework ( “A Loop Transformation
Theory and an Algorithm to Maximize Parallelism”, Michael E. Wolf and
Monica S. Lam, IEEE TPDS, October 1991). The unimodular framework
provides a convenient way to unify legality checking and code generation
for several loop nest transformations (e.g. loop reversal, loop
interchange, loop skewing) and their compositions. Given that the
unimodular framework is applicable to perfect loop nests this is one
property of interest we expose in this analysis. Several other utility
functions are also provided. In the future other properties of interest
can be added in a centralized place.
Authored By: etiotto
Reviewer: Meinersbur, bmahjour, kbarton, Whitney, dmgreen, fhahn,
reames, hfinkel, jdoerfert, ppc-slack
Reviewed By: Meinersbur
Subscribers: bryanpkc, ppc-slack, mgorny, hiraditya, llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D68789
2020-03-03 13:25:28 +00:00
Simon Pilgrim 174cb7c695 [CostModel][X86] Add vXi1 extract/insert cost tests 2020-03-02 11:41:20 +00:00
Simon Pilgrim 168a44a70e [CostModel][X86] Improve extract/insert element costs (PR43605)
This tries to improve the accuracy of extract/insert element costs by accounting for subvector extraction/insertion for >128-bit vectors and the shuffling of elements to/from the 0'th index.

It also adds INSERTPS for f32 types and PINSR/PEXTR costs for integer types (at the moment we assume the same cost as MOVD/MOVQ - which isn't always true).

Differential Revision: https://reviews.llvm.org/D74976
2020-02-27 15:54:13 +00:00
Bardia Mahjour 1b811ff8a9 [DA] Delinearization of fixed-size multi-dimensional arrays
Summary:
Currently the dependence analysis in LLVM is unable to compute accurate
dependence vectors for multi-dimensional fixed size arrays.
This is mainly because the delinearization algorithm in scalar evolution
relies on parametric terms to be present in the access functions. In the
case of fixed size arrays such parametric terms are not present, but we
can use the indexes from GEP instructions to recover the subscripts for
each dimension of the arrays. This patch adds this ability under the
existing option `-da-disable-delinearization-checks`.

Authored By: bmahjour

Reviewer: Meinersbur, sebpop, fhahn, dmgreen, grosser, etiotto, bollu

Reviewed By: Meinersbur

Subscribers: hiraditya, arphaman, Whitney, ppc-slack, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72178
2020-02-27 10:29:01 -05:00
Jay Foad 5900d3f2e9 [AMDGPU][ConstantFolding] Fold llvm.amdgcn.fract intrinsic
Reviewers: nhaehnle, arsenm, rampitec

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75179
2020-02-27 14:37:53 +00:00
Simon Pilgrim b82438872b [CostModel][X86] We don't need a scale factor for SLM extract costs
D74976 will handle larger vector types, but since SLM doesn't support AVX+ then we will always be extracting from 128-bit vectors so don't need to scale the cost.
2020-02-24 14:23:04 +00:00
Simon Pilgrim eaa41e103c [CostModel][X86] Try to check against common prefixes before using target-specific cpu checks
SLM/GLM is still a mess so not all of them have been updated yet.
2020-02-24 11:59:07 +00:00
Jonas Paulsson 41bd9ead35 [SystemZ] Return scalarized costs for vector instructions on older archs.
A cost query for a vector instruction should return a cost even without
target vector support, and not trigger an assert.

VectorCombine does this with an input containing source code vectors.

Review: Ulrich Weigand
2020-02-21 09:17:37 -08:00
Evgeniy Brevnov b0761bbc76 [DependenceAnalysis] Memory dependence analysis internal caching mechanism is broken in presence of TBAA (PR42733).
Summary:
There is a flaw in memory dependence analysis caching mechanism when memory accesses with TBAA are involved. Assume we first analysed and cached results for access with TBAA. Later we request dependence for the same memory but without TBAA (or different TBAA). By design these two queries should share one entry in the internal cache which corresponds to a general access (without TBAA).  Thus upon second request internal cached is cleared and we continue analysis for access as if there is no TBAA.

The problem is that even though internal cache is cleared the set of visited nodes is not. That means we won't traverse visited nodes again and populate internal cache with the corresponding dependence results. So we end up  with internal cache in an incomplete state. Current implementation tries to signal that situation by resetting CacheInfo->Pair at line 1104. But that doesn't actually help since later code ignores this invalidation and relies on 'Cache->empty()' property to decide on cache completeness.

Reviewers: reames, hfinkel, chandlerc, fedor.sergeev, asbirlea, fhahn, john.brawn, Prazek, sunfish

Reviewed By: john.brawn

Subscribers: DaniilSuchkov, kosarev, jfb, dantrushin, hiraditya, bmahjour, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73032
2020-02-21 20:20:36 +07:00
Sanjay Patel d799190851 [ConstantFold] fold fsub -0.0, undef to undef rather than NaN
A question about this behavior came up on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2020-February/139003.html
...and as part of backend improvements in D73978, but this is an IR
change first because we already have fairly thorough tests in place
here.

We decided not to implement a more general change that would have
folded any FP binop with nearly arbitrary constant + undef operand
to undef because that is not theoretically correct (even if it is
practically correct).

Differential Revision: https://reviews.llvm.org/D74713
2020-02-21 08:03:19 -05:00
Sanjay Patel 7ddbf802cf [ConstantFold] add/move tests for FP with undef operand; NFC 2020-02-20 15:07:11 -05:00
Hideto Ueno e253cdda35 [MustExecute] Add backward exploration for must-be-executed-context
Summary:
As mentioned in D71974, it is useful for must-be-executed-context to explore CFG backwardly.
This patch is ported from parts of D64975. We use a dominator tree to find the previous context if
a dominator tree is available.

Reviewers: jdoerfert, hfinkel, baziotis, sstefan1

Reviewed By: jdoerfert

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74817
2020-02-20 14:49:30 +09:00
Bardia Mahjour 0a2626d0cd [DDG] Data Dependence Graph - Graph Simplification
Summary:
This is the last functional patch affecting the representation of DDG.
Here we try to simplify the DDG to reduce the number of nodes and edges by
iteratively merging pairs of nodes that satisfy the following conditions,
until no such pair can be identified. A pair of nodes consisting of a and b
can be merged if:

    1. the only edge from a is a def-use edge to b and
    2. the only edge to b is a def-use edge from a and
    3. there is no cyclic edge from b to a and
    4. all instructions in a and b belong to the same basic block and
    5. both a and b are simple (single or multi instruction) nodes.

These criteria allow us to fold many uninteresting def-use edges that
commonly exist in the graph while avoiding the risk of introducing
dependencies that didn't exist before.

Authored By: bmahjour

Reviewer: Meinersbur, fhahn, myhsu, xtian, dmgreen, kbarton, jdoerfert

Reviewed By: Meinersbur

Subscribers: ychen, arphaman, simoll, a.elovikov, mgorny, hiraditya, jfb, wuzish, llvm-commits, jsji, Whitney, etiotto, ppc-slack

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72350
2020-02-19 13:41:51 -05:00
Jay Foad b329d1b06e [AMDGPU][ConstantFolding] Fold llvm.amdgcn.fmul.legacy intrinsic
Reviewers: arsenm, rampitec, nhaehnle

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74835
2020-02-19 16:01:30 +00:00
Evgeniy Brevnov cae643d596 Reverting D73027 [DependenceAnalysis] Dependecies for loads marked with "ivnariant.load" should not be shared with general accesses(PR42151). 2020-02-14 22:57:23 +07:00
Evgeniy Brevnov 5573abceab [DependenceAnalysis] Dependecies for loads marked with "ivnariant.load" should not be shared with general accesses(PR42151).
Summary:
This is second attempt to fix the problem with incorrect dependencies reported in presence of invariant load. Initial fix (https://reviews.llvm.org/D64405) was reverted due to a regression reported in https://reviews.llvm.org/D70516.

The original fix changed caching behavior for invariant loads. Namely such loads are not put into the second level cache (NonLocalDepInfo). The problem with that fix is the first level cache  (CachedNonLocalPointerInfo) still works as if invariant loads were in the second level cache. The solution is in addition to not putting dependence results into the second level cache avoid putting info about invariant loads into the first level cache as well.

Reviewers: jdoerfert, reames, hfinkel, efriedma

Reviewed By: jdoerfert

Subscribers: DaniilSuchkov, hiraditya, bmahjour, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73027
2020-02-14 12:18:31 +07:00
Huihui Zhang 5350a48931 [ConstantFold][SVE] Fix constant fold for FoldReinterpretLoadFromConstPtr.
Summary:
Bail out early for scalable vectors. As global variables are not expected
to be scalable.

Use explicit call of getFixedSize() to assert on places where scalable size
doesn't make sense.

Reviewers: sdesmalen, efriedma, apazos, huntergr, willlovett

Reviewed By: sdesmalen

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74424
2020-02-12 10:24:50 -08:00
Ehud Katz 2470d2988a [ConstantFolding] Fold calls to FP remainder function
With the fixed implementation of the "remainder" operation in
rG9d0956ebd471, we can now add support to folding calls to it.

Differential Revision: https://reviews.llvm.org/D69777
2020-02-12 13:21:18 +02:00
Nicolai Hähnle ab2f610f38 AMDGPU: llvm.amdgcn.writelane is a source of divergence
Summary:
Consider:

  %r = call i32 @llvm.amdgcn.writelane(i32 0, i32 1, i32 2)

This produces a value that is 0 on lane 1, and 2 everywhere else; i.e.,
it is divergent.

Reported-by: Marek Olsak <Marek.Olsak@amd.com>

Reviewers: arsenm, foad, mareko

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74400
2020-02-12 09:12:56 +01:00
Huihui Zhang 88de9338f2 [ConstantFold][SVE] Fix constand fold for vector call.
Summary:
Do not iterate on scalable vectors.

Reviewers: sdesmalen, efriedma, apazos, huntergr, willlovett

Reviewed By: sdesmalen

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74419
2020-02-11 14:06:15 -08:00
Alina Sbirlea 0cecafd647 [BasicAA] Make BasicAA a cfg pass.
Summary:
Part of the changes in D44564 made BasicAA not CFG only due to it using
PhiAnalysisValues which may have values invalidated.
Subsequent patches (rL340613) appear to have addressed this limitation.

BasicAA should not be invalidated by non-CFG-altering passes.
A concrete example is MemCpyOpt which preserves CFG, but we are testing
it invalidates BasicAA.

llvm-dev RFC: https://groups.google.com/forum/#!topic/llvm-dev/eSPXuWnNfzM

Reviewers: john.brawn, sebpop, hfinkel, brzycki

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74353
2020-02-11 11:30:08 -08:00
Rachel Craik 1f55420065 [LoopCacheAnalysis]: Add support for negative stride
LoopCacheAnalysis currently assumes the loop will be iterated over in
a forward direction. This patch addresses the issue by using the
absolute value of the stride when iterating backwards.

Note: this patch will treat negative and positive array access the
same, resulting in the same cost being calculated for single and
bi-directional access patterns. This should be improved in a
subsequent patch.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D73064
2020-02-10 13:22:35 -05:00
Huihui Zhang 5389ca7a1f [ConstantFold][NFC] Move scalable vector unit tests under vscale.ll 2020-02-05 16:03:51 -08:00
Huihui Zhang 801857c59e [ConstantFold][SVE] Fix constant folding for bitcast.
Do not iterate on scalable vector type in BitCastConstantVector.
Continuation work of D70985, D71147.

Support for folding bitcast into splat value is kept in D74095, as
it depends on D71637.

Differential Revision: https://reviews.llvm.org/D71389
2020-02-05 15:39:57 -08:00
Christopher Tetreault b03f3fbd6a Reapply: [SVE] Fix bug in simplification of scalable vector instructions
This reverts commit a05441038a, reapplying
commit 31574d38ac
2020-02-05 10:00:09 -08:00
Matt Arsenault 096cd991ee AMDGPU: Fix divergence analysis of control flow intrinsics
The mask results of these should be uniform. The trickier part is the
dummy booleans used as IR glue need to be treated as divergent. This
should make the divergence analysis results correct for the IR the DAG
is constructed from.

This should allow us to eliminate requiresUniformRegister, which has
an expensive, recursive scan over all users looking for control flow
intrinsics. This should avoid recent compile time regressions.
2020-02-05 09:30:54 -08:00
Matt Arsenault 4f9f5d09de AMDGPU: Fix isAlwaysUniform for simple asm SGPR results
We were handling the case where the result was a struct with an
extracted SGPR component, but not for the simple case.
2020-02-04 13:34:14 -08:00
Matt Arsenault cb7b661d3d AMDGPU: Analyze divergence of inline asm 2020-02-03 12:42:16 -08:00
Reid Kleckner a05441038a Revert "[SVE] Fix bug in simplification of scalable vector instructions"
This reverts commit 31574d38ac.

The newly added shufflevector test does not pass locally on either of my
workstations.
2020-02-03 11:12:09 -08:00
Christopher Tetreault 31574d38ac [SVE] Fix bug in simplification of scalable vector instructions
Summary:
* Most of the simplifications in SimplifyShuffleVectorInst depend on the
concrete value of, or the length of the mask vector. For scalable
vectors, this cannot be known at compile time.
** for these tests, detect if the vector is scalable before attempting
the transformation
* The functions ShuffleVectorInst::getMaskValue and
ShuffleVectorInst::getShuffleMask access the value of the constant mask.
However, since the length of the mask is unknown at compile time, these
function do not work for scalable vectors. Add asserts to ensure that
the input mask is not scalable

Reviewers: efriedma, sdesmalen, apazos, chrisj, huihuiz

Reviewed By: efriedma

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73555
2020-02-03 10:15:56 -08:00
Huihui Zhang b0d25fff9b [ConstantFold][SVE][NFC] Add test for select instruction in scalable vector.
Side notes from D73669, no need to guard the iteration on vectors, as
it is explicitly looking for a ConstantVector/ConstantDataVector, which
is not expected to be scalable at the moment. So, add the test only.
2020-01-30 10:56:12 -08:00
Huihui Zhang 34e6552dcb [ConstantFold][SVE] Fix constant folding for scalable vector unary operations.
Summary:
Similar to issue D71445. Scalable vector should not be evaluated element by element.
Add support to handle scalable vector UndefValue.

Reviewers: sdesmalen, efriedma, apazos, huntergr, willlovett

Reviewed By: efriedma

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73678
2020-01-30 10:45:15 -08:00
Craig Topper 35625464c6 [X86] Fix the cost model for v16i16->v16i32 zero_extend/sign_extend with AVX2
We seem to be inheriting the cost from sse4.1. But if we have 256-bit registers we should be able to do this with just one extract to split the 16i16 and two v8i16->v8i32 operations so our cost should be 3 not 4.

Differential Revision: https://reviews.llvm.org/D73646
2020-01-29 15:52:10 -08:00
Huihui Zhang d2e2fc450e [ConstantFold][SVE] Fix constant folding for scalable vector binary operations.
Summary:
Scalable vector should not be evaluated element by element.
Add support to handle scalable vector UndefValue.

Reviewers: sdesmalen, huntergr, spatel, lebedev.ri, apazos, efriedma, willlovett

Reviewed By: efriedma

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71445
2020-01-29 10:49:08 -08:00
Evgenii Stepanov 34ab56904e Support zero size types in StackSafetyAnalysis.
Reviewers: vitalybuka

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73395
2020-01-27 15:22:59 -08:00
Evgenii Stepanov c3b80adcee Fix StackSafetyAnalysis crash with scalable vector types.
Summary:
Treat scalable allocas as if they have storage size of 0, and
scalable-typed memory accesses as if their range is unlimited.

This is not a proper support of scalable vector types in the analysis -
we can do better, but not today.

Reviewers: vitalybuka

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73394
2020-01-27 15:22:59 -08:00
Roman Lebedev 9c801c48ee
[NFC][IndVarSimplify] Autogenerate tests affected by isHighCostExpansionHelper() cost modelling (PR44668) 2020-01-27 23:34:29 +03:00
Craig Topper b1f3a0f972 Revert a107f86 "[GlobalsAA] Add back a check to intrinsic_addresstaken.ll to see if the AVX and AVX512 bots still fail for it."
It still fails some buildbots which is what I was trying to test.
2020-01-24 13:15:23 -08:00
Craig Topper a107f86417 [GlobalsAA] Add back a check to intrinsic_addresstaken.ll to see if the AVX and AVX512 bots still fail for it.
These bots failed for this several months ago and as a result, this
check was removed. If they still fail I'm going to try to see if I
can figure out why.
2020-01-24 11:54:23 -08:00
Austin Kerbow c226646337 Resubmit: [DA][TTI][AMDGPU] Add option to select GPUDA with TTI
Summary:
Enable the new diveregence analysis by default for AMDGPU.

Resubmit with test updates since GPUDA was causing failures on Windows.

Reviewers: rampitec, nhaehnle, arsenm, thakis

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73315
2020-01-24 10:39:40 -08:00
Austin Kerbow 37aa16ebb7 [DA] Don't propagate from unreachable blocks
Summary: Fixes crash that could occur when a divergent terminator has an unreachable parent.

Reviewers: rampitec, nhaehnle, arsenm

Subscribers: jvesely, wdng, hiraditya, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73323
2020-01-24 10:28:11 -08:00
David Green e9c198278e [ARM] Basic gather scatter cost model
This is a very basic MVE gather/scatter cost model, based roughly on the
code that we will currently produce. It does not handle truncating
scatters or extending gathers correctly yet, as it is difficult to tell
that they are going to be correctly extended/truncated from the limited
information in the cost function.

This can be improved as we extend support for these in the future.

Based on code originally written by David Sherwood.

Differential Revision: https://reviews.llvm.org/D73021
2020-01-22 14:41:38 +00:00
David Green 0b83e14804 [ARM] MVE Gather Scatter cost model tests. NFC 2020-01-22 14:41:38 +00:00
Florian Hahn 0b21d55262 [IR] Mark memset.* intrinsics as IntrWriteMem.
llvm.memset intrinsics do only write memory, but are missing
IntrWriteMem, so they doesNotReadMemory() returns false for them.

The test change is due to the test checking the fn attribute ids at the
call sites, which got bumped up due to a new combination with writeonly
appearing in the test file.

Reviewers: jdoerfert, reames, efriedma, nlopes, lebedev.ri

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D72789
2020-01-16 10:35:46 +00:00
Zheng Chen a6342c247a [SCEV] accurate range for addrecexpr with nuw flag
If addrecexpr has nuw flag, the value should never be less than its
start value and start value does not required to be SCEVConstant.

Reviewed By: nikic, sanjoy

Differential Revision: https://reviews.llvm.org/D71690
2020-01-12 20:22:37 -05:00
Zheng Chen 569ccfc384 [SCEV] more accurate range for addrecexpr with nsw flag.
Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D72436
2020-01-11 23:26:35 -05:00
Zheng Chen a701be8f03 [SCEV] [NFC] add more test cases for range of addrecexpr with nsw flag 2020-01-10 22:44:47 -05:00
Zheng Chen 4ebb589629 [SCEV] [NFC] add testcase for constant range for addrecexpr with nsw flag 2020-01-09 01:26:57 -05:00
Simon Pilgrim 5d986a68a5 [CostModel][X86] Add missing scalar i64->f32 uitofp costs 2020-01-06 13:17:02 +00:00
Fangrui Song a36ddf0aa9 Migrate function attribute "no-frame-pointer-elim"="false" to "frame-pointer"="none" as cleanups after D56351 2019-12-24 16:27:51 -08:00
Fangrui Song eb16435b5e Migrate function attribute "no-frame-pointer-elim-non-leaf" to "frame-pointer"="non-leaf" as cleanups after D56351 2019-12-24 16:05:15 -08:00
Fangrui Song 502a77f125 Migrate function attribute "no-frame-pointer-elim" to "frame-pointer"="all" as cleanups after D56351 2019-12-24 15:57:33 -08:00
czhengsz 7259f04dde [SCEV] add testcase for get accurate range for addrecexpr with nuw flag 2019-12-22 20:58:19 -05:00
Bardia Mahjour 86acaa9457 [DDG] Data Dependence Graph - Ordinals
Summary:
This patch associates ordinal numbers to the DDG Nodes allowing
the builder to order nodes within a pi-block in program order. The
algorithm works by simply assuming the order in which the BBList
is fed into the builder. The builder already relies on the blocks being
in program order so that it can compute the dependencies correctly.
Similarly the order of instructions in their parent basic blocks
determine their program order.

Authored By: bmahjour

Reviewer: Meinersbur, fhahn, myhsu, xtian, dmgreen, kbarton, jdoerfert

Reviewed By: Meinersbur

Subscribers: ychen, arphaman, simoll, a.elovikov, mgorny, hiraditya, jfb, wuzish, llvm-commits, jsji, Whitney, etiotto, ppc-slack

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70986
2019-12-19 10:57:33 -05:00
czhengsz d588a00206 [SCEV] NFC - add testcase for get accurate range for AddExpr 2019-12-19 04:11:45 -05:00
Stanislav Mekhanoshin 58578f7056 [AMDGPU] Implemented fma cost analysis
Differential Revision: https://reviews.llvm.org/D71676
2019-12-18 23:54:20 -08:00
Stanislav Mekhanoshin b8ac5894a1 [AMDGPU] Fixed cost model for packed 16 bit ops
Differential Revision: https://reviews.llvm.org/D71622
2019-12-17 15:14:17 -08:00
Florian Hahn bd12a322d7 [BasicAA] Use GEP as context for computeKnownBits in aliasGEP.
In order to use assumptions, computeKnownBits needs a context
instruction. We can use the GEP, if it is an instruction. We already
pass the assumption cache, but it cannot be used without a context
instruction.

Reviewers: anemet, asbirlea, hfinkel, spatel

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D71264
2019-12-12 17:18:04 +00:00
Danila Kutenin 19e83a9b4c [ValueTracking] Pointer is known nonnull after load/store
If the pointer was loaded/stored before the null check, the check
is redundant and can be removed. For now the optimizers do not
remove the nullptr check, see https://gcc.godbolt.org/z/H2r5GG.
The patch allows to use more nonnull constraints. Also, it found
one more optimization in some PowerPC test. This is my first llvm
review, I am free to any comments.

Differential Revision: https://reviews.llvm.org/D71177
2019-12-11 20:32:29 +01:00
Danila Kutenin fc765698e0 [ValueTracking] Add tests for non-null check after load/store; NFC
Tests for D71177.
2019-12-11 20:26:31 +01:00
Bardia Mahjour 916d37a2bc [DA] Improve dump to show source and sink of the dependence
Summary:
The current da printer shows the dependence without indicating
which instructions are being considered as the src vs dst. It
also silently ignores call instructions, despite the fact that
they create confused dependence edges to other memory
instructions. This patch addresses these two issues plus a
couple of minor non-functional improvements.

Authored By: bmahjour

Reviewer: dmgreen, fhahn, philip.pfaffe, chandlerc

Reviewed By: dmgreen, fhahn

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71088
2019-12-11 11:48:16 -05:00
Eli Friedman 7c69a03c56 [ConstantFold][SVE] Fix constant folding for shufflevector.
Don't try to fold away shuffles which can't be folded.  Fix creation of
shufflevector constant expressions.

Differential Revision: https://reviews.llvm.org/D71147
2019-12-09 15:31:50 -08:00
David Green be7a107070 [ARM] Teach the Arm cost model that a Shift can be folded into other instructions
This attempts to teach the cost model in Arm that code such as:
  %s = shl i32 %a, 3
  %a = and i32 %s, %b
Can under Arm or Thumb2 become:
  and r0, r1, r2, lsl #3

So the cost of the shift can essentially be free. To do this without
trying to artificially adjust the cost of the "and" instruction, it
needs to get the users of the shl and check if they are a type of
instruction that the shift can be folded into. And so it needs to have
access to the actual instruction in getArithmeticInstrCost, which if
available is added as an extra parameter much like getCastInstrCost.

We otherwise limit it to shifts with a single user, which should
hopefully handle most of the cases. The list of instruction that the
shift can be folded into include ADC, ADD, AND, BIC, CMP, EOR, MVN, ORR,
ORN, RSB, SBC and SUB. This translates to Add, Sub, And, Or, Xor and
ICmp.

Differential Revision: https://reviews.llvm.org/D70966
2019-12-09 10:24:33 +00:00
David Green f008b5b8ce [ARM] Additional tests and minor formatting. NFC
This adds some extra cost model tests for shifts, and does some minor
adjustments to some Neon code to make it clear as to what it applies to.
Both NFC.
2019-12-09 10:24:33 +00:00
Sanjay Patel 7ff0fcb53f [x86] add cost model special-case for insert/extract from element 0
This is a follow-up to D70607 where we made any
extract element on SLM more costly than default. But that is
pessimistic for extract from element 0 because that corresponds
to x86 movd/movq instructions. These generally have >1 cycle
latency, but they are probably implemented as single uop
instructions.

Note that no vectorization tests are affected by this change.
Also, no targets besides SLM are affected because those are
falling through to the default cost of 1 anyway. But this will
become visible/important if we add more specializations via cost
tables.

Differential Revision: https://reviews.llvm.org/D71023
2019-12-06 13:50:25 -05:00
Huihui Zhang 381d3c5c45 [ConstantFold][SVE] Skip scalable vectors in ConstantFoldInsertElementInstruction.
Summary:
Should not constant fold insertelement instruction for scalable vector type.

Reviewers: huntergr, sdesmalen, spatel, levedev.ri, apazos, efriedma, willlovett

Reviewed By: efriedma, spatel

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70985
2019-12-05 19:43:19 -08:00
Bardia Mahjour 2dd82a1c04 [DDG] Data Dependence Graph - Topological Sort (Memory Leak Fix)
Summary:
This fixes the memory leak in bec37c3fc7
and re-delivers the reverted patch.
In this patch the DDG DAG is sorted topologically to put the
nodes in the graph in the order that would satisfy all
dependencies. This helps transformations that would like to
generate code based on the DDG. Since the DDG is a DAG a
reverse-post-order traversal would give us the topological
ordering. This patch also sorts the basic blocks passed to
the builder based on program order to ensure that the
dependencies are computed in the correct direction.

Authored By: bmahjour

Reviewer: Meinersbur, fhahn, myhsu, xtian, dmgreen, kbarton, jdoerfert

Reviewed By: Meinersbur

Subscribers: ychen, arphaman, simoll, a.elovikov, mgorny, hiraditya, jfb, wuzish, llvm-commits, jsji, Whitney, etiotto, ppc-slack

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70609
2019-12-03 10:08:25 -05:00
Taewook Oh 2da205d43e Reland "b19ec1eb3d0c [BPI] Improve unreachable/ColdCall heurstics to handle loops."
Summary: b19ec1eb3d has been reverted because of the test failures
with PowerPC targets. This patch addresses the issues from the previous
commit.

Test Plan: ninja check-all. Confirmed that CodeGen/PowerPC/pr36292.ll
and CodeGen/PowerPC/sms-cpy-1.ll pass

Subscribers: llvm-commits
2019-12-02 10:28:40 -08:00
Roman Lebedev ec7436f299
Autogenerate test/Analysis/ValueTracking/non-negative-phi-bits.ll test
Forgot to stage this change into 0f22e783a0 commit.
2019-12-02 18:28:41 +03:00
Stefan Pintilie 8e84c9ae99 [PowerPC] Separate Features that are known to be Power9 specific from Future CPU
The Power 9 CPU has some features that are unlikely to be passed on to future
versions of the CPU. This patch separates this out so that future CPU does not
inherit them.

Differential Revision: https://reviews.llvm.org/D70466
2019-11-27 15:40:13 -06:00
taewookoh 5d21f75b57 Revert b19ec1eb3d
Summary: This reverts commit b19ec1eb3d as it fails powerpc tests

Subscribers: llvm-commits
2019-11-27 11:17:10 -08:00
Sanjay Patel 5c166f1d19 [x86] make SLM extract vector element more expensive than default
I'm not sure what the effect of this change will be on all of the affected
tests or a larger benchmark, but it fixes the horizontal add/sub problems
noted here:
https://reviews.llvm.org/D59710?vs=227972&id=228095&whitespace=ignore-most#toc

The costs are based on reciprocal throughput numbers in Agner's tables for
PEXTR*; these appear to be very slow ops on Silvermont.

This is a small step towards the larger motivation discussed in PR43605:
https://bugs.llvm.org/show_bug.cgi?id=43605

Also, it seems likely that insert/extract is the source of perf regressions on
other CPUs (up to 30%) that were cited as part of the reason to revert D59710,
so maybe we'll extend the table-based approach to other subtargets.

Differential Revision: https://reviews.llvm.org/D70607
2019-11-27 14:08:56 -05:00
Taewook Oh b19ec1eb3d [BPI] Improve unreachable/ColdCall heurstics to handle loops.
Summary:
While updatePostDominatedByUnreachable attemps to find basic blocks that are post-domianted by unreachable blocks, it currently cannot handle loops precisely, because it doesn't use the actual post dominator tree analysis but relies on heuristics of visiting basic blocks in post-order. More precisely, when the entire loop is post-dominated by the unreachable block, current algorithm fails to detect the entire loop as post-dominated by the unreachable because when the algorithm reaches to the loop latch it fails to tell all its successors (including the loop header) will "eventually" be post-domianted by the unreachable block, because the algorithm hasn't visited the loop header yet. This makes BPI for the loop latch to assume that loop backedges are taken with 100% of probability. And because of this, block frequency info sometimes marks virtually dead loops (which are post dominated by unreachable blocks) super hot, because 100% backedge-taken probability makes the loop iteration count the max value. updatePostDominatedByColdCall has the exact same problem as well.

To address this problem, this patch makes PostDominatedByUnreachable/PostDominatedByColdCall to be computed with the actual post-dominator tree.

Reviewers: skatkov, chandlerc, manmanren

Reviewed By: skatkov

Subscribers: manmanren, vsk, apilipenko, Carrot, qcolombet, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70104
2019-11-27 10:36:06 -08:00
Sanjay Patel 8d20dd0b06 [ConstFolding] move tests for copysign; NFC
InstCombine doesn't have any transforms for copysign currently.
2019-11-26 16:54:46 -05:00
Bardia Mahjour 67f0685b4d Revert "[DDG] Data Dependence Graph - Topological Sort"
Revert for now to look into the failures  on x86

This reverts commit bec37c3fc7.
2019-11-25 16:17:41 -05:00
bmahjour bec37c3fc7 [DDG] Data Dependence Graph - Topological Sort
Summary:
In this patch the DDG DAG is sorted topologically to put the
nodes in the graph in the order that would satisfy all
dependencies. This helps transformations that would like to
generate code based on the DDG. Since the DDG is a DAG a
reverse-post-order traversal would give us the topological
ordering. This patch also sorts the basic blocks passed to
the builder based on program order to ensure that the
dependencies are computed in the correct direction.

Authored By: bmahjour

Reviewer: Meinersbur, fhahn, myhsu, xtian, dmgreen, kbarton, jdoerfert

Reviewed By: Meinersbur

Subscribers: ychen, arphaman, simoll, a.elovikov, mgorny, hiraditya, jfb, wuzish, llvm-commits, jsji, Whitney, etiotto, ppc-slack

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70609
2019-11-25 11:28:58 -05:00
Philip Reames d9426c3360 [Tests] Autogenerate a bunch of SCEV trip count tests for readability. Will likely merge some of these files soon. 2019-11-21 10:46:16 -08:00