Add specific scalar costs for CTLZ instructions, we can't discriminate between CTLZ and CTLZ_ZERO_UNDEF so we have to assume the worst. Given how BSR is often a microcoded nightmare on some older targets we might still be underestimating it.
For targets supporting LZCNT (Intel Haswell+ or AMD Fam10+), we provide overrides that assume 1cy costs.
llvm-svn: 374786
Add specific scalar costs for ctpop instructions, these are based on the llvm-mca's SLM throughput numbers (the oldest model we have).
For targets supporting POPCNT, we provide overrides that assume 1cy costs.
llvm-svn: 374775
I can't see any notable differences in costs between SSE2 and SSE42 arches for FADD/ADD reduction, so I've lowered the target to just SSE2.
I've also added vXi8 sum reduction costs in line with the PSADBW codegen and discussions on PR42674.
llvm-svn: 374655
When simplifying a Phi to the unique value found incoming, check that
there wasn't a Phi already created to break a cycle. If so, remove it.
Resolves PR43541.
Some additional nits included.
llvm-svn: 374471
Summary:
Whenever we get the previous definition, the assumption is that the
recursion starts ina reachable block.
If the recursion starts in an unreachable block, we may recurse
indefinitely. Handle this case by returning LoE if the block is
unreachable.
Resolves PR43426.
Reviewers: george.burgess.iv
Subscribers: Prazek, sanjoy.google, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68809
llvm-svn: 374447
Summary:
The rule for the moveAllAfterMergeBlocks API si for all instructions
from `From` to have been moved to `To`, while keeping the CFG edges (and
block terminators) unchanged.
Update all the callsites for moveAllAfterMergeBlocks to follow this.
Pending follow-up: since the same behavior is needed everytime, merge
all callsites into one. The common denominator may be the call to
`MergeBlockIntoPredecessor`.
Resolves PR43569.
Reviewers: george.burgess.iv
Subscribers: Prazek, sanjoy.google, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68659
llvm-svn: 374177
MemoryPhis should be added in the IDF of the blocks newly gaining Defs.
This includes the blocks that gained a Phi and the block gaining a Def,
if the block did not have one before.
Resolves PR43427.
llvm-svn: 373505
Summary:
This patch adds Root Node to the DDG. The purpose of the root node is to create a single entry node that allows graph walk iterators to iterate through all nodes of the graph, making sure that no node is left unvisited during a graph walk (eg. SCC or DFS). Once the DDG is fully constructed it will have exactly one root node. Every node in the graph is reachable from the root. The algorithm for connecting the root node is based on depth-first-search that keeps track of visited nodes to try to avoid creating unnecessary edges.
Authored By: bmahjour
Reviewer: Meinersbur, fhahn, myhsu, xtian, dmgreen, kbarton, jdoerfert
Reviewed By: Meinersbur
Subscribers: ychen, arphaman, simoll, a.elovikov, mgorny, hiraditya, jfb, wuzish, llvm-commits, jsji, Whitney, etiotto, ppc-slack
Tag: #llvm
Differential Revision: https://reviews.llvm.org/D67970
llvm-svn: 373386
If a single predecessor is found, still check if the block is
unreachable. The test that found this had a self loop unreachable block.
Resolves PR43493.
llvm-svn: 373383
The check for "was there an access in this block" should be: is the last
access in this block and is it not a newly inserted phi.
Resolves new test in PR43438.
Also fix a typo when simplifying trivial Phis to match the comment.
llvm-svn: 373380
This reverts r366419 because the analysis performed is within the context of
the loop and it's only valid to add wrapping flags to "global" expressions if
they're always correct.
llvm-svn: 373184
SLM is 2 x slower for <2 x i64> comparison ops than other vector types, we should account for this like we do for SLM <2 x i64> add/sub/mul costs.
This should remove some of the SLM codegen diffs in D43582
llvm-svn: 372954
Summary:
If a block has all incoming values with the same MemoryAccess (ignoring
incoming values from unreachable blocks), then use that incoming
MemoryAccess and do not create a Phi in the first place.
Revert IDF work-around added in rL372673; it should not be required unless
the Def inserted is the first in its block.
The patch also cleans up a series of tests, added during the many
iterations on insertDef.
The patch also fixes PR43438.
The same issue that occurs in insertDef with "adding phis, hence the IDF of
Phis is needed", can also occur in fixupDefs: the `getPreviousRecursive`
call only adds Phis walking on the predecessor edges, which means there
may be the case of a Phi added walking the CFG "backwards" which
triggers the needs for an additional Phi in successor blocks.
Such Phis are added during fixupDefs only in the presence of unreachable
blocks.
Hence this highlights the need to avoid adding Phis in blocks with
unreachable predecessors in the first place.
Reviewers: george.burgess.iv
Subscribers: Prazek, sanjoy.google, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67995
llvm-svn: 372932
Summary:
MemoryPhis may be needed following a Def insertion inthe IDF of all the
new accesses added (phis + potentially a def). Ensure this also occurs when
only the new MemoryPhis are the defining accesses.
Note: The need for computing IDF here is because of new Phis added with
edges incoming from unreachable code, Phis that had previously been
simplified. The preferred solution is to not reintroduce such Phis.
This patch is the needed fix while working on the preferred solution.
Reviewers: george.burgess.iv
Subscribers: Prazek, sanjoy.google, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67927
llvm-svn: 372673
We are missing costs for a lot of truncation cases, I'm hoping to address all the 'zero cost' cases in trunc.ll
I thought this was a vector widening side effect, but even before this we had some interesting LV decisions (notably over indvars) being made due to these zero costs.
llvm-svn: 372498
The recently announced IBM z15 processor implements the architecture
already supported as "arch13" in LLVM. This patch adds support for
"z15" as an alternate architecture name for arch13.
The patch also uses z15 in a number of places where we used arch13
as long as the official name was not yet announced.
llvm-svn: 372435
At present, `-scalar-evolution-max-iterations` is a `cl::Optional`
option, which means it demands to be passed exactly zero or one times.
Our build system makes it pretty tricky to guarantee this. We often
accidentally pass the flag more than once (but always with the same
value) which results in an error, after which compilation fails:
```
clang (LLVM option parsing): for the -scalar-evolution-max-iterations option: may only occur zero or one times!
```
It seems reasonable to allow -scalar-evolution-max-iterations to be
passed more than once. Quoting the [[ http://llvm.org/docs/CommandLine.html#controlling-the-number-of-occurrences-required-and-allowed | documentation ]]:
> The cl::ZeroOrMore modifier ... indicates that your program will allow the option to be specified zero or more times.
> ...
> If an option is specified multiple times for an option of the cl::opt class, only the last value will be retained.
Original patch by: Enrico Bern Hardy Tanuwidjaja <etanuwid@fb.com>
Differential Revision: https://reviews.llvm.org/D67512
llvm-svn: 372346
Summary:
This is the first patch in a series of patches that will implement data dependence graph in LLVM. Many of the ideas used in this implementation are based on the following paper:
D. J. Kuck, R. H. Kuhn, D. A. Padua, B. Leasure, and M. Wolfe (1981). DEPENDENCE GRAPHS AND COMPILER OPTIMIZATIONS.
This patch contains support for a basic DDGs containing only atomic nodes (one node for each instruction). The edges are two fold: def-use edges and memory-dependence edges.
The implementation takes a list of basic-blocks and only considers dependencies among instructions in those basic blocks. Any dependencies coming into or going out of instructions that do not belong to those basic blocks are ignored.
The algorithm for building the graph involves the following steps in order:
1. For each instruction in the range of basic blocks to consider, create an atomic node in the resulting graph.
2. For each node in the graph establish def-use edges to/from other nodes in the graph.
3. For each pair of nodes containing memory instruction(s) create memory edges between them. This part of the algorithm goes through the instructions in lexicographical order and creates edges in reverse order if the sink of the dependence occurs before the source of it.
Authored By: bmahjour
Reviewer: Meinersbur, fhahn, myhsu, xtian, dmgreen, kbarton, jdoerfert
Reviewed By: Meinersbur, fhahn, myhsu
Subscribers: ychen, arphaman, simoll, a.elovikov, mgorny, hiraditya, jfb, wuzish, llvm-commits, jsji, Whitney, etiotto
Tag: #llvm
Differential Revision: https://reviews.llvm.org/D65350
llvm-svn: 372238
Summary:
This fixes B42473 and B42706.
This patch makes the SDA propagate branch divergence until the end of the RPO traversal. Before, the SyncDependenceAnalysis propagated divergence only until the IPD in rpo order. RPO is incompatible with post dominance in the presence of loops. This made the SDA crash because blocks were missed in the propagation.
Reviewers: foad, nhaehnle
Reviewed By: foad
Subscribers: jvesely, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65274
llvm-svn: 372223
Summary:
This is the first patch in a series of patches that will implement data dependence graph in LLVM. Many of the ideas used in this implementation are based on the following paper:
D. J. Kuck, R. H. Kuhn, D. A. Padua, B. Leasure, and M. Wolfe (1981). DEPENDENCE GRAPHS AND COMPILER OPTIMIZATIONS.
This patch contains support for a basic DDGs containing only atomic nodes (one node for each instruction). The edges are two fold: def-use edges and memory-dependence edges.
The implementation takes a list of basic-blocks and only considers dependencies among instructions in those basic blocks. Any dependencies coming into or going out of instructions that do not belong to those basic blocks are ignored.
The algorithm for building the graph involves the following steps in order:
1. For each instruction in the range of basic blocks to consider, create an atomic node in the resulting graph.
2. For each node in the graph establish def-use edges to/from other nodes in the graph.
3. For each pair of nodes containing memory instruction(s) create memory edges between them. This part of the algorithm goes through the instructions in lexicographical order and creates edges in reverse order if the sink of the dependence occurs before the source of it.
Authored By: bmahjour
Reviewer: Meinersbur, fhahn, myhsu, xtian, dmgreen, kbarton, jdoerfert
Reviewed By: Meinersbur, fhahn, myhsu
Subscribers: ychen, arphaman, simoll, a.elovikov, mgorny, hiraditya, jfb, wuzish, llvm-commits, jsji, Whitney, etiotto
Tag: #llvm
Differential Revision: https://reviews.llvm.org/D65350
llvm-svn: 372162
Summary:
When inserting a Def, the current algorithm is walking edges backward
and inserting new Phis where needed. There may be additional Phis needed
in the IDF of the newly inserted Def and Phis.
Adding Phis in the IDF of the Def was added ina previous patch, but we
may also need other Phis in the IDF of the newly added Phis.
Reviewers: george.burgess.iv
Subscribers: Prazek, sanjoy.google, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67637
llvm-svn: 372138
Summary:
Regularly when moving an instruction that may not read or write memory,
the instruction is not modelled in MSSA, so not action is necessary.
For a non-conventional AA pipeline, MSSA needs to explicitly check when
creating accesses, so as to not model instructions that may not read and
write memory.
Reviewers: george.burgess.iv
Subscribers: Prazek, sanjoy.google, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67562
llvm-svn: 372137
We were failing to compute trip counts (both exact and maximum) for any loop which involved a comparison against either an umin or smin. It looks like this simply got missed when we added smin/umin to SCEV. (Note: umin was submitted separately earlier today. Turned out two folks hit this at the same time.)
Differential Revision: https://reviews.llvm.org/D67514
llvm-svn: 371776
Expanding the folding of `nearbyint()`, `rint()` and `trunc()` to library
functions, in addition to the current support for intrinsics.
Differential revision: https://reviews.llvm.org/D67468
llvm-svn: 371774
This patch adds support for SCEVUMinExpr to getRangeRef,
similar to the support for SCEVUMaxExpr.
Reviewers: sanjoy.google, efriedma, reames, nikic
Reviewed By: sanjoy.google
Differential Revision: https://reviews.llvm.org/D67177
llvm-svn: 371768
Summary:
Do not model debuginfo intrinsics in MemorySSA.
Regularly these are non-memory modifying instructions. With -disable-basicaa, they were being modelled as Defs.
Reviewers: george.burgess.iv
Subscribers: aprantl, Prazek, sanjoy.google, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67307
llvm-svn: 371565
Since NaN is very rare in normal programs, so the probability for floating point unordered comparison should be extremely small. Current probability is 3/8, it is too large, this patch changes it to a tiny number.
Differential Revision: https://reviews.llvm.org/D65303
llvm-svn: 371541
Summary:
We already use the fact that an object with known size X does not alias
another objection of size Y > X before. With this commit, we use
dereferenceability information to determine a lower bound for Y and not
only rely on the user provided query size.
The result for @global_and_deref_arg_2() and @local_and_deref_ret_2()
in test/Analysis/BasicAA/dereferenceable.ll improved with this patch.
Reviewers: asbirlea, chandlerc, hfinkel, sanjoy
Subscribers: hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66157
llvm-svn: 369786
Given an instruction I, the MustBeExecutedContextExplorer allows to
easily traverse instructions that are guaranteed to be executed whenever
I is. For now, these instruction have to be statically "after" I, in
the same or different basic blocks.
This patch also adds a pass which prints the must-be-executed-context
for each instruction in a module. It is used to test the
MustBeExecutedContextExplorer, for now on the examples given in the
class comment of the MustBeExecutedIterator.
Differential Revision: https://reviews.llvm.org/D65186
llvm-svn: 369765
I don't really understand the costs we're using for fp_to_sint,
but prior to widening legalization we used 20 as the cost for this
via the v2i64->v2f64 entry. That number seems better than the 40
we got with widening legalization. So now we need either a
v2i32->v2f64 entry or a v4i32->v2f64 entry depending on whether
AVX is enabled or not since we skip the first SSE2 table look up
under AVX.
llvm-svn: 369628
Summary:
Add a flag to the FunctionToLoopAdaptor that allows enabling MemorySSA only for the loop pass managers that are known to preserve it.
If an LPM is known to have only loop transforms that *all* preserve MemorySSA, then use MemorySSA if `EnableMSSALoopDependency` is set.
If an LPM has loop passes that do not preserve MemorySSA, then the flag passed is `false`, regardless of the value of `EnableMSSALoopDependency`.
When using a custom loop pass pipeline via `passes=...`, use keyword `loop` vs `loop-mssa` to use MemorySSA in that LPM. If a loop that does not preserve MemorySSA is added while using the `loop-mssa` keyword, that's an error.
Add the new `loop-mssa` keyword to a few tests where a difference occurs when enabling MemorySSA.
Reviewers: chandlerc
Subscribers: mehdi_amini, Prazek, george.burgess.iv, sanjoy.google, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66376
llvm-svn: 369548
Summary:
When inserting a new Def, and inserting Phis in the IDF when needed,
also mark the already existing Phis in the IDF as non-optimized, since
these may need fixing as well.
In the test attached, there is a Phi in the IDF that happens to be
trivial, and is wrongfully removed by the call to getLastDef that
follows. This is a valid situation and the existing IDF Phis need to
marked as "may need fixing" as well.
Resolves PR43044.
Reviewers: george.burgess.iv
Subscribers: Prazek, sanjoy.google, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66495
llvm-svn: 369464