[SLP]Do not count perfect diamond matches for gathers several times.

Need to remove the old code for avoiding double counting of the gather
nodes with perfect diamond matches within the tree after we started
detecting perfect/shuffled matching in the previous patch D100495. We
may skip the cost for such nodes completely.

Differential Revision: https://reviews.llvm.org/D102023
This commit is contained in:
Alexey Bataev 2021-05-06 13:44:03 -07:00
parent 4677d795b2
commit 30463bc3f1
2 changed files with 1 additions and 22 deletions

View File

@ -4233,27 +4233,6 @@ InstructionCost BoUpSLP::getTreeCost() {
for (unsigned I = 0, E = VectorizableTree.size(); I < E; ++I) {
TreeEntry &TE = *VectorizableTree[I].get();
// We create duplicate tree entries for gather sequences that have multiple
// uses. However, we should not compute the cost of duplicate sequences.
// For example, if we have a build vector (i.e., insertelement sequence)
// that is used by more than one vector instruction, we only need to
// compute the cost of the insertelement instructions once. The redundant
// instructions will be eliminated by CSE.
//
// We should consider not creating duplicate tree entries for gather
// sequences, and instead add additional edges to the tree representing
// their uses. Since such an approach results in fewer total entries,
// existing heuristics based on tree size may yield different results.
//
if (TE.State == TreeEntry::NeedToGather &&
std::any_of(std::next(VectorizableTree.begin(), I + 1),
VectorizableTree.end(),
[TE](const std::unique_ptr<TreeEntry> &EntryPtr) {
return EntryPtr->State == TreeEntry::NeedToGather &&
EntryPtr->isSame(TE.Scalars);
}))
continue;
InstructionCost C = getEntryCost(&TE);
Cost += C;
LLVM_DEBUG(dbgs() << "SLP: Adding cost " << C

View File

@ -10,7 +10,7 @@ target triple = "aarch64--linux-gnu"
; REMARK-LABEL: Function: gather_multiple_use
; REMARK: Args:
; REMARK-NEXT: - String: 'Vectorized horizontal reduction with cost '
; REMARK-NEXT: - Cost: '-16'
; REMARK-NEXT: - Cost: '-7'
;
; REMARK-NOT: Function: gather_load