llvm-project/clang/unittests/Analysis/CloneDetectionTest.cpp

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

112 lines
4.0 KiB
C++
Raw Normal View History

//===- unittests/Analysis/CloneDetectionTest.cpp - Clone detection tests --===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//
#include "clang/AST/RecursiveASTVisitor.h"
#include "clang/Analysis/CloneDetection.h"
#include "clang/Tooling/Tooling.h"
#include "gtest/gtest.h"
namespace clang {
namespace analysis {
namespace {
class CloneDetectionVisitor
: public RecursiveASTVisitor<CloneDetectionVisitor> {
CloneDetector &Detector;
public:
explicit CloneDetectionVisitor(CloneDetector &D) : Detector(D) {}
bool VisitFunctionDecl(FunctionDecl *D) {
Detector.analyzeCodeBody(D);
return true;
}
};
/// Example constraint for testing purposes.
/// Filters out all statements that are in a function which name starts with
/// "bar".
class NoBarFunctionConstraint {
public:
void constrain(std::vector<CloneDetector::CloneGroup> &CloneGroups) {
CloneConstraint::splitCloneGroups(
CloneGroups, [](const StmtSequence &A, const StmtSequence &B) {
// Check if one of the sequences is in a function which name starts
// with "bar".
for (const StmtSequence &Arg : {A, B}) {
if (const auto *D =
dyn_cast<const FunctionDecl>(Arg.getContainingDecl())) {
if (D->getName().startswith("bar"))
return false;
}
}
return true;
});
}
};
TEST(CloneDetector, FilterFunctionsByName) {
auto ASTUnit =
clang::tooling::buildASTFromCode("void foo1(int &a1) { a1++; }\n"
"void foo2(int &a2) { a2++; }\n"
"void bar1(int &a3) { a3++; }\n"
"void bar2(int &a4) { a4++; }\n");
auto TU = ASTUnit->getASTContext().getTranslationUnitDecl();
CloneDetector Detector;
// Push all the function bodies into the detector.
CloneDetectionVisitor Visitor(Detector);
Visitor.TraverseTranslationUnitDecl(TU);
// Find clones with the usual settings, but but we want to filter out
// all statements from functions which names start with "bar".
std::vector<CloneDetector::CloneGroup> CloneGroups;
Detector.findClones(CloneGroups, NoBarFunctionConstraint(),
[analyzer] Performance optimizations for the CloneChecker Summary: This patch aims at optimizing the CloneChecker for larger programs. Before this patch we took around 102 seconds to analyze sqlite3 with a complexity value of 50. After this patch we now take 2.1 seconds to analyze sqlite3. The biggest performance optimization is that we now put the constraint for group size before the constraint for the complexity. The group size constraint is much faster in comparison to the complexity constraint as it only does a simple integer comparison. The complexity constraint on the other hand actually traverses each Stmt and even checks the macro stack, so it is obviously not able to handle larger amounts of incoming clones. The new order filters out all the single-clone groups that the type II constraint generates in a faster way before passing the fewer remaining clones to the complexity constraint. This reduced runtime by around 95%. The other change is that we also delay the verification part of the type II clones back in the chain of constraints. This required to split up the constraint into two parts - a verification and a hash constraint (which is also making it more similar to the original design of the clone detection algorithm). The reasoning for this is the same as before: The verification constraint has to traverse many statements and shouldn't be at the start of the constraint chain. However, as the type II hashing has to be the first step in our algorithm, we have no other choice but split this constrain into two different ones. Now our group size and complexity constrains filter out a chunk of the clones before they reach the slow verification step, which reduces the runtime by around 8%. I also kept the full type II constraint around - that now just calls it's two sub-constraints - in case someone doesn't care about the performance benefits of doing this. Reviewers: NoQ Reviewed By: NoQ Subscribers: klimek, v.g.vassilev, xazax.hun, cfe-commits Differential Revision: https://reviews.llvm.org/D34182 llvm-svn: 312222
2017-08-31 15:10:46 +08:00
RecursiveCloneTypeIIHashConstraint(),
MinComplexityConstraint(2), MinGroupSizeConstraint(2),
[analyzer] Performance optimizations for the CloneChecker Summary: This patch aims at optimizing the CloneChecker for larger programs. Before this patch we took around 102 seconds to analyze sqlite3 with a complexity value of 50. After this patch we now take 2.1 seconds to analyze sqlite3. The biggest performance optimization is that we now put the constraint for group size before the constraint for the complexity. The group size constraint is much faster in comparison to the complexity constraint as it only does a simple integer comparison. The complexity constraint on the other hand actually traverses each Stmt and even checks the macro stack, so it is obviously not able to handle larger amounts of incoming clones. The new order filters out all the single-clone groups that the type II constraint generates in a faster way before passing the fewer remaining clones to the complexity constraint. This reduced runtime by around 95%. The other change is that we also delay the verification part of the type II clones back in the chain of constraints. This required to split up the constraint into two parts - a verification and a hash constraint (which is also making it more similar to the original design of the clone detection algorithm). The reasoning for this is the same as before: The verification constraint has to traverse many statements and shouldn't be at the start of the constraint chain. However, as the type II hashing has to be the first step in our algorithm, we have no other choice but split this constrain into two different ones. Now our group size and complexity constrains filter out a chunk of the clones before they reach the slow verification step, which reduces the runtime by around 8%. I also kept the full type II constraint around - that now just calls it's two sub-constraints - in case someone doesn't care about the performance benefits of doing this. Reviewers: NoQ Reviewed By: NoQ Subscribers: klimek, v.g.vassilev, xazax.hun, cfe-commits Differential Revision: https://reviews.llvm.org/D34182 llvm-svn: 312222
2017-08-31 15:10:46 +08:00
RecursiveCloneTypeIIVerifyConstraint(),
OnlyLargestCloneConstraint());
ASSERT_EQ(CloneGroups.size(), 1u);
ASSERT_EQ(CloneGroups.front().size(), 2u);
for (auto &Clone : CloneGroups.front()) {
const auto ND = dyn_cast<const FunctionDecl>(Clone.getContainingDecl());
ASSERT_TRUE(ND != nullptr);
// Check that no function name starting with "bar" is in the results...
ASSERT_TRUE(ND->getNameAsString().find("bar") != 0);
}
// Retry above's example without the filter...
CloneGroups.clear();
[analyzer] Performance optimizations for the CloneChecker Summary: This patch aims at optimizing the CloneChecker for larger programs. Before this patch we took around 102 seconds to analyze sqlite3 with a complexity value of 50. After this patch we now take 2.1 seconds to analyze sqlite3. The biggest performance optimization is that we now put the constraint for group size before the constraint for the complexity. The group size constraint is much faster in comparison to the complexity constraint as it only does a simple integer comparison. The complexity constraint on the other hand actually traverses each Stmt and even checks the macro stack, so it is obviously not able to handle larger amounts of incoming clones. The new order filters out all the single-clone groups that the type II constraint generates in a faster way before passing the fewer remaining clones to the complexity constraint. This reduced runtime by around 95%. The other change is that we also delay the verification part of the type II clones back in the chain of constraints. This required to split up the constraint into two parts - a verification and a hash constraint (which is also making it more similar to the original design of the clone detection algorithm). The reasoning for this is the same as before: The verification constraint has to traverse many statements and shouldn't be at the start of the constraint chain. However, as the type II hashing has to be the first step in our algorithm, we have no other choice but split this constrain into two different ones. Now our group size and complexity constrains filter out a chunk of the clones before they reach the slow verification step, which reduces the runtime by around 8%. I also kept the full type II constraint around - that now just calls it's two sub-constraints - in case someone doesn't care about the performance benefits of doing this. Reviewers: NoQ Reviewed By: NoQ Subscribers: klimek, v.g.vassilev, xazax.hun, cfe-commits Differential Revision: https://reviews.llvm.org/D34182 llvm-svn: 312222
2017-08-31 15:10:46 +08:00
Detector.findClones(CloneGroups, RecursiveCloneTypeIIHashConstraint(),
MinComplexityConstraint(2), MinGroupSizeConstraint(2),
[analyzer] Performance optimizations for the CloneChecker Summary: This patch aims at optimizing the CloneChecker for larger programs. Before this patch we took around 102 seconds to analyze sqlite3 with a complexity value of 50. After this patch we now take 2.1 seconds to analyze sqlite3. The biggest performance optimization is that we now put the constraint for group size before the constraint for the complexity. The group size constraint is much faster in comparison to the complexity constraint as it only does a simple integer comparison. The complexity constraint on the other hand actually traverses each Stmt and even checks the macro stack, so it is obviously not able to handle larger amounts of incoming clones. The new order filters out all the single-clone groups that the type II constraint generates in a faster way before passing the fewer remaining clones to the complexity constraint. This reduced runtime by around 95%. The other change is that we also delay the verification part of the type II clones back in the chain of constraints. This required to split up the constraint into two parts - a verification and a hash constraint (which is also making it more similar to the original design of the clone detection algorithm). The reasoning for this is the same as before: The verification constraint has to traverse many statements and shouldn't be at the start of the constraint chain. However, as the type II hashing has to be the first step in our algorithm, we have no other choice but split this constrain into two different ones. Now our group size and complexity constrains filter out a chunk of the clones before they reach the slow verification step, which reduces the runtime by around 8%. I also kept the full type II constraint around - that now just calls it's two sub-constraints - in case someone doesn't care about the performance benefits of doing this. Reviewers: NoQ Reviewed By: NoQ Subscribers: klimek, v.g.vassilev, xazax.hun, cfe-commits Differential Revision: https://reviews.llvm.org/D34182 llvm-svn: 312222
2017-08-31 15:10:46 +08:00
RecursiveCloneTypeIIVerifyConstraint(),
OnlyLargestCloneConstraint());
ASSERT_EQ(CloneGroups.size(), 1u);
ASSERT_EQ(CloneGroups.front().size(), 4u);
// Count how many functions with the bar prefix we have in the results.
int FoundFunctionsWithBarPrefix = 0;
for (auto &Clone : CloneGroups.front()) {
const auto ND = dyn_cast<const FunctionDecl>(Clone.getContainingDecl());
ASSERT_TRUE(ND != nullptr);
// This time check that we picked up the bar functions from above
if (ND->getNameAsString().find("bar") == 0) {
FoundFunctionsWithBarPrefix++;
}
}
// We should have found the two functions bar1 and bar2.
ASSERT_EQ(FoundFunctionsWithBarPrefix, 2);
}
} // namespace
} // namespace analysis
} // namespace clang