2014-11-12 06:14:37 +08:00
|
|
|
//===-- SanitizerCoverage.cpp - coverage instrumentation for sanitizers ---===//
|
|
|
|
//
|
2019-01-19 16:50:56 +08:00
|
|
|
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
|
|
|
|
// See https://llvm.org/LICENSE.txt for license information.
|
|
|
|
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
|
2014-11-12 06:14:37 +08:00
|
|
|
//
|
|
|
|
//===----------------------------------------------------------------------===//
|
|
|
|
//
|
2017-06-01 02:27:33 +08:00
|
|
|
// Coverage instrumentation done on LLVM IR level, works with Sanitizers.
|
2014-11-12 06:14:37 +08:00
|
|
|
//
|
|
|
|
//===----------------------------------------------------------------------===//
|
|
|
|
|
2019-07-26 04:53:15 +08:00
|
|
|
#include "llvm/Transforms/Instrumentation/SanitizerCoverage.h"
|
2014-11-12 06:14:37 +08:00
|
|
|
#include "llvm/ADT/ArrayRef.h"
|
|
|
|
#include "llvm/ADT/SmallVector.h"
|
2022-02-02 20:53:56 +08:00
|
|
|
#include "llvm/ADT/Triple.h"
|
2015-12-03 07:06:39 +08:00
|
|
|
#include "llvm/Analysis/EHPersonalities.h"
|
2017-05-24 08:29:12 +08:00
|
|
|
#include "llvm/Analysis/PostDominators.h"
|
2017-08-19 02:43:30 +08:00
|
|
|
#include "llvm/IR/Constant.h"
|
2014-11-12 06:14:37 +08:00
|
|
|
#include "llvm/IR/DataLayout.h"
|
2016-02-26 09:17:22 +08:00
|
|
|
#include "llvm/IR/Dominators.h"
|
2014-11-12 06:14:37 +08:00
|
|
|
#include "llvm/IR/Function.h"
|
2017-08-19 02:43:30 +08:00
|
|
|
#include "llvm/IR/GlobalVariable.h"
|
2014-11-12 06:14:37 +08:00
|
|
|
#include "llvm/IR/IRBuilder.h"
|
2017-08-31 06:49:31 +08:00
|
|
|
#include "llvm/IR/IntrinsicInst.h"
|
2017-08-19 02:43:30 +08:00
|
|
|
#include "llvm/IR/Intrinsics.h"
|
2014-11-12 06:14:37 +08:00
|
|
|
#include "llvm/IR/LLVMContext.h"
|
|
|
|
#include "llvm/IR/Module.h"
|
|
|
|
#include "llvm/IR/Type.h"
|
Sink all InitializePasses.h includes
This file lists every pass in LLVM, and is included by Pass.h, which is
very popular. Every time we add, remove, or rename a pass in LLVM, it
caused lots of recompilation.
I found this fact by looking at this table, which is sorted by the
number of times a file was changed over the last 100,000 git commits
multiplied by the number of object files that depend on it in the
current checkout:
recompiles touches affected_files header
342380 95 3604 llvm/include/llvm/ADT/STLExtras.h
314730 234 1345 llvm/include/llvm/InitializePasses.h
307036 118 2602 llvm/include/llvm/ADT/APInt.h
213049 59 3611 llvm/include/llvm/Support/MathExtras.h
170422 47 3626 llvm/include/llvm/Support/Compiler.h
162225 45 3605 llvm/include/llvm/ADT/Optional.h
158319 63 2513 llvm/include/llvm/ADT/Triple.h
140322 39 3598 llvm/include/llvm/ADT/StringRef.h
137647 59 2333 llvm/include/llvm/Support/Error.h
131619 73 1803 llvm/include/llvm/Support/FileSystem.h
Before this change, touching InitializePasses.h would cause 1345 files
to recompile. After this change, touching it only causes 550 compiles in
an incremental rebuild.
Reviewers: bkramer, asbirlea, bollu, jdoerfert
Differential Revision: https://reviews.llvm.org/D70211
2019-11-14 05:15:01 +08:00
|
|
|
#include "llvm/InitializePasses.h"
|
2014-11-12 06:14:37 +08:00
|
|
|
#include "llvm/Support/CommandLine.h"
|
Implement `-fsanitize-coverage-whitelist` and `-fsanitize-coverage-blacklist` for clang
Summary:
This commit adds two command-line options to clang.
These options let the user decide which functions will receive SanitizerCoverage instrumentation.
This is most useful in the libFuzzer use case, where it enables targeted coverage-guided fuzzing.
Patch by Yannis Juglaret of DGA-MI, Rennes, France
libFuzzer tests its target against an evolving corpus, and relies on SanitizerCoverage instrumentation to collect the code coverage information that drives corpus evolution. Currently, libFuzzer collects such information for all functions of the target under test, and adds to the corpus every mutated sample that finds a new code coverage path in any function of the target. We propose instead to let the user specify which functions' code coverage information is relevant for building the upcoming fuzzing campaign's corpus. To this end, we add two new command line options for clang, enabling targeted coverage-guided fuzzing with libFuzzer. We see targeted coverage guided fuzzing as a simple way to leverage libFuzzer for big targets with thousands of functions or multiple dependencies. We publish this patch as work from DGA-MI of Rennes, France, with proper authorization from the hierarchy.
Targeted coverage-guided fuzzing can accelerate bug finding for two reasons. First, the compiler will avoid costly instrumentation for non-relevant functions, accelerating fuzzer execution for each call to any of these functions. Second, the built fuzzer will produce and use a more accurate corpus, because it will not keep the samples that find new coverage paths in non-relevant functions.
The two new command line options are `-fsanitize-coverage-whitelist` and `-fsanitize-coverage-blacklist`. They accept files in the same format as the existing `-fsanitize-blacklist` option <https://clang.llvm.org/docs/SanitizerSpecialCaseList.html#format>. The new options influence SanitizerCoverage so that it will only instrument a subset of the functions in the target. We explain these options in detail in `clang/docs/SanitizerCoverage.rst`.
Consider now the woff2 fuzzing example from the libFuzzer tutorial <https://github.com/google/fuzzer-test-suite/blob/master/tutorial/libFuzzerTutorial.md>. We are aware that we cannot conclude much from this example because mutating compressed data is generally a bad idea, but let us use it anyway as an illustration for its simplicity. Let us use an empty blacklist together with one of the three following whitelists:
```
# (a)
src:*
fun:*
# (b)
src:SRC/*
fun:*
# (c)
src:SRC/src/woff2_dec.cc
fun:*
```
Running the built fuzzers shows how many instrumentation points the compiler adds, the fuzzer will output //XXX PCs//. Whitelist (a) is the instrument-everything whitelist, it produces 11912 instrumentation points. Whitelist (b) focuses coverage to instrument woff2 source code only, ignoring the dependency code for brotli (de)compression; it produces 3984 instrumented instrumentation points. Whitelist (c) focuses coverage to only instrument functions in the main file that deals with WOFF2 to TTF conversion, resulting in 1056 instrumentation points.
For experimentation purposes, we ran each fuzzer approximately 100 times, single process, with the initial corpus provided in the tutorial. We let the fuzzer run until it either found the heap buffer overflow or went out of memory. On this simple example, whitelists (b) and (c) found the heap buffer overflow more reliably and 5x faster than whitelist (a). The average execution times when finding the heap buffer overflow were as follows: (a) 904 s, (b) 156 s, and (c) 176 s.
We explain these results by the fact that WOFF2 to TTF conversion calls the brotli decompression algorithm's functions, which are mostly irrelevant for finding bugs in WOFF2 font reconstruction but nevertheless instrumented and used by whitelist (a) to guide fuzzing. This results in longer execution time for these functions and a partially irrelevant corpus. Contrary to whitelist (a), whitelists (b) and (c) will execute brotli-related functions without instrumentation overhead, and ignore new code paths found in them. This results in faster bug finding for WOFF2 font reconstruction.
The results for whitelist (b) are similar to the ones for whitelist (c). Indeed, WOFF2 to TTF conversion calls functions that are mostly located in SRC/src/woff2_dec.cc. The 2892 extra instrumentation points allowed by whitelist (b) do not tamper with bug finding, even though they are mostly irrelevant, simply because most of these functions do not get called. We get a slightly faster average time for bug finding with whitelist (b), which might indicate that some of the extra instrumentation points are actually relevant, or might just be random noise.
Reviewers: kcc, morehouse, vitalybuka
Reviewed By: morehouse, vitalybuka
Subscribers: pratyai, vitalybuka, eternalsakura, xwlin222, dende, srhines, kubamracek, #sanitizers, lebedev.ri, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #sanitizers, #llvm
Differential Revision: https://reviews.llvm.org/D63616
2020-04-11 01:42:41 +08:00
|
|
|
#include "llvm/Support/SpecialCaseList.h"
|
|
|
|
#include "llvm/Support/VirtualFileSystem.h"
|
2016-02-26 09:17:22 +08:00
|
|
|
#include "llvm/Transforms/Instrumentation.h"
|
2014-11-12 06:14:37 +08:00
|
|
|
#include "llvm/Transforms/Utils/BasicBlockUtils.h"
|
|
|
|
#include "llvm/Transforms/Utils/ModuleUtils.h"
|
|
|
|
|
|
|
|
using namespace llvm;
|
|
|
|
|
|
|
|
#define DEBUG_TYPE "sancov"
|
|
|
|
|
2020-12-02 02:33:18 +08:00
|
|
|
const char SanCovTracePCIndirName[] = "__sanitizer_cov_trace_pc_indir";
|
|
|
|
const char SanCovTracePCName[] = "__sanitizer_cov_trace_pc";
|
|
|
|
const char SanCovTraceCmp1[] = "__sanitizer_cov_trace_cmp1";
|
|
|
|
const char SanCovTraceCmp2[] = "__sanitizer_cov_trace_cmp2";
|
|
|
|
const char SanCovTraceCmp4[] = "__sanitizer_cov_trace_cmp4";
|
|
|
|
const char SanCovTraceCmp8[] = "__sanitizer_cov_trace_cmp8";
|
|
|
|
const char SanCovTraceConstCmp1[] = "__sanitizer_cov_trace_const_cmp1";
|
|
|
|
const char SanCovTraceConstCmp2[] = "__sanitizer_cov_trace_const_cmp2";
|
|
|
|
const char SanCovTraceConstCmp4[] = "__sanitizer_cov_trace_const_cmp4";
|
|
|
|
const char SanCovTraceConstCmp8[] = "__sanitizer_cov_trace_const_cmp8";
|
2021-11-09 09:52:36 +08:00
|
|
|
const char SanCovLoad1[] = "__sanitizer_cov_load1";
|
|
|
|
const char SanCovLoad2[] = "__sanitizer_cov_load2";
|
|
|
|
const char SanCovLoad4[] = "__sanitizer_cov_load4";
|
|
|
|
const char SanCovLoad8[] = "__sanitizer_cov_load8";
|
|
|
|
const char SanCovLoad16[] = "__sanitizer_cov_load16";
|
|
|
|
const char SanCovStore1[] = "__sanitizer_cov_store1";
|
|
|
|
const char SanCovStore2[] = "__sanitizer_cov_store2";
|
|
|
|
const char SanCovStore4[] = "__sanitizer_cov_store4";
|
|
|
|
const char SanCovStore8[] = "__sanitizer_cov_store8";
|
|
|
|
const char SanCovStore16[] = "__sanitizer_cov_store16";
|
2020-12-02 02:33:18 +08:00
|
|
|
const char SanCovTraceDiv4[] = "__sanitizer_cov_trace_div4";
|
|
|
|
const char SanCovTraceDiv8[] = "__sanitizer_cov_trace_div8";
|
|
|
|
const char SanCovTraceGep[] = "__sanitizer_cov_trace_gep";
|
|
|
|
const char SanCovTraceSwitchName[] = "__sanitizer_cov_trace_switch";
|
|
|
|
const char SanCovModuleCtorTracePcGuardName[] =
|
2019-05-07 09:39:37 +08:00
|
|
|
"sancov.module_ctor_trace_pc_guard";
|
2020-12-02 02:33:18 +08:00
|
|
|
const char SanCovModuleCtor8bitCountersName[] =
|
2019-05-07 09:39:37 +08:00
|
|
|
"sancov.module_ctor_8bit_counters";
|
2020-12-02 02:33:18 +08:00
|
|
|
const char SanCovModuleCtorBoolFlagName[] = "sancov.module_ctor_bool_flag";
|
2016-03-19 07:29:29 +08:00
|
|
|
static const uint64_t SanCtorAndDtorPriority = 2;
|
|
|
|
|
2020-12-02 02:33:18 +08:00
|
|
|
const char SanCovTracePCGuardName[] = "__sanitizer_cov_trace_pc_guard";
|
|
|
|
const char SanCovTracePCGuardInitName[] = "__sanitizer_cov_trace_pc_guard_init";
|
|
|
|
const char SanCov8bitCountersInitName[] = "__sanitizer_cov_8bit_counters_init";
|
|
|
|
const char SanCovBoolFlagInitName[] = "__sanitizer_cov_bool_flag_init";
|
|
|
|
const char SanCovPCsInitName[] = "__sanitizer_cov_pcs_init";
|
2016-09-14 09:39:35 +08:00
|
|
|
|
2020-12-02 02:33:18 +08:00
|
|
|
const char SanCovGuardsSectionName[] = "sancov_guards";
|
|
|
|
const char SanCovCountersSectionName[] = "sancov_cntrs";
|
|
|
|
const char SanCovBoolFlagSectionName[] = "sancov_bools";
|
|
|
|
const char SanCovPCsSectionName[] = "sancov_pcs";
|
2017-06-03 07:13:44 +08:00
|
|
|
|
2020-12-02 02:33:18 +08:00
|
|
|
const char SanCovLowestStackName[] = "__sancov_lowest_stack";
|
2017-08-19 02:43:30 +08:00
|
|
|
|
2016-03-19 07:29:29 +08:00
|
|
|
static cl::opt<int> ClCoverageLevel(
|
|
|
|
"sanitizer-coverage-level",
|
|
|
|
cl::desc("Sanitizer Coverage. 0: none, 1: entry block, 2: all blocks, "
|
2017-04-20 06:42:11 +08:00
|
|
|
"3: all blocks and critical edges"),
|
2016-03-19 07:29:29 +08:00
|
|
|
cl::Hidden, cl::init(0));
|
2014-11-12 06:14:37 +08:00
|
|
|
|
2017-06-09 06:58:19 +08:00
|
|
|
static cl::opt<bool> ClTracePC("sanitizer-coverage-trace-pc",
|
|
|
|
cl::desc("Experimental pc tracing"), cl::Hidden,
|
|
|
|
cl::init(false));
|
2016-02-18 05:34:43 +08:00
|
|
|
|
2016-09-14 09:39:35 +08:00
|
|
|
static cl::opt<bool> ClTracePCGuard("sanitizer-coverage-trace-pc-guard",
|
|
|
|
cl::desc("pc tracing with a guard"),
|
|
|
|
cl::Hidden, cl::init(false));
|
|
|
|
|
2017-07-28 07:36:49 +08:00
|
|
|
// If true, we create a global variable that contains PCs of all instrumented
|
|
|
|
// BBs, put this global into a named section, and pass this section's bounds
|
|
|
|
// to __sanitizer_cov_pcs_init.
|
|
|
|
// This way the coverage instrumentation does not need to acquire the PCs
|
2020-04-09 13:02:41 +08:00
|
|
|
// at run-time. Works with trace-pc-guard, inline-8bit-counters, and
|
|
|
|
// inline-bool-flag.
|
2017-07-28 08:09:29 +08:00
|
|
|
static cl::opt<bool> ClCreatePCTable("sanitizer-coverage-pc-table",
|
2017-07-28 07:36:49 +08:00
|
|
|
cl::desc("create a static PC table"),
|
|
|
|
cl::Hidden, cl::init(false));
|
|
|
|
|
|
|
|
static cl::opt<bool>
|
|
|
|
ClInline8bitCounters("sanitizer-coverage-inline-8bit-counters",
|
|
|
|
cl::desc("increments 8-bit counter for every edge"),
|
|
|
|
cl::Hidden, cl::init(false));
|
2017-06-09 06:58:19 +08:00
|
|
|
|
2020-04-09 13:02:41 +08:00
|
|
|
static cl::opt<bool>
|
|
|
|
ClInlineBoolFlag("sanitizer-coverage-inline-bool-flag",
|
|
|
|
cl::desc("sets a boolean flag for every edge"), cl::Hidden,
|
|
|
|
cl::init(false));
|
|
|
|
|
2015-03-21 09:29:36 +08:00
|
|
|
static cl::opt<bool>
|
2016-08-30 09:12:10 +08:00
|
|
|
ClCMPTracing("sanitizer-coverage-trace-compares",
|
|
|
|
cl::desc("Tracing of CMP and similar instructions"),
|
|
|
|
cl::Hidden, cl::init(false));
|
|
|
|
|
|
|
|
static cl::opt<bool> ClDIVTracing("sanitizer-coverage-trace-divs",
|
|
|
|
cl::desc("Tracing of DIV instructions"),
|
|
|
|
cl::Hidden, cl::init(false));
|
|
|
|
|
2021-11-09 09:52:36 +08:00
|
|
|
static cl::opt<bool> ClLoadTracing("sanitizer-coverage-trace-loads",
|
|
|
|
cl::desc("Tracing of load instructions"),
|
|
|
|
cl::Hidden, cl::init(false));
|
|
|
|
|
|
|
|
static cl::opt<bool> ClStoreTracing("sanitizer-coverage-trace-stores",
|
|
|
|
cl::desc("Tracing of store instructions"),
|
|
|
|
cl::Hidden, cl::init(false));
|
|
|
|
|
2016-08-30 09:12:10 +08:00
|
|
|
static cl::opt<bool> ClGEPTracing("sanitizer-coverage-trace-geps",
|
|
|
|
cl::desc("Tracing of GEP instructions"),
|
|
|
|
cl::Hidden, cl::init(false));
|
2015-03-21 09:29:36 +08:00
|
|
|
|
2016-04-07 07:24:37 +08:00
|
|
|
static cl::opt<bool>
|
|
|
|
ClPruneBlocks("sanitizer-coverage-prune-blocks",
|
|
|
|
cl::desc("Reduce the number of instrumented blocks"),
|
|
|
|
cl::Hidden, cl::init(true));
|
2016-02-26 09:17:22 +08:00
|
|
|
|
2017-08-19 02:43:30 +08:00
|
|
|
static cl::opt<bool> ClStackDepth("sanitizer-coverage-stack-depth",
|
|
|
|
cl::desc("max stack depth tracing"),
|
|
|
|
cl::Hidden, cl::init(false));
|
|
|
|
|
2014-11-12 06:14:37 +08:00
|
|
|
namespace {
|
|
|
|
|
2015-05-07 09:00:31 +08:00
|
|
|
SanitizerCoverageOptions getOptions(int LegacyCoverageLevel) {
|
|
|
|
SanitizerCoverageOptions Res;
|
|
|
|
switch (LegacyCoverageLevel) {
|
|
|
|
case 0:
|
|
|
|
Res.CoverageType = SanitizerCoverageOptions::SCK_None;
|
|
|
|
break;
|
|
|
|
case 1:
|
|
|
|
Res.CoverageType = SanitizerCoverageOptions::SCK_Function;
|
|
|
|
break;
|
|
|
|
case 2:
|
|
|
|
Res.CoverageType = SanitizerCoverageOptions::SCK_BB;
|
|
|
|
break;
|
|
|
|
case 3:
|
|
|
|
Res.CoverageType = SanitizerCoverageOptions::SCK_Edge;
|
|
|
|
break;
|
|
|
|
case 4:
|
|
|
|
Res.CoverageType = SanitizerCoverageOptions::SCK_Edge;
|
|
|
|
Res.IndirectCalls = true;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
return Res;
|
|
|
|
}
|
|
|
|
|
|
|
|
SanitizerCoverageOptions OverrideFromCL(SanitizerCoverageOptions Options) {
|
|
|
|
// Sets CoverageType and IndirectCalls.
|
|
|
|
SanitizerCoverageOptions CLOpts = getOptions(ClCoverageLevel);
|
|
|
|
Options.CoverageType = std::max(Options.CoverageType, CLOpts.CoverageType);
|
|
|
|
Options.IndirectCalls |= CLOpts.IndirectCalls;
|
2016-08-30 09:12:10 +08:00
|
|
|
Options.TraceCmp |= ClCMPTracing;
|
|
|
|
Options.TraceDiv |= ClDIVTracing;
|
|
|
|
Options.TraceGep |= ClGEPTracing;
|
2017-06-09 06:58:19 +08:00
|
|
|
Options.TracePC |= ClTracePC;
|
2016-09-14 09:39:35 +08:00
|
|
|
Options.TracePCGuard |= ClTracePCGuard;
|
2017-06-09 06:58:19 +08:00
|
|
|
Options.Inline8bitCounters |= ClInline8bitCounters;
|
2020-04-09 13:02:41 +08:00
|
|
|
Options.InlineBoolFlag |= ClInlineBoolFlag;
|
2017-07-28 08:09:29 +08:00
|
|
|
Options.PCTable |= ClCreatePCTable;
|
2017-05-06 07:14:40 +08:00
|
|
|
Options.NoPrune |= !ClPruneBlocks;
|
2017-08-19 02:43:30 +08:00
|
|
|
Options.StackDepth |= ClStackDepth;
|
2021-11-09 09:52:36 +08:00
|
|
|
Options.TraceLoads |= ClLoadTracing;
|
|
|
|
Options.TraceStores |= ClStoreTracing;
|
2017-08-19 02:43:30 +08:00
|
|
|
if (!Options.TracePCGuard && !Options.TracePC &&
|
2020-04-09 13:02:41 +08:00
|
|
|
!Options.Inline8bitCounters && !Options.StackDepth &&
|
2021-11-09 09:52:36 +08:00
|
|
|
!Options.InlineBoolFlag && !Options.TraceLoads && !Options.TraceStores)
|
2017-08-19 02:43:30 +08:00
|
|
|
Options.TracePCGuard = true; // TracePCGuard is default.
|
2015-05-07 09:00:31 +08:00
|
|
|
return Options;
|
|
|
|
}
|
|
|
|
|
2019-09-05 04:30:29 +08:00
|
|
|
using DomTreeCallback = function_ref<const DominatorTree *(Function &F)>;
|
|
|
|
using PostDomTreeCallback =
|
|
|
|
function_ref<const PostDominatorTree *(Function &F)>;
|
2019-07-26 04:53:15 +08:00
|
|
|
|
|
|
|
class ModuleSanitizerCoverage {
|
|
|
|
public:
|
2019-09-05 04:30:29 +08:00
|
|
|
ModuleSanitizerCoverage(
|
Implement `-fsanitize-coverage-whitelist` and `-fsanitize-coverage-blacklist` for clang
Summary:
This commit adds two command-line options to clang.
These options let the user decide which functions will receive SanitizerCoverage instrumentation.
This is most useful in the libFuzzer use case, where it enables targeted coverage-guided fuzzing.
Patch by Yannis Juglaret of DGA-MI, Rennes, France
libFuzzer tests its target against an evolving corpus, and relies on SanitizerCoverage instrumentation to collect the code coverage information that drives corpus evolution. Currently, libFuzzer collects such information for all functions of the target under test, and adds to the corpus every mutated sample that finds a new code coverage path in any function of the target. We propose instead to let the user specify which functions' code coverage information is relevant for building the upcoming fuzzing campaign's corpus. To this end, we add two new command line options for clang, enabling targeted coverage-guided fuzzing with libFuzzer. We see targeted coverage guided fuzzing as a simple way to leverage libFuzzer for big targets with thousands of functions or multiple dependencies. We publish this patch as work from DGA-MI of Rennes, France, with proper authorization from the hierarchy.
Targeted coverage-guided fuzzing can accelerate bug finding for two reasons. First, the compiler will avoid costly instrumentation for non-relevant functions, accelerating fuzzer execution for each call to any of these functions. Second, the built fuzzer will produce and use a more accurate corpus, because it will not keep the samples that find new coverage paths in non-relevant functions.
The two new command line options are `-fsanitize-coverage-whitelist` and `-fsanitize-coverage-blacklist`. They accept files in the same format as the existing `-fsanitize-blacklist` option <https://clang.llvm.org/docs/SanitizerSpecialCaseList.html#format>. The new options influence SanitizerCoverage so that it will only instrument a subset of the functions in the target. We explain these options in detail in `clang/docs/SanitizerCoverage.rst`.
Consider now the woff2 fuzzing example from the libFuzzer tutorial <https://github.com/google/fuzzer-test-suite/blob/master/tutorial/libFuzzerTutorial.md>. We are aware that we cannot conclude much from this example because mutating compressed data is generally a bad idea, but let us use it anyway as an illustration for its simplicity. Let us use an empty blacklist together with one of the three following whitelists:
```
# (a)
src:*
fun:*
# (b)
src:SRC/*
fun:*
# (c)
src:SRC/src/woff2_dec.cc
fun:*
```
Running the built fuzzers shows how many instrumentation points the compiler adds, the fuzzer will output //XXX PCs//. Whitelist (a) is the instrument-everything whitelist, it produces 11912 instrumentation points. Whitelist (b) focuses coverage to instrument woff2 source code only, ignoring the dependency code for brotli (de)compression; it produces 3984 instrumented instrumentation points. Whitelist (c) focuses coverage to only instrument functions in the main file that deals with WOFF2 to TTF conversion, resulting in 1056 instrumentation points.
For experimentation purposes, we ran each fuzzer approximately 100 times, single process, with the initial corpus provided in the tutorial. We let the fuzzer run until it either found the heap buffer overflow or went out of memory. On this simple example, whitelists (b) and (c) found the heap buffer overflow more reliably and 5x faster than whitelist (a). The average execution times when finding the heap buffer overflow were as follows: (a) 904 s, (b) 156 s, and (c) 176 s.
We explain these results by the fact that WOFF2 to TTF conversion calls the brotli decompression algorithm's functions, which are mostly irrelevant for finding bugs in WOFF2 font reconstruction but nevertheless instrumented and used by whitelist (a) to guide fuzzing. This results in longer execution time for these functions and a partially irrelevant corpus. Contrary to whitelist (a), whitelists (b) and (c) will execute brotli-related functions without instrumentation overhead, and ignore new code paths found in them. This results in faster bug finding for WOFF2 font reconstruction.
The results for whitelist (b) are similar to the ones for whitelist (c). Indeed, WOFF2 to TTF conversion calls functions that are mostly located in SRC/src/woff2_dec.cc. The 2892 extra instrumentation points allowed by whitelist (b) do not tamper with bug finding, even though they are mostly irrelevant, simply because most of these functions do not get called. We get a slightly faster average time for bug finding with whitelist (b), which might indicate that some of the extra instrumentation points are actually relevant, or might just be random noise.
Reviewers: kcc, morehouse, vitalybuka
Reviewed By: morehouse, vitalybuka
Subscribers: pratyai, vitalybuka, eternalsakura, xwlin222, dende, srhines, kubamracek, #sanitizers, lebedev.ri, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #sanitizers, #llvm
Differential Revision: https://reviews.llvm.org/D63616
2020-04-11 01:42:41 +08:00
|
|
|
const SanitizerCoverageOptions &Options = SanitizerCoverageOptions(),
|
2020-06-20 13:22:47 +08:00
|
|
|
const SpecialCaseList *Allowlist = nullptr,
|
|
|
|
const SpecialCaseList *Blocklist = nullptr)
|
|
|
|
: Options(OverrideFromCL(Options)), Allowlist(Allowlist),
|
|
|
|
Blocklist(Blocklist) {}
|
2019-09-05 04:30:29 +08:00
|
|
|
bool instrumentModule(Module &M, DomTreeCallback DTCallback,
|
|
|
|
PostDomTreeCallback PDTCallback);
|
2019-07-26 04:53:15 +08:00
|
|
|
|
|
|
|
private:
|
2019-09-05 04:30:29 +08:00
|
|
|
void instrumentFunction(Function &F, DomTreeCallback DTCallback,
|
|
|
|
PostDomTreeCallback PDTCallback);
|
2014-11-12 06:14:37 +08:00
|
|
|
void InjectCoverageForIndirectCalls(Function &F,
|
|
|
|
ArrayRef<Instruction *> IndirCalls);
|
2015-03-21 09:29:36 +08:00
|
|
|
void InjectTraceForCmp(Function &F, ArrayRef<Instruction *> CmpTraceTargets);
|
2016-08-30 09:12:10 +08:00
|
|
|
void InjectTraceForDiv(Function &F,
|
|
|
|
ArrayRef<BinaryOperator *> DivTraceTargets);
|
|
|
|
void InjectTraceForGep(Function &F,
|
|
|
|
ArrayRef<GetElementPtrInst *> GepTraceTargets);
|
2021-11-09 09:52:36 +08:00
|
|
|
void InjectTraceForLoadsAndStores(Function &F, ArrayRef<LoadInst *> Loads,
|
|
|
|
ArrayRef<StoreInst *> Stores);
|
2015-07-31 09:33:06 +08:00
|
|
|
void InjectTraceForSwitch(Function &F,
|
|
|
|
ArrayRef<Instruction *> SwitchTraceTargets);
|
2017-08-31 06:49:31 +08:00
|
|
|
bool InjectCoverage(Function &F, ArrayRef<BasicBlock *> AllBlocks,
|
|
|
|
bool IsLeafFunc = true);
|
2017-06-03 07:13:44 +08:00
|
|
|
GlobalVariable *CreateFunctionLocalArrayInSection(size_t NumElements,
|
|
|
|
Function &F, Type *Ty,
|
|
|
|
const char *Section);
|
2017-08-29 07:46:11 +08:00
|
|
|
GlobalVariable *CreatePCArray(Function &F, ArrayRef<BasicBlock *> AllBlocks);
|
2017-07-28 07:36:49 +08:00
|
|
|
void CreateFunctionLocalArrays(Function &F, ArrayRef<BasicBlock *> AllBlocks);
|
2017-08-31 06:49:31 +08:00
|
|
|
void InjectCoverageAtBlock(Function &F, BasicBlock &BB, size_t Idx,
|
|
|
|
bool IsLeafFunc = true);
|
2019-09-05 04:30:29 +08:00
|
|
|
Function *CreateInitCallsForSections(Module &M, const char *CtorName,
|
|
|
|
const char *InitFunctionName, Type *Ty,
|
|
|
|
const char *Section);
|
|
|
|
std::pair<Value *, Value *> CreateSecStartEnd(Module &M, const char *Section,
|
|
|
|
Type *Ty);
|
2017-06-03 07:13:44 +08:00
|
|
|
|
2017-06-09 06:58:19 +08:00
|
|
|
void SetNoSanitizeMetadata(Instruction *I) {
|
|
|
|
I->setMetadata(I->getModule()->getMDKindID("nosanitize"),
|
|
|
|
MDNode::get(*C, None));
|
|
|
|
}
|
|
|
|
|
2017-06-03 07:13:44 +08:00
|
|
|
std::string getSectionName(const std::string &Section) const;
|
2019-07-16 07:18:31 +08:00
|
|
|
std::string getSectionStart(const std::string &Section) const;
|
|
|
|
std::string getSectionEnd(const std::string &Section) const;
|
[opaque pointer types] Add a FunctionCallee wrapper type, and use it.
Recommit r352791 after tweaking DerivedTypes.h slightly, so that gcc
doesn't choke on it, hopefully.
Original Message:
The FunctionCallee type is effectively a {FunctionType*,Value*} pair,
and is a useful convenience to enable code to continue passing the
result of getOrInsertFunction() through to EmitCall, even once pointer
types lose their pointee-type.
Then:
- update the CallInst/InvokeInst instruction creation functions to
take a Callee,
- modify getOrInsertFunction to return FunctionCallee, and
- update all callers appropriately.
One area of particular note is the change to the sanitizer
code. Previously, they had been casting the result of
`getOrInsertFunction` to a `Function*` via
`checkSanitizerInterfaceFunction`, and storing that. That would report
an error if someone had already inserted a function declaraction with
a mismatching signature.
However, in general, LLVM allows for such mismatches, as
`getOrInsertFunction` will automatically insert a bitcast if
needed. As part of this cleanup, cause the sanitizer code to do the
same. (It will call its functions using the expected signature,
however they may have been declared.)
Finally, in a small number of locations, callers of
`getOrInsertFunction` actually were expecting/requiring that a brand
new function was being created. In such cases, I've switched them to
Function::Create instead.
Differential Revision: https://reviews.llvm.org/D57315
llvm-svn: 352827
2019-02-01 10:28:03 +08:00
|
|
|
FunctionCallee SanCovTracePCIndir;
|
|
|
|
FunctionCallee SanCovTracePC, SanCovTracePCGuard;
|
2021-11-09 09:52:36 +08:00
|
|
|
std::array<FunctionCallee, 4> SanCovTraceCmpFunction;
|
|
|
|
std::array<FunctionCallee, 4> SanCovTraceConstCmpFunction;
|
|
|
|
std::array<FunctionCallee, 5> SanCovLoadFunction;
|
|
|
|
std::array<FunctionCallee, 5> SanCovStoreFunction;
|
|
|
|
std::array<FunctionCallee, 2> SanCovTraceDivFunction;
|
[opaque pointer types] Add a FunctionCallee wrapper type, and use it.
Recommit r352791 after tweaking DerivedTypes.h slightly, so that gcc
doesn't choke on it, hopefully.
Original Message:
The FunctionCallee type is effectively a {FunctionType*,Value*} pair,
and is a useful convenience to enable code to continue passing the
result of getOrInsertFunction() through to EmitCall, even once pointer
types lose their pointee-type.
Then:
- update the CallInst/InvokeInst instruction creation functions to
take a Callee,
- modify getOrInsertFunction to return FunctionCallee, and
- update all callers appropriately.
One area of particular note is the change to the sanitizer
code. Previously, they had been casting the result of
`getOrInsertFunction` to a `Function*` via
`checkSanitizerInterfaceFunction`, and storing that. That would report
an error if someone had already inserted a function declaraction with
a mismatching signature.
However, in general, LLVM allows for such mismatches, as
`getOrInsertFunction` will automatically insert a bitcast if
needed. As part of this cleanup, cause the sanitizer code to do the
same. (It will call its functions using the expected signature,
however they may have been declared.)
Finally, in a small number of locations, callers of
`getOrInsertFunction` actually were expecting/requiring that a brand
new function was being created. In such cases, I've switched them to
Function::Create instead.
Differential Revision: https://reviews.llvm.org/D57315
llvm-svn: 352827
2019-02-01 10:28:03 +08:00
|
|
|
FunctionCallee SanCovTraceGepFunction;
|
|
|
|
FunctionCallee SanCovTraceSwitchFunction;
|
2017-08-19 02:43:30 +08:00
|
|
|
GlobalVariable *SanCovLowestStack;
|
2021-11-09 09:52:36 +08:00
|
|
|
Type *Int128PtrTy, *IntptrTy, *IntptrPtrTy, *Int64Ty, *Int64PtrTy, *Int32Ty,
|
|
|
|
*Int32PtrTy, *Int16PtrTy, *Int16Ty, *Int8Ty, *Int8PtrTy, *Int1Ty,
|
|
|
|
*Int1PtrTy;
|
2015-07-31 09:33:06 +08:00
|
|
|
Module *CurModule;
|
2018-09-14 05:45:55 +08:00
|
|
|
std::string CurModuleUniqueId;
|
2017-02-03 09:08:06 +08:00
|
|
|
Triple TargetTriple;
|
2014-11-12 06:14:37 +08:00
|
|
|
LLVMContext *C;
|
2015-03-21 09:29:36 +08:00
|
|
|
const DataLayout *DL;
|
2014-11-12 06:14:37 +08:00
|
|
|
|
2016-09-30 01:43:24 +08:00
|
|
|
GlobalVariable *FunctionGuardArray; // for trace-pc-guard.
|
2017-06-09 06:58:19 +08:00
|
|
|
GlobalVariable *Function8bitCounterArray; // for inline-8bit-counters.
|
2020-04-09 13:02:41 +08:00
|
|
|
GlobalVariable *FunctionBoolArray; // for inline-bool-flag.
|
2017-07-28 08:09:29 +08:00
|
|
|
GlobalVariable *FunctionPCsArray; // for pc-table.
|
2017-09-09 13:30:13 +08:00
|
|
|
SmallVector<GlobalValue *, 20> GlobalsToAppendToUsed;
|
2018-06-16 04:12:58 +08:00
|
|
|
SmallVector<GlobalValue *, 20> GlobalsToAppendToCompilerUsed;
|
2014-12-24 06:32:17 +08:00
|
|
|
|
2015-05-07 09:00:31 +08:00
|
|
|
SanitizerCoverageOptions Options;
|
Implement `-fsanitize-coverage-whitelist` and `-fsanitize-coverage-blacklist` for clang
Summary:
This commit adds two command-line options to clang.
These options let the user decide which functions will receive SanitizerCoverage instrumentation.
This is most useful in the libFuzzer use case, where it enables targeted coverage-guided fuzzing.
Patch by Yannis Juglaret of DGA-MI, Rennes, France
libFuzzer tests its target against an evolving corpus, and relies on SanitizerCoverage instrumentation to collect the code coverage information that drives corpus evolution. Currently, libFuzzer collects such information for all functions of the target under test, and adds to the corpus every mutated sample that finds a new code coverage path in any function of the target. We propose instead to let the user specify which functions' code coverage information is relevant for building the upcoming fuzzing campaign's corpus. To this end, we add two new command line options for clang, enabling targeted coverage-guided fuzzing with libFuzzer. We see targeted coverage guided fuzzing as a simple way to leverage libFuzzer for big targets with thousands of functions or multiple dependencies. We publish this patch as work from DGA-MI of Rennes, France, with proper authorization from the hierarchy.
Targeted coverage-guided fuzzing can accelerate bug finding for two reasons. First, the compiler will avoid costly instrumentation for non-relevant functions, accelerating fuzzer execution for each call to any of these functions. Second, the built fuzzer will produce and use a more accurate corpus, because it will not keep the samples that find new coverage paths in non-relevant functions.
The two new command line options are `-fsanitize-coverage-whitelist` and `-fsanitize-coverage-blacklist`. They accept files in the same format as the existing `-fsanitize-blacklist` option <https://clang.llvm.org/docs/SanitizerSpecialCaseList.html#format>. The new options influence SanitizerCoverage so that it will only instrument a subset of the functions in the target. We explain these options in detail in `clang/docs/SanitizerCoverage.rst`.
Consider now the woff2 fuzzing example from the libFuzzer tutorial <https://github.com/google/fuzzer-test-suite/blob/master/tutorial/libFuzzerTutorial.md>. We are aware that we cannot conclude much from this example because mutating compressed data is generally a bad idea, but let us use it anyway as an illustration for its simplicity. Let us use an empty blacklist together with one of the three following whitelists:
```
# (a)
src:*
fun:*
# (b)
src:SRC/*
fun:*
# (c)
src:SRC/src/woff2_dec.cc
fun:*
```
Running the built fuzzers shows how many instrumentation points the compiler adds, the fuzzer will output //XXX PCs//. Whitelist (a) is the instrument-everything whitelist, it produces 11912 instrumentation points. Whitelist (b) focuses coverage to instrument woff2 source code only, ignoring the dependency code for brotli (de)compression; it produces 3984 instrumented instrumentation points. Whitelist (c) focuses coverage to only instrument functions in the main file that deals with WOFF2 to TTF conversion, resulting in 1056 instrumentation points.
For experimentation purposes, we ran each fuzzer approximately 100 times, single process, with the initial corpus provided in the tutorial. We let the fuzzer run until it either found the heap buffer overflow or went out of memory. On this simple example, whitelists (b) and (c) found the heap buffer overflow more reliably and 5x faster than whitelist (a). The average execution times when finding the heap buffer overflow were as follows: (a) 904 s, (b) 156 s, and (c) 176 s.
We explain these results by the fact that WOFF2 to TTF conversion calls the brotli decompression algorithm's functions, which are mostly irrelevant for finding bugs in WOFF2 font reconstruction but nevertheless instrumented and used by whitelist (a) to guide fuzzing. This results in longer execution time for these functions and a partially irrelevant corpus. Contrary to whitelist (a), whitelists (b) and (c) will execute brotli-related functions without instrumentation overhead, and ignore new code paths found in them. This results in faster bug finding for WOFF2 font reconstruction.
The results for whitelist (b) are similar to the ones for whitelist (c). Indeed, WOFF2 to TTF conversion calls functions that are mostly located in SRC/src/woff2_dec.cc. The 2892 extra instrumentation points allowed by whitelist (b) do not tamper with bug finding, even though they are mostly irrelevant, simply because most of these functions do not get called. We get a slightly faster average time for bug finding with whitelist (b), which might indicate that some of the extra instrumentation points are actually relevant, or might just be random noise.
Reviewers: kcc, morehouse, vitalybuka
Reviewed By: morehouse, vitalybuka
Subscribers: pratyai, vitalybuka, eternalsakura, xwlin222, dende, srhines, kubamracek, #sanitizers, lebedev.ri, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #sanitizers, #llvm
Differential Revision: https://reviews.llvm.org/D63616
2020-04-11 01:42:41 +08:00
|
|
|
|
2020-06-20 13:22:47 +08:00
|
|
|
const SpecialCaseList *Allowlist;
|
|
|
|
const SpecialCaseList *Blocklist;
|
2014-11-12 06:14:37 +08:00
|
|
|
};
|
|
|
|
|
2019-09-05 04:30:29 +08:00
|
|
|
class ModuleSanitizerCoverageLegacyPass : public ModulePass {
|
2019-07-26 04:53:15 +08:00
|
|
|
public:
|
2019-09-05 04:30:29 +08:00
|
|
|
ModuleSanitizerCoverageLegacyPass(
|
Implement `-fsanitize-coverage-whitelist` and `-fsanitize-coverage-blacklist` for clang
Summary:
This commit adds two command-line options to clang.
These options let the user decide which functions will receive SanitizerCoverage instrumentation.
This is most useful in the libFuzzer use case, where it enables targeted coverage-guided fuzzing.
Patch by Yannis Juglaret of DGA-MI, Rennes, France
libFuzzer tests its target against an evolving corpus, and relies on SanitizerCoverage instrumentation to collect the code coverage information that drives corpus evolution. Currently, libFuzzer collects such information for all functions of the target under test, and adds to the corpus every mutated sample that finds a new code coverage path in any function of the target. We propose instead to let the user specify which functions' code coverage information is relevant for building the upcoming fuzzing campaign's corpus. To this end, we add two new command line options for clang, enabling targeted coverage-guided fuzzing with libFuzzer. We see targeted coverage guided fuzzing as a simple way to leverage libFuzzer for big targets with thousands of functions or multiple dependencies. We publish this patch as work from DGA-MI of Rennes, France, with proper authorization from the hierarchy.
Targeted coverage-guided fuzzing can accelerate bug finding for two reasons. First, the compiler will avoid costly instrumentation for non-relevant functions, accelerating fuzzer execution for each call to any of these functions. Second, the built fuzzer will produce and use a more accurate corpus, because it will not keep the samples that find new coverage paths in non-relevant functions.
The two new command line options are `-fsanitize-coverage-whitelist` and `-fsanitize-coverage-blacklist`. They accept files in the same format as the existing `-fsanitize-blacklist` option <https://clang.llvm.org/docs/SanitizerSpecialCaseList.html#format>. The new options influence SanitizerCoverage so that it will only instrument a subset of the functions in the target. We explain these options in detail in `clang/docs/SanitizerCoverage.rst`.
Consider now the woff2 fuzzing example from the libFuzzer tutorial <https://github.com/google/fuzzer-test-suite/blob/master/tutorial/libFuzzerTutorial.md>. We are aware that we cannot conclude much from this example because mutating compressed data is generally a bad idea, but let us use it anyway as an illustration for its simplicity. Let us use an empty blacklist together with one of the three following whitelists:
```
# (a)
src:*
fun:*
# (b)
src:SRC/*
fun:*
# (c)
src:SRC/src/woff2_dec.cc
fun:*
```
Running the built fuzzers shows how many instrumentation points the compiler adds, the fuzzer will output //XXX PCs//. Whitelist (a) is the instrument-everything whitelist, it produces 11912 instrumentation points. Whitelist (b) focuses coverage to instrument woff2 source code only, ignoring the dependency code for brotli (de)compression; it produces 3984 instrumented instrumentation points. Whitelist (c) focuses coverage to only instrument functions in the main file that deals with WOFF2 to TTF conversion, resulting in 1056 instrumentation points.
For experimentation purposes, we ran each fuzzer approximately 100 times, single process, with the initial corpus provided in the tutorial. We let the fuzzer run until it either found the heap buffer overflow or went out of memory. On this simple example, whitelists (b) and (c) found the heap buffer overflow more reliably and 5x faster than whitelist (a). The average execution times when finding the heap buffer overflow were as follows: (a) 904 s, (b) 156 s, and (c) 176 s.
We explain these results by the fact that WOFF2 to TTF conversion calls the brotli decompression algorithm's functions, which are mostly irrelevant for finding bugs in WOFF2 font reconstruction but nevertheless instrumented and used by whitelist (a) to guide fuzzing. This results in longer execution time for these functions and a partially irrelevant corpus. Contrary to whitelist (a), whitelists (b) and (c) will execute brotli-related functions without instrumentation overhead, and ignore new code paths found in them. This results in faster bug finding for WOFF2 font reconstruction.
The results for whitelist (b) are similar to the ones for whitelist (c). Indeed, WOFF2 to TTF conversion calls functions that are mostly located in SRC/src/woff2_dec.cc. The 2892 extra instrumentation points allowed by whitelist (b) do not tamper with bug finding, even though they are mostly irrelevant, simply because most of these functions do not get called. We get a slightly faster average time for bug finding with whitelist (b), which might indicate that some of the extra instrumentation points are actually relevant, or might just be random noise.
Reviewers: kcc, morehouse, vitalybuka
Reviewed By: morehouse, vitalybuka
Subscribers: pratyai, vitalybuka, eternalsakura, xwlin222, dende, srhines, kubamracek, #sanitizers, lebedev.ri, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #sanitizers, #llvm
Differential Revision: https://reviews.llvm.org/D63616
2020-04-11 01:42:41 +08:00
|
|
|
const SanitizerCoverageOptions &Options = SanitizerCoverageOptions(),
|
2020-06-20 13:22:47 +08:00
|
|
|
const std::vector<std::string> &AllowlistFiles =
|
Implement `-fsanitize-coverage-whitelist` and `-fsanitize-coverage-blacklist` for clang
Summary:
This commit adds two command-line options to clang.
These options let the user decide which functions will receive SanitizerCoverage instrumentation.
This is most useful in the libFuzzer use case, where it enables targeted coverage-guided fuzzing.
Patch by Yannis Juglaret of DGA-MI, Rennes, France
libFuzzer tests its target against an evolving corpus, and relies on SanitizerCoverage instrumentation to collect the code coverage information that drives corpus evolution. Currently, libFuzzer collects such information for all functions of the target under test, and adds to the corpus every mutated sample that finds a new code coverage path in any function of the target. We propose instead to let the user specify which functions' code coverage information is relevant for building the upcoming fuzzing campaign's corpus. To this end, we add two new command line options for clang, enabling targeted coverage-guided fuzzing with libFuzzer. We see targeted coverage guided fuzzing as a simple way to leverage libFuzzer for big targets with thousands of functions or multiple dependencies. We publish this patch as work from DGA-MI of Rennes, France, with proper authorization from the hierarchy.
Targeted coverage-guided fuzzing can accelerate bug finding for two reasons. First, the compiler will avoid costly instrumentation for non-relevant functions, accelerating fuzzer execution for each call to any of these functions. Second, the built fuzzer will produce and use a more accurate corpus, because it will not keep the samples that find new coverage paths in non-relevant functions.
The two new command line options are `-fsanitize-coverage-whitelist` and `-fsanitize-coverage-blacklist`. They accept files in the same format as the existing `-fsanitize-blacklist` option <https://clang.llvm.org/docs/SanitizerSpecialCaseList.html#format>. The new options influence SanitizerCoverage so that it will only instrument a subset of the functions in the target. We explain these options in detail in `clang/docs/SanitizerCoverage.rst`.
Consider now the woff2 fuzzing example from the libFuzzer tutorial <https://github.com/google/fuzzer-test-suite/blob/master/tutorial/libFuzzerTutorial.md>. We are aware that we cannot conclude much from this example because mutating compressed data is generally a bad idea, but let us use it anyway as an illustration for its simplicity. Let us use an empty blacklist together with one of the three following whitelists:
```
# (a)
src:*
fun:*
# (b)
src:SRC/*
fun:*
# (c)
src:SRC/src/woff2_dec.cc
fun:*
```
Running the built fuzzers shows how many instrumentation points the compiler adds, the fuzzer will output //XXX PCs//. Whitelist (a) is the instrument-everything whitelist, it produces 11912 instrumentation points. Whitelist (b) focuses coverage to instrument woff2 source code only, ignoring the dependency code for brotli (de)compression; it produces 3984 instrumented instrumentation points. Whitelist (c) focuses coverage to only instrument functions in the main file that deals with WOFF2 to TTF conversion, resulting in 1056 instrumentation points.
For experimentation purposes, we ran each fuzzer approximately 100 times, single process, with the initial corpus provided in the tutorial. We let the fuzzer run until it either found the heap buffer overflow or went out of memory. On this simple example, whitelists (b) and (c) found the heap buffer overflow more reliably and 5x faster than whitelist (a). The average execution times when finding the heap buffer overflow were as follows: (a) 904 s, (b) 156 s, and (c) 176 s.
We explain these results by the fact that WOFF2 to TTF conversion calls the brotli decompression algorithm's functions, which are mostly irrelevant for finding bugs in WOFF2 font reconstruction but nevertheless instrumented and used by whitelist (a) to guide fuzzing. This results in longer execution time for these functions and a partially irrelevant corpus. Contrary to whitelist (a), whitelists (b) and (c) will execute brotli-related functions without instrumentation overhead, and ignore new code paths found in them. This results in faster bug finding for WOFF2 font reconstruction.
The results for whitelist (b) are similar to the ones for whitelist (c). Indeed, WOFF2 to TTF conversion calls functions that are mostly located in SRC/src/woff2_dec.cc. The 2892 extra instrumentation points allowed by whitelist (b) do not tamper with bug finding, even though they are mostly irrelevant, simply because most of these functions do not get called. We get a slightly faster average time for bug finding with whitelist (b), which might indicate that some of the extra instrumentation points are actually relevant, or might just be random noise.
Reviewers: kcc, morehouse, vitalybuka
Reviewed By: morehouse, vitalybuka
Subscribers: pratyai, vitalybuka, eternalsakura, xwlin222, dende, srhines, kubamracek, #sanitizers, lebedev.ri, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #sanitizers, #llvm
Differential Revision: https://reviews.llvm.org/D63616
2020-04-11 01:42:41 +08:00
|
|
|
std::vector<std::string>(),
|
2020-06-20 13:22:47 +08:00
|
|
|
const std::vector<std::string> &BlocklistFiles =
|
Implement `-fsanitize-coverage-whitelist` and `-fsanitize-coverage-blacklist` for clang
Summary:
This commit adds two command-line options to clang.
These options let the user decide which functions will receive SanitizerCoverage instrumentation.
This is most useful in the libFuzzer use case, where it enables targeted coverage-guided fuzzing.
Patch by Yannis Juglaret of DGA-MI, Rennes, France
libFuzzer tests its target against an evolving corpus, and relies on SanitizerCoverage instrumentation to collect the code coverage information that drives corpus evolution. Currently, libFuzzer collects such information for all functions of the target under test, and adds to the corpus every mutated sample that finds a new code coverage path in any function of the target. We propose instead to let the user specify which functions' code coverage information is relevant for building the upcoming fuzzing campaign's corpus. To this end, we add two new command line options for clang, enabling targeted coverage-guided fuzzing with libFuzzer. We see targeted coverage guided fuzzing as a simple way to leverage libFuzzer for big targets with thousands of functions or multiple dependencies. We publish this patch as work from DGA-MI of Rennes, France, with proper authorization from the hierarchy.
Targeted coverage-guided fuzzing can accelerate bug finding for two reasons. First, the compiler will avoid costly instrumentation for non-relevant functions, accelerating fuzzer execution for each call to any of these functions. Second, the built fuzzer will produce and use a more accurate corpus, because it will not keep the samples that find new coverage paths in non-relevant functions.
The two new command line options are `-fsanitize-coverage-whitelist` and `-fsanitize-coverage-blacklist`. They accept files in the same format as the existing `-fsanitize-blacklist` option <https://clang.llvm.org/docs/SanitizerSpecialCaseList.html#format>. The new options influence SanitizerCoverage so that it will only instrument a subset of the functions in the target. We explain these options in detail in `clang/docs/SanitizerCoverage.rst`.
Consider now the woff2 fuzzing example from the libFuzzer tutorial <https://github.com/google/fuzzer-test-suite/blob/master/tutorial/libFuzzerTutorial.md>. We are aware that we cannot conclude much from this example because mutating compressed data is generally a bad idea, but let us use it anyway as an illustration for its simplicity. Let us use an empty blacklist together with one of the three following whitelists:
```
# (a)
src:*
fun:*
# (b)
src:SRC/*
fun:*
# (c)
src:SRC/src/woff2_dec.cc
fun:*
```
Running the built fuzzers shows how many instrumentation points the compiler adds, the fuzzer will output //XXX PCs//. Whitelist (a) is the instrument-everything whitelist, it produces 11912 instrumentation points. Whitelist (b) focuses coverage to instrument woff2 source code only, ignoring the dependency code for brotli (de)compression; it produces 3984 instrumented instrumentation points. Whitelist (c) focuses coverage to only instrument functions in the main file that deals with WOFF2 to TTF conversion, resulting in 1056 instrumentation points.
For experimentation purposes, we ran each fuzzer approximately 100 times, single process, with the initial corpus provided in the tutorial. We let the fuzzer run until it either found the heap buffer overflow or went out of memory. On this simple example, whitelists (b) and (c) found the heap buffer overflow more reliably and 5x faster than whitelist (a). The average execution times when finding the heap buffer overflow were as follows: (a) 904 s, (b) 156 s, and (c) 176 s.
We explain these results by the fact that WOFF2 to TTF conversion calls the brotli decompression algorithm's functions, which are mostly irrelevant for finding bugs in WOFF2 font reconstruction but nevertheless instrumented and used by whitelist (a) to guide fuzzing. This results in longer execution time for these functions and a partially irrelevant corpus. Contrary to whitelist (a), whitelists (b) and (c) will execute brotli-related functions without instrumentation overhead, and ignore new code paths found in them. This results in faster bug finding for WOFF2 font reconstruction.
The results for whitelist (b) are similar to the ones for whitelist (c). Indeed, WOFF2 to TTF conversion calls functions that are mostly located in SRC/src/woff2_dec.cc. The 2892 extra instrumentation points allowed by whitelist (b) do not tamper with bug finding, even though they are mostly irrelevant, simply because most of these functions do not get called. We get a slightly faster average time for bug finding with whitelist (b), which might indicate that some of the extra instrumentation points are actually relevant, or might just be random noise.
Reviewers: kcc, morehouse, vitalybuka
Reviewed By: morehouse, vitalybuka
Subscribers: pratyai, vitalybuka, eternalsakura, xwlin222, dende, srhines, kubamracek, #sanitizers, lebedev.ri, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #sanitizers, #llvm
Differential Revision: https://reviews.llvm.org/D63616
2020-04-11 01:42:41 +08:00
|
|
|
std::vector<std::string>())
|
2019-09-05 04:30:29 +08:00
|
|
|
: ModulePass(ID), Options(Options) {
|
2020-06-20 13:22:47 +08:00
|
|
|
if (AllowlistFiles.size() > 0)
|
|
|
|
Allowlist = SpecialCaseList::createOrDie(AllowlistFiles,
|
Implement `-fsanitize-coverage-whitelist` and `-fsanitize-coverage-blacklist` for clang
Summary:
This commit adds two command-line options to clang.
These options let the user decide which functions will receive SanitizerCoverage instrumentation.
This is most useful in the libFuzzer use case, where it enables targeted coverage-guided fuzzing.
Patch by Yannis Juglaret of DGA-MI, Rennes, France
libFuzzer tests its target against an evolving corpus, and relies on SanitizerCoverage instrumentation to collect the code coverage information that drives corpus evolution. Currently, libFuzzer collects such information for all functions of the target under test, and adds to the corpus every mutated sample that finds a new code coverage path in any function of the target. We propose instead to let the user specify which functions' code coverage information is relevant for building the upcoming fuzzing campaign's corpus. To this end, we add two new command line options for clang, enabling targeted coverage-guided fuzzing with libFuzzer. We see targeted coverage guided fuzzing as a simple way to leverage libFuzzer for big targets with thousands of functions or multiple dependencies. We publish this patch as work from DGA-MI of Rennes, France, with proper authorization from the hierarchy.
Targeted coverage-guided fuzzing can accelerate bug finding for two reasons. First, the compiler will avoid costly instrumentation for non-relevant functions, accelerating fuzzer execution for each call to any of these functions. Second, the built fuzzer will produce and use a more accurate corpus, because it will not keep the samples that find new coverage paths in non-relevant functions.
The two new command line options are `-fsanitize-coverage-whitelist` and `-fsanitize-coverage-blacklist`. They accept files in the same format as the existing `-fsanitize-blacklist` option <https://clang.llvm.org/docs/SanitizerSpecialCaseList.html#format>. The new options influence SanitizerCoverage so that it will only instrument a subset of the functions in the target. We explain these options in detail in `clang/docs/SanitizerCoverage.rst`.
Consider now the woff2 fuzzing example from the libFuzzer tutorial <https://github.com/google/fuzzer-test-suite/blob/master/tutorial/libFuzzerTutorial.md>. We are aware that we cannot conclude much from this example because mutating compressed data is generally a bad idea, but let us use it anyway as an illustration for its simplicity. Let us use an empty blacklist together with one of the three following whitelists:
```
# (a)
src:*
fun:*
# (b)
src:SRC/*
fun:*
# (c)
src:SRC/src/woff2_dec.cc
fun:*
```
Running the built fuzzers shows how many instrumentation points the compiler adds, the fuzzer will output //XXX PCs//. Whitelist (a) is the instrument-everything whitelist, it produces 11912 instrumentation points. Whitelist (b) focuses coverage to instrument woff2 source code only, ignoring the dependency code for brotli (de)compression; it produces 3984 instrumented instrumentation points. Whitelist (c) focuses coverage to only instrument functions in the main file that deals with WOFF2 to TTF conversion, resulting in 1056 instrumentation points.
For experimentation purposes, we ran each fuzzer approximately 100 times, single process, with the initial corpus provided in the tutorial. We let the fuzzer run until it either found the heap buffer overflow or went out of memory. On this simple example, whitelists (b) and (c) found the heap buffer overflow more reliably and 5x faster than whitelist (a). The average execution times when finding the heap buffer overflow were as follows: (a) 904 s, (b) 156 s, and (c) 176 s.
We explain these results by the fact that WOFF2 to TTF conversion calls the brotli decompression algorithm's functions, which are mostly irrelevant for finding bugs in WOFF2 font reconstruction but nevertheless instrumented and used by whitelist (a) to guide fuzzing. This results in longer execution time for these functions and a partially irrelevant corpus. Contrary to whitelist (a), whitelists (b) and (c) will execute brotli-related functions without instrumentation overhead, and ignore new code paths found in them. This results in faster bug finding for WOFF2 font reconstruction.
The results for whitelist (b) are similar to the ones for whitelist (c). Indeed, WOFF2 to TTF conversion calls functions that are mostly located in SRC/src/woff2_dec.cc. The 2892 extra instrumentation points allowed by whitelist (b) do not tamper with bug finding, even though they are mostly irrelevant, simply because most of these functions do not get called. We get a slightly faster average time for bug finding with whitelist (b), which might indicate that some of the extra instrumentation points are actually relevant, or might just be random noise.
Reviewers: kcc, morehouse, vitalybuka
Reviewed By: morehouse, vitalybuka
Subscribers: pratyai, vitalybuka, eternalsakura, xwlin222, dende, srhines, kubamracek, #sanitizers, lebedev.ri, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #sanitizers, #llvm
Differential Revision: https://reviews.llvm.org/D63616
2020-04-11 01:42:41 +08:00
|
|
|
*vfs::getRealFileSystem());
|
2020-06-20 13:22:47 +08:00
|
|
|
if (BlocklistFiles.size() > 0)
|
|
|
|
Blocklist = SpecialCaseList::createOrDie(BlocklistFiles,
|
Implement `-fsanitize-coverage-whitelist` and `-fsanitize-coverage-blacklist` for clang
Summary:
This commit adds two command-line options to clang.
These options let the user decide which functions will receive SanitizerCoverage instrumentation.
This is most useful in the libFuzzer use case, where it enables targeted coverage-guided fuzzing.
Patch by Yannis Juglaret of DGA-MI, Rennes, France
libFuzzer tests its target against an evolving corpus, and relies on SanitizerCoverage instrumentation to collect the code coverage information that drives corpus evolution. Currently, libFuzzer collects such information for all functions of the target under test, and adds to the corpus every mutated sample that finds a new code coverage path in any function of the target. We propose instead to let the user specify which functions' code coverage information is relevant for building the upcoming fuzzing campaign's corpus. To this end, we add two new command line options for clang, enabling targeted coverage-guided fuzzing with libFuzzer. We see targeted coverage guided fuzzing as a simple way to leverage libFuzzer for big targets with thousands of functions or multiple dependencies. We publish this patch as work from DGA-MI of Rennes, France, with proper authorization from the hierarchy.
Targeted coverage-guided fuzzing can accelerate bug finding for two reasons. First, the compiler will avoid costly instrumentation for non-relevant functions, accelerating fuzzer execution for each call to any of these functions. Second, the built fuzzer will produce and use a more accurate corpus, because it will not keep the samples that find new coverage paths in non-relevant functions.
The two new command line options are `-fsanitize-coverage-whitelist` and `-fsanitize-coverage-blacklist`. They accept files in the same format as the existing `-fsanitize-blacklist` option <https://clang.llvm.org/docs/SanitizerSpecialCaseList.html#format>. The new options influence SanitizerCoverage so that it will only instrument a subset of the functions in the target. We explain these options in detail in `clang/docs/SanitizerCoverage.rst`.
Consider now the woff2 fuzzing example from the libFuzzer tutorial <https://github.com/google/fuzzer-test-suite/blob/master/tutorial/libFuzzerTutorial.md>. We are aware that we cannot conclude much from this example because mutating compressed data is generally a bad idea, but let us use it anyway as an illustration for its simplicity. Let us use an empty blacklist together with one of the three following whitelists:
```
# (a)
src:*
fun:*
# (b)
src:SRC/*
fun:*
# (c)
src:SRC/src/woff2_dec.cc
fun:*
```
Running the built fuzzers shows how many instrumentation points the compiler adds, the fuzzer will output //XXX PCs//. Whitelist (a) is the instrument-everything whitelist, it produces 11912 instrumentation points. Whitelist (b) focuses coverage to instrument woff2 source code only, ignoring the dependency code for brotli (de)compression; it produces 3984 instrumented instrumentation points. Whitelist (c) focuses coverage to only instrument functions in the main file that deals with WOFF2 to TTF conversion, resulting in 1056 instrumentation points.
For experimentation purposes, we ran each fuzzer approximately 100 times, single process, with the initial corpus provided in the tutorial. We let the fuzzer run until it either found the heap buffer overflow or went out of memory. On this simple example, whitelists (b) and (c) found the heap buffer overflow more reliably and 5x faster than whitelist (a). The average execution times when finding the heap buffer overflow were as follows: (a) 904 s, (b) 156 s, and (c) 176 s.
We explain these results by the fact that WOFF2 to TTF conversion calls the brotli decompression algorithm's functions, which are mostly irrelevant for finding bugs in WOFF2 font reconstruction but nevertheless instrumented and used by whitelist (a) to guide fuzzing. This results in longer execution time for these functions and a partially irrelevant corpus. Contrary to whitelist (a), whitelists (b) and (c) will execute brotli-related functions without instrumentation overhead, and ignore new code paths found in them. This results in faster bug finding for WOFF2 font reconstruction.
The results for whitelist (b) are similar to the ones for whitelist (c). Indeed, WOFF2 to TTF conversion calls functions that are mostly located in SRC/src/woff2_dec.cc. The 2892 extra instrumentation points allowed by whitelist (b) do not tamper with bug finding, even though they are mostly irrelevant, simply because most of these functions do not get called. We get a slightly faster average time for bug finding with whitelist (b), which might indicate that some of the extra instrumentation points are actually relevant, or might just be random noise.
Reviewers: kcc, morehouse, vitalybuka
Reviewed By: morehouse, vitalybuka
Subscribers: pratyai, vitalybuka, eternalsakura, xwlin222, dende, srhines, kubamracek, #sanitizers, lebedev.ri, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #sanitizers, #llvm
Differential Revision: https://reviews.llvm.org/D63616
2020-04-11 01:42:41 +08:00
|
|
|
*vfs::getRealFileSystem());
|
2019-09-05 04:30:29 +08:00
|
|
|
initializeModuleSanitizerCoverageLegacyPassPass(
|
|
|
|
*PassRegistry::getPassRegistry());
|
2019-07-26 04:53:15 +08:00
|
|
|
}
|
2019-09-05 04:30:29 +08:00
|
|
|
bool runOnModule(Module &M) override {
|
2020-06-20 13:22:47 +08:00
|
|
|
ModuleSanitizerCoverage ModuleSancov(Options, Allowlist.get(),
|
|
|
|
Blocklist.get());
|
2019-09-05 04:30:29 +08:00
|
|
|
auto DTCallback = [this](Function &F) -> const DominatorTree * {
|
|
|
|
return &this->getAnalysis<DominatorTreeWrapperPass>(F).getDomTree();
|
|
|
|
};
|
|
|
|
auto PDTCallback = [this](Function &F) -> const PostDominatorTree * {
|
|
|
|
return &this->getAnalysis<PostDominatorTreeWrapperPass>(F)
|
|
|
|
.getPostDomTree();
|
|
|
|
};
|
|
|
|
return ModuleSancov.instrumentModule(M, DTCallback, PDTCallback);
|
2019-07-26 04:53:15 +08:00
|
|
|
}
|
|
|
|
|
2019-09-05 04:30:29 +08:00
|
|
|
static char ID; // Pass identification, replacement for typeid
|
|
|
|
StringRef getPassName() const override { return "ModuleSanitizerCoverage"; }
|
2019-07-26 04:53:15 +08:00
|
|
|
|
|
|
|
void getAnalysisUsage(AnalysisUsage &AU) const override {
|
|
|
|
AU.addRequired<DominatorTreeWrapperPass>();
|
|
|
|
AU.addRequired<PostDominatorTreeWrapperPass>();
|
|
|
|
}
|
|
|
|
|
|
|
|
private:
|
|
|
|
SanitizerCoverageOptions Options;
|
Implement `-fsanitize-coverage-whitelist` and `-fsanitize-coverage-blacklist` for clang
Summary:
This commit adds two command-line options to clang.
These options let the user decide which functions will receive SanitizerCoverage instrumentation.
This is most useful in the libFuzzer use case, where it enables targeted coverage-guided fuzzing.
Patch by Yannis Juglaret of DGA-MI, Rennes, France
libFuzzer tests its target against an evolving corpus, and relies on SanitizerCoverage instrumentation to collect the code coverage information that drives corpus evolution. Currently, libFuzzer collects such information for all functions of the target under test, and adds to the corpus every mutated sample that finds a new code coverage path in any function of the target. We propose instead to let the user specify which functions' code coverage information is relevant for building the upcoming fuzzing campaign's corpus. To this end, we add two new command line options for clang, enabling targeted coverage-guided fuzzing with libFuzzer. We see targeted coverage guided fuzzing as a simple way to leverage libFuzzer for big targets with thousands of functions or multiple dependencies. We publish this patch as work from DGA-MI of Rennes, France, with proper authorization from the hierarchy.
Targeted coverage-guided fuzzing can accelerate bug finding for two reasons. First, the compiler will avoid costly instrumentation for non-relevant functions, accelerating fuzzer execution for each call to any of these functions. Second, the built fuzzer will produce and use a more accurate corpus, because it will not keep the samples that find new coverage paths in non-relevant functions.
The two new command line options are `-fsanitize-coverage-whitelist` and `-fsanitize-coverage-blacklist`. They accept files in the same format as the existing `-fsanitize-blacklist` option <https://clang.llvm.org/docs/SanitizerSpecialCaseList.html#format>. The new options influence SanitizerCoverage so that it will only instrument a subset of the functions in the target. We explain these options in detail in `clang/docs/SanitizerCoverage.rst`.
Consider now the woff2 fuzzing example from the libFuzzer tutorial <https://github.com/google/fuzzer-test-suite/blob/master/tutorial/libFuzzerTutorial.md>. We are aware that we cannot conclude much from this example because mutating compressed data is generally a bad idea, but let us use it anyway as an illustration for its simplicity. Let us use an empty blacklist together with one of the three following whitelists:
```
# (a)
src:*
fun:*
# (b)
src:SRC/*
fun:*
# (c)
src:SRC/src/woff2_dec.cc
fun:*
```
Running the built fuzzers shows how many instrumentation points the compiler adds, the fuzzer will output //XXX PCs//. Whitelist (a) is the instrument-everything whitelist, it produces 11912 instrumentation points. Whitelist (b) focuses coverage to instrument woff2 source code only, ignoring the dependency code for brotli (de)compression; it produces 3984 instrumented instrumentation points. Whitelist (c) focuses coverage to only instrument functions in the main file that deals with WOFF2 to TTF conversion, resulting in 1056 instrumentation points.
For experimentation purposes, we ran each fuzzer approximately 100 times, single process, with the initial corpus provided in the tutorial. We let the fuzzer run until it either found the heap buffer overflow or went out of memory. On this simple example, whitelists (b) and (c) found the heap buffer overflow more reliably and 5x faster than whitelist (a). The average execution times when finding the heap buffer overflow were as follows: (a) 904 s, (b) 156 s, and (c) 176 s.
We explain these results by the fact that WOFF2 to TTF conversion calls the brotli decompression algorithm's functions, which are mostly irrelevant for finding bugs in WOFF2 font reconstruction but nevertheless instrumented and used by whitelist (a) to guide fuzzing. This results in longer execution time for these functions and a partially irrelevant corpus. Contrary to whitelist (a), whitelists (b) and (c) will execute brotli-related functions without instrumentation overhead, and ignore new code paths found in them. This results in faster bug finding for WOFF2 font reconstruction.
The results for whitelist (b) are similar to the ones for whitelist (c). Indeed, WOFF2 to TTF conversion calls functions that are mostly located in SRC/src/woff2_dec.cc. The 2892 extra instrumentation points allowed by whitelist (b) do not tamper with bug finding, even though they are mostly irrelevant, simply because most of these functions do not get called. We get a slightly faster average time for bug finding with whitelist (b), which might indicate that some of the extra instrumentation points are actually relevant, or might just be random noise.
Reviewers: kcc, morehouse, vitalybuka
Reviewed By: morehouse, vitalybuka
Subscribers: pratyai, vitalybuka, eternalsakura, xwlin222, dende, srhines, kubamracek, #sanitizers, lebedev.ri, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #sanitizers, #llvm
Differential Revision: https://reviews.llvm.org/D63616
2020-04-11 01:42:41 +08:00
|
|
|
|
2020-06-20 13:22:47 +08:00
|
|
|
std::unique_ptr<SpecialCaseList> Allowlist;
|
|
|
|
std::unique_ptr<SpecialCaseList> Blocklist;
|
2019-07-26 04:53:15 +08:00
|
|
|
};
|
|
|
|
|
2016-03-19 07:29:29 +08:00
|
|
|
} // namespace
|
2014-11-12 06:14:37 +08:00
|
|
|
|
2019-07-26 04:53:15 +08:00
|
|
|
PreservedAnalyses ModuleSanitizerCoveragePass::run(Module &M,
|
2019-09-05 04:30:29 +08:00
|
|
|
ModuleAnalysisManager &MAM) {
|
2020-06-20 13:22:47 +08:00
|
|
|
ModuleSanitizerCoverage ModuleSancov(Options, Allowlist.get(),
|
|
|
|
Blocklist.get());
|
2019-09-05 04:30:29 +08:00
|
|
|
auto &FAM = MAM.getResult<FunctionAnalysisManagerModuleProxy>(M).getManager();
|
|
|
|
auto DTCallback = [&FAM](Function &F) -> const DominatorTree * {
|
|
|
|
return &FAM.getResult<DominatorTreeAnalysis>(F);
|
|
|
|
};
|
|
|
|
auto PDTCallback = [&FAM](Function &F) -> const PostDominatorTree * {
|
|
|
|
return &FAM.getResult<PostDominatorTreeAnalysis>(F);
|
|
|
|
};
|
|
|
|
if (ModuleSancov.instrumentModule(M, DTCallback, PDTCallback))
|
2019-07-26 04:53:15 +08:00
|
|
|
return PreservedAnalyses::none();
|
|
|
|
return PreservedAnalyses::all();
|
|
|
|
}
|
|
|
|
|
2018-10-17 07:43:57 +08:00
|
|
|
std::pair<Value *, Value *>
|
2019-07-26 04:53:15 +08:00
|
|
|
ModuleSanitizerCoverage::CreateSecStartEnd(Module &M, const char *Section,
|
2017-07-28 07:36:49 +08:00
|
|
|
Type *Ty) {
|
2021-03-19 07:46:04 +08:00
|
|
|
// Use ExternalWeak so that if all sections are discarded due to section
|
|
|
|
// garbage collection, the linker will not report undefined symbol errors.
|
2021-03-23 14:05:36 +08:00
|
|
|
// Windows defines the start/stop symbols in compiler-rt so no need for
|
|
|
|
// ExternalWeak.
|
|
|
|
GlobalValue::LinkageTypes Linkage = TargetTriple.isOSBinFormatCOFF()
|
|
|
|
? GlobalVariable::ExternalLinkage
|
|
|
|
: GlobalVariable::ExternalWeakLinkage;
|
|
|
|
GlobalVariable *SecStart =
|
2021-06-29 03:12:12 +08:00
|
|
|
new GlobalVariable(M, Ty, false, Linkage, nullptr,
|
|
|
|
getSectionStart(Section));
|
2017-06-03 07:13:44 +08:00
|
|
|
SecStart->setVisibility(GlobalValue::HiddenVisibility);
|
2021-03-23 14:05:36 +08:00
|
|
|
GlobalVariable *SecEnd =
|
2021-06-29 03:12:12 +08:00
|
|
|
new GlobalVariable(M, Ty, false, Linkage, nullptr,
|
|
|
|
getSectionEnd(Section));
|
2017-06-03 07:13:44 +08:00
|
|
|
SecEnd->setVisibility(GlobalValue::HiddenVisibility);
|
2018-10-17 07:43:57 +08:00
|
|
|
IRBuilder<> IRB(M.getContext());
|
2019-01-15 05:02:02 +08:00
|
|
|
if (!TargetTriple.isOSBinFormatCOFF())
|
2020-07-31 02:08:08 +08:00
|
|
|
return std::make_pair(SecStart, SecEnd);
|
2017-06-03 07:13:44 +08:00
|
|
|
|
2018-10-17 07:43:57 +08:00
|
|
|
// Account for the fact that on windows-msvc __start_* symbols actually
|
|
|
|
// point to a uint64_t before the start of the array.
|
|
|
|
auto SecStartI8Ptr = IRB.CreatePointerCast(SecStart, Int8PtrTy);
|
2019-02-02 04:44:47 +08:00
|
|
|
auto GEP = IRB.CreateGEP(Int8Ty, SecStartI8Ptr,
|
2018-10-17 07:43:57 +08:00
|
|
|
ConstantInt::get(IntptrTy, sizeof(uint64_t)));
|
2021-06-29 03:12:12 +08:00
|
|
|
return std::make_pair(IRB.CreatePointerCast(GEP, PointerType::getUnqual(Ty)),
|
|
|
|
SecEnd);
|
2017-07-28 07:36:49 +08:00
|
|
|
}
|
|
|
|
|
2019-07-26 04:53:15 +08:00
|
|
|
Function *ModuleSanitizerCoverage::CreateInitCallsForSections(
|
2019-05-07 09:39:37 +08:00
|
|
|
Module &M, const char *CtorName, const char *InitFunctionName, Type *Ty,
|
2017-07-28 07:36:49 +08:00
|
|
|
const char *Section) {
|
|
|
|
auto SecStartEnd = CreateSecStartEnd(M, Section, Ty);
|
|
|
|
auto SecStart = SecStartEnd.first;
|
|
|
|
auto SecEnd = SecStartEnd.second;
|
|
|
|
Function *CtorFunc;
|
2021-06-29 03:12:12 +08:00
|
|
|
Type *PtrTy = PointerType::getUnqual(Ty);
|
2017-06-03 07:13:44 +08:00
|
|
|
std::tie(CtorFunc, std::ignore) = createSanitizerCtorAndInitFunctions(
|
2021-06-29 03:12:12 +08:00
|
|
|
M, CtorName, InitFunctionName, {PtrTy, PtrTy}, {SecStart, SecEnd});
|
2019-05-07 09:39:37 +08:00
|
|
|
assert(CtorFunc->getName() == CtorName);
|
2017-06-03 07:13:44 +08:00
|
|
|
|
|
|
|
if (TargetTriple.supportsCOMDAT()) {
|
|
|
|
// Use comdat to dedup CtorFunc.
|
2019-05-07 09:39:37 +08:00
|
|
|
CtorFunc->setComdat(M.getOrInsertComdat(CtorName));
|
2017-06-03 07:13:44 +08:00
|
|
|
appendToGlobalCtors(M, CtorFunc, SanCtorAndDtorPriority, CtorFunc);
|
|
|
|
} else {
|
|
|
|
appendToGlobalCtors(M, CtorFunc, SanCtorAndDtorPriority);
|
|
|
|
}
|
2018-10-13 02:11:47 +08:00
|
|
|
|
2019-01-15 05:02:02 +08:00
|
|
|
if (TargetTriple.isOSBinFormatCOFF()) {
|
2018-10-13 02:11:47 +08:00
|
|
|
// In COFF files, if the contructors are set as COMDAT (they are because
|
|
|
|
// COFF supports COMDAT) and the linker flag /OPT:REF (strip unreferenced
|
|
|
|
// functions and data) is used, the constructors get stripped. To prevent
|
2019-01-15 05:02:02 +08:00
|
|
|
// this, give the constructors weak ODR linkage and ensure the linker knows
|
|
|
|
// to include the sancov constructor. This way the linker can deduplicate
|
|
|
|
// the constructors but always leave one copy.
|
2018-10-13 02:11:47 +08:00
|
|
|
CtorFunc->setLinkage(GlobalValue::WeakODRLinkage);
|
|
|
|
}
|
2017-07-28 07:36:49 +08:00
|
|
|
return CtorFunc;
|
2017-06-03 07:13:44 +08:00
|
|
|
}
|
|
|
|
|
2019-09-05 04:30:29 +08:00
|
|
|
bool ModuleSanitizerCoverage::instrumentModule(
|
|
|
|
Module &M, DomTreeCallback DTCallback, PostDomTreeCallback PDTCallback) {
|
2015-05-07 09:00:31 +08:00
|
|
|
if (Options.CoverageType == SanitizerCoverageOptions::SCK_None)
|
2019-09-05 04:30:29 +08:00
|
|
|
return false;
|
2020-06-20 13:22:47 +08:00
|
|
|
if (Allowlist &&
|
|
|
|
!Allowlist->inSection("coverage", "src", M.getSourceFileName()))
|
Implement `-fsanitize-coverage-whitelist` and `-fsanitize-coverage-blacklist` for clang
Summary:
This commit adds two command-line options to clang.
These options let the user decide which functions will receive SanitizerCoverage instrumentation.
This is most useful in the libFuzzer use case, where it enables targeted coverage-guided fuzzing.
Patch by Yannis Juglaret of DGA-MI, Rennes, France
libFuzzer tests its target against an evolving corpus, and relies on SanitizerCoverage instrumentation to collect the code coverage information that drives corpus evolution. Currently, libFuzzer collects such information for all functions of the target under test, and adds to the corpus every mutated sample that finds a new code coverage path in any function of the target. We propose instead to let the user specify which functions' code coverage information is relevant for building the upcoming fuzzing campaign's corpus. To this end, we add two new command line options for clang, enabling targeted coverage-guided fuzzing with libFuzzer. We see targeted coverage guided fuzzing as a simple way to leverage libFuzzer for big targets with thousands of functions or multiple dependencies. We publish this patch as work from DGA-MI of Rennes, France, with proper authorization from the hierarchy.
Targeted coverage-guided fuzzing can accelerate bug finding for two reasons. First, the compiler will avoid costly instrumentation for non-relevant functions, accelerating fuzzer execution for each call to any of these functions. Second, the built fuzzer will produce and use a more accurate corpus, because it will not keep the samples that find new coverage paths in non-relevant functions.
The two new command line options are `-fsanitize-coverage-whitelist` and `-fsanitize-coverage-blacklist`. They accept files in the same format as the existing `-fsanitize-blacklist` option <https://clang.llvm.org/docs/SanitizerSpecialCaseList.html#format>. The new options influence SanitizerCoverage so that it will only instrument a subset of the functions in the target. We explain these options in detail in `clang/docs/SanitizerCoverage.rst`.
Consider now the woff2 fuzzing example from the libFuzzer tutorial <https://github.com/google/fuzzer-test-suite/blob/master/tutorial/libFuzzerTutorial.md>. We are aware that we cannot conclude much from this example because mutating compressed data is generally a bad idea, but let us use it anyway as an illustration for its simplicity. Let us use an empty blacklist together with one of the three following whitelists:
```
# (a)
src:*
fun:*
# (b)
src:SRC/*
fun:*
# (c)
src:SRC/src/woff2_dec.cc
fun:*
```
Running the built fuzzers shows how many instrumentation points the compiler adds, the fuzzer will output //XXX PCs//. Whitelist (a) is the instrument-everything whitelist, it produces 11912 instrumentation points. Whitelist (b) focuses coverage to instrument woff2 source code only, ignoring the dependency code for brotli (de)compression; it produces 3984 instrumented instrumentation points. Whitelist (c) focuses coverage to only instrument functions in the main file that deals with WOFF2 to TTF conversion, resulting in 1056 instrumentation points.
For experimentation purposes, we ran each fuzzer approximately 100 times, single process, with the initial corpus provided in the tutorial. We let the fuzzer run until it either found the heap buffer overflow or went out of memory. On this simple example, whitelists (b) and (c) found the heap buffer overflow more reliably and 5x faster than whitelist (a). The average execution times when finding the heap buffer overflow were as follows: (a) 904 s, (b) 156 s, and (c) 176 s.
We explain these results by the fact that WOFF2 to TTF conversion calls the brotli decompression algorithm's functions, which are mostly irrelevant for finding bugs in WOFF2 font reconstruction but nevertheless instrumented and used by whitelist (a) to guide fuzzing. This results in longer execution time for these functions and a partially irrelevant corpus. Contrary to whitelist (a), whitelists (b) and (c) will execute brotli-related functions without instrumentation overhead, and ignore new code paths found in them. This results in faster bug finding for WOFF2 font reconstruction.
The results for whitelist (b) are similar to the ones for whitelist (c). Indeed, WOFF2 to TTF conversion calls functions that are mostly located in SRC/src/woff2_dec.cc. The 2892 extra instrumentation points allowed by whitelist (b) do not tamper with bug finding, even though they are mostly irrelevant, simply because most of these functions do not get called. We get a slightly faster average time for bug finding with whitelist (b), which might indicate that some of the extra instrumentation points are actually relevant, or might just be random noise.
Reviewers: kcc, morehouse, vitalybuka
Reviewed By: morehouse, vitalybuka
Subscribers: pratyai, vitalybuka, eternalsakura, xwlin222, dende, srhines, kubamracek, #sanitizers, lebedev.ri, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #sanitizers, #llvm
Differential Revision: https://reviews.llvm.org/D63616
2020-04-11 01:42:41 +08:00
|
|
|
return false;
|
2020-06-20 13:22:47 +08:00
|
|
|
if (Blocklist &&
|
|
|
|
Blocklist->inSection("coverage", "src", M.getSourceFileName()))
|
Implement `-fsanitize-coverage-whitelist` and `-fsanitize-coverage-blacklist` for clang
Summary:
This commit adds two command-line options to clang.
These options let the user decide which functions will receive SanitizerCoverage instrumentation.
This is most useful in the libFuzzer use case, where it enables targeted coverage-guided fuzzing.
Patch by Yannis Juglaret of DGA-MI, Rennes, France
libFuzzer tests its target against an evolving corpus, and relies on SanitizerCoverage instrumentation to collect the code coverage information that drives corpus evolution. Currently, libFuzzer collects such information for all functions of the target under test, and adds to the corpus every mutated sample that finds a new code coverage path in any function of the target. We propose instead to let the user specify which functions' code coverage information is relevant for building the upcoming fuzzing campaign's corpus. To this end, we add two new command line options for clang, enabling targeted coverage-guided fuzzing with libFuzzer. We see targeted coverage guided fuzzing as a simple way to leverage libFuzzer for big targets with thousands of functions or multiple dependencies. We publish this patch as work from DGA-MI of Rennes, France, with proper authorization from the hierarchy.
Targeted coverage-guided fuzzing can accelerate bug finding for two reasons. First, the compiler will avoid costly instrumentation for non-relevant functions, accelerating fuzzer execution for each call to any of these functions. Second, the built fuzzer will produce and use a more accurate corpus, because it will not keep the samples that find new coverage paths in non-relevant functions.
The two new command line options are `-fsanitize-coverage-whitelist` and `-fsanitize-coverage-blacklist`. They accept files in the same format as the existing `-fsanitize-blacklist` option <https://clang.llvm.org/docs/SanitizerSpecialCaseList.html#format>. The new options influence SanitizerCoverage so that it will only instrument a subset of the functions in the target. We explain these options in detail in `clang/docs/SanitizerCoverage.rst`.
Consider now the woff2 fuzzing example from the libFuzzer tutorial <https://github.com/google/fuzzer-test-suite/blob/master/tutorial/libFuzzerTutorial.md>. We are aware that we cannot conclude much from this example because mutating compressed data is generally a bad idea, but let us use it anyway as an illustration for its simplicity. Let us use an empty blacklist together with one of the three following whitelists:
```
# (a)
src:*
fun:*
# (b)
src:SRC/*
fun:*
# (c)
src:SRC/src/woff2_dec.cc
fun:*
```
Running the built fuzzers shows how many instrumentation points the compiler adds, the fuzzer will output //XXX PCs//. Whitelist (a) is the instrument-everything whitelist, it produces 11912 instrumentation points. Whitelist (b) focuses coverage to instrument woff2 source code only, ignoring the dependency code for brotli (de)compression; it produces 3984 instrumented instrumentation points. Whitelist (c) focuses coverage to only instrument functions in the main file that deals with WOFF2 to TTF conversion, resulting in 1056 instrumentation points.
For experimentation purposes, we ran each fuzzer approximately 100 times, single process, with the initial corpus provided in the tutorial. We let the fuzzer run until it either found the heap buffer overflow or went out of memory. On this simple example, whitelists (b) and (c) found the heap buffer overflow more reliably and 5x faster than whitelist (a). The average execution times when finding the heap buffer overflow were as follows: (a) 904 s, (b) 156 s, and (c) 176 s.
We explain these results by the fact that WOFF2 to TTF conversion calls the brotli decompression algorithm's functions, which are mostly irrelevant for finding bugs in WOFF2 font reconstruction but nevertheless instrumented and used by whitelist (a) to guide fuzzing. This results in longer execution time for these functions and a partially irrelevant corpus. Contrary to whitelist (a), whitelists (b) and (c) will execute brotli-related functions without instrumentation overhead, and ignore new code paths found in them. This results in faster bug finding for WOFF2 font reconstruction.
The results for whitelist (b) are similar to the ones for whitelist (c). Indeed, WOFF2 to TTF conversion calls functions that are mostly located in SRC/src/woff2_dec.cc. The 2892 extra instrumentation points allowed by whitelist (b) do not tamper with bug finding, even though they are mostly irrelevant, simply because most of these functions do not get called. We get a slightly faster average time for bug finding with whitelist (b), which might indicate that some of the extra instrumentation points are actually relevant, or might just be random noise.
Reviewers: kcc, morehouse, vitalybuka
Reviewed By: morehouse, vitalybuka
Subscribers: pratyai, vitalybuka, eternalsakura, xwlin222, dende, srhines, kubamracek, #sanitizers, lebedev.ri, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #sanitizers, #llvm
Differential Revision: https://reviews.llvm.org/D63616
2020-04-11 01:42:41 +08:00
|
|
|
return false;
|
2014-11-12 06:14:37 +08:00
|
|
|
C = &(M.getContext());
|
2015-03-21 09:29:36 +08:00
|
|
|
DL = &M.getDataLayout();
|
2019-09-05 04:30:29 +08:00
|
|
|
CurModule = &M;
|
2018-09-14 05:45:55 +08:00
|
|
|
CurModuleUniqueId = getUniqueModuleId(CurModule);
|
2017-02-03 09:08:06 +08:00
|
|
|
TargetTriple = Triple(M.getTargetTriple());
|
2017-06-03 07:13:44 +08:00
|
|
|
FunctionGuardArray = nullptr;
|
2017-06-09 06:58:19 +08:00
|
|
|
Function8bitCounterArray = nullptr;
|
2020-04-09 13:02:41 +08:00
|
|
|
FunctionBoolArray = nullptr;
|
2017-07-28 07:36:49 +08:00
|
|
|
FunctionPCsArray = nullptr;
|
2015-03-21 09:29:36 +08:00
|
|
|
IntptrTy = Type::getIntNTy(*C, DL->getPointerSizeInBits());
|
2016-09-18 12:52:23 +08:00
|
|
|
IntptrPtrTy = PointerType::getUnqual(IntptrTy);
|
2014-11-12 06:14:37 +08:00
|
|
|
Type *VoidTy = Type::getVoidTy(*C);
|
2014-11-25 02:49:53 +08:00
|
|
|
IRBuilder<> IRB(*C);
|
2021-11-09 09:52:36 +08:00
|
|
|
Int128PtrTy = PointerType::getUnqual(IRB.getInt128Ty());
|
2015-07-31 09:33:06 +08:00
|
|
|
Int64PtrTy = PointerType::getUnqual(IRB.getInt64Ty());
|
2021-11-09 09:52:36 +08:00
|
|
|
Int16PtrTy = PointerType::getUnqual(IRB.getInt16Ty());
|
2016-09-30 01:43:24 +08:00
|
|
|
Int32PtrTy = PointerType::getUnqual(IRB.getInt32Ty());
|
2017-06-09 06:58:19 +08:00
|
|
|
Int8PtrTy = PointerType::getUnqual(IRB.getInt8Ty());
|
2020-04-09 13:02:41 +08:00
|
|
|
Int1PtrTy = PointerType::getUnqual(IRB.getInt1Ty());
|
2015-03-21 09:29:36 +08:00
|
|
|
Int64Ty = IRB.getInt64Ty();
|
2016-09-30 01:43:24 +08:00
|
|
|
Int32Ty = IRB.getInt32Ty();
|
2017-08-10 23:00:13 +08:00
|
|
|
Int16Ty = IRB.getInt16Ty();
|
2017-06-09 06:58:19 +08:00
|
|
|
Int8Ty = IRB.getInt8Ty();
|
2020-04-09 13:02:41 +08:00
|
|
|
Int1Ty = IRB.getInt1Ty();
|
2014-11-12 06:14:37 +08:00
|
|
|
|
[opaque pointer types] Add a FunctionCallee wrapper type, and use it.
Recommit r352791 after tweaking DerivedTypes.h slightly, so that gcc
doesn't choke on it, hopefully.
Original Message:
The FunctionCallee type is effectively a {FunctionType*,Value*} pair,
and is a useful convenience to enable code to continue passing the
result of getOrInsertFunction() through to EmitCall, even once pointer
types lose their pointee-type.
Then:
- update the CallInst/InvokeInst instruction creation functions to
take a Callee,
- modify getOrInsertFunction to return FunctionCallee, and
- update all callers appropriately.
One area of particular note is the change to the sanitizer
code. Previously, they had been casting the result of
`getOrInsertFunction` to a `Function*` via
`checkSanitizerInterfaceFunction`, and storing that. That would report
an error if someone had already inserted a function declaraction with
a mismatching signature.
However, in general, LLVM allows for such mismatches, as
`getOrInsertFunction` will automatically insert a bitcast if
needed. As part of this cleanup, cause the sanitizer code to do the
same. (It will call its functions using the expected signature,
however they may have been declared.)
Finally, in a small number of locations, callers of
`getOrInsertFunction` actually were expecting/requiring that a brand
new function was being created. In such cases, I've switched them to
Function::Create instead.
Differential Revision: https://reviews.llvm.org/D57315
llvm-svn: 352827
2019-02-01 10:28:03 +08:00
|
|
|
SanCovTracePCIndir =
|
|
|
|
M.getOrInsertFunction(SanCovTracePCIndirName, VoidTy, IntptrTy);
|
2020-08-13 00:37:28 +08:00
|
|
|
// Make sure smaller parameters are zero-extended to i64 if required by the
|
|
|
|
// target ABI.
|
[opaque pointer types] Add a FunctionCallee wrapper type, and use it.
Recommit r352791 after tweaking DerivedTypes.h slightly, so that gcc
doesn't choke on it, hopefully.
Original Message:
The FunctionCallee type is effectively a {FunctionType*,Value*} pair,
and is a useful convenience to enable code to continue passing the
result of getOrInsertFunction() through to EmitCall, even once pointer
types lose their pointee-type.
Then:
- update the CallInst/InvokeInst instruction creation functions to
take a Callee,
- modify getOrInsertFunction to return FunctionCallee, and
- update all callers appropriately.
One area of particular note is the change to the sanitizer
code. Previously, they had been casting the result of
`getOrInsertFunction` to a `Function*` via
`checkSanitizerInterfaceFunction`, and storing that. That would report
an error if someone had already inserted a function declaraction with
a mismatching signature.
However, in general, LLVM allows for such mismatches, as
`getOrInsertFunction` will automatically insert a bitcast if
needed. As part of this cleanup, cause the sanitizer code to do the
same. (It will call its functions using the expected signature,
however they may have been declared.)
Finally, in a small number of locations, callers of
`getOrInsertFunction` actually were expecting/requiring that a brand
new function was being created. In such cases, I've switched them to
Function::Create instead.
Differential Revision: https://reviews.llvm.org/D57315
llvm-svn: 352827
2019-02-01 10:28:03 +08:00
|
|
|
AttributeList SanCovTraceCmpZeroExtAL;
|
2020-08-13 00:37:28 +08:00
|
|
|
SanCovTraceCmpZeroExtAL =
|
|
|
|
SanCovTraceCmpZeroExtAL.addParamAttribute(*C, 0, Attribute::ZExt);
|
|
|
|
SanCovTraceCmpZeroExtAL =
|
|
|
|
SanCovTraceCmpZeroExtAL.addParamAttribute(*C, 1, Attribute::ZExt);
|
[opaque pointer types] Add a FunctionCallee wrapper type, and use it.
Recommit r352791 after tweaking DerivedTypes.h slightly, so that gcc
doesn't choke on it, hopefully.
Original Message:
The FunctionCallee type is effectively a {FunctionType*,Value*} pair,
and is a useful convenience to enable code to continue passing the
result of getOrInsertFunction() through to EmitCall, even once pointer
types lose their pointee-type.
Then:
- update the CallInst/InvokeInst instruction creation functions to
take a Callee,
- modify getOrInsertFunction to return FunctionCallee, and
- update all callers appropriately.
One area of particular note is the change to the sanitizer
code. Previously, they had been casting the result of
`getOrInsertFunction` to a `Function*` via
`checkSanitizerInterfaceFunction`, and storing that. That would report
an error if someone had already inserted a function declaraction with
a mismatching signature.
However, in general, LLVM allows for such mismatches, as
`getOrInsertFunction` will automatically insert a bitcast if
needed. As part of this cleanup, cause the sanitizer code to do the
same. (It will call its functions using the expected signature,
however they may have been declared.)
Finally, in a small number of locations, callers of
`getOrInsertFunction` actually were expecting/requiring that a brand
new function was being created. In such cases, I've switched them to
Function::Create instead.
Differential Revision: https://reviews.llvm.org/D57315
llvm-svn: 352827
2019-02-01 10:28:03 +08:00
|
|
|
|
2017-04-07 04:23:57 +08:00
|
|
|
SanCovTraceCmpFunction[0] =
|
[opaque pointer types] Add a FunctionCallee wrapper type, and use it.
Recommit r352791 after tweaking DerivedTypes.h slightly, so that gcc
doesn't choke on it, hopefully.
Original Message:
The FunctionCallee type is effectively a {FunctionType*,Value*} pair,
and is a useful convenience to enable code to continue passing the
result of getOrInsertFunction() through to EmitCall, even once pointer
types lose their pointee-type.
Then:
- update the CallInst/InvokeInst instruction creation functions to
take a Callee,
- modify getOrInsertFunction to return FunctionCallee, and
- update all callers appropriately.
One area of particular note is the change to the sanitizer
code. Previously, they had been casting the result of
`getOrInsertFunction` to a `Function*` via
`checkSanitizerInterfaceFunction`, and storing that. That would report
an error if someone had already inserted a function declaraction with
a mismatching signature.
However, in general, LLVM allows for such mismatches, as
`getOrInsertFunction` will automatically insert a bitcast if
needed. As part of this cleanup, cause the sanitizer code to do the
same. (It will call its functions using the expected signature,
however they may have been declared.)
Finally, in a small number of locations, callers of
`getOrInsertFunction` actually were expecting/requiring that a brand
new function was being created. In such cases, I've switched them to
Function::Create instead.
Differential Revision: https://reviews.llvm.org/D57315
llvm-svn: 352827
2019-02-01 10:28:03 +08:00
|
|
|
M.getOrInsertFunction(SanCovTraceCmp1, SanCovTraceCmpZeroExtAL, VoidTy,
|
|
|
|
IRB.getInt8Ty(), IRB.getInt8Ty());
|
|
|
|
SanCovTraceCmpFunction[1] =
|
|
|
|
M.getOrInsertFunction(SanCovTraceCmp2, SanCovTraceCmpZeroExtAL, VoidTy,
|
|
|
|
IRB.getInt16Ty(), IRB.getInt16Ty());
|
|
|
|
SanCovTraceCmpFunction[2] =
|
|
|
|
M.getOrInsertFunction(SanCovTraceCmp4, SanCovTraceCmpZeroExtAL, VoidTy,
|
|
|
|
IRB.getInt32Ty(), IRB.getInt32Ty());
|
2017-04-07 04:23:57 +08:00
|
|
|
SanCovTraceCmpFunction[3] =
|
[opaque pointer types] Add a FunctionCallee wrapper type, and use it.
Recommit r352791 after tweaking DerivedTypes.h slightly, so that gcc
doesn't choke on it, hopefully.
Original Message:
The FunctionCallee type is effectively a {FunctionType*,Value*} pair,
and is a useful convenience to enable code to continue passing the
result of getOrInsertFunction() through to EmitCall, even once pointer
types lose their pointee-type.
Then:
- update the CallInst/InvokeInst instruction creation functions to
take a Callee,
- modify getOrInsertFunction to return FunctionCallee, and
- update all callers appropriately.
One area of particular note is the change to the sanitizer
code. Previously, they had been casting the result of
`getOrInsertFunction` to a `Function*` via
`checkSanitizerInterfaceFunction`, and storing that. That would report
an error if someone had already inserted a function declaraction with
a mismatching signature.
However, in general, LLVM allows for such mismatches, as
`getOrInsertFunction` will automatically insert a bitcast if
needed. As part of this cleanup, cause the sanitizer code to do the
same. (It will call its functions using the expected signature,
however they may have been declared.)
Finally, in a small number of locations, callers of
`getOrInsertFunction` actually were expecting/requiring that a brand
new function was being created. In such cases, I've switched them to
Function::Create instead.
Differential Revision: https://reviews.llvm.org/D57315
llvm-svn: 352827
2019-02-01 10:28:03 +08:00
|
|
|
M.getOrInsertFunction(SanCovTraceCmp8, VoidTy, Int64Ty, Int64Ty);
|
|
|
|
|
|
|
|
SanCovTraceConstCmpFunction[0] = M.getOrInsertFunction(
|
|
|
|
SanCovTraceConstCmp1, SanCovTraceCmpZeroExtAL, VoidTy, Int8Ty, Int8Ty);
|
|
|
|
SanCovTraceConstCmpFunction[1] = M.getOrInsertFunction(
|
|
|
|
SanCovTraceConstCmp2, SanCovTraceCmpZeroExtAL, VoidTy, Int16Ty, Int16Ty);
|
|
|
|
SanCovTraceConstCmpFunction[2] = M.getOrInsertFunction(
|
|
|
|
SanCovTraceConstCmp4, SanCovTraceCmpZeroExtAL, VoidTy, Int32Ty, Int32Ty);
|
2017-08-10 23:00:13 +08:00
|
|
|
SanCovTraceConstCmpFunction[3] =
|
[opaque pointer types] Add a FunctionCallee wrapper type, and use it.
Recommit r352791 after tweaking DerivedTypes.h slightly, so that gcc
doesn't choke on it, hopefully.
Original Message:
The FunctionCallee type is effectively a {FunctionType*,Value*} pair,
and is a useful convenience to enable code to continue passing the
result of getOrInsertFunction() through to EmitCall, even once pointer
types lose their pointee-type.
Then:
- update the CallInst/InvokeInst instruction creation functions to
take a Callee,
- modify getOrInsertFunction to return FunctionCallee, and
- update all callers appropriately.
One area of particular note is the change to the sanitizer
code. Previously, they had been casting the result of
`getOrInsertFunction` to a `Function*` via
`checkSanitizerInterfaceFunction`, and storing that. That would report
an error if someone had already inserted a function declaraction with
a mismatching signature.
However, in general, LLVM allows for such mismatches, as
`getOrInsertFunction` will automatically insert a bitcast if
needed. As part of this cleanup, cause the sanitizer code to do the
same. (It will call its functions using the expected signature,
however they may have been declared.)
Finally, in a small number of locations, callers of
`getOrInsertFunction` actually were expecting/requiring that a brand
new function was being created. In such cases, I've switched them to
Function::Create instead.
Differential Revision: https://reviews.llvm.org/D57315
llvm-svn: 352827
2019-02-01 10:28:03 +08:00
|
|
|
M.getOrInsertFunction(SanCovTraceConstCmp8, VoidTy, Int64Ty, Int64Ty);
|
|
|
|
|
2021-11-09 09:52:36 +08:00
|
|
|
// Loads.
|
|
|
|
SanCovLoadFunction[0] = M.getOrInsertFunction(SanCovLoad1, VoidTy, Int8PtrTy);
|
|
|
|
SanCovLoadFunction[1] =
|
|
|
|
M.getOrInsertFunction(SanCovLoad2, VoidTy, Int16PtrTy);
|
|
|
|
SanCovLoadFunction[2] =
|
|
|
|
M.getOrInsertFunction(SanCovLoad4, VoidTy, Int32PtrTy);
|
|
|
|
SanCovLoadFunction[3] =
|
|
|
|
M.getOrInsertFunction(SanCovLoad8, VoidTy, Int64PtrTy);
|
|
|
|
SanCovLoadFunction[4] =
|
|
|
|
M.getOrInsertFunction(SanCovLoad16, VoidTy, Int128PtrTy);
|
|
|
|
// Stores.
|
|
|
|
SanCovStoreFunction[0] =
|
|
|
|
M.getOrInsertFunction(SanCovStore1, VoidTy, Int8PtrTy);
|
|
|
|
SanCovStoreFunction[1] =
|
|
|
|
M.getOrInsertFunction(SanCovStore2, VoidTy, Int16PtrTy);
|
|
|
|
SanCovStoreFunction[2] =
|
|
|
|
M.getOrInsertFunction(SanCovStore4, VoidTy, Int32PtrTy);
|
|
|
|
SanCovStoreFunction[3] =
|
|
|
|
M.getOrInsertFunction(SanCovStore8, VoidTy, Int64PtrTy);
|
|
|
|
SanCovStoreFunction[4] =
|
|
|
|
M.getOrInsertFunction(SanCovStore16, VoidTy, Int128PtrTy);
|
|
|
|
|
[opaque pointer types] Add a FunctionCallee wrapper type, and use it.
Recommit r352791 after tweaking DerivedTypes.h slightly, so that gcc
doesn't choke on it, hopefully.
Original Message:
The FunctionCallee type is effectively a {FunctionType*,Value*} pair,
and is a useful convenience to enable code to continue passing the
result of getOrInsertFunction() through to EmitCall, even once pointer
types lose their pointee-type.
Then:
- update the CallInst/InvokeInst instruction creation functions to
take a Callee,
- modify getOrInsertFunction to return FunctionCallee, and
- update all callers appropriately.
One area of particular note is the change to the sanitizer
code. Previously, they had been casting the result of
`getOrInsertFunction` to a `Function*` via
`checkSanitizerInterfaceFunction`, and storing that. That would report
an error if someone had already inserted a function declaraction with
a mismatching signature.
However, in general, LLVM allows for such mismatches, as
`getOrInsertFunction` will automatically insert a bitcast if
needed. As part of this cleanup, cause the sanitizer code to do the
same. (It will call its functions using the expected signature,
however they may have been declared.)
Finally, in a small number of locations, callers of
`getOrInsertFunction` actually were expecting/requiring that a brand
new function was being created. In such cases, I've switched them to
Function::Create instead.
Differential Revision: https://reviews.llvm.org/D57315
llvm-svn: 352827
2019-02-01 10:28:03 +08:00
|
|
|
{
|
|
|
|
AttributeList AL;
|
2020-08-13 00:37:28 +08:00
|
|
|
AL = AL.addParamAttribute(*C, 0, Attribute::ZExt);
|
[opaque pointer types] Add a FunctionCallee wrapper type, and use it.
Recommit r352791 after tweaking DerivedTypes.h slightly, so that gcc
doesn't choke on it, hopefully.
Original Message:
The FunctionCallee type is effectively a {FunctionType*,Value*} pair,
and is a useful convenience to enable code to continue passing the
result of getOrInsertFunction() through to EmitCall, even once pointer
types lose their pointee-type.
Then:
- update the CallInst/InvokeInst instruction creation functions to
take a Callee,
- modify getOrInsertFunction to return FunctionCallee, and
- update all callers appropriately.
One area of particular note is the change to the sanitizer
code. Previously, they had been casting the result of
`getOrInsertFunction` to a `Function*` via
`checkSanitizerInterfaceFunction`, and storing that. That would report
an error if someone had already inserted a function declaraction with
a mismatching signature.
However, in general, LLVM allows for such mismatches, as
`getOrInsertFunction` will automatically insert a bitcast if
needed. As part of this cleanup, cause the sanitizer code to do the
same. (It will call its functions using the expected signature,
however they may have been declared.)
Finally, in a small number of locations, callers of
`getOrInsertFunction` actually were expecting/requiring that a brand
new function was being created. In such cases, I've switched them to
Function::Create instead.
Differential Revision: https://reviews.llvm.org/D57315
llvm-svn: 352827
2019-02-01 10:28:03 +08:00
|
|
|
SanCovTraceDivFunction[0] =
|
|
|
|
M.getOrInsertFunction(SanCovTraceDiv4, AL, VoidTy, IRB.getInt32Ty());
|
|
|
|
}
|
2017-04-07 04:23:57 +08:00
|
|
|
SanCovTraceDivFunction[1] =
|
[opaque pointer types] Add a FunctionCallee wrapper type, and use it.
Recommit r352791 after tweaking DerivedTypes.h slightly, so that gcc
doesn't choke on it, hopefully.
Original Message:
The FunctionCallee type is effectively a {FunctionType*,Value*} pair,
and is a useful convenience to enable code to continue passing the
result of getOrInsertFunction() through to EmitCall, even once pointer
types lose their pointee-type.
Then:
- update the CallInst/InvokeInst instruction creation functions to
take a Callee,
- modify getOrInsertFunction to return FunctionCallee, and
- update all callers appropriately.
One area of particular note is the change to the sanitizer
code. Previously, they had been casting the result of
`getOrInsertFunction` to a `Function*` via
`checkSanitizerInterfaceFunction`, and storing that. That would report
an error if someone had already inserted a function declaraction with
a mismatching signature.
However, in general, LLVM allows for such mismatches, as
`getOrInsertFunction` will automatically insert a bitcast if
needed. As part of this cleanup, cause the sanitizer code to do the
same. (It will call its functions using the expected signature,
however they may have been declared.)
Finally, in a small number of locations, callers of
`getOrInsertFunction` actually were expecting/requiring that a brand
new function was being created. In such cases, I've switched them to
Function::Create instead.
Differential Revision: https://reviews.llvm.org/D57315
llvm-svn: 352827
2019-02-01 10:28:03 +08:00
|
|
|
M.getOrInsertFunction(SanCovTraceDiv8, VoidTy, Int64Ty);
|
2017-04-07 04:23:57 +08:00
|
|
|
SanCovTraceGepFunction =
|
[opaque pointer types] Add a FunctionCallee wrapper type, and use it.
Recommit r352791 after tweaking DerivedTypes.h slightly, so that gcc
doesn't choke on it, hopefully.
Original Message:
The FunctionCallee type is effectively a {FunctionType*,Value*} pair,
and is a useful convenience to enable code to continue passing the
result of getOrInsertFunction() through to EmitCall, even once pointer
types lose their pointee-type.
Then:
- update the CallInst/InvokeInst instruction creation functions to
take a Callee,
- modify getOrInsertFunction to return FunctionCallee, and
- update all callers appropriately.
One area of particular note is the change to the sanitizer
code. Previously, they had been casting the result of
`getOrInsertFunction` to a `Function*` via
`checkSanitizerInterfaceFunction`, and storing that. That would report
an error if someone had already inserted a function declaraction with
a mismatching signature.
However, in general, LLVM allows for such mismatches, as
`getOrInsertFunction` will automatically insert a bitcast if
needed. As part of this cleanup, cause the sanitizer code to do the
same. (It will call its functions using the expected signature,
however they may have been declared.)
Finally, in a small number of locations, callers of
`getOrInsertFunction` actually were expecting/requiring that a brand
new function was being created. In such cases, I've switched them to
Function::Create instead.
Differential Revision: https://reviews.llvm.org/D57315
llvm-svn: 352827
2019-02-01 10:28:03 +08:00
|
|
|
M.getOrInsertFunction(SanCovTraceGep, VoidTy, IntptrTy);
|
2015-07-31 09:33:06 +08:00
|
|
|
SanCovTraceSwitchFunction =
|
[opaque pointer types] Add a FunctionCallee wrapper type, and use it.
Recommit r352791 after tweaking DerivedTypes.h slightly, so that gcc
doesn't choke on it, hopefully.
Original Message:
The FunctionCallee type is effectively a {FunctionType*,Value*} pair,
and is a useful convenience to enable code to continue passing the
result of getOrInsertFunction() through to EmitCall, even once pointer
types lose their pointee-type.
Then:
- update the CallInst/InvokeInst instruction creation functions to
take a Callee,
- modify getOrInsertFunction to return FunctionCallee, and
- update all callers appropriately.
One area of particular note is the change to the sanitizer
code. Previously, they had been casting the result of
`getOrInsertFunction` to a `Function*` via
`checkSanitizerInterfaceFunction`, and storing that. That would report
an error if someone had already inserted a function declaraction with
a mismatching signature.
However, in general, LLVM allows for such mismatches, as
`getOrInsertFunction` will automatically insert a bitcast if
needed. As part of this cleanup, cause the sanitizer code to do the
same. (It will call its functions using the expected signature,
however they may have been declared.)
Finally, in a small number of locations, callers of
`getOrInsertFunction` actually were expecting/requiring that a brand
new function was being created. In such cases, I've switched them to
Function::Create instead.
Differential Revision: https://reviews.llvm.org/D57315
llvm-svn: 352827
2019-02-01 10:28:03 +08:00
|
|
|
M.getOrInsertFunction(SanCovTraceSwitchName, VoidTy, Int64Ty, Int64PtrTy);
|
2017-08-19 02:43:30 +08:00
|
|
|
|
|
|
|
Constant *SanCovLowestStackConstant =
|
|
|
|
M.getOrInsertGlobal(SanCovLowestStackName, IntptrTy);
|
2019-02-05 06:06:30 +08:00
|
|
|
SanCovLowestStack = dyn_cast<GlobalVariable>(SanCovLowestStackConstant);
|
2021-06-30 02:29:10 +08:00
|
|
|
if (!SanCovLowestStack || SanCovLowestStack->getValueType() != IntptrTy) {
|
2019-09-05 04:30:29 +08:00
|
|
|
C->emitError(StringRef("'") + SanCovLowestStackName +
|
|
|
|
"' should not be declared by the user");
|
|
|
|
return true;
|
|
|
|
}
|
2017-08-23 05:28:29 +08:00
|
|
|
SanCovLowestStack->setThreadLocalMode(
|
|
|
|
GlobalValue::ThreadLocalMode::InitialExecTLSModel);
|
|
|
|
if (Options.StackDepth && !SanCovLowestStack->isDeclaration())
|
|
|
|
SanCovLowestStack->setInitializer(Constant::getAllOnesValue(IntptrTy));
|
2017-08-19 02:43:30 +08:00
|
|
|
|
[opaque pointer types] Add a FunctionCallee wrapper type, and use it.
Recommit r352791 after tweaking DerivedTypes.h slightly, so that gcc
doesn't choke on it, hopefully.
Original Message:
The FunctionCallee type is effectively a {FunctionType*,Value*} pair,
and is a useful convenience to enable code to continue passing the
result of getOrInsertFunction() through to EmitCall, even once pointer
types lose their pointee-type.
Then:
- update the CallInst/InvokeInst instruction creation functions to
take a Callee,
- modify getOrInsertFunction to return FunctionCallee, and
- update all callers appropriately.
One area of particular note is the change to the sanitizer
code. Previously, they had been casting the result of
`getOrInsertFunction` to a `Function*` via
`checkSanitizerInterfaceFunction`, and storing that. That would report
an error if someone had already inserted a function declaraction with
a mismatching signature.
However, in general, LLVM allows for such mismatches, as
`getOrInsertFunction` will automatically insert a bitcast if
needed. As part of this cleanup, cause the sanitizer code to do the
same. (It will call its functions using the expected signature,
however they may have been declared.)
Finally, in a small number of locations, callers of
`getOrInsertFunction` actually were expecting/requiring that a brand
new function was being created. In such cases, I've switched them to
Function::Create instead.
Differential Revision: https://reviews.llvm.org/D57315
llvm-svn: 352827
2019-02-01 10:28:03 +08:00
|
|
|
SanCovTracePC = M.getOrInsertFunction(SanCovTracePCName, VoidTy);
|
|
|
|
SanCovTracePCGuard =
|
|
|
|
M.getOrInsertFunction(SanCovTracePCGuardName, VoidTy, Int32PtrTy);
|
2014-11-19 08:22:58 +08:00
|
|
|
|
2019-09-05 04:30:29 +08:00
|
|
|
for (auto &F : M)
|
|
|
|
instrumentFunction(F, DTCallback, PDTCallback);
|
|
|
|
|
|
|
|
Function *Ctor = nullptr;
|
|
|
|
|
|
|
|
if (FunctionGuardArray)
|
|
|
|
Ctor = CreateInitCallsForSections(M, SanCovModuleCtorTracePcGuardName,
|
2021-06-29 03:12:12 +08:00
|
|
|
SanCovTracePCGuardInitName, Int32Ty,
|
2019-09-05 04:30:29 +08:00
|
|
|
SanCovGuardsSectionName);
|
|
|
|
if (Function8bitCounterArray)
|
|
|
|
Ctor = CreateInitCallsForSections(M, SanCovModuleCtor8bitCountersName,
|
2021-06-29 03:12:12 +08:00
|
|
|
SanCov8bitCountersInitName, Int8Ty,
|
2019-09-05 04:30:29 +08:00
|
|
|
SanCovCountersSectionName);
|
2020-04-09 13:02:41 +08:00
|
|
|
if (FunctionBoolArray) {
|
|
|
|
Ctor = CreateInitCallsForSections(M, SanCovModuleCtorBoolFlagName,
|
2021-06-29 03:12:12 +08:00
|
|
|
SanCovBoolFlagInitName, Int1Ty,
|
2020-04-09 13:02:41 +08:00
|
|
|
SanCovBoolFlagSectionName);
|
|
|
|
}
|
2019-09-05 04:30:29 +08:00
|
|
|
if (Ctor && Options.PCTable) {
|
2021-06-29 03:12:12 +08:00
|
|
|
auto SecStartEnd = CreateSecStartEnd(M, SanCovPCsSectionName, IntptrTy);
|
2019-09-05 04:30:29 +08:00
|
|
|
FunctionCallee InitFunction = declareSanitizerInitFunction(
|
|
|
|
M, SanCovPCsInitName, {IntptrPtrTy, IntptrPtrTy});
|
|
|
|
IRBuilder<> IRBCtor(Ctor->getEntryBlock().getTerminator());
|
|
|
|
IRBCtor.CreateCall(InitFunction, {SecStartEnd.first, SecStartEnd.second});
|
|
|
|
}
|
2021-02-27 03:10:02 +08:00
|
|
|
appendToUsed(M, GlobalsToAppendToUsed);
|
2018-06-16 04:12:58 +08:00
|
|
|
appendToCompilerUsed(M, GlobalsToAppendToCompilerUsed);
|
2019-09-05 04:30:29 +08:00
|
|
|
return true;
|
2014-11-12 06:14:37 +08:00
|
|
|
}
|
|
|
|
|
2016-03-24 07:15:03 +08:00
|
|
|
// True if block has successors and it dominates all of them.
|
|
|
|
static bool isFullDominator(const BasicBlock *BB, const DominatorTree *DT) {
|
2020-11-16 11:01:20 +08:00
|
|
|
if (succ_empty(BB))
|
2016-03-24 07:15:03 +08:00
|
|
|
return false;
|
2016-02-26 09:17:22 +08:00
|
|
|
|
2020-11-17 13:45:21 +08:00
|
|
|
return llvm::all_of(successors(BB), [&](const BasicBlock *SUCC) {
|
2020-11-16 11:01:20 +08:00
|
|
|
return DT->dominates(BB, SUCC);
|
|
|
|
});
|
2016-03-24 07:15:03 +08:00
|
|
|
}
|
|
|
|
|
2017-05-24 08:29:12 +08:00
|
|
|
// True if block has predecessors and it postdominates all of them.
|
|
|
|
static bool isFullPostDominator(const BasicBlock *BB,
|
|
|
|
const PostDominatorTree *PDT) {
|
2020-11-16 11:01:20 +08:00
|
|
|
if (pred_empty(BB))
|
2017-05-24 08:29:12 +08:00
|
|
|
return false;
|
|
|
|
|
2020-11-17 13:45:21 +08:00
|
|
|
return llvm::all_of(predecessors(BB), [&](const BasicBlock *PRED) {
|
2020-11-16 11:01:20 +08:00
|
|
|
return PDT->dominates(BB, PRED);
|
|
|
|
});
|
2017-05-24 08:29:12 +08:00
|
|
|
}
|
|
|
|
|
2017-05-06 07:14:40 +08:00
|
|
|
static bool shouldInstrumentBlock(const Function &F, const BasicBlock *BB,
|
|
|
|
const DominatorTree *DT,
|
2017-05-24 08:29:12 +08:00
|
|
|
const PostDominatorTree *PDT,
|
2017-05-06 07:14:40 +08:00
|
|
|
const SanitizerCoverageOptions &Options) {
|
[sancov] Instrument reachable blocks that end in unreachable
Summary:
These sorts of blocks often contain calls to noreturn functions, like
longjmp, throw, or trap. If they don't end the program, they are
"interesting" from the perspective of sanitizer coverage, so we should
instrument them. This was discussed in https://reviews.llvm.org/D57982.
Reviewers: kcc, vitalybuka
Subscribers: llvm-commits, craig.topper, efriedma, morehouse, hiraditya
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58740
llvm-svn: 355152
2019-03-01 06:54:30 +08:00
|
|
|
// Don't insert coverage for blocks containing nothing but unreachable: we
|
|
|
|
// will never call __sanitizer_cov() for them, so counting them in
|
2016-09-30 01:43:24 +08:00
|
|
|
// NumberOfInstrumentedBlocks() might complicate calculation of code coverage
|
|
|
|
// percentage. Also, unreachable instructions frequently have no debug
|
|
|
|
// locations.
|
[sancov] Instrument reachable blocks that end in unreachable
Summary:
These sorts of blocks often contain calls to noreturn functions, like
longjmp, throw, or trap. If they don't end the program, they are
"interesting" from the perspective of sanitizer coverage, so we should
instrument them. This was discussed in https://reviews.llvm.org/D57982.
Reviewers: kcc, vitalybuka
Subscribers: llvm-commits, craig.topper, efriedma, morehouse, hiraditya
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58740
llvm-svn: 355152
2019-03-01 06:54:30 +08:00
|
|
|
if (isa<UnreachableInst>(BB->getFirstNonPHIOrDbgOrLifetime()))
|
2016-09-30 01:43:24 +08:00
|
|
|
return false;
|
|
|
|
|
2017-03-24 07:30:41 +08:00
|
|
|
// Don't insert coverage into blocks without a valid insertion point
|
|
|
|
// (catchswitch blocks).
|
|
|
|
if (BB->getFirstInsertionPt() == BB->end())
|
|
|
|
return false;
|
|
|
|
|
2017-05-06 07:14:40 +08:00
|
|
|
if (Options.NoPrune || &F.getEntryBlock() == BB)
|
2016-03-24 07:15:03 +08:00
|
|
|
return true;
|
|
|
|
|
2017-07-25 10:07:38 +08:00
|
|
|
if (Options.CoverageType == SanitizerCoverageOptions::SCK_Function &&
|
|
|
|
&F.getEntryBlock() != BB)
|
|
|
|
return false;
|
|
|
|
|
2017-05-25 09:41:46 +08:00
|
|
|
// Do not instrument full dominators, or full post-dominators with multiple
|
|
|
|
// predecessors.
|
|
|
|
return !isFullDominator(BB, DT)
|
|
|
|
&& !(isFullPostDominator(BB, PDT) && !BB->getSinglePredecessor());
|
2016-02-26 09:17:22 +08:00
|
|
|
}
|
|
|
|
|
2019-02-01 07:43:00 +08:00
|
|
|
|
|
|
|
// Returns true iff From->To is a backedge.
|
|
|
|
// A twist here is that we treat From->To as a backedge if
|
|
|
|
// * To dominates From or
|
|
|
|
// * To->UniqueSuccessor dominates From
|
|
|
|
static bool IsBackEdge(BasicBlock *From, BasicBlock *To,
|
|
|
|
const DominatorTree *DT) {
|
|
|
|
if (DT->dominates(To, From))
|
|
|
|
return true;
|
|
|
|
if (auto Next = To->getUniqueSuccessor())
|
|
|
|
if (DT->dominates(Next, From))
|
|
|
|
return true;
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
|
|
|
// Prunes uninteresting Cmp instrumentation:
|
|
|
|
// * CMP instructions that feed into loop backedge branch.
|
|
|
|
//
|
|
|
|
// Note that Cmp pruning is controlled by the same flag as the
|
|
|
|
// BB pruning.
|
|
|
|
static bool IsInterestingCmp(ICmpInst *CMP, const DominatorTree *DT,
|
|
|
|
const SanitizerCoverageOptions &Options) {
|
|
|
|
if (!Options.NoPrune)
|
|
|
|
if (CMP->hasOneUse())
|
|
|
|
if (auto BR = dyn_cast<BranchInst>(CMP->user_back()))
|
|
|
|
for (BasicBlock *B : BR->successors())
|
|
|
|
if (IsBackEdge(BR->getParent(), B, DT))
|
|
|
|
return false;
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
2019-09-05 04:30:29 +08:00
|
|
|
void ModuleSanitizerCoverage::instrumentFunction(
|
|
|
|
Function &F, DomTreeCallback DTCallback, PostDomTreeCallback PDTCallback) {
|
|
|
|
if (F.empty())
|
|
|
|
return;
|
|
|
|
if (F.getName().find(".module_ctor") != std::string::npos)
|
|
|
|
return; // Should not instrument sanitizer init functions.
|
|
|
|
if (F.getName().startswith("__sanitizer_"))
|
|
|
|
return; // Don't instrument __sanitizer_* callbacks.
|
|
|
|
// Don't touch available_externally functions, their actual body is elewhere.
|
|
|
|
if (F.getLinkage() == GlobalValue::AvailableExternallyLinkage)
|
|
|
|
return;
|
|
|
|
// Don't instrument MSVC CRT configuration helpers. They may run before normal
|
|
|
|
// initialization.
|
|
|
|
if (F.getName() == "__local_stdio_printf_options" ||
|
|
|
|
F.getName() == "__local_stdio_scanf_options")
|
|
|
|
return;
|
|
|
|
if (isa<UnreachableInst>(F.getEntryBlock().getTerminator()))
|
|
|
|
return;
|
|
|
|
// Don't instrument functions using SEH for now. Splitting basic blocks like
|
|
|
|
// we do for coverage breaks WinEHPrepare.
|
|
|
|
// FIXME: Remove this when SEH no longer uses landingpad pattern matching.
|
|
|
|
if (F.hasPersonalityFn() &&
|
|
|
|
isAsynchronousEHPersonality(classifyEHPersonality(F.getPersonalityFn())))
|
|
|
|
return;
|
2020-06-20 13:22:47 +08:00
|
|
|
if (Allowlist && !Allowlist->inSection("coverage", "fun", F.getName()))
|
Implement `-fsanitize-coverage-whitelist` and `-fsanitize-coverage-blacklist` for clang
Summary:
This commit adds two command-line options to clang.
These options let the user decide which functions will receive SanitizerCoverage instrumentation.
This is most useful in the libFuzzer use case, where it enables targeted coverage-guided fuzzing.
Patch by Yannis Juglaret of DGA-MI, Rennes, France
libFuzzer tests its target against an evolving corpus, and relies on SanitizerCoverage instrumentation to collect the code coverage information that drives corpus evolution. Currently, libFuzzer collects such information for all functions of the target under test, and adds to the corpus every mutated sample that finds a new code coverage path in any function of the target. We propose instead to let the user specify which functions' code coverage information is relevant for building the upcoming fuzzing campaign's corpus. To this end, we add two new command line options for clang, enabling targeted coverage-guided fuzzing with libFuzzer. We see targeted coverage guided fuzzing as a simple way to leverage libFuzzer for big targets with thousands of functions or multiple dependencies. We publish this patch as work from DGA-MI of Rennes, France, with proper authorization from the hierarchy.
Targeted coverage-guided fuzzing can accelerate bug finding for two reasons. First, the compiler will avoid costly instrumentation for non-relevant functions, accelerating fuzzer execution for each call to any of these functions. Second, the built fuzzer will produce and use a more accurate corpus, because it will not keep the samples that find new coverage paths in non-relevant functions.
The two new command line options are `-fsanitize-coverage-whitelist` and `-fsanitize-coverage-blacklist`. They accept files in the same format as the existing `-fsanitize-blacklist` option <https://clang.llvm.org/docs/SanitizerSpecialCaseList.html#format>. The new options influence SanitizerCoverage so that it will only instrument a subset of the functions in the target. We explain these options in detail in `clang/docs/SanitizerCoverage.rst`.
Consider now the woff2 fuzzing example from the libFuzzer tutorial <https://github.com/google/fuzzer-test-suite/blob/master/tutorial/libFuzzerTutorial.md>. We are aware that we cannot conclude much from this example because mutating compressed data is generally a bad idea, but let us use it anyway as an illustration for its simplicity. Let us use an empty blacklist together with one of the three following whitelists:
```
# (a)
src:*
fun:*
# (b)
src:SRC/*
fun:*
# (c)
src:SRC/src/woff2_dec.cc
fun:*
```
Running the built fuzzers shows how many instrumentation points the compiler adds, the fuzzer will output //XXX PCs//. Whitelist (a) is the instrument-everything whitelist, it produces 11912 instrumentation points. Whitelist (b) focuses coverage to instrument woff2 source code only, ignoring the dependency code for brotli (de)compression; it produces 3984 instrumented instrumentation points. Whitelist (c) focuses coverage to only instrument functions in the main file that deals with WOFF2 to TTF conversion, resulting in 1056 instrumentation points.
For experimentation purposes, we ran each fuzzer approximately 100 times, single process, with the initial corpus provided in the tutorial. We let the fuzzer run until it either found the heap buffer overflow or went out of memory. On this simple example, whitelists (b) and (c) found the heap buffer overflow more reliably and 5x faster than whitelist (a). The average execution times when finding the heap buffer overflow were as follows: (a) 904 s, (b) 156 s, and (c) 176 s.
We explain these results by the fact that WOFF2 to TTF conversion calls the brotli decompression algorithm's functions, which are mostly irrelevant for finding bugs in WOFF2 font reconstruction but nevertheless instrumented and used by whitelist (a) to guide fuzzing. This results in longer execution time for these functions and a partially irrelevant corpus. Contrary to whitelist (a), whitelists (b) and (c) will execute brotli-related functions without instrumentation overhead, and ignore new code paths found in them. This results in faster bug finding for WOFF2 font reconstruction.
The results for whitelist (b) are similar to the ones for whitelist (c). Indeed, WOFF2 to TTF conversion calls functions that are mostly located in SRC/src/woff2_dec.cc. The 2892 extra instrumentation points allowed by whitelist (b) do not tamper with bug finding, even though they are mostly irrelevant, simply because most of these functions do not get called. We get a slightly faster average time for bug finding with whitelist (b), which might indicate that some of the extra instrumentation points are actually relevant, or might just be random noise.
Reviewers: kcc, morehouse, vitalybuka
Reviewed By: morehouse, vitalybuka
Subscribers: pratyai, vitalybuka, eternalsakura, xwlin222, dende, srhines, kubamracek, #sanitizers, lebedev.ri, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #sanitizers, #llvm
Differential Revision: https://reviews.llvm.org/D63616
2020-04-11 01:42:41 +08:00
|
|
|
return;
|
2020-06-20 13:22:47 +08:00
|
|
|
if (Blocklist && Blocklist->inSection("coverage", "fun", F.getName()))
|
Implement `-fsanitize-coverage-whitelist` and `-fsanitize-coverage-blacklist` for clang
Summary:
This commit adds two command-line options to clang.
These options let the user decide which functions will receive SanitizerCoverage instrumentation.
This is most useful in the libFuzzer use case, where it enables targeted coverage-guided fuzzing.
Patch by Yannis Juglaret of DGA-MI, Rennes, France
libFuzzer tests its target against an evolving corpus, and relies on SanitizerCoverage instrumentation to collect the code coverage information that drives corpus evolution. Currently, libFuzzer collects such information for all functions of the target under test, and adds to the corpus every mutated sample that finds a new code coverage path in any function of the target. We propose instead to let the user specify which functions' code coverage information is relevant for building the upcoming fuzzing campaign's corpus. To this end, we add two new command line options for clang, enabling targeted coverage-guided fuzzing with libFuzzer. We see targeted coverage guided fuzzing as a simple way to leverage libFuzzer for big targets with thousands of functions or multiple dependencies. We publish this patch as work from DGA-MI of Rennes, France, with proper authorization from the hierarchy.
Targeted coverage-guided fuzzing can accelerate bug finding for two reasons. First, the compiler will avoid costly instrumentation for non-relevant functions, accelerating fuzzer execution for each call to any of these functions. Second, the built fuzzer will produce and use a more accurate corpus, because it will not keep the samples that find new coverage paths in non-relevant functions.
The two new command line options are `-fsanitize-coverage-whitelist` and `-fsanitize-coverage-blacklist`. They accept files in the same format as the existing `-fsanitize-blacklist` option <https://clang.llvm.org/docs/SanitizerSpecialCaseList.html#format>. The new options influence SanitizerCoverage so that it will only instrument a subset of the functions in the target. We explain these options in detail in `clang/docs/SanitizerCoverage.rst`.
Consider now the woff2 fuzzing example from the libFuzzer tutorial <https://github.com/google/fuzzer-test-suite/blob/master/tutorial/libFuzzerTutorial.md>. We are aware that we cannot conclude much from this example because mutating compressed data is generally a bad idea, but let us use it anyway as an illustration for its simplicity. Let us use an empty blacklist together with one of the three following whitelists:
```
# (a)
src:*
fun:*
# (b)
src:SRC/*
fun:*
# (c)
src:SRC/src/woff2_dec.cc
fun:*
```
Running the built fuzzers shows how many instrumentation points the compiler adds, the fuzzer will output //XXX PCs//. Whitelist (a) is the instrument-everything whitelist, it produces 11912 instrumentation points. Whitelist (b) focuses coverage to instrument woff2 source code only, ignoring the dependency code for brotli (de)compression; it produces 3984 instrumented instrumentation points. Whitelist (c) focuses coverage to only instrument functions in the main file that deals with WOFF2 to TTF conversion, resulting in 1056 instrumentation points.
For experimentation purposes, we ran each fuzzer approximately 100 times, single process, with the initial corpus provided in the tutorial. We let the fuzzer run until it either found the heap buffer overflow or went out of memory. On this simple example, whitelists (b) and (c) found the heap buffer overflow more reliably and 5x faster than whitelist (a). The average execution times when finding the heap buffer overflow were as follows: (a) 904 s, (b) 156 s, and (c) 176 s.
We explain these results by the fact that WOFF2 to TTF conversion calls the brotli decompression algorithm's functions, which are mostly irrelevant for finding bugs in WOFF2 font reconstruction but nevertheless instrumented and used by whitelist (a) to guide fuzzing. This results in longer execution time for these functions and a partially irrelevant corpus. Contrary to whitelist (a), whitelists (b) and (c) will execute brotli-related functions without instrumentation overhead, and ignore new code paths found in them. This results in faster bug finding for WOFF2 font reconstruction.
The results for whitelist (b) are similar to the ones for whitelist (c). Indeed, WOFF2 to TTF conversion calls functions that are mostly located in SRC/src/woff2_dec.cc. The 2892 extra instrumentation points allowed by whitelist (b) do not tamper with bug finding, even though they are mostly irrelevant, simply because most of these functions do not get called. We get a slightly faster average time for bug finding with whitelist (b), which might indicate that some of the extra instrumentation points are actually relevant, or might just be random noise.
Reviewers: kcc, morehouse, vitalybuka
Reviewed By: morehouse, vitalybuka
Subscribers: pratyai, vitalybuka, eternalsakura, xwlin222, dende, srhines, kubamracek, #sanitizers, lebedev.ri, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #sanitizers, #llvm
Differential Revision: https://reviews.llvm.org/D63616
2020-04-11 01:42:41 +08:00
|
|
|
return;
|
2021-05-25 18:29:00 +08:00
|
|
|
if (F.hasFnAttribute(Attribute::NoSanitizeCoverage))
|
|
|
|
return;
|
2015-05-07 09:00:31 +08:00
|
|
|
if (Options.CoverageType >= SanitizerCoverageOptions::SCK_Edge)
|
2019-03-13 02:20:25 +08:00
|
|
|
SplitAllCriticalEdges(F, CriticalEdgeSplittingOptions().setIgnoreUnreachableDests());
|
2016-03-19 07:29:29 +08:00
|
|
|
SmallVector<Instruction *, 8> IndirCalls;
|
2016-02-26 09:17:22 +08:00
|
|
|
SmallVector<BasicBlock *, 16> BlocksToInstrument;
|
2016-03-19 07:29:29 +08:00
|
|
|
SmallVector<Instruction *, 8> CmpTraceTargets;
|
|
|
|
SmallVector<Instruction *, 8> SwitchTraceTargets;
|
2016-08-30 09:12:10 +08:00
|
|
|
SmallVector<BinaryOperator *, 8> DivTraceTargets;
|
|
|
|
SmallVector<GetElementPtrInst *, 8> GepTraceTargets;
|
2021-11-09 09:52:36 +08:00
|
|
|
SmallVector<LoadInst *, 8> Loads;
|
|
|
|
SmallVector<StoreInst *, 8> Stores;
|
2016-02-26 09:17:22 +08:00
|
|
|
|
2019-09-05 04:30:29 +08:00
|
|
|
const DominatorTree *DT = DTCallback(F);
|
|
|
|
const PostDominatorTree *PDT = PDTCallback(F);
|
2017-08-31 06:49:31 +08:00
|
|
|
bool IsLeafFunc = true;
|
2016-03-22 07:08:16 +08:00
|
|
|
|
2014-11-12 06:14:37 +08:00
|
|
|
for (auto &BB : F) {
|
2017-05-24 08:29:12 +08:00
|
|
|
if (shouldInstrumentBlock(F, &BB, DT, PDT, Options))
|
2016-02-26 09:17:22 +08:00
|
|
|
BlocksToInstrument.push_back(&BB);
|
2015-03-21 09:29:36 +08:00
|
|
|
for (auto &Inst : BB) {
|
2015-05-07 09:00:31 +08:00
|
|
|
if (Options.IndirectCalls) {
|
2020-04-22 12:56:04 +08:00
|
|
|
CallBase *CB = dyn_cast<CallBase>(&Inst);
|
2022-03-08 04:43:37 +08:00
|
|
|
if (CB && CB->isIndirectCall())
|
2014-11-12 06:14:37 +08:00
|
|
|
IndirCalls.push_back(&Inst);
|
|
|
|
}
|
2015-07-31 09:33:06 +08:00
|
|
|
if (Options.TraceCmp) {
|
2019-02-01 07:43:00 +08:00
|
|
|
if (ICmpInst *CMP = dyn_cast<ICmpInst>(&Inst))
|
|
|
|
if (IsInterestingCmp(CMP, DT, Options))
|
|
|
|
CmpTraceTargets.push_back(&Inst);
|
2015-07-31 09:33:06 +08:00
|
|
|
if (isa<SwitchInst>(&Inst))
|
|
|
|
SwitchTraceTargets.push_back(&Inst);
|
|
|
|
}
|
2016-08-30 09:12:10 +08:00
|
|
|
if (Options.TraceDiv)
|
|
|
|
if (BinaryOperator *BO = dyn_cast<BinaryOperator>(&Inst))
|
|
|
|
if (BO->getOpcode() == Instruction::SDiv ||
|
|
|
|
BO->getOpcode() == Instruction::UDiv)
|
|
|
|
DivTraceTargets.push_back(BO);
|
|
|
|
if (Options.TraceGep)
|
|
|
|
if (GetElementPtrInst *GEP = dyn_cast<GetElementPtrInst>(&Inst))
|
|
|
|
GepTraceTargets.push_back(GEP);
|
2021-11-09 09:52:36 +08:00
|
|
|
if (Options.TraceLoads)
|
|
|
|
if (LoadInst *LI = dyn_cast<LoadInst>(&Inst))
|
|
|
|
Loads.push_back(LI);
|
|
|
|
if (Options.TraceStores)
|
|
|
|
if (StoreInst *SI = dyn_cast<StoreInst>(&Inst))
|
|
|
|
Stores.push_back(SI);
|
2017-08-31 06:49:31 +08:00
|
|
|
if (Options.StackDepth)
|
|
|
|
if (isa<InvokeInst>(Inst) ||
|
|
|
|
(isa<CallInst>(Inst) && !isa<IntrinsicInst>(Inst)))
|
|
|
|
IsLeafFunc = false;
|
|
|
|
}
|
2014-11-12 06:14:37 +08:00
|
|
|
}
|
2016-02-26 09:17:22 +08:00
|
|
|
|
2017-08-31 06:49:31 +08:00
|
|
|
InjectCoverage(F, BlocksToInstrument, IsLeafFunc);
|
2015-03-21 09:29:36 +08:00
|
|
|
InjectCoverageForIndirectCalls(F, IndirCalls);
|
|
|
|
InjectTraceForCmp(F, CmpTraceTargets);
|
2015-07-31 09:33:06 +08:00
|
|
|
InjectTraceForSwitch(F, SwitchTraceTargets);
|
2016-08-30 09:12:10 +08:00
|
|
|
InjectTraceForDiv(F, DivTraceTargets);
|
|
|
|
InjectTraceForGep(F, GepTraceTargets);
|
2021-11-09 09:52:36 +08:00
|
|
|
InjectTraceForLoadsAndStores(F, Loads, Stores);
|
2014-11-12 06:14:37 +08:00
|
|
|
}
|
2017-06-03 07:13:44 +08:00
|
|
|
|
2019-09-05 04:30:29 +08:00
|
|
|
GlobalVariable *ModuleSanitizerCoverage::CreateFunctionLocalArrayInSection(
|
2017-06-03 07:13:44 +08:00
|
|
|
size_t NumElements, Function &F, Type *Ty, const char *Section) {
|
|
|
|
ArrayType *ArrayTy = ArrayType::get(Ty, NumElements);
|
|
|
|
auto Array = new GlobalVariable(
|
|
|
|
*CurModule, ArrayTy, false, GlobalVariable::PrivateLinkage,
|
|
|
|
Constant::getNullValue(ArrayTy), "__sancov_gen_");
|
2018-10-13 07:21:48 +08:00
|
|
|
|
[SanitizerCoverage] Drop !associated on metadata sections
In SanitizerCoverage, the metadata sections (`__sancov_guards`,
`__sancov_cntrs`, `__sancov_bools`) are referenced by functions. After
inlining, such a `__sancov_*` section can be referenced by more than one
functions, but its sh_link still refers to the original function's section.
(Note: a SHF_LINK_ORDER section referenced by a section other than its linked-to
section violates the invariant.)
If the original function's section is discarded (e.g. LTO internalization +
`ld.lld --gc-sections`), ld.lld may report a `sh_link points to discarded section` error.
This above reasoning means that `!associated` is not appropriate to be called by
an inlinable function. Non-interposable functions are inline candidates, so we
have to drop `!associated`. A `__sancov_pcs` is not referenced by other sections
but is expected to parallel a metadata section, so we have to make sure the two
sections are retained or discarded at the same time. A section group does the
trick. (Note: we have a module ctor, so `getUniqueModuleId` guarantees to
return a non-empty string, and `GetOrCreateFunctionComdat` guarantees to return
non-null.)
For interposable functions, we could keep using `!associated`, but
LTO can change the linkage to `internal` and allow such functions to be inlinable,
so we have to drop `!associated`, too. To not interfere with section
group resolution, we need to use the `noduplicates` variant (section group flag 0).
(This allows us to get rid of the ModuleID parameter.)
In -fno-pie and -fpie code (mostly dso_local), instrumented interposable
functions have WeakAny/LinkOnceAny linkages, which are rare. So the
section group header overload should be low.
This patch does not change the object file output for COFF (where `!associated` is ignored).
Reviewed By: morehouse, rnk, vitalybuka
Differential Revision: https://reviews.llvm.org/D97430
2021-02-26 03:59:23 +08:00
|
|
|
if (TargetTriple.supportsCOMDAT() &&
|
|
|
|
(TargetTriple.isOSBinFormatELF() || !F.isInterposable()))
|
|
|
|
if (auto Comdat = getOrCreateFunctionComdat(F, TargetTriple))
|
2018-10-13 07:21:48 +08:00
|
|
|
Array->setComdat(Comdat);
|
2017-06-03 07:13:44 +08:00
|
|
|
Array->setSection(getSectionName(Section));
|
2020-04-09 13:13:19 +08:00
|
|
|
Array->setAlignment(Align(DL->getTypeStoreSize(Ty).getFixedSize()));
|
2021-02-27 03:10:02 +08:00
|
|
|
|
|
|
|
// sancov_pcs parallels the other metadata section(s). Optimizers (e.g.
|
|
|
|
// GlobalOpt/ConstantMerge) may not discard sancov_pcs and the other
|
|
|
|
// section(s) as a unit, so we conservatively retain all unconditionally in
|
|
|
|
// the compiler.
|
|
|
|
//
|
|
|
|
// With comdat (COFF/ELF), the linker can guarantee the associated sections
|
|
|
|
// will be retained or discarded as a unit, so llvm.compiler.used is
|
|
|
|
// sufficient. Otherwise, conservatively make all of them retained by the
|
|
|
|
// linker.
|
|
|
|
if (Array->hasComdat())
|
|
|
|
GlobalsToAppendToCompilerUsed.push_back(Array);
|
|
|
|
else
|
|
|
|
GlobalsToAppendToUsed.push_back(Array);
|
2018-09-14 05:45:55 +08:00
|
|
|
|
2017-06-03 07:13:44 +08:00
|
|
|
return Array;
|
|
|
|
}
|
2017-07-28 07:36:49 +08:00
|
|
|
|
2017-08-29 07:46:11 +08:00
|
|
|
GlobalVariable *
|
2019-09-05 04:30:29 +08:00
|
|
|
ModuleSanitizerCoverage::CreatePCArray(Function &F,
|
|
|
|
ArrayRef<BasicBlock *> AllBlocks) {
|
2017-07-28 07:36:49 +08:00
|
|
|
size_t N = AllBlocks.size();
|
|
|
|
assert(N);
|
2017-08-26 03:29:47 +08:00
|
|
|
SmallVector<Constant *, 32> PCs;
|
2017-07-28 07:36:49 +08:00
|
|
|
IRBuilder<> IRB(&*F.getEntryBlock().getFirstInsertionPt());
|
2017-08-26 03:29:47 +08:00
|
|
|
for (size_t i = 0; i < N; i++) {
|
|
|
|
if (&F.getEntryBlock() == AllBlocks[i]) {
|
|
|
|
PCs.push_back((Constant *)IRB.CreatePointerCast(&F, IntptrPtrTy));
|
|
|
|
PCs.push_back((Constant *)IRB.CreateIntToPtr(
|
|
|
|
ConstantInt::get(IntptrTy, 1), IntptrPtrTy));
|
|
|
|
} else {
|
|
|
|
PCs.push_back((Constant *)IRB.CreatePointerCast(
|
|
|
|
BlockAddress::get(AllBlocks[i]), IntptrPtrTy));
|
|
|
|
PCs.push_back((Constant *)IRB.CreateIntToPtr(
|
|
|
|
ConstantInt::get(IntptrTy, 0), IntptrPtrTy));
|
|
|
|
}
|
|
|
|
}
|
2017-08-29 07:46:11 +08:00
|
|
|
auto *PCArray = CreateFunctionLocalArrayInSection(N * 2, F, IntptrPtrTy,
|
|
|
|
SanCovPCsSectionName);
|
|
|
|
PCArray->setInitializer(
|
2017-08-26 03:29:47 +08:00
|
|
|
ConstantArray::get(ArrayType::get(IntptrPtrTy, N * 2), PCs));
|
2017-08-29 07:46:11 +08:00
|
|
|
PCArray->setConstant(true);
|
2017-08-25 09:24:54 +08:00
|
|
|
|
2017-08-29 07:46:11 +08:00
|
|
|
return PCArray;
|
2017-07-28 07:36:49 +08:00
|
|
|
}
|
|
|
|
|
2019-09-05 04:30:29 +08:00
|
|
|
void ModuleSanitizerCoverage::CreateFunctionLocalArrays(
|
2017-07-28 07:36:49 +08:00
|
|
|
Function &F, ArrayRef<BasicBlock *> AllBlocks) {
|
2018-10-12 21:59:31 +08:00
|
|
|
if (Options.TracePCGuard)
|
2017-06-03 07:13:44 +08:00
|
|
|
FunctionGuardArray = CreateFunctionLocalArrayInSection(
|
2017-07-28 07:36:49 +08:00
|
|
|
AllBlocks.size(), F, Int32Ty, SanCovGuardsSectionName);
|
2018-10-12 21:59:31 +08:00
|
|
|
|
2018-09-14 05:45:55 +08:00
|
|
|
if (Options.Inline8bitCounters)
|
2017-06-09 06:58:19 +08:00
|
|
|
Function8bitCounterArray = CreateFunctionLocalArrayInSection(
|
2017-07-28 07:36:49 +08:00
|
|
|
AllBlocks.size(), F, Int8Ty, SanCovCountersSectionName);
|
2020-04-09 13:02:41 +08:00
|
|
|
if (Options.InlineBoolFlag)
|
|
|
|
FunctionBoolArray = CreateFunctionLocalArrayInSection(
|
|
|
|
AllBlocks.size(), F, Int1Ty, SanCovBoolFlagSectionName);
|
2018-10-12 21:59:31 +08:00
|
|
|
|
2018-09-14 05:45:55 +08:00
|
|
|
if (Options.PCTable)
|
2017-08-29 07:46:11 +08:00
|
|
|
FunctionPCsArray = CreatePCArray(F, AllBlocks);
|
2016-09-30 01:43:24 +08:00
|
|
|
}
|
2014-11-12 06:14:37 +08:00
|
|
|
|
2019-09-05 04:30:29 +08:00
|
|
|
bool ModuleSanitizerCoverage::InjectCoverage(Function &F,
|
|
|
|
ArrayRef<BasicBlock *> AllBlocks,
|
|
|
|
bool IsLeafFunc) {
|
2016-09-30 01:43:24 +08:00
|
|
|
if (AllBlocks.empty()) return false;
|
2017-07-28 07:36:49 +08:00
|
|
|
CreateFunctionLocalArrays(F, AllBlocks);
|
2017-07-25 10:07:38 +08:00
|
|
|
for (size_t i = 0, N = AllBlocks.size(); i < N; i++)
|
2017-08-31 06:49:31 +08:00
|
|
|
InjectCoverageAtBlock(F, *AllBlocks[i], i, IsLeafFunc);
|
2017-07-25 10:07:38 +08:00
|
|
|
return true;
|
2014-11-12 06:14:37 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
// On every indirect call we call a run-time function
|
|
|
|
// __sanitizer_cov_indir_call* with two parameters:
|
|
|
|
// - callee address,
|
2016-03-19 07:29:29 +08:00
|
|
|
// - global cache array that contains CacheSize pointers (zero-initialized).
|
2014-11-12 06:14:37 +08:00
|
|
|
// The cache is used to speed up recording the caller-callee pairs.
|
|
|
|
// The address of the caller is passed implicitly via caller PC.
|
2016-03-19 07:29:29 +08:00
|
|
|
// CacheSize is encoded in the name of the run-time function.
|
2019-09-05 04:30:29 +08:00
|
|
|
void ModuleSanitizerCoverage::InjectCoverageForIndirectCalls(
|
2014-11-12 06:14:37 +08:00
|
|
|
Function &F, ArrayRef<Instruction *> IndirCalls) {
|
2016-03-19 07:29:29 +08:00
|
|
|
if (IndirCalls.empty())
|
|
|
|
return;
|
2020-04-09 13:02:41 +08:00
|
|
|
assert(Options.TracePC || Options.TracePCGuard ||
|
|
|
|
Options.Inline8bitCounters || Options.InlineBoolFlag);
|
2014-11-12 06:14:37 +08:00
|
|
|
for (auto I : IndirCalls) {
|
|
|
|
IRBuilder<> IRB(I);
|
2020-04-22 12:56:04 +08:00
|
|
|
CallBase &CB = cast<CallBase>(*I);
|
2020-04-28 11:15:59 +08:00
|
|
|
Value *Callee = CB.getCalledOperand();
|
2016-03-19 07:29:29 +08:00
|
|
|
if (isa<InlineAsm>(Callee))
|
|
|
|
continue;
|
2017-04-20 06:42:11 +08:00
|
|
|
IRB.CreateCall(SanCovTracePCIndir, IRB.CreatePointerCast(Callee, IntptrTy));
|
2014-11-12 06:14:37 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2015-07-31 09:33:06 +08:00
|
|
|
// For every switch statement we insert a call:
|
|
|
|
// __sanitizer_cov_trace_switch(CondValue,
|
|
|
|
// {NumCases, ValueSizeInBits, Case0Value, Case1Value, Case2Value, ... })
|
|
|
|
|
2019-09-05 04:30:29 +08:00
|
|
|
void ModuleSanitizerCoverage::InjectTraceForSwitch(
|
2016-03-19 07:29:29 +08:00
|
|
|
Function &, ArrayRef<Instruction *> SwitchTraceTargets) {
|
2015-07-31 09:33:06 +08:00
|
|
|
for (auto I : SwitchTraceTargets) {
|
|
|
|
if (SwitchInst *SI = dyn_cast<SwitchInst>(I)) {
|
|
|
|
IRBuilder<> IRB(I);
|
|
|
|
SmallVector<Constant *, 16> Initializers;
|
|
|
|
Value *Cond = SI->getCondition();
|
2015-08-11 08:24:39 +08:00
|
|
|
if (Cond->getType()->getScalarSizeInBits() >
|
|
|
|
Int64Ty->getScalarSizeInBits())
|
|
|
|
continue;
|
2015-07-31 09:33:06 +08:00
|
|
|
Initializers.push_back(ConstantInt::get(Int64Ty, SI->getNumCases()));
|
|
|
|
Initializers.push_back(
|
|
|
|
ConstantInt::get(Int64Ty, Cond->getType()->getScalarSizeInBits()));
|
|
|
|
if (Cond->getType()->getScalarSizeInBits() <
|
|
|
|
Int64Ty->getScalarSizeInBits())
|
|
|
|
Cond = IRB.CreateIntCast(Cond, Int64Ty, false);
|
2016-03-19 07:29:29 +08:00
|
|
|
for (auto It : SI->cases()) {
|
2015-07-31 09:33:06 +08:00
|
|
|
Constant *C = It.getCaseValue();
|
|
|
|
if (C->getType()->getScalarSizeInBits() <
|
|
|
|
Int64Ty->getScalarSizeInBits())
|
|
|
|
C = ConstantExpr::getCast(CastInst::ZExt, It.getCaseValue(), Int64Ty);
|
|
|
|
Initializers.push_back(C);
|
|
|
|
}
|
2021-01-15 12:30:31 +08:00
|
|
|
llvm::sort(drop_begin(Initializers, 2),
|
2018-04-14 03:47:57 +08:00
|
|
|
[](const Constant *A, const Constant *B) {
|
|
|
|
return cast<ConstantInt>(A)->getLimitedValue() <
|
|
|
|
cast<ConstantInt>(B)->getLimitedValue();
|
|
|
|
});
|
2015-07-31 09:33:06 +08:00
|
|
|
ArrayType *ArrayOfInt64Ty = ArrayType::get(Int64Ty, Initializers.size());
|
|
|
|
GlobalVariable *GV = new GlobalVariable(
|
|
|
|
*CurModule, ArrayOfInt64Ty, false, GlobalVariable::InternalLinkage,
|
|
|
|
ConstantArray::get(ArrayOfInt64Ty, Initializers),
|
|
|
|
"__sancov_gen_cov_switch_values");
|
|
|
|
IRB.CreateCall(SanCovTraceSwitchFunction,
|
|
|
|
{Cond, IRB.CreatePointerCast(GV, Int64PtrTy)});
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2019-09-05 04:30:29 +08:00
|
|
|
void ModuleSanitizerCoverage::InjectTraceForDiv(
|
2016-08-30 09:12:10 +08:00
|
|
|
Function &, ArrayRef<BinaryOperator *> DivTraceTargets) {
|
|
|
|
for (auto BO : DivTraceTargets) {
|
|
|
|
IRBuilder<> IRB(BO);
|
|
|
|
Value *A1 = BO->getOperand(1);
|
|
|
|
if (isa<ConstantInt>(A1)) continue;
|
|
|
|
if (!A1->getType()->isIntegerTy())
|
|
|
|
continue;
|
|
|
|
uint64_t TypeSize = DL->getTypeStoreSizeInBits(A1->getType());
|
|
|
|
int CallbackIdx = TypeSize == 32 ? 0 :
|
|
|
|
TypeSize == 64 ? 1 : -1;
|
|
|
|
if (CallbackIdx < 0) continue;
|
|
|
|
auto Ty = Type::getIntNTy(*C, TypeSize);
|
2021-06-08 06:54:35 +08:00
|
|
|
IRB.CreateCall(SanCovTraceDivFunction[CallbackIdx],
|
|
|
|
{IRB.CreateIntCast(A1, Ty, true)});
|
2016-08-30 09:12:10 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2019-09-05 04:30:29 +08:00
|
|
|
void ModuleSanitizerCoverage::InjectTraceForGep(
|
2016-08-30 09:12:10 +08:00
|
|
|
Function &, ArrayRef<GetElementPtrInst *> GepTraceTargets) {
|
|
|
|
for (auto GEP : GepTraceTargets) {
|
|
|
|
IRBuilder<> IRB(GEP);
|
2021-02-06 13:02:07 +08:00
|
|
|
for (Use &Idx : GEP->indices())
|
|
|
|
if (!isa<ConstantInt>(Idx) && Idx->getType()->isIntegerTy())
|
2016-08-30 09:12:10 +08:00
|
|
|
IRB.CreateCall(SanCovTraceGepFunction,
|
2021-02-06 13:02:07 +08:00
|
|
|
{IRB.CreateIntCast(Idx, IntptrTy, true)});
|
2016-08-30 09:12:10 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2021-11-09 09:52:36 +08:00
|
|
|
void ModuleSanitizerCoverage::InjectTraceForLoadsAndStores(
|
|
|
|
Function &, ArrayRef<LoadInst *> Loads, ArrayRef<StoreInst *> Stores) {
|
2022-01-26 00:20:26 +08:00
|
|
|
auto CallbackIdx = [&](Type *ElementTy) -> int {
|
2021-11-09 09:52:36 +08:00
|
|
|
uint64_t TypeSize = DL->getTypeStoreSizeInBits(ElementTy);
|
|
|
|
return TypeSize == 8 ? 0
|
|
|
|
: TypeSize == 16 ? 1
|
|
|
|
: TypeSize == 32 ? 2
|
|
|
|
: TypeSize == 64 ? 3
|
|
|
|
: TypeSize == 128 ? 4
|
|
|
|
: -1;
|
|
|
|
};
|
|
|
|
Type *PointerType[5] = {Int8PtrTy, Int16PtrTy, Int32PtrTy, Int64PtrTy,
|
|
|
|
Int128PtrTy};
|
|
|
|
for (auto LI : Loads) {
|
|
|
|
IRBuilder<> IRB(LI);
|
|
|
|
auto Ptr = LI->getPointerOperand();
|
2022-01-26 00:20:26 +08:00
|
|
|
int Idx = CallbackIdx(LI->getType());
|
2021-11-09 09:52:36 +08:00
|
|
|
if (Idx < 0)
|
|
|
|
continue;
|
|
|
|
IRB.CreateCall(SanCovLoadFunction[Idx],
|
|
|
|
IRB.CreatePointerCast(Ptr, PointerType[Idx]));
|
|
|
|
}
|
|
|
|
for (auto SI : Stores) {
|
|
|
|
IRBuilder<> IRB(SI);
|
|
|
|
auto Ptr = SI->getPointerOperand();
|
2022-01-26 00:20:26 +08:00
|
|
|
int Idx = CallbackIdx(SI->getValueOperand()->getType());
|
2021-11-09 09:52:36 +08:00
|
|
|
if (Idx < 0)
|
|
|
|
continue;
|
|
|
|
IRB.CreateCall(SanCovStoreFunction[Idx],
|
|
|
|
IRB.CreatePointerCast(Ptr, PointerType[Idx]));
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2019-09-05 04:30:29 +08:00
|
|
|
void ModuleSanitizerCoverage::InjectTraceForCmp(
|
2016-03-19 07:29:29 +08:00
|
|
|
Function &, ArrayRef<Instruction *> CmpTraceTargets) {
|
2015-03-21 09:29:36 +08:00
|
|
|
for (auto I : CmpTraceTargets) {
|
|
|
|
if (ICmpInst *ICMP = dyn_cast<ICmpInst>(I)) {
|
|
|
|
IRBuilder<> IRB(ICMP);
|
|
|
|
Value *A0 = ICMP->getOperand(0);
|
|
|
|
Value *A1 = ICMP->getOperand(1);
|
2016-03-19 07:29:29 +08:00
|
|
|
if (!A0->getType()->isIntegerTy())
|
|
|
|
continue;
|
2015-03-21 09:29:36 +08:00
|
|
|
uint64_t TypeSize = DL->getTypeStoreSizeInBits(A0->getType());
|
2016-08-18 09:25:28 +08:00
|
|
|
int CallbackIdx = TypeSize == 8 ? 0 :
|
|
|
|
TypeSize == 16 ? 1 :
|
|
|
|
TypeSize == 32 ? 2 :
|
|
|
|
TypeSize == 64 ? 3 : -1;
|
|
|
|
if (CallbackIdx < 0) continue;
|
2015-05-07 05:35:25 +08:00
|
|
|
// __sanitizer_cov_trace_cmp((type_size << 32) | predicate, A0, A1);
|
2017-08-10 23:00:13 +08:00
|
|
|
auto CallbackFunc = SanCovTraceCmpFunction[CallbackIdx];
|
|
|
|
bool FirstIsConst = isa<ConstantInt>(A0);
|
|
|
|
bool SecondIsConst = isa<ConstantInt>(A1);
|
|
|
|
// If both are const, then we don't need such a comparison.
|
|
|
|
if (FirstIsConst && SecondIsConst) continue;
|
|
|
|
// If only one is const, then make it the first callback argument.
|
|
|
|
if (FirstIsConst || SecondIsConst) {
|
|
|
|
CallbackFunc = SanCovTraceConstCmpFunction[CallbackIdx];
|
2017-08-29 07:38:12 +08:00
|
|
|
if (SecondIsConst)
|
2017-08-10 23:00:13 +08:00
|
|
|
std::swap(A0, A1);
|
|
|
|
}
|
|
|
|
|
2016-08-18 09:25:28 +08:00
|
|
|
auto Ty = Type::getIntNTy(*C, TypeSize);
|
2021-06-08 06:54:35 +08:00
|
|
|
IRB.CreateCall(CallbackFunc, {IRB.CreateIntCast(A0, Ty, true),
|
|
|
|
IRB.CreateIntCast(A1, Ty, true)});
|
2015-03-21 09:29:36 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2019-09-05 04:30:29 +08:00
|
|
|
void ModuleSanitizerCoverage::InjectCoverageAtBlock(Function &F, BasicBlock &BB,
|
|
|
|
size_t Idx,
|
|
|
|
bool IsLeafFunc) {
|
2015-08-15 01:03:45 +08:00
|
|
|
BasicBlock::iterator IP = BB.getFirstInsertionPt();
|
2015-01-03 08:54:43 +08:00
|
|
|
bool IsEntryBB = &BB == &F.getEntryBlock();
|
2015-06-12 09:48:47 +08:00
|
|
|
DebugLoc EntryLoc;
|
|
|
|
if (IsEntryBB) {
|
2016-03-11 10:14:16 +08:00
|
|
|
if (auto SP = F.getSubprogram())
|
2020-12-12 04:45:22 +08:00
|
|
|
EntryLoc = DILocation::get(SP->getContext(), SP->getScopeLine(), 0, SP);
|
2015-08-15 00:45:42 +08:00
|
|
|
// Keep static allocas and llvm.localescape calls in the entry block. Even
|
|
|
|
// if we aren't splitting the block, it's nice for allocas to be before
|
|
|
|
// calls.
|
|
|
|
IP = PrepareToSplitEntryBlock(BB, IP);
|
2015-06-12 09:48:47 +08:00
|
|
|
} else {
|
|
|
|
EntryLoc = IP->getDebugLoc();
|
2021-04-13 06:55:53 +08:00
|
|
|
if (!EntryLoc)
|
|
|
|
if (auto *SP = F.getSubprogram())
|
|
|
|
EntryLoc = DILocation::get(SP->getContext(), 0, 0, SP);
|
2015-06-12 09:48:47 +08:00
|
|
|
}
|
|
|
|
|
2015-10-14 01:39:10 +08:00
|
|
|
IRBuilder<> IRB(&*IP);
|
2014-11-12 06:14:37 +08:00
|
|
|
IRB.SetCurrentDebugLocation(EntryLoc);
|
2016-02-18 05:34:43 +08:00
|
|
|
if (Options.TracePC) {
|
2020-06-23 02:43:52 +08:00
|
|
|
IRB.CreateCall(SanCovTracePC)
|
|
|
|
->setCannotMerge(); // gets the PC using GET_CALLER_PC.
|
2017-06-09 06:58:19 +08:00
|
|
|
}
|
|
|
|
if (Options.TracePCGuard) {
|
2016-09-30 01:43:24 +08:00
|
|
|
auto GuardPtr = IRB.CreateIntToPtr(
|
|
|
|
IRB.CreateAdd(IRB.CreatePointerCast(FunctionGuardArray, IntptrTy),
|
|
|
|
ConstantInt::get(IntptrTy, Idx * 4)),
|
|
|
|
Int32PtrTy);
|
2020-06-23 02:43:52 +08:00
|
|
|
IRB.CreateCall(SanCovTracePCGuard, GuardPtr)->setCannotMerge();
|
2015-02-04 09:21:45 +08:00
|
|
|
}
|
2017-06-09 06:58:19 +08:00
|
|
|
if (Options.Inline8bitCounters) {
|
|
|
|
auto CounterPtr = IRB.CreateGEP(
|
2019-02-02 04:44:47 +08:00
|
|
|
Function8bitCounterArray->getValueType(), Function8bitCounterArray,
|
2017-06-09 06:58:19 +08:00
|
|
|
{ConstantInt::get(IntptrTy, 0), ConstantInt::get(IntptrTy, Idx)});
|
2019-02-02 04:44:24 +08:00
|
|
|
auto Load = IRB.CreateLoad(Int8Ty, CounterPtr);
|
2017-06-09 06:58:19 +08:00
|
|
|
auto Inc = IRB.CreateAdd(Load, ConstantInt::get(Int8Ty, 1));
|
|
|
|
auto Store = IRB.CreateStore(Inc, CounterPtr);
|
|
|
|
SetNoSanitizeMetadata(Load);
|
|
|
|
SetNoSanitizeMetadata(Store);
|
|
|
|
}
|
2020-04-09 13:02:41 +08:00
|
|
|
if (Options.InlineBoolFlag) {
|
|
|
|
auto FlagPtr = IRB.CreateGEP(
|
|
|
|
FunctionBoolArray->getValueType(), FunctionBoolArray,
|
|
|
|
{ConstantInt::get(IntptrTy, 0), ConstantInt::get(IntptrTy, Idx)});
|
2020-05-05 16:19:13 +08:00
|
|
|
auto Load = IRB.CreateLoad(Int1Ty, FlagPtr);
|
|
|
|
auto ThenTerm =
|
|
|
|
SplitBlockAndInsertIfThen(IRB.CreateIsNull(Load), &*IP, false);
|
|
|
|
IRBuilder<> ThenIRB(ThenTerm);
|
|
|
|
auto Store = ThenIRB.CreateStore(ConstantInt::getTrue(Int1Ty), FlagPtr);
|
|
|
|
SetNoSanitizeMetadata(Load);
|
2020-04-09 13:02:41 +08:00
|
|
|
SetNoSanitizeMetadata(Store);
|
|
|
|
}
|
2017-08-31 06:49:31 +08:00
|
|
|
if (Options.StackDepth && IsEntryBB && !IsLeafFunc) {
|
2017-08-19 02:43:30 +08:00
|
|
|
// Check stack depth. If it's the deepest so far, record it.
|
2019-07-22 20:42:48 +08:00
|
|
|
Module *M = F.getParent();
|
|
|
|
Function *GetFrameAddr = Intrinsic::getDeclaration(
|
|
|
|
M, Intrinsic::frameaddress,
|
|
|
|
IRB.getInt8PtrTy(M->getDataLayout().getAllocaAddrSpace()));
|
2017-08-19 02:43:30 +08:00
|
|
|
auto FrameAddrPtr =
|
|
|
|
IRB.CreateCall(GetFrameAddr, {Constant::getNullValue(Int32Ty)});
|
|
|
|
auto FrameAddrInt = IRB.CreatePtrToInt(FrameAddrPtr, IntptrTy);
|
2019-02-02 04:44:24 +08:00
|
|
|
auto LowestStack = IRB.CreateLoad(IntptrTy, SanCovLowestStack);
|
2017-08-19 02:43:30 +08:00
|
|
|
auto IsStackLower = IRB.CreateICmpULT(FrameAddrInt, LowestStack);
|
|
|
|
auto ThenTerm = SplitBlockAndInsertIfThen(IsStackLower, &*IP, false);
|
|
|
|
IRBuilder<> ThenIRB(ThenTerm);
|
2017-08-31 06:49:31 +08:00
|
|
|
auto Store = ThenIRB.CreateStore(FrameAddrInt, SanCovLowestStack);
|
|
|
|
SetNoSanitizeMetadata(LowestStack);
|
|
|
|
SetNoSanitizeMetadata(Store);
|
2017-08-19 02:43:30 +08:00
|
|
|
}
|
2014-11-12 06:14:37 +08:00
|
|
|
}
|
|
|
|
|
2017-06-03 07:13:44 +08:00
|
|
|
std::string
|
2019-09-05 04:30:29 +08:00
|
|
|
ModuleSanitizerCoverage::getSectionName(const std::string &Section) const {
|
2019-01-15 05:02:02 +08:00
|
|
|
if (TargetTriple.isOSBinFormatCOFF()) {
|
[libFuzzer] Port to Windows
Summary:
Port libFuzzer to windows-msvc.
This patch allows libFuzzer targets to be built and run on Windows, using -fsanitize=fuzzer and/or fsanitize=fuzzer-no-link. It allows these forms of coverage instrumentation to work on Windows as well.
It does not fix all issues, such as those with -fsanitize-coverage=stack-depth, which is not usable on Windows as of this patch.
It also does not fix any libFuzzer integration tests. Nearly all of them fail to compile, fixing them will come in a later patch, so libFuzzer tests are disabled on Windows until them.
Patch By: metzman
Reviewers: morehouse, rnk
Reviewed By: morehouse, rnk
Subscribers: #sanitizers, delcypher, morehouse, kcc, eraman
Differential Revision: https://reviews.llvm.org/D51022
llvm-svn: 341082
2018-08-30 23:54:44 +08:00
|
|
|
if (Section == SanCovCountersSectionName)
|
|
|
|
return ".SCOV$CM";
|
2020-04-09 13:02:41 +08:00
|
|
|
if (Section == SanCovBoolFlagSectionName)
|
|
|
|
return ".SCOV$BM";
|
[libFuzzer] Port to Windows
Summary:
Port libFuzzer to windows-msvc.
This patch allows libFuzzer targets to be built and run on Windows, using -fsanitize=fuzzer and/or fsanitize=fuzzer-no-link. It allows these forms of coverage instrumentation to work on Windows as well.
It does not fix all issues, such as those with -fsanitize-coverage=stack-depth, which is not usable on Windows as of this patch.
It also does not fix any libFuzzer integration tests. Nearly all of them fail to compile, fixing them will come in a later patch, so libFuzzer tests are disabled on Windows until them.
Patch By: metzman
Reviewers: morehouse, rnk
Reviewed By: morehouse, rnk
Subscribers: #sanitizers, delcypher, morehouse, kcc, eraman
Differential Revision: https://reviews.llvm.org/D51022
llvm-svn: 341082
2018-08-30 23:54:44 +08:00
|
|
|
if (Section == SanCovPCsSectionName)
|
|
|
|
return ".SCOVP$M";
|
|
|
|
return ".SCOV$GM"; // For SanCovGuardsSectionName.
|
|
|
|
}
|
2017-02-03 09:08:06 +08:00
|
|
|
if (TargetTriple.isOSBinFormatMachO())
|
2017-06-03 07:13:44 +08:00
|
|
|
return "__DATA,__" + Section;
|
|
|
|
return "__" + Section;
|
2017-02-03 09:08:06 +08:00
|
|
|
}
|
|
|
|
|
2019-09-05 04:30:29 +08:00
|
|
|
std::string
|
|
|
|
ModuleSanitizerCoverage::getSectionStart(const std::string &Section) const {
|
|
|
|
if (TargetTriple.isOSBinFormatMachO())
|
|
|
|
return "\1section$start$__DATA$__" + Section;
|
|
|
|
return "__start___" + Section;
|
|
|
|
}
|
|
|
|
|
|
|
|
std::string
|
|
|
|
ModuleSanitizerCoverage::getSectionEnd(const std::string &Section) const {
|
|
|
|
if (TargetTriple.isOSBinFormatMachO())
|
|
|
|
return "\1section$end$__DATA$__" + Section;
|
|
|
|
return "__stop___" + Section;
|
|
|
|
}
|
2017-02-03 09:08:06 +08:00
|
|
|
|
2019-09-05 04:30:29 +08:00
|
|
|
char ModuleSanitizerCoverageLegacyPass::ID = 0;
|
|
|
|
INITIALIZE_PASS_BEGIN(ModuleSanitizerCoverageLegacyPass, "sancov",
|
2019-07-26 04:53:15 +08:00
|
|
|
"Pass for instrumenting coverage on functions", false,
|
|
|
|
false)
|
2016-02-27 13:50:40 +08:00
|
|
|
INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)
|
2017-05-24 08:29:12 +08:00
|
|
|
INITIALIZE_PASS_DEPENDENCY(PostDominatorTreeWrapperPass)
|
2019-09-05 04:30:29 +08:00
|
|
|
INITIALIZE_PASS_END(ModuleSanitizerCoverageLegacyPass, "sancov",
|
2019-07-26 04:53:15 +08:00
|
|
|
"Pass for instrumenting coverage on functions", false,
|
|
|
|
false)
|
|
|
|
ModulePass *llvm::createModuleSanitizerCoverageLegacyPassPass(
|
Implement `-fsanitize-coverage-whitelist` and `-fsanitize-coverage-blacklist` for clang
Summary:
This commit adds two command-line options to clang.
These options let the user decide which functions will receive SanitizerCoverage instrumentation.
This is most useful in the libFuzzer use case, where it enables targeted coverage-guided fuzzing.
Patch by Yannis Juglaret of DGA-MI, Rennes, France
libFuzzer tests its target against an evolving corpus, and relies on SanitizerCoverage instrumentation to collect the code coverage information that drives corpus evolution. Currently, libFuzzer collects such information for all functions of the target under test, and adds to the corpus every mutated sample that finds a new code coverage path in any function of the target. We propose instead to let the user specify which functions' code coverage information is relevant for building the upcoming fuzzing campaign's corpus. To this end, we add two new command line options for clang, enabling targeted coverage-guided fuzzing with libFuzzer. We see targeted coverage guided fuzzing as a simple way to leverage libFuzzer for big targets with thousands of functions or multiple dependencies. We publish this patch as work from DGA-MI of Rennes, France, with proper authorization from the hierarchy.
Targeted coverage-guided fuzzing can accelerate bug finding for two reasons. First, the compiler will avoid costly instrumentation for non-relevant functions, accelerating fuzzer execution for each call to any of these functions. Second, the built fuzzer will produce and use a more accurate corpus, because it will not keep the samples that find new coverage paths in non-relevant functions.
The two new command line options are `-fsanitize-coverage-whitelist` and `-fsanitize-coverage-blacklist`. They accept files in the same format as the existing `-fsanitize-blacklist` option <https://clang.llvm.org/docs/SanitizerSpecialCaseList.html#format>. The new options influence SanitizerCoverage so that it will only instrument a subset of the functions in the target. We explain these options in detail in `clang/docs/SanitizerCoverage.rst`.
Consider now the woff2 fuzzing example from the libFuzzer tutorial <https://github.com/google/fuzzer-test-suite/blob/master/tutorial/libFuzzerTutorial.md>. We are aware that we cannot conclude much from this example because mutating compressed data is generally a bad idea, but let us use it anyway as an illustration for its simplicity. Let us use an empty blacklist together with one of the three following whitelists:
```
# (a)
src:*
fun:*
# (b)
src:SRC/*
fun:*
# (c)
src:SRC/src/woff2_dec.cc
fun:*
```
Running the built fuzzers shows how many instrumentation points the compiler adds, the fuzzer will output //XXX PCs//. Whitelist (a) is the instrument-everything whitelist, it produces 11912 instrumentation points. Whitelist (b) focuses coverage to instrument woff2 source code only, ignoring the dependency code for brotli (de)compression; it produces 3984 instrumented instrumentation points. Whitelist (c) focuses coverage to only instrument functions in the main file that deals with WOFF2 to TTF conversion, resulting in 1056 instrumentation points.
For experimentation purposes, we ran each fuzzer approximately 100 times, single process, with the initial corpus provided in the tutorial. We let the fuzzer run until it either found the heap buffer overflow or went out of memory. On this simple example, whitelists (b) and (c) found the heap buffer overflow more reliably and 5x faster than whitelist (a). The average execution times when finding the heap buffer overflow were as follows: (a) 904 s, (b) 156 s, and (c) 176 s.
We explain these results by the fact that WOFF2 to TTF conversion calls the brotli decompression algorithm's functions, which are mostly irrelevant for finding bugs in WOFF2 font reconstruction but nevertheless instrumented and used by whitelist (a) to guide fuzzing. This results in longer execution time for these functions and a partially irrelevant corpus. Contrary to whitelist (a), whitelists (b) and (c) will execute brotli-related functions without instrumentation overhead, and ignore new code paths found in them. This results in faster bug finding for WOFF2 font reconstruction.
The results for whitelist (b) are similar to the ones for whitelist (c). Indeed, WOFF2 to TTF conversion calls functions that are mostly located in SRC/src/woff2_dec.cc. The 2892 extra instrumentation points allowed by whitelist (b) do not tamper with bug finding, even though they are mostly irrelevant, simply because most of these functions do not get called. We get a slightly faster average time for bug finding with whitelist (b), which might indicate that some of the extra instrumentation points are actually relevant, or might just be random noise.
Reviewers: kcc, morehouse, vitalybuka
Reviewed By: morehouse, vitalybuka
Subscribers: pratyai, vitalybuka, eternalsakura, xwlin222, dende, srhines, kubamracek, #sanitizers, lebedev.ri, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #sanitizers, #llvm
Differential Revision: https://reviews.llvm.org/D63616
2020-04-11 01:42:41 +08:00
|
|
|
const SanitizerCoverageOptions &Options,
|
2020-06-20 13:22:47 +08:00
|
|
|
const std::vector<std::string> &AllowlistFiles,
|
|
|
|
const std::vector<std::string> &BlocklistFiles) {
|
|
|
|
return new ModuleSanitizerCoverageLegacyPass(Options, AllowlistFiles,
|
|
|
|
BlocklistFiles);
|
2014-11-12 06:14:37 +08:00
|
|
|
}
|