Go to file
Matt Morehouse bef187c750 Implement `-fsanitize-coverage-whitelist` and `-fsanitize-coverage-blacklist` for clang
Summary:
This commit adds two command-line options to clang.
These options let the user decide which functions will receive SanitizerCoverage instrumentation.
This is most useful in the libFuzzer use case, where it enables targeted coverage-guided fuzzing.

Patch by Yannis Juglaret of DGA-MI, Rennes, France

libFuzzer tests its target against an evolving corpus, and relies on SanitizerCoverage instrumentation to collect the code coverage information that drives corpus evolution. Currently, libFuzzer collects such information for all functions of the target under test, and adds to the corpus every mutated sample that finds a new code coverage path in any function of the target. We propose instead to let the user specify which functions' code coverage information is relevant for building the upcoming fuzzing campaign's corpus. To this end, we add two new command line options for clang, enabling targeted coverage-guided fuzzing with libFuzzer. We see targeted coverage guided fuzzing as a simple way to leverage libFuzzer for big targets with thousands of functions or multiple dependencies. We publish this patch as work from DGA-MI of Rennes, France, with proper authorization from the hierarchy.

Targeted coverage-guided fuzzing can accelerate bug finding for two reasons. First, the compiler will avoid costly instrumentation for non-relevant functions, accelerating fuzzer execution for each call to any of these functions. Second, the built fuzzer will produce and use a more accurate corpus, because it will not keep the samples that find new coverage paths in non-relevant functions.

The two new command line options are `-fsanitize-coverage-whitelist` and `-fsanitize-coverage-blacklist`. They accept files in the same format as the existing `-fsanitize-blacklist` option <https://clang.llvm.org/docs/SanitizerSpecialCaseList.html#format>. The new options influence SanitizerCoverage so that it will only instrument a subset of the functions in the target. We explain these options in detail in `clang/docs/SanitizerCoverage.rst`.

Consider now the woff2 fuzzing example from the libFuzzer tutorial <https://github.com/google/fuzzer-test-suite/blob/master/tutorial/libFuzzerTutorial.md>. We are aware that we cannot conclude much from this example because mutating compressed data is generally a bad idea, but let us use it anyway as an illustration for its simplicity. Let us use an empty blacklist together with one of the three following whitelists:

```
  # (a)
  src:*
  fun:*

  # (b)
  src:SRC/*
  fun:*

  # (c)
  src:SRC/src/woff2_dec.cc
  fun:*
```

Running the built fuzzers shows how many instrumentation points the compiler adds, the fuzzer will output //XXX PCs//. Whitelist (a) is the instrument-everything whitelist, it produces 11912 instrumentation points. Whitelist (b) focuses coverage to instrument woff2 source code only, ignoring the dependency code for brotli (de)compression; it produces 3984 instrumented instrumentation points. Whitelist (c) focuses coverage to only instrument functions in the main file that deals with WOFF2 to TTF conversion, resulting in 1056 instrumentation points.

For experimentation purposes, we ran each fuzzer approximately 100 times, single process, with the initial corpus provided in the tutorial. We let the fuzzer run until it either found the heap buffer overflow or went out of memory. On this simple example, whitelists (b) and (c) found the heap buffer overflow more reliably and 5x faster than whitelist (a). The average execution times when finding the heap buffer overflow were as follows: (a) 904 s, (b) 156 s, and (c) 176 s.

We explain these results by the fact that WOFF2 to TTF conversion calls the brotli decompression algorithm's functions, which are mostly irrelevant for finding bugs in WOFF2 font reconstruction but nevertheless instrumented and used by whitelist (a) to guide fuzzing. This results in longer execution time for these functions and a partially irrelevant corpus. Contrary to whitelist (a), whitelists (b) and (c) will execute brotli-related functions without instrumentation overhead, and ignore new code paths found in them. This results in faster bug finding for WOFF2 font reconstruction.

The results for whitelist (b) are similar to the ones for whitelist (c). Indeed, WOFF2 to TTF conversion calls functions that are mostly located in SRC/src/woff2_dec.cc. The 2892 extra instrumentation points allowed by whitelist (b) do not tamper with bug finding, even though they are mostly irrelevant, simply because most of these functions do not get called. We get a slightly faster average time for bug finding with whitelist (b), which might indicate that some of the extra instrumentation points are actually relevant, or might just be random noise.

Reviewers: kcc, morehouse, vitalybuka

Reviewed By: morehouse, vitalybuka

Subscribers: pratyai, vitalybuka, eternalsakura, xwlin222, dende, srhines, kubamracek, #sanitizers, lebedev.ri, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #sanitizers, #llvm

Differential Revision: https://reviews.llvm.org/D63616
2020-04-10 10:44:03 -07:00
clang Implement `-fsanitize-coverage-whitelist` and `-fsanitize-coverage-blacklist` for clang 2020-04-10 10:44:03 -07:00
clang-tools-extra [clang-tidy] Add check to find calls to NSInvocation methods under ARC that don't have proper object argument lifetimes. 2020-04-10 08:51:21 -07:00
compiler-rt Implement `-fsanitize-coverage-whitelist` and `-fsanitize-coverage-blacklist` for clang 2020-04-10 10:44:03 -07:00
debuginfo-tests [Dexter] Add support for Windows to regression test suite. 2020-03-31 10:18:12 +01:00
flang [flang] [NFC] Adjust README.md for upstreaming. 2020-04-09 21:15:48 +01:00
libc [libc] Change minimum cmake requirement. 2020-04-09 20:42:55 -07:00
libclc libclc: cmake configure should depend on file list 2020-02-25 04:43:14 -05:00
libcxx [libc++] Fix recursive instantiation in std::array. 2020-04-09 17:42:10 -04:00
libcxxabi [libc++abi] Enable the new libc++ testing format by default 2020-04-07 09:16:06 -04:00
libunwind [libunwind] add hexagon support 2020-04-10 04:24:10 -05:00
lld [AArch64InstPrinter] Change printAlignedLabel to print the target address in hexadecimal form 2020-04-10 09:21:09 -07:00
lldb Attempt to fix a compile error reported with older compilers and libstdc++ 2020-04-10 10:34:44 -07:00
llvm Implement `-fsanitize-coverage-whitelist` and `-fsanitize-coverage-blacklist` for clang 2020-04-10 10:44:03 -07:00
mlir [MLIR][NFC] fix doc comment for isKnownIsolatedFromAbove 2020-04-10 19:15:21 +05:30
openmp [OpenMP] Put old APIs back and added new _async series for backward compatibility 2020-04-09 22:40:58 -04:00
parallel-libs [arcconfig] Delete subproject arcconfigs 2020-02-24 16:20:36 -08:00
polly [NFC] Modernize misc. uses of Align/MaybeAlign APIs. 2020-04-06 17:53:04 -07:00
pstl [pstl] A hot fix for exclusive_scan (+ lost enable_if in declaration) 2020-03-17 16:22:24 -04:00
utils/arcanist Use in-tree clang-format-diff.py as Arcanist linter 2020-04-06 12:02:20 -04:00
.arcconfig [arcconfig] Default base to previous revision 2020-02-24 16:20:25 -08:00
.arclint Setup clang-format as an Arcanist linter 2020-03-30 15:02:33 -04:00
.clang-format
.clang-tidy - Update .clang-tidy to ignore parameters of main like functions for naming violations in clang and llvm directory 2020-01-31 16:49:45 +00:00
.git-blame-ignore-revs Add some libc++ revisions to .git-blame-ignore-revs 2020-03-17 17:30:20 -04:00
.gitignore Add a newline at the end of the file 2019-09-04 06:33:46 +00:00
CONTRIBUTING.md Add contributing info to CONTRIBUTING.md and README.md 2019-12-02 15:47:15 +00:00
README.md Add missing hyphens 2020-04-08 08:21:53 +02:00

README.md

The LLVM Compiler Infrastructure

This directory and its sub-directories contain source code for LLVM, a toolkit for the construction of highly optimized compilers, optimizers, and run-time environments.

The README briefly describes how to get started with building LLVM. For more information on how to contribute to the LLVM project, please take a look at the Contributing to LLVM guide.

Getting Started with the LLVM System

Taken from https://llvm.org/docs/GettingStarted.html.

Overview

Welcome to the LLVM project!

The LLVM project has multiple components. The core of the project is itself called "LLVM". This contains all of the tools, libraries, and header files needed to process intermediate representations and converts it into object files. Tools include an assembler, disassembler, bitcode analyzer, and bitcode optimizer. It also contains basic regression tests.

C-like languages use the Clang front end. This component compiles C, C++, Objective-C, and Objective-C++ code into LLVM bitcode -- and from there into object files, using LLVM.

Other components include: the libc++ C++ standard library, the LLD linker, and more.

Getting the Source Code and Building LLVM

The LLVM Getting Started documentation may be out of date. The Clang Getting Started page might have more accurate information.

This is an example work-flow and configuration to get and build the LLVM source:

  1. Checkout LLVM (including related sub-projects like Clang):

    • git clone https://github.com/llvm/llvm-project.git

    • Or, on windows, git clone --config core.autocrlf=false https://github.com/llvm/llvm-project.git

  2. Configure and build LLVM and Clang:

    • cd llvm-project

    • mkdir build

    • cd build

    • cmake -G <generator> [options] ../llvm

      Some common build system generators are:

      • Ninja --- for generating Ninja build files. Most llvm developers use Ninja.
      • Unix Makefiles --- for generating make-compatible parallel makefiles.
      • Visual Studio --- for generating Visual Studio projects and solutions.
      • Xcode --- for generating Xcode projects.

      Some Common options:

      • -DLLVM_ENABLE_PROJECTS='...' --- semicolon-separated list of the LLVM sub-projects you'd like to additionally build. Can include any of: clang, clang-tools-extra, libcxx, libcxxabi, libunwind, lldb, compiler-rt, lld, polly, or debuginfo-tests.

        For example, to build LLVM, Clang, libcxx, and libcxxabi, use -DLLVM_ENABLE_PROJECTS="clang;libcxx;libcxxabi".

      • -DCMAKE_INSTALL_PREFIX=directory --- Specify for directory the full path name of where you want the LLVM tools and libraries to be installed (default /usr/local).

      • -DCMAKE_BUILD_TYPE=type --- Valid options for type are Debug, Release, RelWithDebInfo, and MinSizeRel. Default is Debug.

      • -DLLVM_ENABLE_ASSERTIONS=On --- Compile with assertion checks enabled (default is Yes for Debug builds, No for all other build types).

    • cmake --build . [-- [options] <target>] or your build system specified above directly.

      • The default target (i.e. ninja or make) will build all of LLVM.

      • The check-all target (i.e. ninja check-all) will run the regression tests to ensure everything is in working order.

      • CMake will generate targets for each tool and library, and most LLVM sub-projects generate their own check-<project> target.

      • Running a serial build will be slow. To improve speed, try running a parallel build. That's done by default in Ninja; for make, use the option -j NNN, where NNN is the number of parallel jobs, e.g. the number of CPUs you have.

    • For more information see CMake

Consult the Getting Started with LLVM page for detailed information on configuring and compiling LLVM. You can visit Directory Layout to learn about the layout of the source code tree.