Add docs+a script for building clang/LLVM with PGO

Depending on who you ask, PGO grants a 15%-25% improvement in build
times when using clang. Sadly, hooking everything up properly to
generate a profile and apply it to clang isn't always straightforward.
This script (and the accompanying docs) aim to make this process easier;
ideally, a single invocation of the given script.

In terms of testing, I've got a cronjob on my Debian box that's meant to
run this a few times per week, and I tried manually running it on a puny
Gentoo box I have (four whole Atom cores!). Nothing obviously broke.
¯\_(ツ)_/¯

I don't know if we have a Python style guide, so I just shoved this
through yapf with all the defaults on.

Finally, though the focus is clang at the moment, the hope is that this
is easily applicable to other LLVM-y tools with minimal effort (e.g.
lld, opt, ...). Hence, this lives in llvm/utils and tries to be somewhat
ambiguous about naming.

Differential Revision: https://reviews.llvm.org/D53598

llvm-svn: 345427
This commit is contained in:
George Burgess IV 2018-10-26 20:56:03 +00:00
parent 98d880fbd7
commit cf477f4e41
3 changed files with 654 additions and 0 deletions

View File

@ -0,0 +1,163 @@
=============================================================
How To Build Clang and LLVM with Profile-Guided Optimizations
=============================================================
Introduction
============
PGO (Profile-Guided Optimization) allows your compiler to better optimize code
for how it actually runs. Users report that applying this to Clang and LLVM can
decrease overall compile time by 20%.
This guide walks you through how to build Clang with PGO, though it also applies
to other subprojects, such as LLD.
Using the script
================
We have a script at ``utils/collect_and_build_with_pgo.py``. This script is
tested on a few Linux flavors, and requires a checkout of LLVM, Clang, and
compiler-rt. Despite the the name, it performs four clean builds of Clang, so it
can take a while to run to completion. Please see the script's ``--help`` for
more information on how to run it, and the different options available to you.
If you want to get the most out of PGO for a particular use-case (e.g. compiling
a specific large piece of software), please do read the section below on
'benchmark' selection.
Please note that this script is only tested on a few Linux distros. Patches to
add support for other platforms, as always, are highly appreciated. :)
This script also supports a ``--dry-run`` option, which causes it to print
important commands instead of running them.
Selecting 'benchmarks'
======================
PGO does best when the profiles gathered represent how the user plans to use the
compiler. Notably, highly accurate profiles of llc building x86_64 code aren't
incredibly helpful if you're going to be targeting ARM.
By default, the script above does two things to get solid coverage. It:
- runs all of Clang and LLVM's lit tests, and
- uses the instrumented Clang to build Clang, LLVM, and all of the other
LLVM subprojects available to it.
Together, these should give you:
- solid coverage of building C++,
- good coverage of building C,
- great coverage of running optimizations,
- great coverage of the backend for your host's architecture, and
- some coverage of other architectures (if other arches are supported backends).
Altogether, this should cover a diverse set of uses for Clang and LLVM. If you
have very specific needs (e.g. your compiler is meant to compile a large browser
for four different platforms, or similar), you may want to do something else.
This is configurable in the script itself.
Building Clang with PGO
=======================
If you prefer to not use the script, this briefly goes over how to build
Clang/LLVM with PGO.
First, you should have at least LLVM, Clang, and compiler-rt checked out
locally.
Next, at a high level, you're going to need to do the following:
1. Build a standard Release Clang and the relevant libclang_rt.profile library
2. Build Clang using the Clang you built above, but with instrumentation
3. Use the instrumented Clang to generate profiles, which consists of two steps:
- Running the instrumented Clang/LLVM/lld/etc. on tasks that represent how
users will use said tools.
- Using a tool to convert the "raw" profiles generated above into a single,
final PGO profile.
4. Build a final release Clang (along with whatever other binaries you need)
using the profile collected from your benchmark
In more detailed steps:
1. Configure a Clang build as you normally would. It's highly recommended that
you use the Release configuration for this, since it will be used to build
another Clang. Because you need Clang and supporting libraries, you'll want
to build the ``all`` target (e.g. ``ninja all`` or ``make -j4 all``).
2. Configure a Clang build as above, but add the following CMake args:
- ``-DLLVM_BUILD_INSTRUMENTED=IR`` -- This causes us to build everything
with instrumentation.
- ``-DLLVM_BUILD_RUNTIME=No`` -- A few projects have bad interactions when
built with profiling, and aren't necessary to build. This flag turns them
off.
- ``-DCMAKE_C_COMPILER=/path/to/stage1/clang`` - Use the Clang we built in
step 1.
- ``-DCMAKE_CXX_COMPILER=/path/to/stage1/clang++`` - Same as above.
In this build directory, you simply need to build the ``clang`` target (and
whatever supporting tooling your benchmark requires).
3. As mentioned above, this has two steps: gathering profile data, and then
massaging it into a useful form:
a. Build your benchmark using the Clang generated in step 2. The 'standard'
benchmark recommended is to run ``check-clang`` and ``check-llvm`` in your
instrumented Clang's build directory, and to do a full build of Clang/LLVM
using your instrumented Clang. So, create yet another build directory,
with the following CMake arguments:
- ``-DCMAKE_C_COMPILER=/path/to/stage2/clang`` - Use the Clang we built in
step 2.
- ``-DCMAKE_CXX_COMPILER=/path/to/stage2/clang++`` - Same as above.
If your users are fans of debug info, you may want to consider using
``-DCMAKE_BUILD_TYPE=RelWithDebInfo`` instead of
``-DCMAKE_BUILD_TYPE=Release``. This will grant better coverage of
debug info pieces of clang, but will take longer to complete and will
result in a much larger build directory.
It's recommended to build the ``all`` target with your instrumented Clang,
since more coverage is often better.
b. You should now have a few ``*.profdata`` files in
``path/to/stage2/profiles/``. You need to merge these using
``llvm-profdata`` (even if you only have one! The profile merge transforms
profraw into actual profile data, as well). This can be done with
``/path/to/stage1/llvm-profdata -merge
-output=/path/to/output/profdata.prof path/to/stage2/profiles/*.profdata``.
4. Now, build your final, PGO-optimized Clang. To do this, you'll want to pass
the following additional arguments to CMake.
- ``-DLLVM_PROFDATA_FILE=/path/to/output/profdata.prof`` - Use the PGO
profile from the previous step.
- ``-DCMAKE_C_COMPILER=/path/to/stage1/clang`` - Use the Clang we built in
step 1.
- ``-DCMAKE_CXX_COMPILER=/path/to/stage1/clang++`` - Same as above.
From here, you can build whatever targets you need.
.. note::
You may see warnings about a mismatched profile in the build output. These
are generally harmless. To silence them, you can add
``-DCMAKE_C_FLAGS='-Wno-backend-plugin'
-DCMAKE_CXX_FLAGS='-Wno-backend-plugin'`` to your CMake invocation.
Congrats! You now have a Clang built with profile-guided optimizations, and you
can delete all but the final build directory if you'd like.
If this worked well for you and you plan on doing it often, there's a slight
optimization that can be made: LLVM and Clang have a tool called tblgen that's
built and run during the build process. While it's potentially nice to build
this for coverage as part of step 3, none of your other builds should benefit
from building it. You can pass the CMake options
``-DCLANG_TABLEGEN=/path/to/stage1/bin/clang-tblgen
-DLLVM_TABLEGEN=/path/to/stage1/bin/llvm-tblgen`` to steps 2 and onward to avoid
these useless rebuilds.

View File

@ -68,6 +68,7 @@ representation.
CMakePrimer
AdvancedBuilds
HowToBuildOnARM
HowToBuildWithPGO
HowToCrossCompileBuiltinsOnArm
HowToCrossCompileLLVM
CommandGuide/index
@ -107,6 +108,9 @@ representation.
:doc:`HowToBuildOnARM`
Notes on building and testing LLVM/Clang on ARM.
:doc:`HowToBuildWithPGO`
Notes on building LLVM/Clang with PGO.
:doc:`HowToCrossCompileBuiltinsOnArm`
Notes on cross-building and testing the compiler-rt builtins for Arm.

View File

@ -0,0 +1,487 @@
#!/usr/bin/env python3
"""
This script:
- Builds clang with user-defined flags
- Uses that clang to build an instrumented clang, which can be used to collect
PGO samples
- Builds a user-defined set of sources (default: clang) to act as a
"benchmark" to generate a PGO profile
- Builds clang once more with the PGO profile generated above
This is a total of four clean builds of clang (by default). This may take a
while. :)
"""
import argparse
import collections
import multiprocessing
import os
import shlex
import shutil
import subprocess
import sys
### User configuration
# If you want to use a different 'benchmark' than building clang, make this
# function do what you want. out_dir is the build directory for clang, so all
# of the clang binaries will live under "${out_dir}/bin/". Using clang in
# ${out_dir} will magically have the profiles go to the right place.
#
# You may assume that out_dir is a freshly-built directory that you can reach
# in to build more things, if you'd like.
def _run_benchmark(env, out_dir, include_debug_info):
"""The 'benchmark' we run to generate profile data."""
target_dir = env.output_subdir('instrumentation_run')
# `check-llvm` and `check-clang` are cheap ways to increase coverage. The
# former lets us touch on the non-x86 backends a bit if configured, and the
# latter gives us more C to chew on (and will send us through diagnostic
# paths a fair amount, though the `if (stuff_is_broken) { diag() ... }`
# branches should still heavily be weighted in the not-taken direction,
# since we built all of LLVM/etc).
_build_things_in(env, target_dir, what=['check-llvm', 'check-clang'])
# Building tblgen gets us coverage; don't skip it. (out_dir may also not
# have them anyway, but that's less of an issue)
cmake = _get_cmake_invocation_for_bootstrap_from(
env, out_dir, skip_tablegens=False)
if include_debug_info:
cmake.add_flag('CMAKE_BUILD_TYPE', 'RelWithDebInfo')
_run_fresh_cmake(env, cmake, target_dir)
# Just build all the things. The more data we have, the better.
_build_things_in(env, target_dir, what=['all'])
### Script
class CmakeInvocation:
_cflags = ['CMAKE_C_FLAGS', 'CMAKE_CXX_FLAGS']
_ldflags = [
'CMAKE_EXE_LINKER_FLAGS',
'CMAKE_MODULE_LINKER_FLAGS',
'CMAKE_SHARED_LINKER_FLAGS',
]
def __init__(self, cmake, maker, cmake_dir):
self._prefix = [cmake, '-G', maker, cmake_dir]
# Map of str -> (list|str).
self._flags = {}
for flag in CmakeInvocation._cflags + CmakeInvocation._ldflags:
self._flags[flag] = []
def add_new_flag(self, key, value):
self.add_flag(key, value, allow_overwrites=False)
def add_flag(self, key, value, allow_overwrites=True):
if key not in self._flags:
self._flags[key] = value
return
existing_value = self._flags[key]
if isinstance(existing_value, list):
existing_value.append(value)
return
if not allow_overwrites:
raise ValueError('Invalid overwrite of %s requested' % key)
self._flags[key] = value
def add_cflags(self, flags):
# No, I didn't intend to append ['-', 'O', '2'] to my flags, thanks :)
assert not isinstance(flags, str)
for f in CmakeInvocation._cflags:
self._flags[f].extend(flags)
def add_ldflags(self, flags):
assert not isinstance(flags, str)
for f in CmakeInvocation._ldflags:
self._flags[f].extend(flags)
def to_args(self):
args = self._prefix.copy()
for key, value in sorted(self._flags.items()):
if isinstance(value, list):
# We preload all of the list-y values (cflags, ...). If we've
# nothing to add, don't.
if not value:
continue
value = ' '.join(value)
arg = '-D' + key
if value != '':
arg += '=' + value
args.append(arg)
return args
class Env:
def __init__(self, llvm_dir, use_make, output_dir, default_cmake_args,
dry_run):
self.llvm_dir = llvm_dir
self.use_make = use_make
self.output_dir = output_dir
self.default_cmake_args = default_cmake_args.copy()
self.dry_run = dry_run
def get_default_cmake_args_kv(self):
return self.default_cmake_args.items()
def get_cmake_maker(self):
return 'Ninja' if not self.use_make else 'Unix Makefiles'
def get_make_command(self):
if self.use_make:
return ['make', '-j{}'.format(multiprocessing.cpu_count())]
return ['ninja']
def output_subdir(self, name):
return os.path.join(self.output_dir, name)
def has_llvm_subproject(self, name):
if name == 'compiler-rt':
subdir = 'projects/compiler-rt'
elif name == 'clang':
subdir = 'tools/clang'
else:
raise ValueError('Unknown subproject: %s' % name)
return os.path.isdir(os.path.join(self.llvm_dir, subdir))
# Note that we don't allow capturing stdout/stderr. This works quite nicely
# with dry_run.
def run_command(self,
cmd,
cwd=None,
check=False,
silent_unless_error=False):
cmd_str = ' '.join(shlex.quote(s) for s in cmd)
print(
'Running `%s` in %s' % (cmd_str, shlex.quote(cwd or os.getcwd())))
if self.dry_run:
return
if silent_unless_error:
stdout, stderr = subprocess.PIPE, subprocess.STDOUT
else:
stdout, stderr = None, None
# Don't use subprocess.run because it's >= py3.5 only, and it's not too
# much extra effort to get what it gives us anyway.
popen = subprocess.Popen(
cmd,
stdin=subprocess.DEVNULL,
stdout=stdout,
stderr=stderr,
cwd=cwd)
stdout, _ = popen.communicate()
return_code = popen.wait(timeout=0)
if not return_code:
return
if silent_unless_error:
print(stdout.decode('utf-8', 'ignore'))
if check:
raise subprocess.CalledProcessError(
returncode=return_code, cmd=cmd, output=stdout, stderr=None)
def _get_default_cmake_invocation(env):
inv = CmakeInvocation(
cmake='cmake', maker=env.get_cmake_maker(), cmake_dir=env.llvm_dir)
for key, value in env.get_default_cmake_args_kv():
inv.add_new_flag(key, value)
return inv
def _get_cmake_invocation_for_bootstrap_from(env, out_dir,
skip_tablegens=True):
clang = os.path.join(out_dir, 'bin', 'clang')
cmake = _get_default_cmake_invocation(env)
cmake.add_new_flag('CMAKE_C_COMPILER', clang)
cmake.add_new_flag('CMAKE_CXX_COMPILER', clang + '++')
# We often get no value out of building new tblgens; the previous build
# should have them. It's still correct to build them, just slower.
def add_tablegen(key, binary):
path = os.path.join(out_dir, 'bin', binary)
# Check that this exists, since the user's allowed to specify their own
# stage1 directory (which is generally where we'll source everything
# from). Dry runs should hope for the best from our user, as well.
if env.dry_run or os.path.exists(path):
cmake.add_new_flag(key, path)
if skip_tablegens:
add_tablegen('LLVM_TABLEGEN', 'llvm-tblgen')
add_tablegen('CLANG_TABLEGEN', 'clang-tblgen')
return cmake
def _build_things_in(env, target_dir, what):
cmd = env.get_make_command() + what
env.run_command(cmd, cwd=target_dir, check=True)
def _run_fresh_cmake(env, cmake, target_dir):
if not env.dry_run:
try:
shutil.rmtree(target_dir)
except FileNotFoundError:
pass
os.makedirs(target_dir, mode=0o755)
cmake_args = cmake.to_args()
env.run_command(
cmake_args, cwd=target_dir, check=True, silent_unless_error=True)
def _build_stage1_clang(env):
target_dir = env.output_subdir('stage1')
cmake = _get_default_cmake_invocation(env)
_run_fresh_cmake(env, cmake, target_dir)
# FIXME: The full build here is somewhat unfortunate. It's primarily
# because I don't know what to call libclang_rt.profile for arches that
# aren't x86_64 (and even then, it's in a subdir that contains clang's
# current version). It would be nice to figure out what target I can
# request to magically have libclang_rt.profile built for ${host}
_build_things_in(env, target_dir, what=['all'])
return target_dir
def _generate_instrumented_clang_profile(env, stage1_dir, profile_dir,
output_file):
llvm_profdata = os.path.join(stage1_dir, 'bin', 'llvm-profdata')
if env.dry_run:
profiles = [os.path.join(profile_dir, '*.profraw')]
else:
profiles = [
os.path.join(profile_dir, f) for f in os.listdir(profile_dir)
if f.endswith('.profraw')
]
cmd = [llvm_profdata, 'merge', '-output=' + output_file] + profiles
env.run_command(cmd, check=True)
def _build_instrumented_clang(env, stage1_dir):
assert os.path.isabs(stage1_dir)
target_dir = os.path.join(env.output_dir, 'instrumented')
cmake = _get_cmake_invocation_for_bootstrap_from(env, stage1_dir)
cmake.add_new_flag('LLVM_BUILD_INSTRUMENTED', 'IR')
# libcxx's configure step messes with our link order: we'll link
# libclang_rt.profile after libgcc, and the former requires atexit from the
# latter. So, configure checks fail.
#
# Since we don't need libcxx or compiler-rt anyway, just disable them.
cmake.add_new_flag('LLVM_BUILD_RUNTIME', 'No')
_run_fresh_cmake(env, cmake, target_dir)
_build_things_in(env, target_dir, what=['clang', 'lld'])
profiles_dir = os.path.join(target_dir, 'profiles')
return target_dir, profiles_dir
def _build_optimized_clang(env, stage1_dir, profdata_file):
if not env.dry_run and not os.path.exists(profdata_file):
raise ValueError('Looks like the profdata file at %s doesn\'t exist' %
profdata_file)
target_dir = os.path.join(env.output_dir, 'optimized')
cmake = _get_cmake_invocation_for_bootstrap_from(env, stage1_dir)
cmake.add_new_flag('LLVM_PROFDATA_FILE', os.path.abspath(profdata_file))
# We'll get complaints about hash mismatches in `main` in tools/etc. Ignore
# it.
cmake.add_cflags(['-Wno-backend-plugin'])
_run_fresh_cmake(env, cmake, target_dir)
_build_things_in(env, target_dir, what=['clang'])
return target_dir
Args = collections.namedtuple('Args', [
'do_optimized_build',
'include_debug_info',
'profile_location',
'stage1_dir',
])
def _parse_args():
parser = argparse.ArgumentParser(
description='Builds LLVM and Clang with instrumentation, collects '
'instrumentation profiles for them, and (optionally) builds things'
'with these PGO profiles. By default, it\'s assumed that you\'re '
'running this from your LLVM root, and all build artifacts will be '
'saved to $PWD/out.')
parser.add_argument(
'--cmake-extra-arg',
action='append',
default=[],
help='an extra arg to pass to all cmake invocations. Note that this '
'is interpreted as a -D argument, e.g. --cmake-extra-arg FOO=BAR will '
'be passed as -DFOO=BAR. This may be specified multiple times.')
parser.add_argument(
'--dry-run',
action='store_true',
help='print commands instead of running them')
parser.add_argument(
'--llvm-dir',
default='.',
help='directory containing an LLVM checkout (default: $PWD)')
parser.add_argument(
'--no-optimized-build',
action='store_true',
help='disable the final, PGO-optimized build')
parser.add_argument(
'--out-dir',
help='directory to write artifacts to (default: $llvm_dir/out)')
parser.add_argument(
'--profile-output',
help='where to output the profile (default is $out/pgo_profile.prof)')
parser.add_argument(
'--stage1-dir',
help='instead of having an initial build of everything, use the given '
'directory. It is expected that this directory will have clang, '
'llvm-profdata, and the appropriate libclang_rt.profile already built')
parser.add_argument(
'--use-debug-info-in-benchmark',
action='store_true',
help='use a regular build instead of RelWithDebInfo in the benchmark. '
'This increases benchmark execution time and disk space requirements, '
'but gives more coverage over debuginfo bits in LLVM and clang.')
parser.add_argument(
'--use-make',
action='store_true',
default=shutil.which('ninja') is None,
help='use Makefiles instead of ninja')
args = parser.parse_args()
llvm_dir = os.path.abspath(args.llvm_dir)
if args.out_dir is None:
output_dir = os.path.join(llvm_dir, 'out')
else:
output_dir = os.path.abspath(args.out_dir)
extra_args = {'CMAKE_BUILD_TYPE': 'Release'}
for arg in args.cmake_extra_arg:
if arg.startswith('-D'):
arg = arg[2:]
elif arg.startswith('-'):
raise ValueError('Unknown not- -D arg encountered; you may need '
'to tweak the source...')
split = arg.split('=', 1)
if len(split) == 1:
key, val = split[0], ''
else:
key, val = split
extra_args[key] = val
env = Env(
default_cmake_args=extra_args,
dry_run=args.dry_run,
llvm_dir=llvm_dir,
output_dir=output_dir,
use_make=args.use_make,
)
if args.profile_output is not None:
profile_location = args.profile_output
else:
profile_location = os.path.join(env.output_dir, 'pgo_profile.prof')
result_args = Args(
do_optimized_build=not args.no_optimized_build,
include_debug_info=args.use_debug_info_in_benchmark,
profile_location=profile_location,
stage1_dir=args.stage1_dir,
)
return env, result_args
def _looks_like_llvm_dir(directory):
"""Arbitrary set of heuristics to determine if `directory` is an llvm dir.
Errs on the side of false-positives."""
contents = set(os.listdir(directory))
expected_contents = [
'CODE_OWNERS.TXT',
'cmake',
'docs',
'include',
'utils',
]
if not all(c in contents for c in expected_contents):
return False
try:
include_listing = os.listdir(os.path.join(directory, 'include'))
except NotADirectoryError:
return False
return 'llvm' in include_listing
def _die(*args, **kwargs):
kwargs['file'] = sys.stderr
print(*args, **kwargs)
sys.exit(1)
def _main():
env, args = _parse_args()
if not _looks_like_llvm_dir(env.llvm_dir):
_die('Looks like %s isn\'t an LLVM directory; please see --help' %
env.llvm_dir)
if not env.has_llvm_subproject('clang'):
_die('Need a clang checkout at tools/clang')
if not env.has_llvm_subproject('compiler-rt'):
_die('Need a compiler-rt checkout at projects/compiler-rt')
def status(*args):
print(*args, file=sys.stderr)
if args.stage1_dir is None:
status('*** Building stage1 clang...')
stage1_out = _build_stage1_clang(env)
else:
stage1_out = args.stage1_dir
status('*** Building instrumented clang...')
instrumented_out, profile_dir = _build_instrumented_clang(env, stage1_out)
status('*** Running profdata benchmarks...')
_run_benchmark(env, instrumented_out, args.include_debug_info)
status('*** Generating profile...')
_generate_instrumented_clang_profile(env, stage1_out, profile_dir,
args.profile_location)
print('Final profile:', args.profile_location)
if args.do_optimized_build:
status('*** Building PGO-optimized binaries...')
optimized_out = _build_optimized_clang(env, stage1_out,
args.profile_location)
print('Final build directory:', optimized_out)
if __name__ == '__main__':
_main()