[docs][PerformanceTips] Framing the generic IR tips

llvm-svn: 245858
This commit is contained in:
Philip Reames 2015-08-24 18:16:02 +00:00
parent 7223a7fe67
commit 92aa8d683d
1 changed files with 32 additions and 14 deletions

View File

@ -11,13 +11,42 @@ Abstract
The intended audience of this document is developers of language frontends
targeting LLVM IR. This document is home to a collection of tips on how to
generate IR that optimizes well. As with any optimizer, LLVM has its strengths
and weaknesses. In some cases, surprisingly small changes in the source IR
can have a large effect on the generated code.
generate IR that optimizes well.
IR Best Practices
=================
As with any optimizer, LLVM has its strengths and weaknesses. In some cases,
surprisingly small changes in the source IR can have a large effect on the
generated code.
Beyond the specific items on the list below, it's worth noting that the most
mature frontend for LLVM is Clang. As a result, the further your IR gets from what Clang might emit, the less likely it is to be effectively optimized. It
can often be useful to write a quick C program with the semantics you're trying
to model and see what decisions Clang's IRGen makes about what IR to emit.
Studying Clang's CodeGen directory can also be a good source of ideas. Note
that Clang and LLVM are explicitly version locked so you'll need to make sure
you're using a Clang built from the same svn revision or release as the LLVM
library you're using. As always, it's *strongly* recommended that you track
tip of tree development, particularly during bring up of a new project.
The Basics
^^^^^^^^^^^
#. Make sure that your Modules contain both a data layout specification and
target triple. Without these pieces, non of the target specific optimization
will be enabled. This can have a major effect on the generated code quality.
#. For each function or global emitted, use the most private linkage type
possible (private, internal or linkonce_odr preferably). Doing so will
make LLVM's inter-procedural optimizations much more effective.
#. Avoid high in-degree basic blocks (e.g. basic blocks with dozens or hundreds
of predecessors). Among other issues, the register allocator is known to
perform badly with confronted with such structures. The only exception to
this guidance is that a unified return block with high in-degree is fine.
Avoid loads and stores of large aggregate type
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@ -53,15 +82,9 @@ register width using a zext instruction.
Other Things to Consider
^^^^^^^^^^^^^^^^^^^^^^^^
#. Make sure that a DataLayout is provided (this will likely become required in
the near future, but is certainly important for optimization).
#. Use ptrtoint/inttoptr sparingly (they interfere with pointer aliasing
analysis), prefer GEPs
#. Use the "most-private" possible linkage types for the functions being defined
(private, internal or linkonce_odr preferably)
#. Prefer globals over inttoptr of a constant address - this gives you
dereferencability information. In MCJIT, use getSymbolAddress to provide
actual address.
@ -101,11 +124,6 @@ Other Things to Consider
improvement. Note that this is not always profitable and does involve a
potentially large increase in code size.
#. Avoid high in-degree basic blocks (e.g. basic blocks with dozens or hundreds
of predecessors). Among other issues, the register allocator is known to
perform badly with confronted with such structures. The only exception to
this guidance is that a unified return block with high in-degree is fine.
#. When checking a value against a constant, emit the check using a consistent
comparison type. The GVN pass *will* optimize redundant equalities even if
the type of comparison is inverted, but GVN only runs late in the pipeline.