The ``ASTImporter`` class is part of Clang's core library, the AST library.
It imports nodes of an ``ASTContext`` into another ``ASTContext``.
In this document, we assume basic knowledge about the Clang AST. See the :doc:`Introduction
to the Clang AST <IntroductionToTheClangAST>` if you want to learn more
about how the AST is structured.
Knowledge about :doc:`matching the Clang AST <LibASTMatchers>` and the `reference for the matchers <https://clang.llvm.org/docs/LibASTMatchersReference.html>`_ are also useful.
..contents::
:local:
Introduction
------------
``ASTContext`` holds long-lived AST nodes (such as types and decls) that can be referred to throughout the semantic analysis of a file.
In some cases it is preferable to work with more than one ``ASTContext``.
For example, we'd like to parse multiple different files inside the same Clang tool.
It may be convenient if we could view the set of the resulting ASTs as if they were one AST resulting from the parsing of each file together.
``ASTImporter`` provides the way to copy types or declarations from one ``ASTContext`` to another.
We refer to the context from which we import as the **"from" context** or *source context*; and the context into which we import as the **"to" context** or *destination context*.
Existing clients of the ``ASTImporter`` library are Cross Translation Unit (CTU) static analysis and the LLDB expression parser.
CTU static analysis imports a definition of a function if its definition is found in another translation unit (TU).
This way the analysis can breach out from the single TU limitation.
LLDB's ``expr`` command parses a user-defined expression, creates an ``ASTContext`` for that and then imports the missing definitions from the AST what we got from the debug information (DWARF, etc).
Algorithm of the import
-----------------------
Importing one AST node copies that node into the destination ``ASTContext``.
Why do we have to copy the node?
Isn't enough to insert the pointer to that node into the destination context?
One reason is that the "from" context may outlive the "to" context.
Also, the Clang AST consider nodes (or certain properties of nodes) equivalent if they have the same address!
The import algorithm has to ensure that the structurally equivalent nodes in the different translation units are not getting duplicated in the merged AST.
E.g. if we include the definition of the vector template (``#include <vector>``) in two translation units, then their merged AST should have only one node which represents the template.
Also, we have to discover *one definition rule* (ODR) violations.
For instance, if there is a class definition with the same name in both translation units, but one of the definition contains a different number of fields.
So, we look up existing definitions, and then we check the structural equivalency on those nodes.
The following pseudo-code demonstrates the basics of the import mechanism:
..code-block:: cpp
// Pseudo-code(!) of import:
ErrorOrDecl Import(Decl *FromD) {
Decl *ToDecl = nullptr;
FoundDeclsList = Look up all Decls in the "to" Ctx with the same name of FromD;
for (auto FoundDecl : FoundDeclsList) {
if (StructurallyEquivalentDecls(FoundDecl, FromD)) {
ToDecl = FoundDecl;
Mark FromD as imported;
break;
} else {
Report ODR violation;
return error;
}
}
if (FoundDeclsList is empty) {
Import dependent declarations and types of ToDecl;
ToDecl = create a new AST node in "to" Ctx;
Mark FromD as imported;
}
return ToDecl;
}
Two AST nodes are *structurally equivalent* if they are
- builtin types and refer to the same type, e.g. ``int`` and ``int`` are structurally equivalent,
- function types and all their parameters have structurally equivalent types,
- record types and all their fields in order of their definition have the same identifier names and structurally equivalent types,
- variable or function declarations and they have the same identifier name and their types are structurally equivalent.
We could extend the definition of structural equivalency to templates similarly.
If A and B are AST nodes and *A depends on B*, then we say that A is a **dependant** of B and B is a **dependency** of A.
The words "dependant" and "dependency" are nouns in British English.
Unfortunately, in American English, the adjective "dependent" is used for both meanings.
In this document, with the "dependent" adjective we always address the dependencies, the B node in the example.
API
---
Let's create a tool which uses the ASTImporter class!
First, we build two ASTs from virtual files; the content of the virtual files are synthesized from string literals:
If there's no error then we can get the underlying value.
In this example we will print the AST of the "to" context.
..code-block:: cpp
Decl *Imported = *ImportedOrErr;
Imported->getTranslationUnitDecl()->dump();
Since we set **minimal import** in the constructor of the importer, the AST will not contain the declaration of the members (once we run the test tool).
With **normal import**, all dependent declarations are imported normally.
However, with minimal import, the dependent Decls are imported without definition, and we have to import their definition for each if we later need that.
Putting this all together here is how the source of the tool looks like:
..code-block:: cpp
#include "clang/AST/ASTImporter.h"
#include "clang/ASTMatchers/ASTMatchFinder.h"
#include "clang/ASTMatchers/ASTMatchers.h"
#include "clang/Tooling/Tooling.h"
using namespace clang;
using namespace tooling;
using namespace ast_matchers;
template <typename Node, typename Matcher>
Node *getFirstDecl(Matcher M, const std::unique_ptr<ASTUnit> &Unit) {
auto MB = M.bind("bindStr"); // Bind the to-be-matched node to a string key.
|-CXXRecordDecl 0xe91558 <col:7, col:14> col:14 implicit struct X
`-FieldDecl 0xe91600 <col:23, col:27> col:27 i 'int'
Error propagation
"""""""""""""""""
If there is a dependent node we have to import before we could import a given node then the import error associated to the dependency propagates to the dependant node.
Let's modify the previous example and import a ``FieldDecl`` instead of the ``ClassTemplateSpecializationDecl``.
..code-block:: cpp
auto Matcher = fieldDecl(hasName("i2"));
auto *From = getFirstDecl<FieldDecl>(Matcher, FromUnit);
In this case we can see that an error is associated (``getImportDeclErrorIfAny``) to the specialization also, not just to the field:
auto *From = getFirstDecl<CXXRecordDecl>(Matcher, FromUnit);
auto *To = getFirstDecl<CXXRecordDecl>(Matcher, ToUnit);
This time we create a shared_ptr for ``ASTImporterSharedState`` which owns the associated errors for the "to" context.
Note, there may be several different ASTImporter objects which import into the same "to" context but from different "from" contexts; they should share the same ``ASTImporterSharedState``.
(Also note, we have to include the corresponding ``ASTImporterSharedState.h`` header file.)
..code-block:: cpp
auto ImporterState = std::make_shared<ASTImporterSharedState>();
`-CXXRecordDecl 0xf66828 <col:7, col:13> col:13 implicit class Y
We do not remove the erroneous nodes because by the time when we recognize the error it is too late to remove the node, there may be additional references to that already in the AST.
This is aligned with the overall `design principle of the Clang AST <InternalsManual.html#immutability>`_: Clang AST nodes (types, declarations, statements, expressions, and so on) are generally designed to be **immutable once created**.
Thus, clients of the ASTImporter library should always check if there is any associated error for the node which they inspect in the destination context.
We recommend skipping the processing of those nodes which have an error associated with them.
Using the ``-ast-merge`` Clang front-end action
-----------------------------------------------
The ``-ast-merge <pch-file>`` command-line switch can be used to merge from the given serialized AST file.
This file represents the source context.
When this switch is present then each top-level AST node of the source context is being merged into the destination context.
If the merge was successful then ``ASTConsumer::HandleTopLevelDecl`` is called for the Decl.
This results that we can execute the original front-end action on the extended AST.
Example for C
^^^^^^^^^^^^^
Let's consider the following three files:
..code-block:: c
// bar.h
#ifndef BAR_H
#define BAR_H
int bar();
#endif /* BAR_H */
// bar.c
#include "bar.h"
int bar() {
return 41;
}
// main.c
#include "bar.h"
int main() {
return bar();
}
Let's generate the AST files for the two source files:
..code-block:: bash
$ clang -cc1 -emit-pch -o bar.ast bar.c
$ clang -cc1 -emit-pch -o main.ast main.c
Then, let's check how the merged AST would look like if we consider only the ``bar()`` function: