Recently we observed high memory pressure caused by clang during some parallel builds.
We discovered that we have several projects that have a large number of #define directives
in their TUs (on the order of millions), which caused huge memory consumption in clang due
to a lot of allocations for MacroInfo. We would like to reduce the memory overhead of
clang for a single #define to reduce the memory overhead for these files, to allow us to
reduce the memory pressure on the system during highly parallel builds. This change achieves
that by removing the SmallVector in MacroInfo and instead storing the tokens in an array
allocated using the bump pointer allocator, after all tokens are lexed.
The added unit test with 1000000 #define directives illustrates the problem. Prior to this
change, on arm64 macOS, clang's PP bump pointer allocator allocated 272007616 bytes, and
used roughly 272 bytes per #define. After this change, clang's PP bump pointer allocator
allocates 120002016 bytes, and uses only roughly 120 bytes per #define.
For an example test file that we have internally with 7.8 million #define directives, this
change produces the following improvement on arm64 macOS: Persistent allocation footprint for
this test case file as it's being compiled to LLVM IR went down 22% from 5.28 GB to 4.07 GB
and the total allocations went down 14% from 8.26 GB to 7.05 GB. Furthermore, this change
reduced the total number of allocations made by the system for this clang invocation from
1454853 to 133663, an order of magnitude improvement.
The recommit fixes the LLDB build failure.
Differential Revision: https://reviews.llvm.org/D117348
Recently we observed high memory pressure caused by clang during some parallel builds.
We discovered that we have several projects that have a large number of #define directives
in their TUs (on the order of millions), which caused huge memory consumption in clang due
to a lot of allocations for MacroInfo. We would like to reduce the memory overhead of
clang for a single #define to reduce the memory overhead for these files, to allow us to
reduce the memory pressure on the system during highly parallel builds. This change achieves
that by removing the SmallVector in MacroInfo and instead storing the tokens in an array
allocated using the bump pointer allocator, after all tokens are lexed.
The added unit test with 1000000 #define directives illustrates the problem. Prior to this
change, on arm64 macOS, clang's PP bump pointer allocator allocated 272007616 bytes, and
used roughly 272 bytes per #define. After this change, clang's PP bump pointer allocator
allocates 120002016 bytes, and uses only roughly 120 bytes per #define.
For an example test file that we have internally with 7.8 million #define directives, this
change produces the following improvement on arm64 macOS: Persistent allocation footprint for
this test case file as it's being compiled to LLVM IR went down 22% from 5.28 GB to 4.07 GB
and the total allocations went down 14% from 8.26 GB to 7.05 GB. Furthermore, this change
reduced the total number of allocations made by the system for this clang invocation from
1454853 to 133663, an order of magnitude improvement.
Differential Revision: https://reviews.llvm.org/D117348
ASTReader
This is a cleanup to reduce the lines of code to handle default template
argument in ASTReader.
Reviewed By: urnathan
Differential Revision: https://reviews.llvm.org/D118437
This patch replaces the exact include count of each file in `HeaderFileInfo` with a set of included files in `Preprocessor`.
The number of includes isn't a property of a header file but rather a preprocessor state. The exact number of includes is not used anywhere except statistic tracking.
Reviewed By: vsapsai
Differential Revision: https://reviews.llvm.org/D114095
The patch was reverted because it caused a crash during PCH build -- we
missed to update the RParenLoc in TreeTransform<Derived>::TransformAutoType.
This relands 55d96ac and 37ec65e with a test and fix.
This patch adds the support for `atomic compare` in parser. The support
in Sema and CodeGen will come soon. For now, it simply eimits an error when it
is encountered.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D115561
This reverts commit cc56c66f27.
Fixed a bad assertion, the target of a UsingShadowDecl must not have
*local* qualifiers, but it can be a typedef whose underlying type is qualified.
Currently there's no way to find the UsingDecl that a typeloc found its
underlying type through. Compare to DeclRefExpr::getFoundDecl().
Design decisions:
- a sugar type, as there are many contexts this type of use may appear in
- UsingType is a leaf like TypedefType, the underlying type has no TypeLoc
- not unified with UnresolvedUsingType: a single name is appealing,
but being sometimes-sugar is often fiddly.
- not unified with TypedefType: the UsingShadowDecl is not a TypedefNameDecl or
even a TypeDecl, and users think of these differently.
- does not cover other rarer aliases like objc @compatibility_alias,
in order to be have a concrete API that's easy to understand.
- implicitly desugared by the hasDeclaration ASTMatcher, to avoid
breaking existing patterns and following the precedent of ElaboratedType.
Scope:
- This does not cover types associated with template names introduced by
using declarations. A future patch should introduce a sugar TemplateName
variant for this. (CTAD deduced types fall under this)
- There are enough AST matchers to fix the in-tree clang-tidy tests and
probably any other matchers, though more may be useful later.
Caveats:
- This changes a fairly common pattern in the AST people may depend on matching.
Previously, typeLoc(loc(recordType())) matched whether a struct was
referred to by its original scope or introduced via using-decl.
Now, the using-decl case is not matched, and needs a separate matcher.
This is similar to the case of typedefs but nevertheless both adds
complexity and breaks existing code.
Differential Revision: https://reviews.llvm.org/D114251
WG14 adopted the _ExtInt feature from Clang for C23, but renamed the
type to be _BitInt. This patch does the vast majority of the work to
rename _ExtInt to _BitInt, which accounts for most of its size. The new
type is exposed in older C modes and all C++ modes as a conforming
extension. However, there are functional changes worth calling out:
* Deprecates _ExtInt with a fix-it to help users migrate to _BitInt.
* Updates the mangling for the type.
* Updates the documentation and adds a release note to warn users what
is going on.
* Adds new diagnostics for use of _BitInt to call out when it's used as
a Clang extension or as a pre-C23 compatibility concern.
* Adds new tests for the new diagnostic behaviors.
I want to call out the ABI break specifically. We do not believe that
this break will cause a significant imposition for early adopters of
the feature, and so this is being done as a full break. If it turns out
there are critical uses where recompilation is not an option for some
reason, we can consider using ABI tags to ease the transition.
For each selector encountered in the source code, we need to load
selectors from the imported modules and check that we are calling a
selector with compatible types.
At the moment, for each module we are storing methods declared in the
headers belonging to this module and methods from the transitive closure
of imported modules. When a module is imported by a few other modules,
methods from the shared module are duplicated in each importer. As the
result, we can end up with lots of identical methods that we try to add
to the global method pool. Doing this duplicate work is useless and
relatively expensive.
Avoid processing duplicate methods by storing in each module only its
own methods and not storing methods from dependencies. Collect methods
from dependencies by walking the graph of module dependencies.
The issue was discovered and reported by Richard Howell. He has done the
hard work for this fix as he has investigated and provided a detailed
explanation of the performance problem.
Differential Revision: https://reviews.llvm.org/D110123
At this point, `F.ImportLoc` has not been initialized by the `ASTReader` yet and using it leads to an assertion failure.
Introduced in 638c673a8c and 4445135109.
To reduce the number of explicit builds of a single module, we can try to squash multiple occurrences of the module with different command-lines (and context hashes) by removing benign command-line options. The greatest contributors to benign differences between command-lines are the header search paths.
In this patch, the lookup cache in `HeaderSearch` is used to identify paths that were actually used when implicitly building the module during scanning. This information is serialized into the unhashed control block of the implicitly-built PCM. The dependency scanner then loads this and may use it to prune the header search paths before computing the context hash of the module and generating the command-line.
We could also prune the header search paths when serializing `HeaderSearchOptions` into the PCM. That way, we could do it only once instead of every load of the PCM file by dependency scanner. However, that would result in a PCM file whose contents don't produce the same context hash as the original build, which is probably highly surprising.
There is an alternative approach to storing extra information into the PCM: wire up preprocessor callbacks to capture the used header search paths on-the-fly during preprocessing of modularized headers (similar to what we currently do for the main source file and textual headers). Right now, that's not compatible with the fact that we do an actual implicit build producing PCM files during dependency scanning. The second run of dependency scanner loads the PCM from the first run, skipping the preprocessing altogether, which would result in different results between runs. We can revisit this approach when we stop building implicitly during dependency scanning.
Depends on D102923.
Reviewed By: dexonsmith
Differential Revision: https://reviews.llvm.org/D102488
This patch propagates the import `SourceLocation` into `HeaderSearch::lookupModule`. This enables remarks on search path usage (implemented in D102923) to point to the source code that initiated header search.
Reviewed By: dexonsmith
Differential Revision: https://reviews.llvm.org/D111557
This refactor changes the GlobalMethodPool to a class that contains
the DenseMap of methods. This is to allow for the addition of a
separate DenseSet in a follow-up diff that will handle method
de-duplication when inserting methods into the global method pool.
Changes:
- the `GlobalMethods` pair becomes `GlobalMethodPool::Lists`
- the `GlobalMethodPool` becomes a class containing the `DenseMap` of methods
- pass through methods are added to maintain most of the existing code without changing `MethodPool` -> `MethodPool.Methods` everywhere
Reviewed By: dexonsmith
Differential Revision: https://reviews.llvm.org/D109898
Currently, we have no front-end type for ppc_fp128 type in IR. PowerPC
target generates ppc_fp128 type from long double now, but there's option
(-mabi=(ieee|ibm)longdouble) to control it and we're going to do
transition from IBM extended double-double ppc_fp128 to IEEE fp128 in
the future.
This patch adds type __ibm128 which always represents ppc_fp128 in IR,
as what GCC did for that type. Without this type in Clang, compilation
will fail if compiling against future version of libstdcxx (which uses
__ibm128 in headers).
Although all operations in backend for __ibm128 is done by software,
only PowerPC enables support for it.
There's something not implemented in this commit, which can be done in
future ones:
- Literal suffix for __ibm128 type. w/W is suitable as GCC documented.
- __attribute__((mode(IF))) should be for __ibm128.
- Complex __ibm128 type.
Reviewed By: rjmccall
Differential Revision: https://reviews.llvm.org/D93377
Per the contract of ReadLateParsedTemplates, we should not be returning
the same results multiple times. No functionality change intended, other
than to runtime.
Thanks to Luboš Luňák for identifying the cause of the regression!
Reading the AST block can never fail with a recoverable error as modules
cannot be removed during this phase. Change the return type of these
functions to return an llvm::Error instead, ie. either success or
failure.
NFC other than the wording of some of the errors.
Differential Revision: https://reviews.llvm.org/D108268
Reading modules first reads each control block in the chain and then all
AST blocks.
The first phase is intended to find recoverable errors, eg. an out of
date or missing module. If any error occurs during this phase, it is
safe to remove all modules in the chain as no references to them will
exist.
While reading the AST blocks, however, various fields in ASTReader are
updated with references to the module. Removing modules at this point
can cause dangling pointers which can be accessed later. These would be
otherwise harmless, eg. a binary search over `GlobalSLocEntryMap` may
access a failed module that could error, but shouldn't crash. Do not
remove modules in this phase, regardless of failures.
Since this is the case, it also doesn't make sense to return OutOfDate
during this phase, so remove the two cases where this happens.
When they were originally added these checks would return a failure when
the serialized and current path didn't match up. That was updated to an
OutOfDate as it was found to be hit when using VFS and overriding the
umbrella. Later on the path was changed to instead be the name as
written in the module file, resolved using the serialized base
directory. At this point the check is really only comparing the name of
the umbrella and only works for frameworks since those don't include
`Headers/` in the name (which means the resolved path will never exist)
Given all that, it seems safe to ignore this case entirely for now.
This makes the handling of an umbrella header/directory the same as
regular headers, which also don't check for differences in the path
caused by VFS.
Resolves rdar://79329355
Differential Revision: https://reviews.llvm.org/D107690
This is part of a patch series working towards the ability to make
SourceLocation into a 64-bit type to handle larger translation units.
NFC: this patch introduces typedefs for the integer type used by
SourceLocation and makes all the boring changes to use the typedefs
everywhere, but for the moment, they are unconditionally defined to
uint32_t.
Patch originally by Mikhail Maltsev.
Reviewed By: tmatheson
Differential Revision: https://reviews.llvm.org/D105492
It was possible to re-add a module to a shared in-memory module cache
when search paths are changed. This can eventually cause a crash if the
original module is referenced after this occurs.
1. Module A depends on B
2. B exists in two paths C and D
3. First run only has C on the search path, finds A and B and loads
them
4. Second run adds D to the front of the search path. A is loaded and
contains a reference to the already compiled module from C. But
searching finds the module from D instead, causing a mismatch
5. B and the modules that depend on it are considered out of date and
thus rebuilt
6. The recompiled module A is added to the in-memory cache, freeing
the previously inserted one
This can never occur from a regular clang process, but is very easy to
do through the API - whether through the use of a shared case or just
running multiple compilations from a single `CompilerInstance`. Update
the compilation to return early if a module is already finalized so that
the pre-condition in the in-memory module cache holds.
Resolves rdar://78180255
Differential Revision: https://reviews.llvm.org/D105328
Original commit message:
[clang-repl] Implement partial translation units and error recovery.
https://reviews.llvm.org/D96033 contained a discussion regarding efficient
modeling of error recovery. @rjmccall has outlined the key ideas:
Conceptually, we can split the translation unit into a sequence of partial
translation units (PTUs). Every declaration will be associated with a unique PTU
that owns it.
The first key insight here is that the owning PTU isn't always the "active"
(most recent) PTU, and it isn't always the PTU that the declaration
"comes from". A new declaration (that isn't a redeclaration or specialization of
anything) does belong to the active PTU. A template specialization, however,
belongs to the most recent PTU of all the declarations in its signature - mostly
that means that it can be pulled into a more recent PTU by its template
arguments.
The second key insight is that processing a PTU might extend an earlier PTU.
Rolling back the later PTU shouldn't throw that extension away. For example, if
the second PTU defines a template, and the third PTU requires that template to
be instantiated at float, that template specialization is still part of the
second PTU. Similarly, if the fifth PTU uses an inline function belonging to the
fourth, that definition still belongs to the fourth. When we go to emit code in
a new PTU, we map each declaration we have to emit back to its owning PTU and
emit it in a new module for just the extensions to that PTU. We keep track of
all the modules we've emitted for a PTU so that we can unload them all if we
decide to roll it back.
Most declarations/definitions will only refer to entities from the same or
earlier PTUs. However, it is possible (primarily by defining a
previously-declared entity, but also through templates or ADL) for an entity
that belongs to one PTU to refer to something from a later PTU. We will have to
keep track of this and prevent unwinding to later PTU when we recognize it.
Fortunately, this should be very rare; and crucially, we don't have to do the
bookkeeping for this if we've only got one PTU, e.g. in normal compilation.
Otherwise, PTUs after the first just need to record enough metadata to be able
to revert any changes they've made to declarations belonging to earlier PTUs,
e.g. to redeclaration chains or template specialization lists.
It should even eventually be possible for PTUs to provide their own slab
allocators which can be thrown away as part of rolling back the PTU. We can
maintain a notion of the active allocator and allocate things like Stmt/Expr
nodes in it, temporarily changing it to the appropriate PTU whenever we go to do
something like instantiate a function template. More care will be required when
allocating declarations and types, though.
We would want the PTU to be efficiently recoverable from a Decl; I'm not sure
how best to do that. An easy option that would cover most declarations would be
to make multiple TranslationUnitDecls and parent the declarations appropriately,
but I don't think that's good enough for things like member function templates,
since an instantiation of that would still be parented by its original class.
Maybe we can work this into the DC chain somehow, like how lexical DCs are.
We add a different kind of translation unit `TU_Incremental` which is a
complete translation unit that we might nonetheless incrementally extend later.
Because it is complete (and we might want to generate code for it), we do
perform template instantiation, but because it might be extended later, we don't
warn if it declares or uses undefined internal-linkage symbols.
This patch teaches clang-repl how to recover from errors by disconnecting the
most recent PTU and update the primary PTU lookup tables. For instance:
```./clang-repl
clang-repl> int i = 12; error;
In file included from <<< inputs >>>:1:
input_line_0:1:13: error: C++ requires a type specifier for all declarations
int i = 12; error;
^
error: Parsing failed.
clang-repl> int i = 13; extern "C" int printf(const char*,...);
clang-repl> auto r1 = printf("i=%d\n", i);
i=13
clang-repl> quit
```
Differential revision: https://reviews.llvm.org/D104918
This reverts commit 6775fc6ffa.
It also reverts "[lldb] Fix compilation by adjusting to the new ASTContext signature."
This reverts commit 03a3f86071.
We see some failures on the lldb infrastructure, these changes might play a role
in it. Let's revert it now and see if the bots will become green.
Ref: https://reviews.llvm.org/D104918
https://reviews.llvm.org/D96033 contained a discussion regarding efficient
modeling of error recovery. @rjmccall has outlined the key ideas:
Conceptually, we can split the translation unit into a sequence of partial
translation units (PTUs). Every declaration will be associated with a unique PTU
that owns it.
The first key insight here is that the owning PTU isn't always the "active"
(most recent) PTU, and it isn't always the PTU that the declaration
"comes from". A new declaration (that isn't a redeclaration or specialization of
anything) does belong to the active PTU. A template specialization, however,
belongs to the most recent PTU of all the declarations in its signature - mostly
that means that it can be pulled into a more recent PTU by its template
arguments.
The second key insight is that processing a PTU might extend an earlier PTU.
Rolling back the later PTU shouldn't throw that extension away. For example, if
the second PTU defines a template, and the third PTU requires that template to
be instantiated at float, that template specialization is still part of the
second PTU. Similarly, if the fifth PTU uses an inline function belonging to the
fourth, that definition still belongs to the fourth. When we go to emit code in
a new PTU, we map each declaration we have to emit back to its owning PTU and
emit it in a new module for just the extensions to that PTU. We keep track of
all the modules we've emitted for a PTU so that we can unload them all if we
decide to roll it back.
Most declarations/definitions will only refer to entities from the same or
earlier PTUs. However, it is possible (primarily by defining a
previously-declared entity, but also through templates or ADL) for an entity
that belongs to one PTU to refer to something from a later PTU. We will have to
keep track of this and prevent unwinding to later PTU when we recognize it.
Fortunately, this should be very rare; and crucially, we don't have to do the
bookkeeping for this if we've only got one PTU, e.g. in normal compilation.
Otherwise, PTUs after the first just need to record enough metadata to be able
to revert any changes they've made to declarations belonging to earlier PTUs,
e.g. to redeclaration chains or template specialization lists.
It should even eventually be possible for PTUs to provide their own slab
allocators which can be thrown away as part of rolling back the PTU. We can
maintain a notion of the active allocator and allocate things like Stmt/Expr
nodes in it, temporarily changing it to the appropriate PTU whenever we go to do
something like instantiate a function template. More care will be required when
allocating declarations and types, though.
We would want the PTU to be efficiently recoverable from a Decl; I'm not sure
how best to do that. An easy option that would cover most declarations would be
to make multiple TranslationUnitDecls and parent the declarations appropriately,
but I don't think that's good enough for things like member function templates,
since an instantiation of that would still be parented by its original class.
Maybe we can work this into the DC chain somehow, like how lexical DCs are.
We add a different kind of translation unit `TU_Incremental` which is a
complete translation unit that we might nonetheless incrementally extend later.
Because it is complete (and we might want to generate code for it), we do
perform template instantiation, but because it might be extended later, we don't
warn if it declares or uses undefined internal-linkage symbols.
This patch teaches clang-repl how to recover from errors by disconnecting the
most recent PTU and update the primary PTU lookup tables. For instance:
```./clang-repl
clang-repl> int i = 12; error;
In file included from <<< inputs >>>:1:
input_line_0:1:13: error: C++ requires a type specifier for all declarations
int i = 12; error;
^
error: Parsing failed.
clang-repl> int i = 13; extern "C" int printf(const char*,...);
clang-repl> auto r1 = printf("i=%d\n", i);
i=13
clang-repl> quit
```
Differential revision: https://reviews.llvm.org/D104918
It's useful to be able to load explicitly-built PCH files into an implicit build (e.g. during dependency scanning). That's currently impossible, since the explicitly-built PCH has an empty modules cache path, while the current compilation has (and needs to have) a valid path, triggering an error in the `PCHValidator`.
This patch adds a preprocessor option and command-line flag that can be used to omit this check.
Reviewed By: dexonsmith
Differential Revision: https://reviews.llvm.org/D103802
Implementation of the unroll directive introduced in OpenMP 5.1. Follows the approach from D76342 for the tile directive (i.e. AST-based, not using the OpenMPIRBuilder). Tries to use `llvm.loop.unroll.*` metadata where possible, but has to fall back to an AST representation of the outer loop if the partially unrolled generated loop is associated with another directive (because it needs to compute the number of iterations).
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D99459
Reduce memory footprint of AST Reader/Writer:
1. Adjust internal data containers' element type.
2. Switch to set for deduplication of deferred diags.
Differential Revision: https://reviews.llvm.org/D101793
Drop non-conformant extension pragma implementation as
it does not properly disable anything and therefore
enabling non-disabled logic has no meaning.
This simplifies clang code and user interface to the extension
functionality. With this patch extension pragma 'begin'/'end'
and 'enable'/'disable' are only accepted for backward
compatibility and no longer have any default behavior.
Differential Revision: https://reviews.llvm.org/D101043
This patch enables explicitly building inferred modules.
Effectively a cherry-pick of https://github.com/apple/llvm-project/pull/699 authored by @Bigcheese with libclang and dependency scanner changes omitted.
Contains the following changes:
1. [Clang] Fix the header paths in clang::Module for inferred modules.
* The UmbrellaAsWritten and NameAsWritten fields in clang::Module are a lie for framework modules. For those they actually are the path to the header or umbrella relative to the clang::Module::Directory.
* The exception to this case is for inferred modules. Here it actually is the name as written, because we print out the module and read it back in when implicitly building modules. This causes a problem when explicitly building an inferred module, as we skip the printing out step.
* In order to fix this issue this patch adds a new field for the path we want to use in getInputBufferForModule. It also makes NameAsWritten actually be the name written in the module map file (or that would be, in the case of an inferred module).
2. [Clang] Allow explicitly building an inferred module.
* Building the actual module still fails, but make sure it fails for the right reason.
Split from D100934.
Reviewed By: dexonsmith
Differential Revision: https://reviews.llvm.org/D102491
If a module contains errors (ie. it was built with
-fallow-pcm-with-compiler-errors and had errors) and was from the module
cache, it is marked as out of date - see
a2c1054c30.
When a module is imported multiple times in the one compile, this caused
it to be recompiled each time - removing the existing buffer from the
module cache and replacing it. This results in various errors further
down the line.
Instead, only mark the module as out of date if it isn't already
finalized in the module cache.
Reviewed By: akyrtzi
Differential Revision: https://reviews.llvm.org/D100619
Problem:
On SystemZ we need to open text files in text mode. On Windows, files opened in text mode adds a CRLF '\r\n' which may not be desirable.
Solution:
This patch adds two new flags
- OF_CRLF which indicates that CRLF translation is used.
- OF_TextWithCRLF = OF_Text | OF_CRLF indicates that the file is text and uses CRLF translation.
Developers should now use either the OF_Text or OF_TextWithCRLF for text files and OF_None for binary files. If the developer doesn't want carriage returns on Windows, they should use OF_Text, if they do want carriage returns on Windows, they should use OF_TextWithCRLF.
So this is the behaviour per platform with my patch:
z/OS:
OF_None: open in binary mode
OF_Text : open in text mode
OF_TextWithCRLF: open in text mode
Windows:
OF_None: open file with no carriage return
OF_Text: open file with no carriage return
OF_TextWithCRLF: open file with carriage return
The Major change is in llvm/lib/Support/Windows/Path.inc to only set text mode if the OF_CRLF is set.
```
if (Flags & OF_CRLF)
CrtOpenFlags |= _O_TEXT;
```
These following files are the ones that still use OF_Text which I left unchanged. I modified all these except raw_ostream.cpp in recent patches so I know these were previously in Binary mode on Windows.
./llvm/lib/Support/raw_ostream.cpp
./llvm/lib/TableGen/Main.cpp
./llvm/tools/dsymutil/DwarfLinkerForBinary.cpp
./llvm/unittests/Support/Path.cpp
./clang/lib/StaticAnalyzer/Core/HTMLDiagnostics.cpp
./clang/lib/Frontend/CompilerInstance.cpp
./clang/lib/Driver/Driver.cpp
./clang/lib/Driver/ToolChains/Clang.cpp
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D99426
Added basic parsing/sema/serialization support to extend the
existing 'destroy' clause for use with the 'interop' directive.
Differential Revision: https://reviews.llvm.org/D98834