In a previous patch, I made changes so that PDBs which were
generated on non-Windows platforms contained sensical paths
for the host. While this is an esoteric use case, we need
it to be supported for certain cross compilation scenarios
especially with LLDB, which can debug things on non-Windows
platforms.
However, this regressed a case where you specify /PDBSOURCEPATH
and use a windows-style path. Previously, we would still remove
dots and canonicalize slashes to backslashes, but since my
change intentionally tried to support non-backslash paths, this
was broken.
This patch fixes the situation by trying to guess which path
style the user is specifying when /PDBSOURCEPATH is passed.
It is intentionally conservative, erring on the side of a
Windows path style unless absolutely certain. All dots are
removed and slashes canonicalized to whatever the deduced
path style is after appending the file path to the /PDBSOURCEPATH
argument.
Differential Revision: https://reviews.llvm.org/D57769
llvm-svn: 353250
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351636
Saves up to 1.3 sec on large PDBs.
Figures below are for the "Globals Stream Layout" pass:
Before This patch
Large EXE (PDB is ~2 GB) 3330 ms 2022 ms
Large EXE (PDB is ~2 GB) 2680 ms 1608 ms
Large DLL (PDB is ~1 GB) 1455 ms 938 ms
Large DLL (PDB is ~800 MB) 1215 ms 800 ms
Small DLL (PDB is ~200 MB) 224 ms 146 ms
Differential Revision: https://reviews.llvm.org/D56334
llvm-svn: 350452
In PDBs, symbol records must be aligned to four bytes. However, in the
object file, symbol records may not be aligned. MSVC does not pad out
symbol records to make sure they are aligned. That means the linker has
to do extra work to insert the padding. Currently, LLD calculates the
required space with alignment, and copies each record one at a time
while padding them out to the correct size. It has a fast path that
avoids this copy when the records are already aligned.
This change fixes a bug in that codepath so that the copy is actually
saved, and tweaks LLVM's symbol record emission to align symbol records.
Here's how things compare when doing a plain clang Release+PDB build:
- objs are 0.65% bigger (negligible)
- link is 3.3% faster (negligible)
- saves allocating 441MB
- new LLD high water mark is ~1.05GB
llvm-svn: 349431
When calling BinaryStreamArray::drop_front(), if the stream
is skewed it means we must never drop the first bytes of the
stream since offsets which occur in records assume the existence
of those bytes. So if we want to skip the first record in a
stream, then what we really want to do is just set the begin
pointer to the next record. But we shouldn't actually remove
those bytes from the underlying view of the data.
llvm-svn: 349066
Previously these were dropped. We now understand them sufficiently
well to start emitting them. From the debugger's perspective, this
now enables us to have debug info about typedefs (both global and
function-locally scoped)
Differential Revision: https://reviews.llvm.org/D55228
llvm-svn: 348306
Summary:
This speeds up linking clang.exe/pdb with /DEBUG:GHASH by 31%, from
12.9s to 9.8s.
Symbol records are typically small (16.7 bytes on average), but we
processed them one at a time. CVSymbol is a relatively "large" type. It
wraps an ArrayRef<uint8_t> with a kind an optional 32-bit hash, which we
don't need. Before this change, each DbiModuleDescriptorBuilder would
maintain an array of CVSymbols, and would write them individually with a
BinaryItemStream.
With this change, we now add symbols that happen to appear contiguously
in bulk. For each .debug$S section (roughly one per function), we
allocate two copies, one for relocation, and one for realignment
purposes. For runs of symbols that go in the module stream, which is
most symbols, we now add them as a single ArrayRef<uint8_t>, so the
vector DbiModuleDescriptorBuilder is roughly linear in the number of
.debug$S sections (O(# funcs)) instead of the number of symbol records
(very large).
Some stats on symbol sizes for the curious:
PDB size: 507M
sym bytes: 316,508,016
sym count: 18,954,971
sym byte avg: 16.7
As future work, we may be able to skip copying symbol records in the
linker for realignment purposes if we make LLVM write them aligned into
the object file. We need to double check that such symbol records are
still compatible with link.exe, but if so, it's definitely worth doing,
since my profile shows we spend 500ms in memcpy in the symbol merging
code. We could potentially cut that in half by saving a copy.
Alternatively, we could apply the relocations *after* we iterate the
symbols. This would require some careful re-engineering of the
relocation processing code, though.
Reviewers: zturner, aganea, ruiu
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D54554
llvm-svn: 347687
Don't use a uint32_t*, use a ulittle32_t* to make this correct
on big endian systems.
Patch by James Clarke
Differential Revision: https://reviews.llvm.org/D54421
llvm-svn: 347349
- Make mergeSymbolRecords a method of PDBLinker to reduce the number of
parameters it needs.
- Remove a stale FIXME comment about error handling. We already drop
unknown symbol records, log them, and continue.
- Update a comment about why we're copying the symbol record. We do it
to realign the record. We can already mutate the symbol record memory,
it's memory allocated by relocateDebugChunk.
- Avoid the extra `CVSymbol NewSym` variable. We can mutate Sym in
place, which is best, since we're mutating the underlying record anyway.
llvm-svn: 346817
This change allows for link-time merging of debugging information from
Microsoft precompiled types OBJs compiled with cl.exe /Z7 /Yc and /Yu.
This fixes llvm.org/PR34278
Differential Revision: https://reviews.llvm.org/D45213
llvm-svn: 346154
This a resubmission of a patch which was previously reverted
due to breaking several lld tests. The issues causing those
failures have been fixed, so the patch is now resubmitted.
---Original Commit Message---
While it doesn't make a *ton* of sense for POSIX paths to be
in PDBs, it's possible to occur in real scenarios involving
cross compilation.
The tools need to be able to handle this, because certain types
of debugging scenarios are possible without a running process
and so don't necessarily require you to be on a Windows system.
These include post-mortem debugging and binary forensics (e.g.
using a debugger to disassemble functions and examine symbols
without running the process).
There's changes in clang, LLD, and lldb in this patch. After
this the cross-platform disassembly and source-list tests pass
on Linux.
Furthermore, the behavior of LLD can now be summarized by a much
simpler rule than before: Unless you specify /pdbsourcepath and
/pdbaltpath, the PDB ends up with paths that are valid within
the context of the machine that the link is performed on.
Differential Revision: https://reviews.llvm.org/D53149
llvm-svn: 344377
This was originally causing some test failures on non-Windows
platforms, which required fixes in the compiler and linker. After
those fixes, however, other tests started failing. Reverting
temporarily until I can address everything.
llvm-svn: 344279
While it doesn't make a *ton* of sense for POSIX paths to be
in PDBs, it's possible to occur in real scenarios involving
cross compilation.
The tools need to be able to handle this, because certain types
of debugging scenarios are possible without a running process
and so don't necessarily require you to be on a Windows system.
These include post-mortem debugging and binary forensics (e.g.
using a debugger to disassemble functions and examine symbols
without running the process).
There's changes in clang, LLD, and lldb in this patch. After
this the cross-platform disassembly and source-list tests pass
on Linux.
Furthermore, the behavior of LLD can now be summarized by a much
simpler rule than before: Unless you specify /pdbsourcepath and
/pdbaltpath, the PDB ends up with paths that are valid within
the context of the machine that the link is performed on.
Differential Revision: https://reviews.llvm.org/D53149
llvm-svn: 344269
/pdbsourcepath: was added in https://reviews.llvm.org/D48882 to make it
possible to have relative paths in the debug info that clang-cl writes.
lld-link then makes the paths absolute at link time, which debuggers require.
This way, clang-cl's output is independent of the absolute path of the build
directory, which is useful for cacheability in distcc-like systems.
This patch extends /pdbsourcepath: (if passed) to also be used for:
1. The "cwd" stored in the env block in the pdb is /pdbsourcepath: if present
2. The "exe" stored in the env block in the pdb is made absolute relative
to /pdbsourcepath: instead of the cwd
3. The "pdb" stored in the env block in the pdb is made absolute relative
to /pdbsourcepath: instead of the cwd
4. For making absolute paths to .obj files referenced from the pdb
/pdbsourcepath: is now useful in three scenarios (the first one already working
before this change):
1. When building with full debug info, passing the real build dir to
/pdbsourcepath: allows having clang-cl's output to be independent
of the build directory path. This patch effectively doesn't change
behavior for this use case (assuming the cwd is the build dir).
2. When building without compile-time debug info but linking with /debug,
a fake fixed /pdbsourcepath: can be passed to get symbolized stacks
while making the pdb and exe independent of the current build dir.
For this two work, lld-link needs to be invoked with relative paths for
the lld-link invocation itself (for "exe"), for the pdb output name, the exe
output name (for "pdb"), and the obj input files, and no absolute path
must appear on the link command (for "cmd" in the pdb's env block).
Since no full debug info is present, it doesn't matter that the absolute
path doesn't exist on disk -- we only get symbols in stacks.
3. When building production builds with full debug info that don't have
local changes, and that get source indexed and their pdbs get uploaded
to a symbol server. /pdbsourcepath: again makes the build output independent
of the current directory, and the fixed path passed to /pdbsourcepath: can
be given the source indexing transform so that it gets mapped to a
repository path. This has the same requirements as 2.
This patch also makes it possible to create PDB files containing Windows-style
absolute paths when cross-compiling on a POSIX system.
Differential Revision: https://reviews.llvm.org/D53021
llvm-svn: 344061
This is a feature that MS link.exe lacks; it currently errors out on
such relocations, just like lld did before.
This allows linking clang.exe for ARM - practically, any image over
16 MB will likely run into the issue.
Differential Revision: https://reviews.llvm.org/D52156
llvm-svn: 342962
Previously, lld-link would use a random byte sequence as the PDB GUID. Instead,
use a hash of the PDB file contents.
To not disturb llvm-pdbutil pdb2yaml, the hash generation is an opt-in feature
on InfoStreamBuilder and ldb/COFF/PDB.cpp always sets it.
Since writing the PDB computes this ID which also goes in the exe, the PDB
writing code now must be called before writeBuildId(). writeBuildId() for that
reason is no longer included in the "Code Layout" timer.
Since the PDB GUID is now a function of the PDB contents, the PDB Age is always
set to 1. There was a long comment above loadExistingBuildId (now gone) about
how not changing the GUID and only incrementing the age was important, but
according to the discussion in PR35914 that comment was incorrect.
Differential Revision: https://reviews.llvm.org/D51956
llvm-svn: 342334
r342003 added support for emitting FPO data from the
DEBUG_S_FRAMEDATA subsection of the .debug$S section to the PDB
file. However, that is not the end of the story. FPO can end
up in two different destinations in a PDB, each corresponding to
a different FPO data source.
The case handled by r342003 involves copying data from the
DEBUG_S_FRAMEDATA subsection of the .debug$S section to the
"New FPO" stream in the PDB, which is then referred to by the
DBI stream. The case handled by this patch involves copying
records from the .debug$F section of an object file to the "FPO"
stream (or perhaps more aptly, the "Old FPO" stream) in the PDB
file, which is also referred to by the DBI stream.
The formats are largely similar, and the difference is mostly
only visible in masm generated object files, such as some of the
low-level CRT object files like memcpy. MASM doesn't appear to
support writing the DEBUG_S_FRAMEDATA subsection, and instead
just writes these records to the .debug$F section.
Although clang-cl does not emit a .debug$F section ever, lld still
needs to support it so we have good debugging for CRT functions.
Differential Revision: https://reviews.llvm.org/D51958
llvm-svn: 342080
- Log the reason for a PDB or precompiled-OBJ load failure
- Properly handle out-of-date PDB or precompiled-OBJ signature by displaying a corresponding error
- Slightly change behavior on PDB failure: any subsequent load attempt from another OBJ would result in the same error message being logged
- Slightly change behavior on PDB failure: retry with filename only if previous error was ENOENT ("no such file or directory")
- Tests: a. for native PDB errors; b. cover all the cases above
Differential Revision: https://reviews.llvm.org/D51559
llvm-svn: 341825
Following D50807, and heading towards D50664, this intermediary change does the following:
1. Upgrade all custom Error types in llvm/trunk/lib/DebugInfo/ to use the new StringError behavior (D50807).
2. Implement std::is_error_code_enum and make_error_code() for DebugInfo error enumerations.
3. Rename GenericError -> PDBError (the file will be renamed in a subsequent commit)
4. Update custom error messages to follow the same formatting: (\w\s*)+\.
5. Keep generic "file not found" (ENOENT) errors as they are in PDB code. Previously, there used to be a custom enumeration for that purpose.
6. Remove a few extraneous LF in log() implementations. Printing LF is a responsability at a higher level, not at the error level.
Differential Revision: https://reviews.llvm.org/D51499
llvm-svn: 341228
After fixing up the runtime pseudo relocation, the .refptr.<var>
will be a plain pointer with the same value as the IAT entry itself.
To save a little binary size and reduce the number of runtime pseudo
relocations, redirect references to the IAT entry (via the __imp_<var>
symbol) itself and discard the .refptr.<var> chunk (as long as the
same section chunk doesn't contain anything else than the single
pointer).
As there are now cases for both setting the Live variable to true
and false externally, remove the accessors and setters and just make
the variable public instead.
Differential Revision: https://reviews.llvm.org/D51456
llvm-svn: 341175
This patch changes relative path for source files in obj files to
absolute path in PDB when linking with added flag.
I will make obj file generated by clang-cl independent from build
directory for chromium build. But I don't want to confuse visual studio
debugger or require additional configuration. To attain this goal, I
added flag to convert relative source file path in obj to absolute path
when emitting PDB.
By removing absolute path from obj files, we can share build cache
between chromium developers even when they are doing debug build.
That will make build time faster.
More context:
https://bugs.chromium.org/p/chromium/issues/detail?id=712796https://groups.google.com/a/chromium.org/forum/#!topic/chromium-dev/5HXSVX-7fPc
llvm-svn: 337439
Previously we emitted 20-byte SHA1 hashes. This is overkill
for identifying debug info records, and has the negative side
effect of making object files bigger and links slower. By
using only the last 8 bytes of a SHA1, we get smaller object
files and ~10% faster links.
This modifies the format of the .debug$H section by adding a new
value for the hash algorithm field, so that the linker will still
work when its object files have an old format.
Differential Revision: https://reviews.llvm.org/D46855
llvm-svn: 332669
It's possible to have an empty object file, for example if you
just compile an empty .c file. This file won't have any sections
so asserting that a file has chunks is definitely wrong.
llvm-svn: 330461
Part of the DBI stream is a list of variable length structures
describing each module that contributes to the final executable.
One member of this structure is a section contribution entry that
describes the first section contribution in the output file for
the given module.
We have been leaving this structure unpopulated until now, so with
this patch it is now filled out correctly.
Differential Revision: https://reviews.llvm.org/D45832
llvm-svn: 330457
Summary:
This change does three things:
- Try to find the file and line number of an undefined symbol
reference by reading codeview debug info.
- Try to find the name of the function or global variable with the
undefined symbol reference by searching the object file's symbol
table.
- Prints the information in the same style as the ELF linker.
Differential Revision: https://reviews.llvm.org/D45467
llvm-svn: 330235
Using Config->is64() will treat ARM64 as Amd64, which is incorrect.
Furthermore, there are more esoteric architectures that could
theoretically be encountered. Just set it directly to the machine
type, which we already know anyway.
llvm-svn: 330157
Most of these are pretty trivial and obvious. Setting the toolchain
version to 14.11 is perhaps a little questionable, but we've been bitten
in the past where one of our version fields sidn't match MSVC's, and I
definitely don't want to go through that diagnosis again as it was
pretty time consuming and hard to track down.
I found all of these by using llvm-pdbutil export to dump the dbi and
pdb streams to a file, then using fc followed by llvm-pdbutil explain to
explain the mismatched bytes.
There are still some more, these are just the low hanging fruit.
Differential Revision: https://reviews.llvm.org/D45276
llvm-svn: 330130
This was reverted several times due to what ultimately turned out
to be incompatibilities in our serialized hash table format.
Several changes went in prior to this to fix those issues since
they were more fundamental and independent of supporting injected
sources, so now that those are fixed this change should hopefully
pass.
llvm-svn: 328363
When investigating bugs in PDB generation, the first step is
often to do the same link with link.exe and then compare PDBs.
But comparing PDBs is hard because two completely different byte
sequences can both be correct, so it hampers the investigation when
you also have to spend time figuring out not just which bytes are
different, but also if the difference is meaningful.
This patch fixes a couple of cases related to string table emission,
hash table emission, and the order in which we emit strings that
makes more of our bytes the same as the bytes generated by MS PDBs.
Differential Revision: https://reviews.llvm.org/D44810
llvm-svn: 328348
This is still failing on a different bot this time due to some
issue related to hashing absolute paths. Reverting until I can
figure it out.
llvm-svn: 328014