[lld-macho] Cache readFile results

In one of our links lld was reading 760k files, but the unique number of
files was only 1500. This takes that link from 30 seconds to 8.

This seems like a heavy hammer, especially since some things don't need
to be cached, like the filelist arguments and the passed static
archives (the latter is already cached as a one off), but it seems ld64
does something similar here to short circuit these duplicate reads:

82e429e186/src/ld/InputFiles.cpp (L644-L665)

Of the types of files being read for our iOS app, the biggest problem
was constantly re-reading small tbd files:

```
% wc -l /tmp/read.txt
761414 /tmp/read.txt
% cat /tmp/read.txt | sort -u | wc -l
1503

% cat /tmp/read.txt | grep "\.a$" | wc -l
43721
% cat /tmp/read.txt | grep "\.tbd$" | wc -l
717656
```

We could likely hoist this logic up to not cache at this level, but it
would be a more invasive change to make sure all callers that needed it
cached the results.

I could see this being an issue with OOMs, and I'm not a linker expert so
maybe there's another way we should solve this problem? Feedback welcome!

Reviewed By: int3, #lld-macho

Differential Revision: https://reviews.llvm.org/D113153
This commit is contained in:
Keith Smiley 2021-11-03 21:42:04 -07:00
parent 6629ec3ecc
commit d49e7244cc
1 changed files with 14 additions and 2 deletions

View File

@ -173,8 +173,19 @@ static bool checkCompatibility(const InputFile *input) {
return true; return true;
} }
// This cache mostly exists to store system libraries (and .tbds) as they're
// loaded, rather than the input archives, which are already cached at a higher
// level, and other files like the filelist that are only read once.
// Theoretically this caching could be more efficient by hoisting it, but that
// would require altering many callers to track the state.
static DenseMap<CachedHashStringRef, MemoryBufferRef> resolvedReads;
// Open a given file path and return it as a memory-mapped file. // Open a given file path and return it as a memory-mapped file.
Optional<MemoryBufferRef> macho::readFile(StringRef path) { Optional<MemoryBufferRef> macho::readFile(StringRef path) {
CachedHashStringRef key(path);
auto entry = resolvedReads.find(key);
if (entry != resolvedReads.end())
return entry->second;
ErrorOr<std::unique_ptr<MemoryBuffer>> mbOrErr = MemoryBuffer::getFile(path); ErrorOr<std::unique_ptr<MemoryBuffer>> mbOrErr = MemoryBuffer::getFile(path);
if (std::error_code ec = mbOrErr.getError()) { if (std::error_code ec = mbOrErr.getError()) {
error("cannot open " + path + ": " + ec.message()); error("cannot open " + path + ": " + ec.message());
@ -192,7 +203,7 @@ Optional<MemoryBufferRef> macho::readFile(StringRef path) {
read32be(&hdr->magic) != FAT_MAGIC) { read32be(&hdr->magic) != FAT_MAGIC) {
if (tar) if (tar)
tar->append(relativeToRoot(path), mbref.getBuffer()); tar->append(relativeToRoot(path), mbref.getBuffer());
return mbref; return resolvedReads[key] = mbref;
} }
// Object files and archive files may be fat files, which contain multiple // Object files and archive files may be fat files, which contain multiple
@ -217,7 +228,8 @@ Optional<MemoryBufferRef> macho::readFile(StringRef path) {
error(path + ": slice extends beyond end of file"); error(path + ": slice extends beyond end of file");
if (tar) if (tar)
tar->append(relativeToRoot(path), mbref.getBuffer()); tar->append(relativeToRoot(path), mbref.getBuffer());
return MemoryBufferRef(StringRef(buf + offset, size), path.copy(bAlloc)); return resolvedReads[key] = MemoryBufferRef(StringRef(buf + offset, size),
path.copy(bAlloc));
} }
error("unable to find matching architecture in " + path); error("unable to find matching architecture in " + path);