llvm-project

Commit Graph

Author	SHA1	Message	Date
Rui Ueyama	c8e6884871	Inline MergeInputSection::getData(). This change seems to make LLD 0.6% faster when linking Clang with debug info. I don't want us to have lots of local optimizations, but this function is very hot, and the improvement is small but not negligible, so I think it's worth doing. llvm-svn: 288757	2016-12-06 02:19:30 +00:00
Rui Ueyama	fcd3fa83ea	Use "equivalence class" instead of "color" to describe the concept in ICF. Also add a citation to GNU gold safe ICF paper. Differential Revision: https://reviews.llvm.org/D27398 llvm-svn: 288684	2016-12-05 18:11:35 +00:00
Rui Ueyama	91ae861af5	Updates file comments and variable names. Use "color" instead of "group id" to describe the ICF algorithm. llvm-svn: 288409	2016-12-01 19:45:22 +00:00
Rui Ueyama	c1835319c9	Parallelize ICF to make LLD's ICF really fast. ICF is short for Identical Code Folding. It is a size optimization to identify two or more functions that happened to have the same contents to merges them. It usually reduces output size by a few percent. ICF is slow because it is computationally intensive process. I tried to paralellize it before but failed because I couldn't make a parallelized version produce consistent outputs. Although it didn't create broken executables, every invocation of the linker generated slightly different output, and I couldn't figure out why. I think I now understand what was going on, and also came up with a simple algorithm to fix it. So is this patch. The result is very exciting. Chromium for example has 780,662 input sections in which 20,774 are reducible by ICF. LLD previously took 7.980 seconds for ICF. Now it finishes in 1.065 seconds. As a result, LLD can now link a Chromium binary (output size 1.59 GB) in 10.28 seconds on my machine with ICF enabled. Compared to gold which takes 40.94 seconds to do the same thing, this is an amazing number. From here, I'll describe what we are doing for ICF, what was the previous problem, and what I did in this patch. In ICF, two sections are considered identical if they have the same section flags, section data, and relocations. Relocations are tricky, becuase two relocations are considered the same if they have the same relocation type, values, and if they point to the same section _in terms of ICF_. Here is an example. If foo and bar defined below are compiled to the same machine instructions, ICF can (and should) merge the two, although their relocations point to each other. void foo() { bar(); } void bar() { foo(); } This is not an easy problem to solve. What we are doing in LLD is some sort of coloring algorithm. We color non-identical sections using different colors repeatedly, and sections in the same color when the algorithm terminates are considered identical. Here is the details: 1. First, we color all sections using their hash values of section types, section contents, and numbers of relocations. At this moment, relocation targets are not taken into account. We just color sections that apparently differ in different colors. 2. Next, for each color C, we visit sections having color C to see if their relocations are the same. Relocations are considered equal if their targets have the same color. We then recolor sections that have different relocation targets in new colors. 3. If we recolor some section in step 2, relocations that were previously pointing to the same color targets may now be pointing to different colors. Therefore, repeat 2 until a convergence is obtained. Step 2 is a heavy operation. For Chromium, the first iteration of step 2 takes 2.882 seconds, and the second iteration takes 1.038 seconds, and in total it needs 23 iterations. Parallelizing step 1 is easy because we can color each section independently. This patch does that. Parallelizing step 2 is tricky. We could work on each color independently, but we cannot recolor sections in place, because it will break the invariance that two possibly-identical sections must have the same color at any moment. Consider sections S1, S2, S3, S4 in the same color C, where S1 and S2 are identical, S3 and S4 are identical, but S2 and S3 are not. Thread A is about to recolor S1 and S2 in C'. After thread A recolor S1 in C', but before recolor S2 in C', other thread B might observe S1 and S2. Then thread B will conclude that S1 and S2 are different, and it will split thread B's sections into smaller groups wrongly. Over- splitting doesn't produce broken results, but it loses a chance to merge some identical sections. That was the cause of indeterminism. To fix the problem, I made sections have two colors, namely current color and next color. At the beginning of each iteration, both colors are the same. Each thread reads from current color and writes to next color. In this way, we can avoid threads from reading partial results. After each iteration, we flip current and next. This is a very simple solution and is implemented in less than 50 lines of code. I tested this patch with Chromium and confirmed that this parallelized ICF produces the identical output as the non-parallelized one. Differential Revision: https://reviews.llvm.org/D27247 llvm-svn: 288373	2016-12-01 17:09:04 +00:00
Rui Ueyama	e8a077badf	Change return types of split{Non,}Strings. They return new vectors, but at the same time they mutate other vectors, so returning values doesn't make much sense. We should just mutate two vectors. llvm-svn: 287979	2016-11-26 15:15:11 +00:00
Rui Ueyama	da06bfb794	Move getLocation from Relocations.cpp to InputSection.cpp. The function was used only within Relocations.cpp, but now we are using it in many places, so this patch moves it to a file that fits to the functionality. llvm-svn: 287943	2016-11-25 18:51:53 +00:00
Rui Ueyama	3fc0f7e54f	Define toString() as a generic function to get a string for error message. We have different functions to stringize objects to construct error messages. For InputFile, we have getFilename, and for InputSection, we have getName. You had to memorize them. I think this is the case where the function overloading comes in handy. This patch defines toString() functions that are overloaded for all these types, so that you just call it in error(). Differential Revision: https://reviews.llvm.org/D27030 llvm-svn: 287787	2016-11-23 18:07:33 +00:00
Eugene Leviant	531df4fcef	[ELF] Print error location in .eh_frame parser Differential revision: https://reviews.llvm.org/D26914 llvm-svn: 287750	2016-11-23 09:45:17 +00:00
Rui Ueyama	f94efdddc0	Add a flag to InputSectionBase for linker script. Previously, we set (uintptr_t)-1 to InputSectionBase::OutSec to record that a section has already been set to be assigned to some output section by linker scripts. Later, we restored nullptr to the pointer to use the field for the original purpose. That overloading is not very easy to understand. This patch adds a bit flag for that purpose, so that we don't need to piggyback the flag on an unrelated pointer. llvm-svn: 287508	2016-11-20 23:15:52 +00:00
Rui Ueyama	bd1f0630a8	Do not expose ICF class from the file. Also this patch uses file-scope functions instead of class member function. Now that ICF class is not visible from outside, InputSection class can no longer be "friend" of it. So I removed the friend relation and just make it expose the features to public. llvm-svn: 287480	2016-11-20 02:39:59 +00:00
Rui Ueyama	77f2a87575	Simplify MergeOutputSection. MergeOutputSection class was a bit hard to use because it provdes a series of finalize functions that have to be called in a right way at a right time. It also intereacted with MergeInputSection, and the logic was somewhat entangled between the two classes. This patch simplifies it by providing only one finalize function. Now, all you have to do is to call MergeOutputSection::finalize when you have added all sections to the output section. Then, it internally merges strings and initliazes StringPiece objects. I think this is much easier to understand. This patch also adds comments. llvm-svn: 287314	2016-11-18 05:05:43 +00:00
George Rimar	d8b27769c8	[ELF] - format. NFC. llvm-svn: 286805	2016-11-14 10:14:18 +00:00
Rui Ueyama	82664d9d4c	Remove a member from InputSectionData and use the pool instead. llvm-svn: 286557	2016-11-11 03:54:59 +00:00
Rafael Espindola	9f0c4bb795	Parse relocations only once. Relocations are the last thing that we wore storing a raw section pointer to and parsing on demand. With this patch we parse it only once and store a pointer to the actual data. The patch also changes where we store it. It is now in InputSectionBase. Not all sections have relocations, but most do and this simplifies the logic. It also means that we now only support one relocation section per section. Given that that constraint is maintained even with -r with gold bfd and lld, I think it is OK. llvm-svn: 286459	2016-11-10 14:53:24 +00:00
Eugene Leviant	41ca327b5e	[ELF] Convert .got.plt section to input section Differential revision: https://reviews.llvm.org/D26349 llvm-svn: 286443	2016-11-10 09:48:29 +00:00
Rafael Espindola	e08e78df6d	Make OutputSectionBase a class instead of class template. The disadvantage is that we use uint64_t instad of uint32_t for some value in 32 bit files. The advantage is a substantially simpler code, faster builds and less code duplication. llvm-svn: 286414	2016-11-09 23:23:45 +00:00
Simon Atanasyan	fa03b0fafa	[ELF][MIPS] Convert .MIPS.abiflags section to synthetic input section Previously, we have both input and output section for .MIPS.abiflags. Now we have only one class for .MIPS.abiflags, which is MipsAbiFlagsSection. This class is a synthetic input section. .MIPS.abiflags sections are handled as regular sections until the control reaches Writer. Writer then aggregates all sections whose type is SHT_MIPS_ABIFLAGS to create a single synthesized input section. The synthesized section is then processed normally as if it came from an input file. llvm-svn: 286398	2016-11-09 21:37:06 +00:00
Simon Atanasyan	ce02cf0099	[ELF][MIPS] Convert .reginfo and .MIPS.options sections to synthetic input sections Previously, we have both input and output sections for .reginfo and .MIPS.options. Now for each such sections we have one synthetic input sections: MipsReginfoSection and MipsOptionsSection respectively. Both sections are handled as regular sections until the control reaches Writer. Writer then aggregates all sections whose type is SHT_MIPS_REGINFO or SHT_MIPS_OPTIONS to create a single synthesized input section. In that moment Writer also save GP0 value to the MipsGp0 field of the corresponding ObjectFile. This value required for R_MIPS_GPREL16 and R_MIPS_GPREL32 relocations calculation. Differential revision: https://reviews.llvm.org/D26444 llvm-svn: 286397	2016-11-09 21:36:56 +00:00
Rafael Espindola	6ff570a395	Make Discarded a InputSection. It was quite confusing that it had SectionKind of Regular, but was not actually a InputSection. llvm-svn: 286379	2016-11-09 16:55:07 +00:00
Rafael Espindola	77dbe9a405	Add a convenience getObj method. NFC. llvm-svn: 286370	2016-11-09 14:39:20 +00:00
Rafael Espindola	1a5411238e	Revert "[ELF] Make InputSection<ELFT>::writeTo virtual" This reverts commit r286100. This saves 8 bytes of every InputSection. llvm-svn: 286235	2016-11-08 14:47:16 +00:00
Eugene Leviant	0a8f1fe6f7	[ELF] Make InputSection<ELFT>::writeTo virtual Differential revision: https://reviews.llvm.org/D26281 llvm-svn: 286100	2016-11-07 09:04:06 +00:00
Rui Ueyama	e8a6102fa9	Rewrite CommonInputSection as a synthetic input section. A CommonInputSection is a section containing all common symbols. That was an input section but was abstracted in a different way than the synthetic input sections because it was written before the synthetic input section was invented. This patch rewrites CommonInputSection as a synthetic input section so that it behaves better with other sections. llvm-svn: 286053	2016-11-05 23:05:47 +00:00
Rui Ueyama	6dc7fcbec4	Create SyntheticSections.cpp. We are going to have many more classes for linker-synthesized input sections, so it's worth to be added to a separate file than to the file for regular input sections. llvm-svn: 285740	2016-11-01 20:28:21 +00:00
Rafael Espindola	092d3b7f3b	Don't store an OutputLoc in every InputSection. It was only used by build-id and that can easily compute it. llvm-svn: 285691	2016-11-01 13:57:19 +00:00
Eugene Leviant	d0a9d1499c	[ELF] Remove unwanted typedef. NFC. llvm-svn: 285683	2016-11-01 10:16:52 +00:00
Eugene Leviant	282251a226	Convert BuildIdSection to input section Differential revision: https://reviews.llvm.org/D25627 llvm-svn: 285682	2016-11-01 09:49:24 +00:00
Eugene Leviant	c4681203e1	Allow fetching source line, when multiple "AX" sections present Differential revision: https://reviews.llvm.org/D26070 llvm-svn: 285680	2016-11-01 09:17:50 +00:00
Rafael Espindola	093abab817	Don't create a dummy ELF to process a binary file. Now that it is easy to create input section and symbols, this is simple. llvm-svn: 285322	2016-10-27 17:45:40 +00:00
Rafael Espindola	99558efed6	Pass a InputSectionData to classoff. This allows a non template class to hold input sections. llvm-svn: 285221	2016-10-26 18:44:57 +00:00
Rafael Espindola	1854a8ebb8	Delete trivial getters. NFC. llvm-svn: 285190	2016-10-26 12:36:56 +00:00
Rafael Espindola	0e090522c8	Read section headers upfront. Instead of storing a pointer, store the members we need. The reason for doing this is that it makes it far easier to create synthetic sections. It also avoids reading data from files multiple times., which might help with cross endian linking and host architectures with slow unaligned access. There are obvious compacting opportunities, but this already has mixed results even on native x86_64 linking. There is also the possibility of better refactoring the code for handling common symbols, but this already shows that a custom class is not necessary. llvm-svn: 285148	2016-10-26 00:54:03 +00:00
Rafael Espindola	397f0aa0d3	Be a bit more consistent about using getters. NFC. llvm-svn: 285082	2016-10-25 16:42:46 +00:00
Rafael Espindola	58139d1758	Delete getSectionHdr. We were fairly inconsistent as to what information should be accessed with getSectionHdr and what information (like alignment) was stored elsewhere. Now all section info has a dedicated getter. The code is also a bit more compact. llvm-svn: 285079	2016-10-25 16:14:25 +00:00
Hans Wennborg	7314c48bcb	Fix SectionPiece size when compiling with MSVC Builds were failing with: InputSection.h(139): error C2338: SectionPiece is too big because MSVC does record layout differently, probably not packing the 'OutputOff' and 'Live' bitfields because their types are of different size. Using size_t for 'Live' seems to fix it. llvm-svn: 284740	2016-10-20 15:59:08 +00:00
Rafael Espindola	113860b9ae	Compact SectionPiece. We allocate a lot of these when linking debug info. This speeds up the link of debug programs by 1% to 2%. llvm-svn: 284716	2016-10-20 10:55:58 +00:00
Rui Ueyama	388838ed23	Format. NFC. llvm-svn: 284697	2016-10-20 05:23:23 +00:00
Rafael Espindola	116d83fbe0	Don't call markLiveAt for non alloc sections. We don't gc them anyway, so just use an early return in Enqueue. llvm-svn: 284663	2016-10-19 23:13:40 +00:00
Rui Ueyama	05384080df	Support GNU-style ZLIB-compressed input sections. Previously, we supported only SHF_COMPRESSED sections because it's new and it's the ELF standard. But there are object files compressed in the GNU style out there, so we had to support it. Sections compressed in the GNU style start with ".zdebug_" and contain different headers than the ELF standard's one. In this patch, getRawCompressedData is responsible to handle it. A tricky thing about GNU-style compressed sections is that we have to rename them when creating output sections. ".zdebug_" prefix implies the section is compressed. We need to rename ".zdebug_" ".debug" because our output sections are not compressed. We do that in this patch. llvm-svn: 284068	2016-10-12 22:36:31 +00:00
Peter Smith	0760605ac5	[ELF][ARM] Garbage collection support for .ARM.exidx sections .ARM.exidx sections have a reverse dependency on the section they have a SHF_LINK_ORDER dependency on. In other words a .ARM.exidx section is live only if the executable section it describes is live. We implement this with a reverse dependency field in InputSection. Adding the dependency to InputSection is the simplest implementation but it could be moved out to a separate map if it were found to decrease performance for non ARM targets. Differential revision: https://reviews.llvm.org/D25234 llvm-svn: 283734	2016-10-10 10:10:27 +00:00
Peter Smith	0a259f3b9c	[ELF][ARM] Initial implentation of ARM exceptions support The .ARM.exidx sections contain a table. Each entry has two fields: - PREL31 offset to the function the table entry describes - Action to take, either cantunwind, inline unwind, or PREL31 offset to .ARM.extab section The table entries must be sorted in order of the virtual addresses the first entry of the table describes. Traditionally this is implemented by the SHF_LINK_ORDER dependency. Instead of implementing this directly we sort the table entries post relocation. The .ARM.exidx OutputSection is described by the PT_ARM_EXIDX program header Differential revision: https://reviews.llvm.org/D25127 llvm-svn: 283730	2016-10-10 09:39:26 +00:00
Rafael Espindola	5fc2b1d2fe	Store the hash in SectionPiece. This spreads out computing the hash and using it in a hash table. The speedups are: firefox master 6.811232891 patch 6.559280249 1.03841162939x faster chromium master 4.369323666 patch 4.33171853 1.00868134338x faster chromium fast master 1.856679971 patch 1.850617741 1.00327578725x faster the gold plugin master 0.32917962 patch 0.325711944 1.01064645023x faster clang master 0.558015452 patch 0.550284165 1.01404962652x faster llvm-as master 0.032563515 patch 0.032152077 1.01279662275x faster the gold plugin fsds master 0.356221362 patch 0.352772162 1.00977741549x faster clang fsds master 0.635096494 patch 0.627249229 1.01251060127x faster llvm-as fsds master 0.030183188 patch 0.029889544 1.00982430511x faster scylla master 3.071448906 patch 2.938484138 1.04524944215x faster This seems to be because we don't stall as much. When linking firefox stalled-cycles-frontend goes from 57.56% to 55.55%. With -O2 the difference is even more significant since we avoid recomputing the hash. For firefox we go from 9.990295265 to 9.149627521 seconds (1.09x faster). llvm-svn: 283367	2016-10-05 19:36:02 +00:00
Rafael Espindola	32aca87bf8	Compact SectionPiece. It is pretty easy to get the data from the InputSection, so we don't have to store it. This opens the way for storing the hash instead. llvm-svn: 283357	2016-10-05 18:40:00 +00:00
Rafael Espindola	939e9493bf	Simplify setting the Live bit in SectionPiece. NFC. llvm-svn: 283340	2016-10-05 17:02:09 +00:00
Rafael Espindola	c7e1e03498	Store an ArrayRef for Data in InputSectionData. llvm-svn: 281210	2016-09-12 13:13:53 +00:00
Rafael Espindola	54f1614ec1	Revert "Revert "Compact InputSectionData from 64 to 48 bytes. NFC."" This reverts commit r281096. The previous link errors should be fixed by r281208. llvm-svn: 281209	2016-09-12 13:06:10 +00:00
Rafael Espindola	78fe670994	Revert "Compact InputSectionData from 64 to 48 bytes. NFC." This reverts commit r281084. The link was failing on some bots. No idea why. I will try to reproduce it on Monday. llvm-svn: 281096	2016-09-09 21:20:30 +00:00
Rafael Espindola	82621dcb10	Compact InputSectionData from 64 to 48 bytes. NFC. llvm-svn: 281084	2016-09-09 19:42:11 +00:00
Rafael Espindola	042a3f209b	Compute section names only once. This simplifies error handling as there is now only one place in the code that needs to consider the possibility that the name is corrupted. Before we would do it in every access. llvm-svn: 280937	2016-09-08 14:06:08 +00:00
Rafael Espindola	16853bb00f	Pack InputSectionData from 72 to 64 bytes. NFC. llvm-svn: 280925	2016-09-08 12:33:41 +00:00

1 2 3 4

157 Commits