llvm-project

Commit Graph

Author	SHA1	Message	Date
Maksim Panchenko	7fd487066f	[BOLT] Move BinaryFunctions into a BinaryContext and more Summary: A long due refactoring that makes interfaces cleaner and less awkward. Mainly makes the future work way easier. (cherry picked from FBD14766284)	2019-04-03 15:52:01 -07:00
Maksim Panchenko	8894853f42	[BOLT][DWARF] Dedup .debug_abbrev section patches Summary: When we patch .debug_abbrev we issue many duplicate patches. Instead of storing these patches as a vector, use a hash map. This saves some processing time and memory. (cherry picked from FBD14691292)	2019-03-29 14:22:54 -07:00
Maksim Panchenko	297d1a4e1a	[BOLT] Do not write jump table section headers Summary: In non-relocation mode we were accidentally emitting section headers for every single jump table. This happened with default `-jump-tables=basic`. (cherry picked from FBD14653282)	2019-03-27 13:58:31 -07:00
Maksim Panchenko	d1b76f2ac2	[BOLT] Allocate enough space past __hot_end for huge pages Summary: While using "-hot-text" option, we might not get enough cold text to fill up the last huge page, and we can get data allocated on this page producing undesirable effects. To prevent this from happening, always make sure to allocate enough space past __hot_end. (cherry picked from FBD14575100)	2019-03-21 21:13:45 -07:00
Maksim Panchenko	69faf61372	[BOLT] Fix section lookup while deleting symbols Summary: While removing redundant local symbols, we used new section index to lookup the corresponding section in the old section table. As a result, we used to either not remove the correct symbols, or remove the wrong ones. (cherry picked from FBD14552047)	2019-03-20 16:13:09 -07:00
Maksim Panchenko	b8d3dc40ea	[BOLT] Use local binding for cold fragment symbols Summary: We used to use existing symbol binding while duplicating and renaming cold fragment symbols. As a result, some of those were emitted with global binding. This confuses gdb, and it starts treating those symbols as additional entry points. The fix is to always emit such symbols with a local binding. This also means that we have to sort static symbol table before emission to make sure local symbols precede all others. (cherry picked from FBD14529265)	2019-03-19 13:46:21 -07:00
Maksim Panchenko	6bcb3389dd	[BOLT] Place hot text mover functions into a separate section Summary: Create a separate pass for assigning functions to sections. Detect functions originating from special sections (by default .stub and .mover) and place them into ".text.mover" if "-hot-text" options is specified. Cold functions are isolated from hot functions even when no function re-ordering is specified. (cherry picked from FBD14512628)	2019-03-15 13:43:36 -07:00
Maksim Panchenko	17cd2034f3	[BOLT] Fix debug line info emission Summary: GDB does not like if the first entry in the line info table after end_sequence entry is not marked with is_stmt. If this happens, it will not print the correct line number information for such address. Note that everything works fine starting with the first address marked with is_stmt. This could happen if the first instruction in the cold section wasn't marked with is_stmt. The fix is to always emit debug line info for the first instruction in any function fragment with is_stmt flag. (cherry picked from FBD14516629)	2019-03-18 19:22:26 -07:00
Maksim Panchenko	61ea19edf8	[BOLT][NFC] Fix compilation warnings Summary: Get rid of warnings while building with Clang. (cherry picked from FBD14495620)	2019-03-15 15:06:41 -07:00
Maksim Panchenko	0a55001a0e	[BOLT] Fix -hot-functions-at-end option Summary: Make "-hot-functions-at-end" option work again. (cherry picked from FBD14476242)	2019-03-14 20:32:04 -07:00
Maksim Panchenko	163adbec9f	[BOLT] Refactor allocatable sections rewrite part Summary: This refactoring makes it easier to create new code sections and control code placement. As an example, cold code is being placed into ".text.cold" which is emitted independently from ".text", and the final address assignment becomes more flexible. Previously, in non-relocation mode we used to emit temporary section name into .shstrtab. This resulted in unnecessary bloat of this section. There was unnecessary padding emitted at the end of text section. After fixing this, the output binary becomes smaller. I had to change the way exception handling tables are re-written as the current infra does not support cross-section label difference. This means we have to emit absolute landing pad addresses, which might not work for PIE binaries. I'm going to address this once I investigate the current exception handling issues in PIEs. This diff temporarily disables "-hot-functions-at-end" option. (cherry picked from FBD14475693)	2019-03-14 18:51:05 -07:00
Maksim Panchenko	a9e64947c5	[NFC][BOLT] Move ExecutableFileMemoryManager into its own file (cherry picked from FBD14474800)	2019-03-14 18:49:40 -07:00
Rafael Auler	c593563d1f	Do not assert on addresses read from processIndirectBranch Summary: As part of our heuristics to decode an indirect branch, if we suspect the branch is an indirect tail call, we add its probable target to the BC::InterproceduralReferences vector to detect functions with more than one entry point. However, if this probable target is not in an allocatable section, we were asserting. Remove this assertion and change the code to conditionally store to InterproceduralReferences instead. The probable target could be garbage at this point because of analyzeIndirectBranch failing to identify the load instruction that has the memory address of the target, so we should tolerate this. (cherry picked from FBD14432821)	2019-03-12 16:36:35 -07:00
Maksim Panchenko	0c704eb75a	[BOLT-HEATMAP] Initial heat map implementation Summary: Add heatmap subcommand to produce heatmaps based on perf.data with LBR. The output is produced in colored ASCII format. llvm-bolt heatmap -p perf.data <executable> -block-size=<uint> - size of a heat map block in bytes (default 64) -line-size=<uint> - number of entries per line (default 256) -max-address=<uint> - maximum address considered valid for heatmap (default 4GB) -o=<string> - heatmap output file (default stdout) (cherry picked from FBD13969992)	2019-02-05 15:28:19 -08:00
Maksim Panchenko	ff6e21290f	[BOLT] New inliner implementation Summary: Addresses correctness issues related to inlining. Inlining heuristics are not part of this diff. (cherry picked from FBD13796888)	2019-01-31 11:23:02 -08:00
Maksim Panchenko	365bd1f1c8	[BOLT] For non-simple functions always update jump tables in-place Summary: For non-simple function we can miss a reference to a jump table or to an indirect goto table. If we move the jump table, the missed reference will not get updated, and the corresponding indirect jump will end up in the old (wrong) location. Updating the original jump table in-place should take care of the issue. (cherry picked from FBD13849776)	2019-01-28 13:46:18 -08:00
Rafael Auler	af81c7ff80	[perf2bolt] Add support for generating autofdo input Summary: Autofdo tools support. (cherry picked from FBD13779026)	2019-01-22 17:21:45 -08:00
Maksim Panchenko	c6ce2abb7d	[perf2bolt] Optimize memory usage in perf2bolt Summary: While converting perf profile, we only need CFG for functions that were profiled and can skip building CFG for the rest. This saves us some processing time and memory. Breakdown processing of perf.data into two steps. The first step parses the data, saves it in intermediate format, and marks functions with the profile. The second step attributes the profile to functions with CFG. When we disassemble and build CFG for functions in aggregate-only mode, we skip functions without the profile. (cherry picked from FBD13706697)	2019-01-15 23:43:40 -08:00
Maksim Panchenko	2fe0c38d6b	[perf2bolt] Better tracking of process forking Summary: Improve tracking of forked processes. If a process corresponding to the input binary has forked/started before 'perf record' was initiated, then the full name of the binary will be recorded in a corresponding MMAP2 event. We've being handling such cases well so far. However, if the process was forked after 'perf record' has started, and execve(2) wasn't called afterwards, then there will be no MMAP2 event recorded corresponding to the mapping of the main binary (unrelated MMAP2 events could still be recorded). To track such cases, we need to parse 'perf script --show-task-events' command output, and to scan for PERF_RECORD_FORK events, and then add forked process PIDs to the list associated with the input binary. If the fork event was followed by an exec event (PERF_RECORD_COMM exec) of a different binary, then the forked PID should be ignored. If the exec event was associated with our input binary, then the correct MMAP2 event was recorded and parsed. To track if the event occurred before or after 'perf record', we parse event's time. This helps us to differentiate some events. E.g. the exec event is only registered correctly if it happened after perf recording has started (otherwise the "exec" part is missing), and thus we only record forks with non-zero time stamps. (cherry picked from FBD13250904)	2018-11-21 20:04:00 -08:00
Maksim Panchenko	067a385000	[BOLT] Add thresholds for function splitting Summary: Use newly added function size estimation to measure the effectiveness and guide function splitting. Two new tuning options are added: -split-threshold=<uint> split function only if its main size is reduced by more than given amount of bytes. Default value: 0, i.e. split iff the size is reduced. Note that on some architectures the size can increase after splitting. -split-align-threshold=<uint> when deciding to split a function, apply this alignment while doing the size comparison (see -split-threshold). Default value: 2. (cherry picked from FBD13136352)	2018-11-15 16:03:34 -08:00
Maksim Panchenko	b0f7fddd35	[BOLT] Add method for better function size estimation Summary: Add BinaryContext::calculateEmittedSize() that ephemerally emits code to allow precise estimation of the function size. Relaxation and macro-op alignment adjustments are taken into account. (cherry picked from FBD13092139)	2018-11-15 16:02:16 -08:00
Maksim Panchenko	e1b8fade7f	[BOLT] Add branch priority policy for blocks with 2 successors Summary: On x86 the difference between long and short jump instructions could be either 4 or 3 bytes, depending if it's a conditional jump or not. For a basic block with 2 jump instructions, if we know that one of the successors is in a different code region, then we can make it a target of an unconditional jump instruction. This will save 1 byte in case the conditional jump happens to be a short one. (cherry picked from FBD13078139)	2018-11-14 14:43:59 -08:00
Maksim Panchenko	40d9fcfdca	[BOLT] Workaround for Clang de-virtualization bug Summary: When Clang is boot-strapped with (Thin)LTO, it may produce a code fragment similar to below: .LFT663334 (6 instructions, align : 1) Predecessors: .LFT663333 00000538: movb $0x1, %al 0000053a: movl %eax, -0x2c(%rbp) 0000053d: movl $"_ZN5clang6Parser12ConsumeParenEv/1", %ecx 00000542: testb $0x1, %cl 00000545: movq -0x40(%rbp), %r14 00000549: je .Ltmp1071462 Successors: .Ltmp1071462, .LFT663335 .LFT663335 (2 instructions, align : 1) Predecessors: .LFT663334 0000054b: movq (%r12), %rax 0000054f: movq .Ltmp0(%rax), %rcx Successors: .Ltmp1071462 .Ltmp1071462 (7 instructions, align : 1) Predecessors: .LFT663334, .LFT663335 00000556: movq %r12, %rdi 00000559: callq *%rcx ....... The code above is making a call by dereferencing a pointer to a member function. A pointer to a member function could either be a regular function, or a virtual function. To differentiate between the two, AMD64 ABI (originated from Itanium ABI) uses the last bit of the pointer. The call instruction sequence varies depending if the function is virtual or not, and the pointer's last bit is checked. If it's "1" then the value of the pointer (minus 1) is used as an offset in the object vtable to get the address of the function, otherwise the pointer is used directly as a function address. In this specific case, a de-virtualization is taking place, but it's not complete. Compiler knows that the member function pointer is actually a non-virtual function _ZN5clang6Parser12ConsumeParenEv (aka "clang::Parser::ConsumeParen()"). However, it keeps the (dead) code that checks the last bit of _ZN5clang6Parser12ConsumeParenEv, and furthermore keeps the code (unreachable/dead) to make a virtual call while using (_ZN5clang6Parser12ConsumeParenEv - 1) as an offset into the vtable. This is obviously wrong, but since the code is unreachable, it will never affect the runtime correctness. The value "_ZN5clang6Parser12ConsumeParenEv - 1" falls into a last byte of a function preceding _ZN5clang6Parser12ConsumeParenEv, and BOLT creates a label ".Ltmp0" pointing to this last byte that is referenced in by the instruction sequence above. It just happens that the last byte is also in the middle of the last instruction, and as a result, BOLT never emits the label, hence resulting in the error message "Undefined temporary symbol". The workaround is to detect non-pc-relative relocations from code pointing to some (fptr - 1). Note that this is not completely error-prone, but non-pc-relative references from code into a middle of a function are quite rare, and chances that in a normal situation they will point to a byte preceding some function address are virtually zero. (cherry picked from FBD13030310)	2018-11-12 12:38:50 -08:00
Maksim Panchenko	30fd960951	[BOLT] Update local symbol count in symbol table Summary: Fix sh_info entry for symbol table section to reflect updated number of local symbols. (cherry picked from FBD10503216)	2018-10-22 18:48:12 -07:00
Maksim Panchenko	a76b13d48e	[perf2bolt] Pre-aggregate LBR samples Summary: Pre-aggregating LBR data cuts pef2bolt processing times in half. (cherry picked from FBD10420286)	2018-10-02 17:16:26 -07:00
Rafael Auler	74a71c6812	Fix bug in analyzeRelocation for GOT entries Summary: Special case GOT relocs to ignore addend subtracting logic in analyzeRelocation, since the addend does not refer to the target of the instruction being analyzed. Also make the code honor the comments in the special case about zeroed out ExtractValue but non-zero addend. Fix facebookincubator/BOLT#40 (cherry picked from FBD10355019)	2018-10-11 18:12:09 -07:00
Facebook Github Bot	b166ccbea8	[BOLT][PR] Fix compiler warnings in BinaryContext and RegAnalysis Summary: This pull request fixes two compiler warnings: - missing `break;` in a switch-case statement in RegAnalysis.cpp (-Wimplicit-fallthrough warning) - misleading indentation in BinaryContext.cpp (-Wmisleading-indentation warning) Pull Request resolved: https://github.com/facebookincubator/BOLT/pull/39 GitHub Author: Andreas Ziegler <andreas.ziegler@fau.de> (cherry picked from FBD10202092)	2018-10-04 10:46:16 -07:00
Igor Sugak	c3c80822a3	[BOLT] Capitalize i Summary: as titled (cherry picked from FBD10136655)	2018-10-01 16:22:46 -07:00
Igor Sugak	cc2276d3f1	[BOLT] fix build with gcc-4.8.5 Summary: These are two minor changes to make it copatible with gcc-4.8.5 (cherry picked from FBD9884971)	2018-09-17 12:17:33 -07:00
Maksim Panchenko	ce508b58c6	[BOLT] Support relocations without symbols Summary: lld may generate relocations without associated symbols. Instead of rejecting binaries with such relocations, we can re-create the symbol the relocation is against based on the extracted value. (cherry picked from FBD10054576)	2018-09-21 12:00:20 -07:00
Rafael Auler	bd0b99c45d	[BOLT] Change stub-insertion pass for AArch64 Summary: Previously, we were expanding eligible branches with stubs. After expansion, we were computing which stubs were unnecessary and removing them, assuming ranges were shortening as code is removed. The problem with this approach is that for branches that refer to code that is not managed by BOLT, the distance to that location can increase and we can end up with an out-of-range branch. This rewrites the pass to be simpler, only increasing size and expanding code with stubs as needed after each iteration, stopping when code stops increasing. Besides this rewrite, the stub-insertion pass now supports stubs grouping similar to what the linker does, allowing different functions to share the same veneer that jumps to a common callee. It also fixes a bug in the previous implementation that, in very large functions that use TBZ/TBNZ (+-32KB range), it would mistakenly try to reuse a local stub BB that is out of range. This includes a change to allow hot functions to be put at the end of the .text section, closer to the heap, requiring no veneers to jump to JITted code. And finally it enables eliminate veneers pass by default. (cherry picked from FBD10023158)	2018-09-17 13:36:59 -07:00
Maksim Panchenko	1387a9d761	[BOLT] Keep .text section in file when using old text Summary: If we reuse text section under `-use-old-text` option, then there's no need to rename it. Tools, such as perf, seem to not like binaries without `.text`. Additionally, check if the code fits into `.text` using the page alignment, otherwise we were skipping the alignment relying on the user detecting the warning message. This could have resulted in unexpected performance drops. Also add `-no-huge-pages` option to use regular page size for code alignment purposes (i.e. 4KiB instead of 2MiB). (cherry picked from FBD10024670)	2018-09-24 20:58:31 -07:00
Maksim Panchenko	53b72d0f2e	[BOLT] Ignore symbols from non-allocatable sections Summary: While creating BinaryData objects we used to process all symbol table entries. However, some symbols could belong to non-allocatable sections, and thus we have to ignore them for the purpose of analyzing in-memory data. (cherry picked from FBD9666511)	2018-09-05 14:36:52 -07:00
Maksim Panchenko	8026760ac0	[BOLT] Fix another issue with profile after ICP Summary: For jump tables ICP was using profile from the jump table itself which doesn't work correct if the jump table is re-used at different code locations. (cherry picked from FBD9618774)	2018-08-30 13:21:50 -07:00
spupyrev	41ed5431a0	[BOLT] turning on the compact aligner by default Summary: Making UseCompactAligner true by default (cherry picked from FBD9325158)	2018-08-14 14:49:10 -07:00
Maksim Panchenko	cd19f718b4	[BOLT] Merge jump table profile data Summary: While running ICF pass we have skipped merging profile data for jump tables. We were only updating profile in the CFG. Fix that. (cherry picked from FBD9595523)	2018-08-30 13:21:29 -07:00
Maksim Panchenko	69e6004a42	[perf2bolt] Fix processing of binaries with names over 15 chars long Summary: Do not truncate the binary name for comparison purposes as the binary name we are getting from "perf script" is no longer truncated. (cherry picked from FBD9596409)	2018-08-30 14:51:10 -07:00
Rafael Auler	d0a80b0870	[BOLT] Change ForceRelocation behavior Summary: Only record address as addend if the target of the relocation is the pseudo-symbol Zero. (cherry picked from FBD9551543)	2018-08-28 18:15:13 -07:00
Maksim Panchenko	708a550084	[BOLT] Fix profile after ICP Summary: After optimizing a target of a jump table, ICP was not updating edge counts corresponding to that target. As a result the edge could be left hot and negatively influence the code layout. (cherry picked from FBD9524396)	2018-08-23 22:47:46 -07:00
Maksim Panchenko	2511b09985	[BOLT][DWARF] Fix line info for empty CU DIEs Summary: In some rare cases a compiler may generate DWARF that contains an empty CU DIE that references a debug line fragment. That fragment will contain no file name information, and we fail to register it. Then, as a result, DW_AT_stmt_list is not updated for the CU. This may cause some DWARF-processing tools to segfault. As a solution/workaround, we register "<unknown>" file name for such debug line tables. (cherry picked from FBD9526705)	2018-08-27 20:12:59 -07:00
Rafael Auler	a7e0704be6	[BOLT] Reduce AArch64 target feature flags Summary: Eliminate some flags that are not recognized and are currently printing warnings when BOLT runs on AArch64. (cherry picked from FBD9499971)	2018-08-24 10:42:00 -07:00
Rafael Auler	af1177d99f	[BOLT] Add mattr options to AArch64 target Summary: Make the AArch64 subtarget enable all features, so the disassembler won't choke on extension instructions. (cherry picked from FBD9477066)	2018-08-22 18:47:39 -07:00
Rafael Auler	9c4fcafa37	[BOLT] Add update-build-id option, on by default Summary: The build-id is used by tools to uniquely identify binaries. Update the output binary build-id with a different number to make it distinguishable from the input binary. This implementation just flips the last build-id bit. (cherry picked from FBD9235336)	2018-08-08 17:55:24 -07:00
Rafael Auler	510a8c4bbe	[BOLT] Fix shrink-wrapping CFI update Summary: When updating CFI for a function that was optimized by shrink-wrapping, if the function had no frame pointers, the CFI update algorithm was incorrect. (cherry picked from FBD9328658)	2018-08-14 17:32:06 -07:00
Maksim Panchenko	88bb145164	[BOLT] Update allocatable relocation sections Summary: Position-independent binaries may have runtime relocations of type R_X86_64_RELATIVE that need an update if they were pointing to one of the functions that we have relocated. (cherry picked from FBD9374164)	2018-08-16 16:53:14 -07:00
Maksim Panchenko	87788ca926	[perf2bolt] Support profiling of PIEs and .so's Summary: Processing profile data for binaries with flexible load address (such as position-independent executables and shared objects) requires adjusting binary addresses depending on the base load address. For every PID the mapping will be more or less unique when executing with ASLR enabled, thus we have to keep the mapping record for all PIDs associated with the binary. Then we adjust the addresses based on those mappings. (cherry picked from FBD9368566)	2018-08-14 13:24:44 -07:00
Maksim Panchenko	560c23411a	[perf2bolt] Use mmap events for PID collection Summary: Switch from using `perf script --show-task-events` to `perf script --show-mmap-events` for associating a binary with PIDs in perf.data. The output of the former command does not provide enough information for PIE/.so processing. (cherry picked from FBD9346586)	2018-08-14 13:24:44 -07:00
Rafael Auler	b10d4724c3	[BOLT] Fix pseudo calculation in BinaryBasicBlock Summary: A recent commit broke our tests because it was depending on getNumNonPseudos() at a very late stage of our optimization pipeline. The problem was in a instruction deletion member function in BinaryBasicBlock that was not updating the number of pseudos after deletion. Fix this. (cherry picked from FBD9305972)	2018-08-13 14:36:38 -07:00
Laith Saed Sakka	b2382dc552	retpoline insertion : further updates. Summary: Couple of updates: 1) Handle address pattern with segment register. 2) Assume R11 available for PLT calls always. 3) Add CFI state to each BB. 4) early exit getMacroOpFusionPair if Instruction.size() <2. (cherry picked from FBD9172426)	2018-08-03 16:36:06 -07:00
Maksim Panchenko	c35dc2a386	[BOLT] Detect and handle fixed indirect branches Summary: Sometimes GCC can generate code where one of jump table entries is being used by an indirect branch with a fixed memory reference, such as "jmp *(JT+8)". If we don't convert such branches to direct ones and move jump tables, then the indirect branch will reference the old table value and will end up at the non-updated destination, possibly causing a runtime crash. This fix converts such indirect branches into direct ones. For now we mark functions containing indirect branches with fixed destination as non-simple to prevent unreachable code elimination problem triggered by related dead/unreachable jump table. (cherry picked from FBD9192363)	2018-08-06 11:22:45 -07:00

1 2 3 4 5 ...

616 Commits All Branches Search

616 Commits

All Branches