llvm-project

Commit Graph

Author	SHA1	Message	Date
Maksim Panchenko	a99005397f	[BOLT] Fix branch count in removeDuplicateConditionalSuccessor(). Summary: When we merge the original branch counts we have to make sure both of them have a profile. Otherwise set the count to COUNT_NO_PROFILE. The misprediction count should be 0. (cherry picked from FBD4837774)	2017-04-05 13:00:20 -07:00
Bill Nell	6c5c65e3a3	[BOLT] Fix double jump peephole, remove useless conditional branches. Summary: I split some of this out from the jumptable diff since it fixes the double jump peephole. I've changed the pass manager so that UCE and peepholes are not called after SCTC. I've incorporated a call to the double jump fixer to SCTC since it is needed to fix things up afterwards. While working on fixing the double jump peephole I discovered a few useless conditional branches that could be removed as well. I highly doubt that removing them will improve perf at all but it does seem odd to leave in useless conditional branches. There are also some minor logging improvements. (cherry picked from FBD4751875)	2017-03-20 22:44:25 -07:00
Maksim Panchenko	f7d32f7e7d	[BOLT] Detect and reject binaries built for coverage. Summary: Don't attempt to optimize binaries built with coverage support. (cherry picked from FBD4810330)	2017-03-31 07:51:30 -07:00
Maksim Panchenko	c166a8c1a7	[BOLT] Fix debug info update for inlining. Summary: When inlining, if a callee has debug info and a caller does not (i.e. a containing compilation unit was compiled without "-g"), we try to update a nonexistent compilation unit. Instead we should skip updating debug info in such cases. Minor refactoring of line number emitting code. (cherry picked from FBD4823982)	2017-04-03 16:24:26 -07:00
Maksim Panchenko	0bde796e50	[BOLT] Organize options in categories for pretty printing (near NFC). Summary: Each BOLT-specific option now belongs to BoltCategory or BoltOptCategory. Use alphabetical order for options in source code (does not affect output). The result is a cleaner output of "llvm-bolt -help" which does not include any unrelated llvm options and is close to the following: ..... BOLT generic options: -data=<string> - <data file> -dyno-stats - print execution info based on profile -hot-text - hot text symbols support (relocation mode) -o=<string> - <output file> -relocs - relocation mode - use relocations to move functions in the binary -update-debug-sections - update DWARF debug sections of the executable -use-gnu-stack - use GNU_STACK program header for new segment (workaround for issues with strip/objcopy) -use-old-text - re-use space in old .text if possible (relocation mode) -v=<uint> - set verbosity level for diagnostic output BOLT optimization options: -align-blocks - try to align BBs inserting nops -align-functions=<uint> - align functions at a given value (relocation mode) -align-functions-max-bytes=<uint> - maximum number of bytes to use to align functions -boost-macroops - try to boost macro-op fusions by avoiding the cache-line boundary -eliminate-unreachable - eliminate unreachable code -frame-opt - optimize stack frame accesses ...... (cherry picked from FBD4793684)	2017-03-28 14:40:20 -07:00
Maksim Panchenko	d5a0264a9e	[BOLT] Issue error in relocs mode if input is lacking relocations. Summary: If we specify "-relocs" flag and an input has no relocations we proceed with assumptions that relocations were there and break the binary. Detect the condition above, and reject the input. (cherry picked from FBD4761239)	2017-03-22 22:05:50 -07:00
Rafael Auler	ad81bd6779	Change dynostats dynamic instruction count policy Summary: Also add LOAD/STORE counters. (cherry picked from FBD4732284)	2017-03-17 10:32:56 -07:00
Bill Nell	b1ef186ca9	[BOLT] Don't allow non-symbol targets in ICP Summary: ICP was letting through call targets that weren't symbols. This diff filters out the non-symbol targets before running ICP. (cherry picked from FBD4735358)	2017-03-18 11:55:45 -07:00
Maksim Panchenko	e6f96de4d0	[BOLT] Add option to print only specific functions. Summary: Add option '-print-only=func1,func2,...' to print only functions of interest. The rest of the functions are still processed and optimized (e.g. inlined), but only the ones on the list are printed. (cherry picked from FBD4734610)	2017-03-17 19:05:11 -07:00
Maksim Panchenko	6cfd7ac2d5	[BOLT] Do not overwrite starting address in non-relocation mode. Summary: In non-relocation mode we shouldn't attemtp to change ELF entry point. What made matters worse - it broke '-max-funcs=' and '-funcs=' options since an entry function more often than not was excluded from the list of processed functions, and we were setting entry point to 0. (cherry picked from FBD4720044)	2017-03-15 19:31:20 -07:00
Maksim Panchenko	559a57a181	[BOLT] Improve dynostats output. Summary: Reduce verbosity of dynostats to make them more readable. * Don't print "before" dynostats twice. * Detect if dynostats have changed after optimization and print before/after only if at least one metric have changed. Otherwise just print dynostats once and indicate "no change". * If any given metric hasn't changed, then print the difference as "(=)" as opposed to (+0.0%). (cherry picked from FBD4705920)	2017-03-14 09:03:23 -07:00
Maksim Panchenko	351af0c895	[BOLT] Do not process empty functions. Summary: While running on a recent test binary BOLT failed with an error. We were trying to process '__hot_end' (which is not really a function), and asserted that it had no basic blocks. This diff marks functions with empty basic blocks list as non-simple since there's no need to process them. (cherry picked from FBD4696517)	2017-03-12 11:30:05 -07:00
Bill Nell	2e5c2e689f	Fix hfsort callgraph stats, add hfsort test. Summary: The stats for call sites that are not included in the call graph were broken. The intention is to count the total number of call sites vs. the number of call sites that are ignored because they have targets that are not BinaryFunctions. Also add a new test for hfsort. (cherry picked from FBD4668631)	2017-03-07 11:45:07 -08:00
Maksim Panchenko	f4825ea417	[BOLT] Fix gcc5 build. Summary: A <numeric> include is required for gcc5 build. (cherry picked from FBD4671953)	2017-03-07 18:09:09 -08:00
Maksim Panchenko	98737b34bb	[BOLT] Fix verbose output. Summary: Inadvertently, output of BOLT became way too verbose. Discovered while building HHVM on master. (cherry picked from FBD4669881)	2017-03-07 14:22:15 -08:00
Bill Nell	fed0980139	[BOLT] Update tests Summary: Fix validateCFG to handle BBs that were generated from code that used _builtin_unreachable(). Add -verify-cfg option to run CFG validation after every optimization pass. (cherry picked from FBD4641174)	2017-02-27 21:44:38 -08:00
Maksim Panchenko	0acba2bcf0	[BOLT] Detect unmarked data in text. Summary: Sometimes a code written in assembly will have unmarked data (such as constants) embedded into text. Typically such data falls into a "padding" address space of a function. This diffs detects such references, and adjusts the padding space to prevent overwriting of code in data. Note that in relocation mode we prefer to overwrite the original code (-use-old-text) and thus cannot simply ignore data in text. (cherry picked from FBD4662780)	2017-02-21 14:18:09 -08:00
Maksim Panchenko	f241e252fc	[BOLT] Detect and handle __builtin_unreachable(). Summary: Calls to __builtin_unreachable() can result in a inconsistent CFG. It was possible for basic block to end with a conditional branche and have a single successor. Or there could exist non-terminated basic block without successors. We also often treated conditional jumps with destination past the end of a function as conditional tail calls. This can be prevented reliably at least when the byte past the end of the function does not belong to the next function. This diff includes several changes: * At disassembly stage jumps past the end of a function are converted into 'nops'. This is done only for cases when we can guarantee that the jump is not a tail call. Conversion to nop is required since the instruction could be referenced either by exception handling tables and/or debug info. Nops are later removed. * In CFG insert 'ret' into non-terminated basic blocks without successors (this almost never happens). * Conditional jumps at the end of the function are removed from CFG. The block will still have a single successor. * Cases where a destination of a jump instruction is the start of the next function, are still conservatively handled as (conditional) tail calls. (cherry picked from FBD4655046)	2017-03-03 11:35:41 -08:00
Maksim Panchenko	6dc2351505	[BOLT] New CFI handling policy. Summary: The new interface for handling Call Frame Information: * CFI state at any point in a function (in CFG state) is defined by CFI state at basic block entry and CFI instructions inside the block. The state is independent of basic blocks layout order (this is implied by CFG state but wasn't always true in the past). * Use BinaryBasicBlock::getCFIStateAtInstr(const MCInst Inst) to get CFI state at any given instruction in the program. No need to call fixCFIState() after any given pass. fixCFIState() is called only once during function finalization, and any function transformations after that point are prohibited. * When introducing new basic blocks, make sure CFI state at entry is set correctly and matches CFI instructions in the basic block (if any). * When splitting basic blocks, use getCFIStateAtInstr() to get a state at the split point, and set the new basic block's CFI state to this value. Introduce CFG_Finalized state to indicate that no further optimizations are allowed on the function. This state is reached after we have synced CFI instructions and updated EH info. Rename "-print-after-fixup" option to "-print-finalized". This diffs fixes CFI for cases when we split conditional tail calls, and for indirect call promotion optimization. (cherry picked from FBD4629307)	2017-02-24 21:59:33 -08:00
Rafael Auler	965a373dc4	Fix warnings when compiling with clang (NFC) Summary: Fix inconsistent override keyword usages and initializes a missing field of a Relocation object when using braced initializers. (cherry picked from FBD4622856)	2017-02-27 13:09:27 -08:00
Maksim Panchenko	2029458f34	[BOLT] Strip 'repz' prefix from 'repz retq'. Summary: Add pass to strip 'repz' prefix from 'repz retq' sequence. The prefix is not used in Intel CPUs afaik. The pass is on by default. (cherry picked from FBD4610329)	2017-02-23 18:09:10 -08:00
Maksim Panchenko	88a461014b	[BOLT] Don't set code skew in relocations mode. Summary: We use code skew in non-relocation mode since functions have fixed addresses, and internal alignment has to be adjusted wrt the skew. However in relocation mode it interferes with effective code alignment, and has to be disabled. I missed it when was re-basing the relocation diff. (cherry picked from FBD4599670)	2017-02-22 11:29:52 -08:00
Maksim Panchenko	d3e33b6edc	[BOLT] Fix -jump-tables=basic in relocation mode. Summary: In a prev diff I added an option to update jump tables in-place (on by default) and accidentally broke the default handling of jump tables in relocation mode. The update should be happening semi-automatically, but because we ignore relocations for jump tables it wasn't happening (derp). Since we mostly use '-jump-tables=move' this hasn't been noticed for some time. This diff gets rid of IgnoredRelocations and removes relocations from a relocation set when they are no longer needed. If relocations are created later for jump tables they are no longer ignored. (cherry picked from FBD4595159)	2017-02-21 16:15:15 -08:00
Maksim Panchenko	88244a10bb	[BOLT] Move BOLT passes under Passes subdirectory (NFC). Summary: Move passes under Passes subdirectory. Move inlining passes under Passes/Inliner.* (cherry picked from FBD4575832)	2017-02-16 14:57:57 -08:00
Maksim Panchenko	f06a1455ea	[BOLT] Add support for *GOTPCRELX relocation type. Summary: gcc5 can generate new types of relocations that give linker a freedom to substitute instructions. These relocations are PC-relative, and since we manually process such relocations they don't present much of a problem. Additionally, detect non-pc-relative access from code into a middle of a function. Occasionally I've seen such code, but don't know exactly how to trigger its generation. Just issue a warning for now. (cherry picked from FBD4566473)	2017-02-14 22:55:10 -08:00
Maksim Panchenko	82965b963f	[BOLT] Emit short tail calls in relocation mode. Summary: To minimize size of the output code we should emit tail calls that are as short as possible. For this we have to convert a synthetic TAILJMPd into JMP_1 instruction. This should be one of the last passes as most of analysis passes could break since tail calls will no longer be marked as such. The total size of the code is smaller, but not by much - hot text was reduced by 192 bytes. (cherry picked from FBD4557804)	2017-02-13 23:05:12 -08:00
Maksim Panchenko	734a7a5437	[BOLT] Skip disassembly of padding at function end. Summary: Some functions coming from assembly may not have been marked with size. We assume the size to include all bytes up to the next function/object in the file. As a result, function body will include any padding inserted by the linker. If linker inserts 0-value bytes this could be misinterpreted as invalid instruction and BOLT will bail out on such functions in non-relocation mode, and give up on a binary in relocation mode. This diff detects zero-padding, ignores it, and continues processing as normal. (cherry picked from FBD4528893)	2017-02-08 09:14:10 -08:00
Maksim Panchenko	6b0b5bbae7	[BOLT] Reject sanitized binaries. Summary: Whenever input binary is suspected to have been sanitized we print an error message and exit. I've checked that "__asan_init*" symbol presence is the most conservative way to detect "sanitization". (cherry picked from FBD4525478)	2017-02-07 15:56:00 -08:00
Maksim Panchenko	c89821cee3	[BOLT] Detect and prevent re-optimization attempts. Summary: Whenever we try to re-optimize a binary with BOLT we should issue an error and exit. (cherry picked from FBD4525228)	2017-02-07 15:31:14 -08:00
Maksim Panchenko	e212805ea6	[BOLT] Update section names in output file. Summary: Re-write section header string table to reflect new names given to sections. Old sections get ".bolt.org" prefix. E.g. when we write ".eh_frame" section, we keep the old copy but rename it to ".bolt.org.eh_frame". Note: the new code section is named ".bolt.text" - it contains split function bodies, while original ".text" name is left unchanged. (cherry picked from FBD4524935)	2017-02-07 12:20:46 -08:00
Bill Nell	d74997c3cc	Indirect call promotion optimization. Summary: Perform indirect call promotion optimization in BOLT. The code scans the instructions during CFG creation for all indirect calls. Right now indirect tail calls are not handled since the functions are marked not simple. The offsets of the indirect calls are stored for later use by the ICP pass. The indirect call promotion pass visits each indirect call and examines the BranchData for each. If the most frequent targets from that callsite exceed the specified threshold (default 90%), the call is promoted. Otherwise, it is ignored. By default, only one target is considered at each callsite. When an candiate callsite is processed, we modify the callsite to test for the most common call targets before calling through the original generic call mechanism. The CFG and layout are modified by ICP. A few new command line options have been added: -indirect-call-promotion -indirect-call-promotion-threshold=<percentage> -indirect-call-promotion-topn=<int> The threshold is the minimum frequency of a call target needed before ICP is triggered. The topn option controls the number of targets to consider for each callsite, e.g. ICP is triggered if topn=2 and the total requency of the top two call targets exceeds the threshold. Example of ICP: C++ code: int B_count = 0; int C_count = 0; struct A { virtual void foo() = 0; } struct B : public A { virtual void foo() { ++B_count; }; }; struct C : public A { virtual void foo() { ++C_count; }; }; A* a = ... a->foo(); ... original: 400863: 49 8b 07 mov (%r15),%rax 400866: 4c 89 ff mov %r15,%rdi 400869: ff 10 callq (%rax) 40086b: 41 83 e6 01 and $0x1,%r14d 40086f: 4d 89 e6 mov %r12,%r14 400872: 4c 0f 44 f5 cmove %rbp,%r14 400876: 4c 89 f7 mov %r14,%rdi ... after ICP: 40085e: 49 8b 07 mov (%r15),%rax 400861: 4c 89 ff mov %r15,%rdi 400864: 49 ba e0 0b 40 00 00 movabs $0x400be0,%r10 40086b: 00 00 00 40086e: 4c 3b 10 cmp (%rax),%r10 400871: 75 29 jne 40089c <main+0x9c> 400873: 41 ff d2 callq %r10 400876: 41 83 e6 01 and $0x1,%r14d 40087a: 4d 89 e6 mov %r12,%r14 40087d: 4c 0f 44 f5 cmove %rbp,%r14 400881: 4c 89 f7 mov %r14,%rdi ... 40089c: ff 10 callq *(%rax) 40089e: eb d6 jmp 400876 <main+0x76> (cherry picked from FBD3612218)	2016-09-07 18:59:23 -07:00
Maksim Panchenko	6ff1795d96	[BOLT] Support overwriting jump tables in-place. Summary: Add an option to overwrite jump tables without moving and make it a default: -jump-tables - jump tables support (default=basic) =none - do not optimize functions with jump tables =basic - optimize functions with jump tables =move - move jump tables to a separate section =split - split jump tables section into hot and cold based on function execution frequency =aggressive - aggressively split jump tables section based on usage of the tables (cherry picked from FBD4448499)	2017-01-17 15:49:59 -08:00
Rafael Auler	6dfd16cb4c	Cover RSP-indexed accesses in frame optimization Summary: Add a new dataflow analysis to recover the value of RSP at a given point of the program. This value is expressed as an offset from the CFA. Use this information to detect redundant load in memory accesses performed via RSP as well, not only RBP as done previously. Bail when RSP value (as an offset of the CFA) can't be reliably determined with a simple dataflow analysis. (cherry picked from FBD4372261)	2016-12-28 17:09:52 -08:00
Maksim Panchenko	503c741d43	[BOLT] Report stale functions' percentage wrt all profiled functions. Summary: Report stale functions percentage with respect to all profiled functions instead of all simple functions in the binary. The new reporting format should make it more apparent if the profile is out-of-date. Compare: BOLT-INFO: 341 (16.7% of all profiled) functions have invalid (possibly stale) profile. vs old: BOLT-INFO: 341 (0.3%) functions have invalid (possibly stale) profile. (cherry picked from FBD4451746)	2017-01-23 13:08:40 -08:00
Maksim Panchenko	19859377f8	[BOLT] Fix debug info update for zero-length ranges. Summary: Due to a clowntown on my part we were generating wrong ranges when an empty range was seen on input. We were basically expanding the range to include all basic blocks following such range and setting wrong sizes at the same time. Add "-dump-cu" option to llvm-dwarfdump that allows to look at debug info of a single compile unit only. Saves time if we are only interested in a subset of information. (cherry picked from FBD4430989)	2017-01-18 10:09:54 -08:00
Maksim Panchenko	0894905373	[ICF] Don't re-fold functions in non-relocation mode. Summary: In-non relocation mode, when we run ICF the second time, we fold the same functions again since they were not removed from the function set. This diff marks them as folded and ignores them during ICF optimization. Note that we still want to optimize such functions since they are potentially called from the code not covered by BOLT in non-relocation mode. Folded functions are also excluded from dyno stats with this diff Also print the number of times folded functions were called. When 2 functions - f1() and f2() are folded, that number would be min(call_frequency(f1), call_frequency(f2)). (cherry picked from FBD4399993)	2017-01-10 11:20:56 -08:00
Maksim Panchenko	bc8a456309	ICF improvements. Summary: Re-worked the way ICF operates. The pass now checks for more than just call instructions, but also for all references including function pointers. Jump tables are handled too. (cherry picked from FBD4372491)	2016-12-21 17:13:56 -08:00
Maksim Panchenko	55fc5417f8	Relocations support for BOLT. Summary: Read relocation from linker and relocate all functions. (cherry picked from FBD4223901)	2016-09-27 19:09:38 -07:00
Rafael Auler	a75bbfc640	Add a frame optimization pass Summary: This is a first attempt to perform data flow analyses on bolt and try to rebuild the stack frame for functions. The goal of the frame optimization pass is to detect instructions that are accessing stack and, if loading values, evaluate whether this load is redundant and we can substitute the memory operation for a register load or immediate load. To find opportunities, this pass also builds a map of clobbered registers by function, so we use this in our analysis at call sites. If a call site is found out to not clobber a caller-saved register but the caller is spilling it anyway to the stack (to comply with the ABI), we should detect these cases and remove this unnecessary move. (cherry picked from FBD4337238)	2016-12-05 11:47:08 -08:00
Bill Nell	3a3dfc3dc2	BOLT: Use profiling info to control branch simplification optimization. Summary: An optimization to simplify conditional tail calls by removing unnecessary branches. It adds the following two command line options: -simplify-conditional-tail-calls - simplify conditional tail calls by removing unnecessary jumps -sctc-mode - mode for simplify conditional tail calls =always - always perform sctc =preserve - only perform sctc when branch direction is preserved =heuristic - use branch prediction data to control sctc This optimization considers both of the following cases: foo: ... jcc L1 original ... L1: jmp bar # TAILJMP -> foo: ... jcc bar iff jcc L1 is expected ... L1 is unreachable OR foo: ... jcc L2 L1: jmp dest # TAILJMP L2: ... -> foo: jncc dest # TAILJMP L2: ... L1 is unreachable For this particular case, the first basic block ends with a conditional branch and has two successors, one fall-through and one for when the condition is true. The target of the conditional is a basic block with a single unconditional branch (i.e. tail call) to another function. We don't care about the contents of the fall-through block. (cherry picked from FBD3719617)	2016-09-22 18:08:20 -07:00
Rafael Auler	06caefdb1d	Fix typo in time passes Summary: Previously NamedRegionTimer's constructor was being called with no local variable associated with it owing to a typo. We need a local variable to keep track of the time spent in the scope. At the end of the scope, the destructor will be called an then the timer will stop. (cherry picked from FBD4301844)	2016-12-08 13:34:56 -08:00
Rafael Auler	c570038d31	Add option to time passes Summary: As we begin to work on optimization passes for bolt, it is important to keep track of the time spent in each of these to measure their contribution to the time bolt takes to finish rewriting a program. (cherry picked from FBD4301136)	2016-12-08 12:15:20 -08:00
Rafael Auler	3888c5604f	Remove unused private var in CFIReaderWriter (NFC) Summary: This member variable is dead. (cherry picked from FBD4255342)	2016-11-30 16:03:53 -08:00
Rafael Auler	5c0e4b6a57	Fix undefined behavior in DebugInfo Summary: The CFI instructions parser in libDebugInfo was relying on undefined behavior to parse operands by assuming the order function parameters are evaluated in a function call site is defined (it is not). This patch fix this and makes our clang and gcc tests agree. It also fixes wrong LIT tests in our codebase with respect to the order of DW_CFA_def_cfa operands. (cherry picked from FBD4255227)	2016-11-30 15:52:24 -08:00
Rafael Auler	a331fa396b	Fix memory leak in DWARFRewriter Summary: Clang's Address Sanitizer caught this leak where MCAsmBackend and MCObjectWriter instances were being created but not freed. Fix this. (cherry picked from FBD4249941)	2016-11-29 20:11:32 -08:00
Rafael Auler	5cc9c58410	Avoid const_iterator on std::vector::emplace Summary: This is part of a series of clean-up patches to make bolt cleanly compile with clang 4.0. This patch fixes an error where clang will fail to compile because it does not support passing a const_iterator to std::vector<T>::emplace(Iter, ...). (cherry picked from FBD4242546)	2016-11-28 17:45:25 -08:00
Rafael Auler	b21bc02ac4	Remove pessimizing std::move Summary: This is part of a series of clean-up patches to make bolt cleanly compile with clang 4.0. This patch fixes the following warning: moving a temporary object prevents copy elision (cherry picked from FBD4242236)	2016-11-28 17:25:17 -08:00
Rafael Auler	7115706d02	Fix clang warning about switch covering all enums Summary: This is part of a series of clean-up patches to make bolt cleanly compile with clang 4.0. This patch fixes the following warning: default label in switch which covers all enumeration values (cherry picked from FBD4242168)	2016-11-28 17:17:14 -08:00
Maksim Panchenko	ac2621fbf4	Add stats for "-optimize-bodyless-functions". Summary: Print the number of calls eliminated. (cherry picked from FBD4010698)	2016-10-12 13:08:52 -07:00
Rafael Auler	8609ad51e5	Detect default CFI frame instructions for the target Summary: Make BOLT resilient to changes in the LLVM's X86 target library by not hardwiring the list of default CIE instructions, but detecting it at run time. (cherry picked from FBD4200982)	2016-11-17 14:56:42 -08:00
Maksim Panchenko	a7fb610eba	Relocate old .eh_frame section next to the new one. Summary: In order to improve gdb experience with BOLT we have to make sure the output file has a single .eh_frame section. Otherwise gdb will use either old or new section for unwinding purposes. This diff relocates the original .eh_frame section next to the new one generated by LLVM. Later we merge two sections into one and make sure only the newly created section has .eh_frame name. (cherry picked from FBD4203943)	2016-11-11 14:33:34 -08:00
Maksim Panchenko	809c28f585	Generate .eh_frame_hdr based on contents of .eh_frame's. Summary: We used to patch an existing .eh_frame_hdr and append contents for split functions at the end. However, this approach does not work in relocation mode since function addresses change and split functions will not necessarily be at the end. Instead of patching and appending we generate the new .eh_frame_hdr based on contents of old and new .eh_frame sections. (cherry picked from FBD4180756)	2016-11-14 16:39:55 -08:00
Maksim Panchenko	055dfe48e7	Another EH fix for cold fragments of functions that we fail to write. Summary: In a prev diff I disabled inclusion of FDEs for cold fragments that we fail to write. The side effect of it was that we failed to write FDE for the next function with a cold fragment since it had the same assigned address that we had put in FailedAddresses. The correct fix is to assign zero address to failed cold fragments and ignore them when we write .eh_frame_hdr. (cherry picked from FBD4156740)	2016-11-09 11:19:02 -08:00
Rafael Auler	355dbd769e	Fix DW_CFA_def_cfa CFI duping in output binary Summary: CFI instructions may live in CIEs or FDEs. CIEs hold common instructions used across many FDEs. When replaying CFIs to the output binary, llvm-bolt needs to replay both instructions from CIE and the corresponding FDE for the function. However, some instructions need not to be replayed because MCStreamer/MCDwarf and friends will write them by default in the output CIE. This patch fix the code that tried to recognize one of these default instructions but was failing, resulting in an extra CFI instruction in each FDE we outputted. With this patch, the output binary should be a bit smaller. (cherry picked from FBD4194753)	2016-11-16 17:47:31 -08:00
Rafael Auler	bc8cb088c0	Support DWARF expressions in CFI instructions Summary: Modify the MC layer (MCDwarf.h\|cpp) to understand CFI instructions dealing with DWARF expressions. Add code to emit DWARF expressions in MCDwarf. Change llvm-bolt to pass these CFI instructions to streamer instead of bailing on them. Change -dump-eh-frame option in llvm-bolt to dump the EH frame of the rewritten binary in addition to the one in the original binary, allowing us to proper test this patch. (cherry picked from FBD4194452)	2016-11-15 10:40:00 -08:00
Maksim Panchenko	99dce7d05e	Disable processing of functions with EVEX-encoded instructions (AVX-512). Summary: AVX-512 disassembler support in LLVM is not quite ready yet. Before we feel more comfortable about it we disable processing of all functions that use any EVEX-encoded instructions. (cherry picked from FBD4028706)	2016-10-16 18:56:56 -07:00
Maksim Panchenko	0eb2559fee	Fix EH for cold fragments that we fail to write. Summary: When we fail to write functions that are too big, we have to effectively cancel their effect on exception handling by ignoring their FDE entries in .eh_frame while writing .eh_frame_hdr. This can happen to functions that we split too. In such cases the cold part has its own FDE and we have to ignore that one too. This doesn't happen very often - I've only seen one case on hhvm binary, however it is a potential issue. The fix is to add the cold part address to the list of failed-to-write addresses. (cherry picked from FBD3987984)	2016-10-07 09:34:16 -07:00
Maksim Panchenko	e241e9c156	New function discovery and support for multiple entries. Summary: Modified function discovery process to tolerate more functions and symbols coming from assembly. The processing order now matches the memory order of the functions (input symbol table is unsorted). Added basic support for functions with multiple entries. When a function references its internal address other than with a branch instruction, that address could potentially escape. We mark such addresses as entry points and make sure they are treated as roots by unreachable code elimination. Without relocations we have to mark multiple-entry functions as non-simple. (cherry picked from FBD3950243)	2016-09-29 11:19:06 -07:00
Maksim Panchenko	9cf5d74ffb	Support for PIC-style jump tables. Summary: Added support for jump tables in code compiled with "-fpic". Code pattern generated for position-independent jump tables is quite different, as is the format of the tables. More details in comments. Coverage increased slightly for a test, mostly due to the code coming from external lib that was compiled with "-fpic". (cherry picked from FBD3940771)	2016-09-27 19:09:38 -07:00
Bill Nell	4a0c494bc1	BOLT: Remove restrictions on unreachable code elimination Summary: Allow UCE when blocks have EH info. Since UCE may remove blocks that are referenced from debugging info data structures, we don't actually delete them. We just mark them with an "invalid" index and store them in a different vector to be cleaned up later once the BinaryFunction is destroyed. The debugging code just skips any BBs that have an invalid index. Eliminating blocks may also expose useless jmp instructions, i.e. a jmp around a dead block could just be a fallthrough. I've added a new routine to cleanup these jmps. Although, @maks is working on changing fixBranches() so that it can be used instead. (cherry picked from FBD3793259)	2016-09-07 18:59:23 -07:00
Maksim Panchenko	4464861a02	Support for splitting jump tables. Summary: Add level for "-jump-tables=<n>" option: 1 - all jump tables are output in the same section (default). 2 - basic splitting, if the table is used it is output to hot section otherwise to cold one. 3 - aggressively split compound jump tables and collect profile for all entries. Option "-print-jump-tables" outputs all jump tables for debugging and/or analyzing purposes. Use with "-jump-tables=3" to get profile values for every entry in a jump table. (cherry picked from FBD3912119)	2016-09-16 15:54:32 -07:00
Bill Nell	ecc4b9e713	BOLT: Add ud2 after indirect tailcalls. Summary: Insert ud2 instructions after indirect tailcalls to prevent the CPU from decoding instructions following the callsite. A simple counter in the peephole pass shows 3260 tail call traps inserted. (cherry picked from FBD3859737)	2016-09-13 15:16:11 -07:00
Bill Nell	2f1341b51d	BOLT: Refactoring BinaryFunction interface. Summary: Get rid of all uses of getIndex/getLayoutIndex/getOffset outside of BinaryFunction. Also made some other offset related methods private. (cherry picked from FBD3861968)	2016-09-13 20:32:12 -07:00
Bill Nell	510f227cbd	BOLT: Add feature to sort functions by dyno stats. Summary: Add -print-sorted-by and -print-sorted-by-order command line options. The first option takes a list of dyno stats keys used to sort functions that are printed at the end of all optimization passes. Only the top 100 functions are printed. The -print-sorted-by-order option can be either ascending or descending (descending is the default). (cherry picked from FBD3898818)	2016-09-20 20:55:49 -07:00
Maksim Panchenko	62bff426c3	Do no collect dyno stats on functions with stale profile. Summary: Dyno stats collected on functions with invalid profile may appear completely bogus. Skip them. (cherry picked from FBD3879371)	2016-09-16 13:13:16 -07:00
Maksim Panchenko	2c9bf9afd6	Add PLT dyno stats. Summary: Get PLT call stats. (cherry picked from FBD3874799)	2016-09-15 15:47:10 -07:00
Maksim Panchenko	c4e36c1dd6	Fix issue with zero-size duplicate function symbols. Summary: While working on PLT dyno stats I've noticed that we were missing BinaryFunctions for some symbols that were not PLT. Upon closer inspection turned out that those symbols were marked as zero-sized functions in symbol table, but they had duplicates with non-zero size. Since the zero-size symbols were preceding other duplicates, we were not creating BinaryFunction for them and they were not added as duplicates. The 2 most prominent functions that were missing for a test were free() and malloc(). There's not much to optimize in these functions, but they were contributing quite significantly to dyno stats. As a result dyno stats for this test needed an adjustment. Also several assembly functions (e.g. _init()) had zero size, and now we set the size to the max size and start processing those. It's good for coverage but will not affect the performance. (cherry picked from FBD3874622)	2016-09-15 15:47:10 -07:00
Maksim Panchenko	8dbf0e2b3d	Add dyno stats for jump tables. Summary: Add dyno stats for jump tables. (cherry picked from FBD3871035)	2016-09-15 10:24:22 -07:00
Maksim Panchenko	2f3a859772	Add experimental jump table support. Summary: Option "-jump-tables=1" enables experimental support for jump tables. The option hasn't been tested with optimizations other than block re-ordering. Only non-PIC jump tables are supported at the moment. (cherry picked from FBD3867849)	2016-09-14 16:45:40 -07:00
Bill Nell	7483cd0fa6	BOLT: Clean up interface between BinaryFunction and BinaryBasicBlock. Summary: This is just a bit of refactoring to make sure that BinaryFunction goes through methods to get at the state in BinaryBasicBlock. I did this so that changing the way Index/LayoutIndex/Valid works will be easier. (cherry picked from FBD3860899)	2016-09-13 17:12:00 -07:00
Maksim Panchenko	b0f4031db3	Add cluster randomization layout algorithm. Summary: Add "-reorder-blocks=cluster-shuffle" for performance experiments. Use "-bolt-seed=<N>" to set a randomization seed. (cherry picked from FBD3851035)	2016-09-11 14:33:58 -07:00
Maksim Panchenko	52bfc3f92f	Fix switch table detection. Disassemble all instructions in non-simple functions. Summary: Switch table can contain __builtin_unreachable(). As a result, a compiler may place an entry into a jump table that contains an address immediately past the last instruction in the function. Sometimes it may coincide with a start of the next function in the binary. Thus when we check for switch tables in such cases we have to check more than a single entry until we see either an address inside containing function or some address outside different from the address past the last instruction. Additonally, don't stop disassembly after discovering that the function was not simple. We need to detect all outside references whenever possible. (cherry picked from FBD3850825)	2016-09-12 10:12:31 -07:00
Bill Nell	861d5a1586	BOLT: Remove double jumps peephole. Summary: Replace jumps to other unconditional jumps with the final destination, e.g. B0: ... jmp B1 (or jcc B1) B1: jmp B2 -> B0: ... jmp B2 (or jcc B1) This peephole removes 8928 double jumps from a test binary. Note: after filtering out double jumps found in EH code and infinite loops, the number of double jumps patched is 49 (24 for a clang compiled test). The 24 in the clang build are all from external libraries which have probably been compiled with gcc. This peephole is still useful for cleaning up after ICP though. (cherry picked from FBD3815420)	2016-09-02 18:09:07 -07:00
Maksim Panchenko	617c6a13b7	Use BB.getNumNonPseudos() in more places. Summary: Use BB.getNumNonPseudos() in more places. Fix analyze_potential script to pass the new parameter. (cherry picked from FBD3844416)	2016-09-09 14:42:35 -07:00
Bill Nell	71be567969	BOLT: Add per pass dyno stats + factor out post pass printing. Summary: I've added dyno stats printing per pass so we can see the results of each optimization pass on the stats. I've also factored out the post pass function printing code since it was pretty much the same after each pass. (cherry picked from FBD3843587)	2016-09-09 12:37:37 -07:00
Maksim Panchenko	c4c518ee9d	Rewrite SCTC pass to do UCE and make it the last optimization pass. Summary: For now we make SCTC a special pass that runs at the end of all optimizations and transformations right after fixupBranches(). Since it's the last pass, it has to do its own UCE. (cherry picked from FBD3838051)	2016-09-08 14:52:26 -07:00
Maksim Panchenko	6bef336cc2	Add dyno stats to BOLT. Summary: Add "-dyno-stats" option that prints instruction stats based on the execution profile similar to below: BOLT-INFO: program-wide dynostats after optimizations: executed forward branches : 109706407 (+8.1%) taken forward branches : 13769074 (-55.5%) executed backward branches : 24517582 (-25.0%) taken backward branches : 15330256 (-27.2%) executed unconditional branches : 6009826 (-35.5%) function calls : 17192114 (+0.0%) executed instructions : 837733057 (-0.4%) total branches : 140233815 (-2.3%) taken branches : 35109156 (-42.8%) Also fixed pseudo instruction discrepancies and added assertions for BinaryBasicBlock::getNumPseudos() to make sure the number is synchronized with real number of pseudo instructions. (cherry picked from FBD3826995)	2016-08-29 21:11:22 -07:00
Maksim Panchenko	17e691915b	Make BinaryFunction::fixBranches() more flexible and support CFG updates. Summary: The CFG represents "the ultimate source of truth". Transformations on functions and blocks have to update the CFG and fixBranches() would make sure the correct branch instructions are inserted at the end of basic blocks (or removed when necessary). We do require a conditional branch at the end of the basic block if the block has 2 successors as CFG currently lacks the conditional code support (it will probably stay that way). We only use this branch instruction for its conditional code, the destination is determined by CFG - first successor representing true/taken branch, while the second successor - false/fall-through branch. When we reverse the branch condition, the CFG is updated accordingly. The previous version used to insert jumps after some terminating instructions sometimes resulting in a larger code than needed. As a result with the new version 1 extra function becomes overwritten for HHVM binary. With this diff we also convert conditional branches with one successor (result of code from __builtin_unreachable()) into unconditional jumps. (cherry picked from FBD3802062)	2016-08-29 21:11:22 -07:00
Bill Nell	48b55300e0	BOLT: Make most command line options ZeroOrMore. Summary: This will make it easier to run experiments with the same baseline BOLT binary but different command line options. (cherry picked from FBD3831978)	2016-09-07 14:41:56 -07:00
Bill Nell	dcaffe64d3	Inlining fixes/enhancements Summary: A number of fixes/enhancements to inline-small-functions - Fixed size estimateHotSize to use computeCodeSize instead of the original layout offsets. - Added -print-inline option to dump CFGs for functions that have been modified by inlining. - Added flag to force consideration of functions without any profiling info (mostly for testing) - Updated debug line info for inlined functions. - Ignore the number of pseudo instructions when checking for candidates of suitable size. Misc changes - Moved most print flags to BinaryPasses.cpp (cherry picked from FBD3812658)	2016-09-02 11:58:53 -07:00
Maksim Panchenko	1cf200107e	Fix tail call conversion and test cases. Summary: A previous diff accidentally disabled tail call conversion. Additionally some test cases relied on output of "-v=2". Fix those. (cherry picked from FBD3823760)	2016-09-06 13:19:26 -07:00
Bill Nell	c27a6a5c63	Add verbosity level and clean up stream usage. Summary: I've added a verbosity level to help keep the BOLT spewage to a minimum. The default level is pretty terse now, level 1 is closer to the original, I've saved level 2 for the noisiest of messages. Error messages should never be suppressed by the verbosity level only warnings and info messages. The rational behind stream usage is as follows: outs() for info and debugging controlled by command line flags. errs() for errors and warnings. dbgs() for output within DEBUG(). With the exception of a few of the level 2 messages I don't have any strong feelings about the others. (cherry picked from FBD3814259)	2016-09-02 14:15:29 -07:00
Maksim Panchenko	43acb6a28a	Emit remember_state CFI in the same code region as restore_state. Summary: While creating remember_state/restore_state CFI sequences, we were always placing remember_state instruction into the first basic block. However, when we have hot-cold splitting, the cold part has and independent FDE entry in .eh_frame, and thus the restore_state instruction was missing its counter part. The fix is to adjust the basic block that is used for placing remember_state instruction whenever we see the hot-cold split boundary. (cherry picked from FBD3767102)	2016-08-24 14:25:33 -07:00
Maksim Panchenko	97f598fd17	Handling for indirect tail calls. Summary: Analyze indirect branches and convert them into indirect tail calls when possible. We analyze the memory contents when the address could be calculated statically and also detect epilogue code. (cherry picked from FBD3754395)	2016-08-22 14:24:09 -07:00
Maksim Panchenko	42c5894fe2	Write padding for .eh_frame_hdr to a file. Summary: We were applying padding to the calculated address but were never writing it to a file triggering an assertion for cases when .gcc_except_table size wasn't multiple of 4. (cherry picked from FBD3744638)	2016-08-19 13:54:35 -07:00
Maksim Panchenko	a10fb73ab3	Compute ClusterEdges only when necessary. Summary: We only need ClusterEdges in reordering algorithm optimized for branches and the computation is quite resource-hungry, thus it makes sense to only do it when needed. Some refactoring too. (cherry picked from FBD3721107)	2016-08-15 15:37:00 -07:00
Bill Nell	c1d1c2e7cd	Check if operands are immediates before trying shortening. Summary: Operands in the initial instruction stream should all have immediate operands for instructions that can be shortened. But if a BOLT optimization pass adds one of these instructions with a symbolic operand, the shortening operation will assert. This diff adds checks to make sure that the operands are immediate. I've also disabled shortening pass by default since it won't really be needed until ICP is submitted. It will still run at CFG creation time. (cherry picked from FBD3610646)	2016-07-22 20:52:57 -07:00
Bill Nell	406aa62083	Add additional info to BOLT graphviz CFG dumps. Summary: Add the following info the graphviz CFG dump: - Edges are labeled with the jmp instruction that leads to that edge. - Edges include the count and misprediction count. - Nodes have (offset, BB index, BB layout index) - Nodes optionally have tooltips which contain the code of the basic block. (enabled with -dot-tooltip-code) - Added dashed edges to landing pads. (cherry picked from FBD3646568)	2016-07-29 19:18:37 -07:00
Maksim Panchenko	003d106c0b	More refactoring work. Summary: Avoid referring to BinaryFunction's by name. Functions could be found by MCSymbol using BinaryContext::getFunctionForSymbol(). (cherry picked from FBD3707685)	2016-08-11 14:23:54 -07:00
Maksim Panchenko	36df6057b0	Refactoring. Mainly NFC. Summary: Eliminated BinaryFunction::getName(). The function was confusing since the name is ambigous. Instead we have BinaryFunction::getPrintName() used for printing and whenever unique string identifier is needed one can use getSymbol()->getName(). In the next diff I'll have a map from MCSymbol to BinaryFunction in BinaryContext to facilitate function lookup from instruction operand expressions. There's one bug fixed where the function was called only under assert() in ICF::foldFunction(). For output we update all symbols associated with the function. At the moment it has no effect on the generated binary but in the future we would like to have all symbols in the symbol table updated. (cherry picked from FBD3704790)	2016-08-07 12:35:23 -07:00
Theodoros Kasampalis	32739247eb	More aggressive inlining pass Summary: This adds functionality for a more aggressive inlining pass, that can inline tail calls and functions with more than one basic block. (cherry picked from FBD3677856)	2016-07-29 14:17:06 -07:00
Bill Nell	82d76ae18b	Add MCInst annotation mechanism to MCInstrAnalysis class. Summary: Add three new MCOperand types: Annotation, LandingPad and GnuArgsSize. Annotation is used for associating random data with MCInsts. Clients can construct their own annotation types (subclassed from MCAnnotation) and associate them with instructions. Annotations are looked up by string keys. Annotations can be added, removed and queried using an instance of the MCInstrAnalysis class. The LandingPad operand is a MCSymbol, uint64_t pair used to encode exception handling information for call instructions. GnuArgsSize is used to annotate calls with the DW_CFA_GNU_args_size attribute. (cherry picked from FBD3597877)	2016-07-28 10:34:50 -07:00
Theodoros Kasampalis	713e361f36	Fix for correct disassembling of conditional tail calls. Summary: BOLT attempts to convert jumps that serve as tail calls to dedicated tail call instructions, but this is impossible when the jump is conditional because there is no corresponding tail call instruction. This was causing the creation of a duplicate fall-through edge for basic blocks terminated with a conditional jump serving as a tail call when there is profile data available for the non-taken branch. In this case, the first fall-through edge had a count taken from the profile data, while the second has a count computed (incorrectly) by BinaryFunction::inferFallThroughCounts. (cherry picked from FBD3560504)	2016-07-13 18:57:40 -07:00
Maksim Panchenko	486ab273c7	Add printing support for indirect tail calls. Summary: LLVM was missing assembler print string for indirect tail calls which are synthetic instructions created by us. (cherry picked from FBD3640197)	2016-07-28 18:49:48 -07:00
Bill Nell	50e011f4e5	CFG editing functions Summary: This diff adds a number of methods to BinaryFunction that can be used to edit the CFG after it is created. The basic public functions are: - createBasicBlock - create a new block that is not inserted into the CFG. - insertBasicBlocks - insert a range of blocks (made with createBasicBlock) into the CFG. - updateLayout - update the CFG layout (either by inserting new blocks at a certain point or recomputing the entire layout). - fixFallthroughBranch - add a direct jump to the fallthrough successor for a given block. There are a number of private helper functions used to implement the above. This was split off the ICP diff to simplify it a bit. (cherry picked from FBD3611313)	2016-07-23 12:50:34 -07:00
Theodoros Kasampalis	ab599fe71a	Basic block clustering algorithm for minimizing branches. Summary: This algorithm is similar to our main clustering algorithm but uses a different heuristic for selecting edges to become fall-throughs. The weight of an edge is calculated as the win in branches if we choose to layout this edge as a fall-through. For example, the edges A -> B with execution count 100 and A -> C with execution count 500 (where B and C are the only successors of A) have weights -400 and +400 respectively. (cherry picked from FBD3606591)	2016-07-15 16:11:30 -07:00
Theodoros Kasampalis	a9bb3320ad	Identical Code Folding (ICF) pass Summary: Added an ICF pass to BOLT, that can recognize identical functions and replace references to these functions with references to just one representative. (cherry picked from FBD3460297)	2016-06-09 11:36:55 -07:00
Bill Nell	82401630a2	Factor out instruction printing and size computation. Summary: I've factored out the instruction printing and size computation routines to methods on BinaryContext. I've also added some more debug print functions. This was split off the ICP diff to simplify it a bit. (cherry picked from FBD3610690)	2016-07-23 08:01:53 -07:00
Theodoros Kasampalis	156a55209c	Simplification of loads from read-only data sections. Summary: Instructions that load data from the a read-only data section and their target address can be computed statically (e.g. RIP-relative addressing) are modified to corresponding instructions that use immediate operands. We apply the transformation only when the resulting instruction will have smaller or equal size. (cherry picked from FBD3397112)	2016-06-03 00:58:11 -07:00
Theodoros Kasampalis	17b846586c	Loop detection for BOLT's CFG. Summary: Loop detection for the CFG data structure. Added a GraphTraits specialization for BOLT's CFG that allows us to use LLVM's loop detection interface. (cherry picked from FBD3604837)	2016-05-26 10:58:01 -07:00
Bill Nell	ea53cffb2d	Add movabs -> mov shortening optimization. Add peephole optimization pass that does instruction shortening. Summary: Shorten when a mov instruction has a 64-bit immediate that can be repesented as a sign extended 32-bit number, use the smaller mov instruction (MOV64ri -> MOV64ri32). Add peephole optimization pass that does instruction shortening. (cherry picked from FBD3603099)	2016-07-21 16:40:06 -07:00
Maksim Panchenko	c6d0c568d4	Add BinaryContext::getSectionForAddress() Summary: Interface for accessing section from BinaryContext. (cherry picked from FBD3600854)	2016-07-21 12:45:35 -07:00
Maksim Panchenko	f2d82919d0	Move debug-handling code into DWARFRewriter (NFC). Summary: RewriteInstance.cpp is getting too big. Split the code. (cherry picked from FBD3596103)	2016-05-31 19:12:26 -07:00
Maksim Panchenko	bf46263eed	Shorten instructions if possible. Summary: Generate short versions of branch instructions by default and rely on relaxation to produce longer versions when needed. Also produce short versions of arithmetic instructions if immediate fits into one byte. This was only triggered once on HHVM binary. (cherry picked from FBD3591466)	2016-07-19 11:19:18 -07:00
Bill Nell	674dbcc0de	Fix crash in patchELFPHDRTable when no functions are modified. Summary: patchELFPHDRTable was asserting that it could not find an entry for .eh_frame_hdr in SectionMapInfo when no functions were modified by BOLT. This just changes code to skip modifying GNU_EH_FRAME program headers hen SectionMapInfo is empty. The existing header is copied and written instead. (cherry picked from FBD3557481)	2016-07-12 16:43:53 -07:00
Maksim Panchenko	84b5b9e462	Create alternative name for local symbols. Summary: If a profile data was collected on a stripped binary but an input to BOLT is unstripped, we would use a different mangling scheme for local functions and ignore their profiles. To solve the issue this diff adds alternative name for all local functions such that one of the names would match the name in the profile. If the input binary was stripped, we reject it, unless "-allow-stripped" option was passed. It's more complicated to do a matching in this case since we have less information than at the time of profile collection. It's also not that simple to tell if the profile was gathered on a stripped binary (in which case we would have no issue matching data). (cherry picked from FBD3548012)	2016-07-11 18:51:13 -07:00
Bill Nell	bdd4af2134	Store index inside BinaryBasicBlock instead of in map on BinaryFunction. Summary: Store the basic block index inside the BinaryBasicBlock instead of a map in BinaryFunction. This cut another 15-20 sec. from the processing time for hhvm. (cherry picked from FBD3533606)	2016-07-07 21:43:43 -07:00
Bill Nell	90c9323511	Use unordered_map instead of map in ReorderAlgorithm and BinaryFunction::BasicBlockIndices. Summary: Use unordered_map instead of map in ReorderAlgorithm and BinaryFunction::BasicBlockIndices. Cuts about 30sec off the processing time for the hhvm binary. (~8.5 min to ~8min) (cherry picked from FBD3530910)	2016-07-07 11:48:50 -07:00
Theodoros Kasampalis	c20506c570	Fix in inferFallthroughCounts Summary: This fixes the initialization of basic block execution counts, where we should skip edges to the first basic block but we were not skipping the corresponding profile info. Also, I removed a check that was done twice. (cherry picked from FBD3519265)	2016-07-03 21:30:35 -07:00
Bill Nell	260f6fbdb6	Add option to dump CFGs in (simple) graphviz format during all passes. Summary: I noticed the BinaryFunction::viewGraph() method that hadn't been implemented and decided I could use a simple DOT dumper for CFGs while working on the indirect call optimization. I've implemented the bare minimum for the dumper. It's just nodes+BB labels with dges. We can add more detailed information as needed/desired. (cherry picked from FBD3509326)	2016-07-01 08:40:56 -07:00
Theodoros Kasampalis	6eb4e5b687	perf2bolt can extract branch records with histories Summary: Added perf2bolt functionality for extracting branch records with histories of previous branches. The length of the histories is user defined, and the default is 0 (previous functionality). Also, DataReader can parse perf2bolt output with histories. Note: creating profile data with long histories can increase their size significantly (2x for history of length 1, 3x for length 2 etc). (cherry picked from FBD3473983)	2016-06-21 18:44:42 -07:00
Theodoros Kasampalis	287fa51324	Fix for ignoring fall-through profile data when jump is followed by no-op Summary: When a conditional jump is followed by one or more no-ops, the destination of fall-through branch was recorded as the first no-op in FuncBranchInfo. However the fall-through basic block after the jump starts after the no-ops, so the profile data could not match the CFG and was ignored. (cherry picked from FBD3496084)	2016-06-27 14:51:38 -07:00
Theodoros Kasampalis	d09b00ebff	Refactoring of the reordering algorithms Summary: The various reorder and clustering algorithms have been refactored into separate classes, so that it is easier to add new algorithms and/or change the logic of algorithm selection. (cherry picked from FBD3473656)	2016-06-16 18:47:57 -07:00
Maksim Panchenko	f1192a7118	Support for multiple function names. Summary: With ICF optimization in the linker we were getting mismatches of function names in .fdata and BinaryFunction name. This diff adds support for multiple function names for BinaryFunction and does a match against all possible names for the profile. (cherry picked from FBD3466215)	2016-06-10 17:13:05 -07:00
Maksim Panchenko	70f82d9371	Reject profile data for functions that do not match. Summary: Verify profile data for a function and reject if there are branches that don't correspond to any branches in the function CFG. Note that we have to ignore branches resulting from recursive calls. Fix printing instruction offsets in disassembled state. Allow function to have non-zero execution count even if we don't have branch information. (cherry picked from FBD3451596)	2016-06-15 18:36:16 -07:00
Maksim Panchenko	88ac5d9d0e	[merge-fdata] Add option to print function list. Summary: Print total number of functions/objects that have profile and add new options: -print - print the list of objects with count to stderr =none - do not print objects/functions =exec - print functions sorted by execution count =branches - print functions sorted by total branch count -q - do not print merged data to stdout (cherry picked from FBD3442288)	2016-06-09 17:45:15 -07:00
Bill Nell	980a06265a	Revert "Indirect call optimization." This reverts commit 33966090e18545b64013614e7929ff1bdcdf10d5. (cherry picked from FBD28110782)	2016-06-08 17:38:13 -07:00
Bill Nell	8bcfd9a392	Indirect call optimization. (cherry picked from FBD28110629)	2016-06-07 16:27:52 -07:00
Bill Nell	45e2219ae4	Allocate BinaryBasicBlocks with new rather than storing them in the BasicBlocks vector. Summary: This will help optimization passes that need to modify the CFG after it is constructed. Otherwise, the BinaryBasicBlock pointers stored in the layout, successors and predecessors would need to be modified every time a new basic block is created. (cherry picked from FBD3403372)	2016-06-07 16:27:52 -07:00
Maksim Panchenko	6da0d95326	Fix large functions debug info by default. Summary: Turn on -fix-debuginfo-large-functions by default. In the process of testing I've discovered that we output cold code for functions that were too large to be emitted. Fixed that. (cherry picked from FBD3372697)	2016-05-31 19:29:34 -07:00
Maksim Panchenko	4460da0d81	Improvements for debug info. Summary: Assembly functions could have no corresponding DW_AT_subprogram entries, yet they are represented in module ranges (and .debug_aranges) and will have line number information. Make sure we update those. Eliminated unnecessary data structures and optimized some passes. For .debug_loc unused location entries are no longer processed resulting in smaller output files. Overall it's a small processing time improvement and memory imporement. (cherry picked from FBD3362540)	2016-05-27 20:19:19 -07:00
Theodoros Kasampalis	65ac8bbdf2	Better edge counts for fall through blocks in presence of C++ exceptions. Summary: The inference algorithm for counts of fall through edges takes possible jumps to landing pad blocks into account. Also, the landing pad block execution counts are updated using profile data. (cherry picked from FBD3350727)	2016-05-26 15:10:09 -07:00
Theodoros Kasampalis	485f9220b7	Taking LP counts into account for FT count inference (cherry picked from FBD28110493)	2016-05-24 09:26:25 -07:00
Theodoros Kasampalis	fb5f18b2dc	Correctly updating landing pad exec counts. (cherry picked from FBD28110316)	2016-05-23 16:16:25 -07:00
Maksim Panchenko	06b9c5b342	Better .debug_line for non-simple functions. Summary: Generate .debug_line info for non-simple functions in a way that if preferrable by 'objdump -S'. (cherry picked from FBD3345485)	2016-05-24 20:50:36 -07:00
Maksim Panchenko	7b97793b94	Fix for clang .debug_info. Summary: Clang uses different attribute for high_pc which was incompatible with the way we were updating ranges. This diff fixes it. (cherry picked from FBD3345537)	2016-05-24 14:54:23 -07:00
Maksim Panchenko	cfa5d753eb	Miscellaneous fixes for debug info. Summary: * Fix several cases for handling debug info: - properly update CU DW_AT_ranges for function with folded body due to ICF optimization - convert ranges to DW_AT_ranges from hi/low PC for all DIEs - add support for [a, a) range - update CU ranges even when there are no functions registered * Overwrite .debug_ranges section instead of appending. * Convert assertions in debug info handling part into warnings. (cherry picked from FBD3339383)	2016-05-23 19:36:38 -07:00
Maksim Panchenko	7ab3db129b	Create DW_AT_ranges for compile units. Summary: Some compile unit DIEs might be missing DW_AT_ranges because they were compiled without "-ffunction-sections" option. This diff adds the attribute to all compile units. If the section is not present, we need to create it. Will do it in a separate diff. (cherry picked from FBD3314984)	2016-05-17 18:10:14 -07:00
Maksim Panchenko	f047b9d43a	Overwrite contents of .debug_line section. Summary: Overwrite contents of .debug_line section since we don't reference the original contents anymore. This saves ~100MB of HHVM binary. (cherry picked from FBD3314917)	2016-05-16 17:02:17 -07:00
Bill Nell	e63984f325	Patch forward jumping tail calls to prevent branch mispredictions. Summary: A simple optimization to prevent branch misprediction for tail calls. Convert the sequence: j<cc> L1 ... L1: jmp foo # tail call into: j<cc> foo but only if 'j<cc> foo' turns out to be a forward branch. (cherry picked from FBD3234207)	2016-05-02 12:47:18 -07:00
Maksim Panchenko	b445f5eb7b	Fix issue with garbage address in .debug_line. Summary: While emitting debug lines for a function we don't overwrite, we don't have a code section context that is needed by default writing routine. Hence we have to emit end_sequence after the last address, not at the end of section. (cherry picked from FBD3291533)	2016-05-11 19:13:38 -07:00
Bill Nell	f7e7e25b88	Put all optimization passes under the pass manager. Summary: Move eliminate unreachable code, block reordering, and CFI/exception fixup into official optimization passes. (cherry picked from FBD3248991)	2016-05-02 12:47:18 -07:00
Gabriel Poesia	5fa128e748	Inlining of small functions. Summary: Added an optimization pass of inlining calls to small functions (with only one basic block). Inlining is done in a very simple way, inserting instructions to simulate the changes to the stack pointer that call/ret would make before/after the inlined function executes. Also, the heuristic prefers to inline calls that happen in the hottest blocks (by looking at their execution count). Calls in cold blocks are ignored. (cherry picked from FBD3233516)	2016-04-25 14:25:58 -07:00
Gabriel Poesia	d1f525499e	Optimize calls to functions that are a single unconditional jump Summary: Many functions (around 600) in the HHVM binary are simply a single unconditional jump instruction to another function. These can be trivially optimized by modifying the call sites to directly call the branch target instead (because it also happens with more than one jump in sequence, we do it iteratively). This diff also adds a very simple analysis/optimization pass system in which this pass is the first one to be implemented. A follow-up to this could be to move the current optimizations to other passes. (cherry picked from FBD3211138)	2016-04-15 15:59:52 -07:00
Gabriel Poesia	e6acc7bb53	Optimize calls to functions that are a single unconditional jump Summary: Many functions (around 600) in the HHVM binary are simply a single unconditional jump instruction to another function. These can be trivially optimized by modifying the call sites to directly call the branch target instead (because it also happens with more than one jump in sequence, we do it iteratively). This diff also adds a very simple analysis/optimization pass system in which this pass is the first one to be implemented. A follow-up to this could be to move the current optimizations to other passes. (cherry picked from FBD3211138)	2016-04-15 15:59:52 -07:00
Gabriel Poesia	459eb8c230	Fix "Cannot update ranges for DIE at offset" error messages. Summary: Fix the error message by not printing it :) Explanation: a previous diff accidentally removed this error message from within the DEBUG macro, and it's expected that we'll have a bunch of them since a lot of the DIEs we try to update are empty or meaningless. For instance (and mainly), there is a huge number of lexical block DIEs with no attributes in .debug_info. In the first phase of collecting debugging info, we store the offsets of all these DIEs, only later to realize that we cannot update their address ranges because they have none. A better fix would be to check this earlier and not store offsets of DIEs we cannot update to begin with. (cherry picked from FBD3236923)	2016-04-28 12:55:35 -07:00
Maksim Panchenko	de95a5b6a4	Make merge-fdata generate smaller .fdata files. Summary: A lot of the space in the merged .fdata is taken by branches to and from [heap], which is jitted code. On different machines, or during different runs, jitted addresses are all different. We don't use these addresses, but we need branch info to get accurate function call counts. This diff treats all [heap] addresses the same, resulting in a simplified merged file. The size of the compressed file decreased from 70MB to 8MB. (cherry picked from FBD3233943)	2016-04-27 18:06:18 -07:00
Maksim Panchenko	1258903b54	Fix for functions in different segments. Summary: In a test binary some functions are placed in a segment preceding the segment containing .text section. As a result, we were miscalculating maximum function size as the calculation was based on addresses only. This diff fixes the calculation by checking if symbol after function belongs to the same section. If it does not, then we set the maximum function size based on the size of the containing section and not on the address distance to the next symbol. (cherry picked from FBD3229205)	2016-04-26 23:42:39 -07:00
Maksim Panchenko	3811673a0c	Option to break in given functions. Summary: Added option "-break-funcs=func1,func2,...." to coredump in any given function by introducing ud2 sequence at the beginning of the function. Useful for debugging and validating stack traces. Also renamed options containing "_" to use "-" instead. Also run hhvm test with "-update-debug-sections". (cherry picked from FBD3210248)	2016-04-21 09:54:33 -07:00
Maksim Panchenko	87a90ae133	Fix ninja install-* for BOLT utilities. Summary: Make sure we can install all tools needed for processing BOLT .fdata files such as perf2bolt, merge-fdata, etc. (cherry picked from FBD3223477)	2016-04-25 22:13:12 -07:00
Maksim Panchenko	ff68b34553	Tool to merge .fdata files. Summary: merge-fdata tool takes multiple .fdata files and outputs to stdout combined fdata. Takes about 2 seconds per each additional .fdata file with hhvm production data. (cherry picked from FBD3216430)	2016-04-08 12:18:06 -07:00
Maksim Panchenko	43bc4a09ad	Changed splitting options and fixed sorting. Summary: Splitting option now has different meanings/values. Since landing pads are mostly always cold/frozen, we should split them before anything else (we still check the execution count is 0). That's value '1'. Everything else goes on top of that and has increased value (2 - large functions, 3 - everything). Sorting was non-deterministic and somewhat broken for functions with EH ranges. Fixed that and added '-split-all-cold' option to outline all 0-count blocks. Fixed compilation of test cases. After my last commit the binaries were linked to wrong source files (i.e. debug info). Had to rebuild the binaries from updated sources. (cherry picked from FBD3209369)	2016-04-20 15:31:11 -07:00
Maksim Panchenko	4f44d60947	Special handling for GNU_args_size call frame instruction. Summary: GNU_args_size is a special kind of CFI that tells runtime to adjust %rsp when control is passed to a landing pad. It is used for annotating call instructions that pass (extra) parameters on the stack and there's a corresponding landing pad. It is also special in a way that its value is not handled by DW_CFA_remember_state/DW_CFA_restore_state instruction sequence that we utilize to restore the state after block re-ordering. This diff adds association of call instructions with GNU_args_size value when it's used. If the function does not use GNU_args_size, there is no overhead. Otherwise, we regenerate GNU_args_size instruction during code emission, i.e. after all optimizations and block-reordering. (cherry picked from FBD3201322)	2016-04-19 22:00:29 -07:00
Gabriel Poesia	ad344c4387	Group debugging info representation and serialization code. Summary: Moved the classes related to representing and serializing DWARF entities into a single header, DebugData.h. (cherry picked from FBD3153279)	2016-04-07 15:06:43 -07:00
Gabriel Poesia	f6c8929799	Fix debugging info for simple functions that we fail to rewrite. Summary: Simple functions which we fail to rewrite after optimizations were having wrong debugging information because the latter would reflect the optimized version of the function. There are only 48 functions (at this time) in this situation in the HHVM binary. The simple fix is to add another full pass. Another more complicated path, which will be more efficient, is to reset only the BinaryContext and emit again, but then we need to recreate all symbols in the new MCContext and update the pointers. I started taking this path but it started getting too complicated for only those 48 functions (needed to create a new map of global symbols, recreate landing pads - which needed to have the internal intermediate labels in the functions kept to be updated too, etc). Because the overhead is quite large (another full emission pass - around 4m30s here) and the impact is small I put this behind a new command-line flag which is off by default: -fix-debuginfo-large-functions. (cherry picked from FBD3166576)	2016-04-11 17:46:18 -07:00
Gabriel Poesia	0e77c53b89	Update address ranges of inlined functions and try/catch blocks. Summary: Update address ranges of inlined functions and try/catch blocks. This was missing and lead gdb to show weird information in a core dump we inspected because of the several nestings of inline in the call stack. This is very similar to Lexical Blocks, so the change is to basically generalize that code to do the same for DW_AT_try_block, DW_AT_catch_block and DW_AT_inlined_subroutine. (cherry picked from FBD3169417)	2016-04-12 11:41:03 -07:00
Maksim Panchenko	e16b5d8b78	Option to pass a file with list of functions to skip. Summary: Take "-skip_funcs_file=<file>" option and don't process any function listed in the <file>. (cherry picked from FBD3160226)	2016-04-08 19:30:27 -07:00
Gabriel Poesia	2694e58fa2	Update unmatched and nested subprogram DIEs. Summary: readelf was showing some errors because we weren't updating DIEs that were not shallow in the DIE tree, or DIEs of functions with addresses we don't recognize (mostly functions with address 0, which could have been removed by the Linker Script but still have debugging information there). These DIEs need to be updated because their abbreviations are patched. (cherry picked from FBD3159335)	2016-04-08 16:24:38 -07:00
Gabriel Poesia	665b03a464	Fix behavior with multiple functions with same address. Summary: We were updating only one DIE per function, but because the Linker Script may map multiple functions to the same address this would cause us to generate invalid debug info (as some DIEs weren't updated but their abbreviations were changed). (cherry picked from FBD3157263)	2016-04-08 11:55:42 -07:00
Gabriel Poesia	784f6a8773	Emit debug line information for non-simple functions. Summary: Non-simple functions aren't emitted, and thus didn't have line number information emitted. This diff emits it for those functions by extending LLVM's generation of the line number program to allow for absolute addresses (it is wholly symbolic), then iterating over the relevant line tables from the input and appending entries with absolute addresses to the line tables to be emited. This still leaves the simple but not overwritten functions unhandled (there were 48 in HHVM in my last run). However, I think that to fix them we'd need another pass, since by the time we realize a simple function wont't fit, debug line info was already written to the output. (cherry picked from FBD3148468)	2016-04-05 19:35:45 -07:00
Maksim Panchenko	e513bfd86d	Only set output ranges when updating dbg info. Summary: Save processing time by setting output ranges when needed. (cherry picked from FBD3148791)	2016-04-06 18:03:44 -07:00
Gabriel Poesia	4b4db40174	Update DWARF location lists after optimization. Summary: Summary: Update DWARF location lists in .debug_loc and pointers to them in .debug_info so that gdb can print variables which change location during their lifetime. The following changes were made: - Refactored BasicBlockOffsetRanges to allow ranges to be tied to binary information (so that we can reuse it for location lists) - Implemented range compression optimization in BasicBlockOffsetRanges (needed otherwise too much data was being generated). - Added representation for location lists (LocationList.h, BinaryContext.h) - Implemented .debug_loc serializer that keeps the updated offsets (DebugLocWriter.{h,cpp}) - After disassembly, traverse entries in .debug_loc and save them in context (BinaryContext.cpp) - After optimizations, serialize .debug_loc and update pointers in .debug_info (RewriteInstance.cpp) (cherry picked from FBD3130682)	2016-04-01 11:37:28 -07:00
Maksim Panchenko	4349b63144	Re-enable conditional function spitting under an option. Summary: Add a parameter value to "-split-functions=" option to allow splitting only when the function is too large to fit: 0 - never split 1 - split if too large to fit 2 - always split We may use this option when the profile data is not very precise. In that case excessive splitting may increase iTLB misses. (cherry picked from FBD3137700)	2016-03-31 16:38:49 -07:00
Gabriel Poesia	0a07d9bf88	Don't skip non-simple functions on function address ranges update. Summary: This fixes a problem in which bolt was generating a malformed .debug_info section on the bzip2 binary. The bug was the following: - A simple and a non-simple function shared an abbreviation - The abbreviation was patched to contain DW_AT_ranges because of the simple function - The non-simple function's data was not updated, but then it didn't match the layout expected by the abbreviation anymore And because we were already creating an address ranges list in .debug_ranges even for non-simple functions, it doesn't make sense not to use it anyway. (cherry picked from FBD3129219)	2016-04-01 15:09:34 -07:00
Gabriel Poesia	ffa9641e16	Update DWARF lexical blocks address ranges. Summary: Updates DWARF lexical blocks address ranges in the output binary after optimizations. This is similar to updating function address ranges except that the ranges representation needs to be more general, since address ranges can begin or end in the middle of a basic block. The following changes were made: - Added a data structure for iterating over the basic blocks that intersect an address range: BasicBlockTable.h - Added some more bookkeeping in BinaryBasicBlock. Basically, I needed to keep track of the block's size in the input binary as well as its address in the output binary. This information is mostly set by BinaryFunction after disassembly. - Added a representation for address ranges relative to basic blocks (BasicBlockOffsetRanges.h). Will also serve for location lists. - Added a representation for Lexical Blocks (LexicalBlock.h) - Small refactorings in DebugArangesWriter: -- Renamed to DebugRangesSectionsWriter since it also writes .debug_ranges -- Refactored it not to depend on BinaryFunction but instead on anything that can be assined an aoffset in .debug_ranges (added an interface for that) - Iterate over the DIE tree during initialization to find lexical blocks in .debug_info (BinaryContext.cpp) - Added patches to .debug_abbrev and .debug_info in RewriteInstance to update lexical blocks attributes (in fact, this part is very similar to what was done to function address ranges and I just refactored/reused that code) - Added small test case (lexical_blocks_address_ranges_debug.test) (cherry picked from FBD3113181)	2016-03-28 17:45:22 -07:00
Maksim Panchenko	e8ef8a5619	Speedup section remapping. Summary: Before this diff LLVM used to iterate over all sections to find the one with an address we want to remap. Since we have extremely large number of section this process is highly inefficient. Instead we add a new interface to remap a section with a given ID (which effectively is an index into an array of sections), and pass the ID instead of the address. This cuts down the processing time of hhvm binary by 10 seconds, and brings the total processing time to a little under 2 minutes. (cherry picked from FBD3110015)	2016-03-28 22:39:48 -07:00
Maksim Panchenko	595d0885d9	Populate function execution count while parsing fdata. Summary: Populate function execution count while parsing fdata. Before we used a quadratic algorithm to populate the execution count (had to iterate over all branches for every single function). Ignore non-symbol to non-symbol branches while parsing fdata. These changes combined drop HHVM processing time from 4 minutes 53 seconds down to 2 minutes 9 seconds on my devserver. Test case had to be modified since it contained irrelevant branches from PLT to libc. (cherry picked from FBD3106263)	2016-03-28 11:06:28 -07:00
Gabriel Poesia	466cbae866	Update subroutine address ranges in binary. Summary: [WIP] Update DWARF info for function address ranges. This diff currently does not work for unknown reasons, but I'm describing here what's the current state. According to both llvm-dwarf and readelf our output seems correct, but GDB does not interpret it as expected. All details go below in hope I missed something. I couldn't actually track the whole change that introduced support for what we need in gdb yet, but I think I can get to it (2007-12-04: Support lexical bocks and function bodies that occupy non-contiguous address ranges). I have reasons to believe gdb at least at some nges). The set of introduced changes was basically this: - After disassembly, iterate over the DIEs in .debug_info and find the ones that correspond to each BinaryFunction. - Refactor DebugArangesWriter to also write addresses of functions to .debug_ranges and track the offsets of function address ranges there - Add some infrastructure to facilitate patching the binary in simple ways (BinaryPatcher.h) - In RewriteInstance, after writing .debug_ranges already with function address ranges, for each function do: -- Find the abbreviation corresponding to the function -- Patch .debug_abbrev to replace DW_AT_low_pc with DW_AT_ranges and DW_AT_high_pc with DW_AT_producer (I'll explain this hack below). Also patch the corresponding forms to DW_FORM_sec_offset and DW_FORM_string (null-terminated in-place string). -- Patch debug_info with the .debug_ranges offset in place of the first 4 bytes of DW_AT_low_pc (DW_AT_ranges only occupies 4 bytes whereas low_pc occupies 8), and write an arbitrary string in-place in the other 12 bytes that were the 4 MSB of low_pc and the 8 bytes of high_pc before the patch. This depends on low_pc and high_pc being put consecutively by the compiler, but it serves to validate the idea. I tried another way of doing it that does not rely on this but it didn't work either and I believe the reason for either not working is the same (and still unknown, but unrelated to them. I might be wrong though, and if I find yet another way of doing it I may try it). The other way was to use a form of DW_FORM_data8 for the section offset. This is disallowed by the specification, but I doubt gdb validates this, as it's just easier to store it as 64-bit anyway as this is even necessary to support 64-bit DWARF (which is not what gcc generates by default apparently). I still need to make changes to the diff to make it production-ready, but first I want to figure out why it doesn't work as expected. By looking at the output of llvm-dwarfdump or readelf, all of .debug_ranges, .debug_abbrev and .debug_info seem to have been correctly updated. However, gdb seems to have serious problems with what we write. (In fact, readelf --debug-dump=Ranges shows some funny warning messages of the form ("Warning: There is a hole [0x100 - 0x120] in .debug_ranges"), but I played around with this and it seems it's just because no compile unit was using these ranges. Changing .debug_info apparently changes these warnings, so they seem to be unrelated to the section itself. Also looking at the hex dump of the section doesn't help, as everything seems fine. llvm-dwarfdump doesn't say anything. So I think .debug_ranges is fine.) The result is that gdb not only doesn't show the function name as we wanted, but it also stops showing line number information. Apparently it's not reading/interpreting the address ranges at all, and so the functions now have no associated address ranges, only the symbol value which allows one to put a breakpoint in the function, but not to show source code. As this left me without more ideas of what to try to feed gdb with, I believe the most promising next trial is to try to debug gdb itself, unless someone spots anything I missed. I found where the interesting part of the code lies for this case (gdb/dwarf2read.c and some other related files, but mainly that one). It seems in some parts gdb uses DW_AT_ranges for only getting its lowest and highest addresses and setting that as low_pc and high_pc (see dwarf2_get_pc_bounds in gdb's code and where it's called). I really hope this is not actually the case for function address ranges. I'll investigate this further. Otherwise I don't think any changes we make will make it work as initially intended, as we'll simply need gdb to support it and in that case it doesn't. (cherry picked from FBD3073641)	2016-03-16 18:08:29 -07:00
Gabriel Poesia	9cdb7bdb55	Write only minimal .debug_line information. Summary: We used to output .debug_line information for every instruction, but because of the way gdb (and probably lldb as of llvm::DWARFDebugLine::LineTable::findAddress) queries the line table it's not necessary to output information for two instructions if they follow each other and map to the same source line. By not repeating this information we generate a bit less .debug_line data. (cherry picked from FBD3056402)	2016-03-15 16:22:04 -07:00
Maksim Panchenko	a60914427c	Update DW_AT_ranges for CU when it exists. Summary: If CU has DW_AT_ranges update the value. Note that it does not create DW_AT_ranges attribute. (cherry picked from FBD3051904)	2016-03-14 19:04:23 -07:00
Maksim Panchenko	d01172ffa8	Refactor existing debugging code. Summary: Almost NFC. Isolate code for updating debug info. (cherry picked from FBD3051536)	2016-03-14 18:48:05 -07:00
Gabriel Poesia	dc7cc1fb18	Fix default line number information for instructions. Summary: The line number information generated from a null pointer was actually valid, which caused new instructions without the line number information set to have a valid and wrong line number reference. This diff fixes this by making the null pointer be assigned to an invalid line number row. (cherry picked from FBD3048453)	2016-03-14 11:40:52 -07:00
Gabriel Poesia	80ea31b24e	Write updated .debug_aranges section after optimizations. Summary: Write the .debug_aranges section after optimizations to the output binary. Each function generates at least one range and at most two (one extra for its cold part). The writing is done manually because LLVM's implementation is tied to the output of .debug_info (see EmitGenDwarfInfo and EmitGenDwarfARanges in lib/MC/MCDwarf.cpp), which we don't want to trigger right now. (cherry picked from FBD3043108)	2016-03-11 11:30:30 -08:00
Maksim Panchenko	e7e9e15b90	Check function data in symbol table against data in .eh_frame. Summary: At the moment we rely solely on the symbol table information to discover function boundaries. However, similar information is contained in .eh_frame. Verify that the information from these two sources is consistent, and if it's not, then skip processing the functions with conflicting information. (cherry picked from FBD3043800)	2016-03-11 11:09:34 -08:00
Maksim Panchenko	f2df1a8d97	Update stmt_list value to point to new .debug_line offset. Summary: After we add new line number information we have to update stmt_list offsets in .debug_info. For this I had to add a primitive relocations support for non-allocatable sections we are copying from input file. Also enabled functionality to process relocations in non-allocatable sections that LLVM is generating, such as .debug_line. I thought we already had it, but apparently it didn't work, at least not for ELF binaries. (cherry picked from FBD3037903)	2016-03-09 16:06:41 -08:00
Maksim Panchenko	9212a9ad69	Proper skipping of unsupported CFI instructions. Summary: Skip DW_CFA_expression and DW_CFA_val_expression instructions properly, according to DWARF spec. If CFI range does not match function range skip that function. (cherry picked from FBD3040502)	2016-03-10 23:03:17 -08:00
Gabriel Poesia	73c9f0abe3	Write updated .debug_line information to temp file Summary: Writes .debug_line section by setting the state in MCContext that LLVM needs to produce and output the line tables. This basically consists of setting the current location and compile unit offset. This makes LLVM output .debug_line in the temporary file, but not yet in the generated ELF file. Also computes the line table offsets for each compile unit and saves them into BinaryContext. Added an option to print these offsets. (cherry picked from FBD3004554)	2016-03-02 18:40:10 -08:00
Maksim Panchenko	d68b1c7b16	Extending support for non-allocatable sections. Summary: The is a set of changes that allow modification of non-allocatable sections in ELF binary. Primarily for the purpose of updating debug info. Extend LLVM interface to allow processing relocations in non-allocatable sections. This allows to produce .debug* sections with resolved relocations against generated code. Extend BOLT rewriting framework to allow appending contents to non-allocatable sections in the binary. Re-worked ELF binary rewriting to support the above and to allow future extensions (e.g. new section names). (cherry picked from FBD3023403)	2016-03-03 10:13:11 -08:00
Gabriel Poesia	77a6b72842	BOLT: Read and tie .debug_line info to IR. Summary: Reads information in the DWARF .debug_line section using LLVM and tie every MCInst to one line of a line table from the input binary. Subsequent diffs will update this information to match the final binary layout and output updated line tables. (cherry picked from FBD2989813)	2016-02-25 16:57:07 -08:00
Maksim Panchenko	62da18d32a	Always split functions under '-split-functions=1' option. Summary: Force the splitting of the function into hot/cold even when the function fits into original slot. This reduces BOLT optimization time by 50% without affecting hhvm performance. (cherry picked from FBD2973773)	2016-02-22 16:49:26 -08:00
Maksim Panchenko	73e9afe99c	Don't abort on unknown CFI instructions. Summary: If we see an unknown CFI instruction, skip processing the function containing it instead of aborting execution. (cherry picked from FBD2964557)	2016-02-22 18:25:43 -08:00
Maksim Panchenko	7f7d4af7e0	Add an option to use PT_GNU_STACK for new segment. Summary: Added an option to reuse existing program header entry. This option allows for bfd tools like strip and objcopy to operate on the optimized binary without destroying it. Also, all new sections are now properly marked in ELF. (cherry picked from FBD2943339)	2016-02-12 19:01:53 -08:00
Maksim Panchenko	50c895ad0c	Drop requirement for __flo_storage in the input binary. Summary: We used to require pre-allocated space in the input binary so that we can write extra sections in there (.eh_frame, .eh_frame_hdr, .gcc_except_table, etc.). With this diff there's no further need for pre-allocated storage as we create a new segment and can use as much space as needed. There are certain limitations on where the new segment could be allocated, and as a result the size of the file may increase. There's currently a limitation if the binary size is close to 4GB we cannot allocate new segment prior to that and as a result we require debug info to be stripped to reduce the file size. The fix is in progress. (cherry picked from FBD2916029)	2016-02-08 10:02:48 -08:00
Maksim Panchenko	e1a61e1eed	Keep intermediate .o file only under -keep-tmp option. Summary: We use intermediate .o file for debugging purposes, but there's no reason to generate it by default. Only do it if "-keep-tmp" is specified. (cherry picked from FBD2912098)	2016-02-08 10:08:28 -08:00
Maksim Panchenko	d1526083fc	Rename binary optimizer to BOLT. Summary: BOLT - Binary Optimization and Layout Tool replaces FLO. I'm keeping .fdata extension for "feedback data". (cherry picked from FBD2908028)	2016-02-05 14:42:04 -08:00
Maksim Panchenko	628d06b1e5	Preserve layout of basic blocks with 0 profile counts. Summary: Preserve original layout for basic blocks that have 0 execution count. Since we don't optimize for size, it's better to rely on the original input order. (cherry picked from FBD2875335)	2016-01-21 14:18:30 -08:00
Maksim Panchenko	b91d1f1299	Enable REPNZ prefix support. Summary: I didn't see a case where REPNZ were not disassembled/reassembled properly. (cherry picked from FBD2869229)	2016-01-26 17:53:08 -08:00
Maksim Panchenko	218c5f0916	Fix a bug with outlining first basic block. Summary: We should never outline the first basic block. Also add an option to accept a file with the list of functions to optimize. (cherry picked from FBD2868184)	2016-01-26 16:03:58 -08:00
Maksim Panchenko	89578e2314	Allow to partially split functions with exceptions. Summary: We could split functions with exceptions even without creating a new exception handling table. This limits us to only move basic blocks that never throw, and are not a start of a landing pad. (cherry picked from FBD2862937)	2016-01-22 16:45:39 -08:00
Maksim Panchenko	bbb745efa9	Don't create empty basic blocks. Fix CFI bug. Summary: Some basic blocks were created empty because they only contained alignment nop's. Ignore such nop's before basic block gets created. Fixed intermittent aborts related to CFI update. (cherry picked from FBD2844465)	2016-01-19 00:20:06 -08:00
Maksim Panchenko	4a44d187c6	Handle more CFI cases and some. Summary: * Update CFI state for larger range of functions to increase coverage. * Issue more warnings indicating reasons for skipping functions. * Print top called functions in the binary. (cherry picked from FBD2839734)	2016-01-16 14:58:22 -08:00
Maksim Panchenko	d9536e6092	Added an option to reverse original basic blocks order. Summary: Modified processing of "-reorder-blocks=" option and added an option to reverse original basic blocks order for testing purposes. (cherry picked from FBD2829862)	2016-01-13 17:19:40 -08:00
Maksim Panchenko	c9b7e3e09e	Write updated LSDA's. Summary: Write new exception ranges tables (LSDA's) into the output file. (cherry picked from FBD2828312)	2015-12-18 17:00:46 -08:00
Maksim Panchenko	b42c72cbf6	Fix issues with some CFI instructions with gcc 4.9. Summary: Fixes some issues discovered after hhvm switched to gcc 4.9. Add support for DW_CFA_GNU_args_size instruction. Allow CFI instruction after the last instruction in a function. Reverse conditions of assert for DW_CFA_set_loc. (cherry picked from FBD28110096)	2015-12-18 20:26:44 -08:00
Maksim Panchenko	a6efd11c05	Code/comments cleanup. Summary: Consolidate cold function info under cold FragmentInfo. Minor code and comment mods to LSDA handling. (cherry picked from FBD28109981)	2015-12-17 12:59:15 -08:00
Maksim Panchenko	e2fcb371a8	Ignore functions referencing symbol at 0x0. Summary: Binary code could be weird. It could include calls to address 0 and reference data at 0 (e.g. with lea on x86). LLVM JIT fatals while resolving relocations against symbols at address 0x0. For now we will stop emitting such code, i.e. we'll skip functions. (cherry picked from FBD28109837)	2015-12-16 17:56:49 -08:00
Maksim Panchenko	f7d7a85a24	Turn EH ranges support back on. Summary: Changed the way EH info is stored/extracted from call instruction. Make sure indirect calls work. (cherry picked from FBD28109629)	2015-12-15 17:06:27 -08:00
Rafael Auler	fb6e8c5d0b	Don't touch functions whose internal BBs are targets of interprocedural branches Summary: In a test binary, we found 8 cases where code in a function A would jump to the middle of another function B. In this case, we cannot reorder function B because this would change instruction offsets and break the program. This is pretty rare but can happen in code written in assembly. (cherry picked from FBD2719850)	2015-12-03 13:29:52 -08:00
Rafael Auler	9a73a8c446	Turns off basic block alignment by default Summary: We found out that the insertion of extra nops to preserve alignment of some loop bodies do not pay off the increased function size, since this extra size may inhibit us from rewriting a reordered version of this function. (cherry picked from FBD2718466)	2015-12-03 09:45:18 -08:00
Rafael Auler	04c80af012	Don't choke on DW_CFA_def_cfa_expression and friends Summary: Our CFI parser in the LLVM library was giving up on parsing all CFI instructions when finding a single instruction with expression operands. Yet, all gcc-4.9 binaries seem to have at least one CFI instruction with expression operands (DW_CFA_def_cfa_expression). This patch fixes this and makes DebugInfo continue to parse other instructions, even though it does not completely parse DWARF expressions yet. However, this seems to be enough to allow llvm-flo to process gcc-4.9 binaries because the FDEs with DWARF expressions are linked to the PLT region, and not to functions that we process. If we ever try to read a function whose CFI depends on DWARF expression, which is unlikely, llvm-flo will assert. (cherry picked from FBD2693088)	2015-11-24 13:55:44 -08:00
Rafael Auler	d6f01452d1	Change function splitting to be a two-pass process Summary: This patch builds upon the previous patch to create a two-pass process to function splitting. We first perform the full rewriting pipeline to discover which functions need splitting. Afterwards, we restart the pipeline with those functions annotated to be split. (cherry picked from FBD2691709)	2015-11-24 09:29:41 -08:00
Rafael Auler	c67a753e3c	Refactoring llvm-flo.cpp into a new class RewriteInstance, NFC. Summary: Previously, llvm-flo.cpp contained a long function doing lots of different tasks. This patch refactors this logic into a separate class with different member functions, exposing the relationship between each step of the rewritting process and making it easier to coordinate/change it. (cherry picked from FBD2691674)	2015-11-23 17:54:18 -08:00
Rafael Auler	ccbbb8f8b9	Teach llvm-flo how to split functions into hot and cold regions Summary: After basic block reordering, it may be possible that the reordered function is now larger than the original because of the following reasons: - jump offsets may change, forcing some jump instructions to use 4-byte immediate operand instead of the 1-byte, shorter version. - fall-throughs change, forcing us to emit an extra jump instruction to jump to the original fall-through at the end of a basic block. Since we currently do not change function addresses, we need to rewrite the function back in the binary in the original location. If it doesn't fit, we were dropping the function. This patch adds a flag -split-functions that tells llvm-flo to split hot functions into hot and cold separate regions. The hot region is written back in the original function location, while the cold region is written in a separate, far-away region reserved to flo via a linker script. This patch also adds the logic to create and extra FDE to supply unwinding information to the cold part of the function. Owing to this, we now need to rewrite .eh_frame_hdr to another location and patch the EH_FRAME ELF segment to point to this new .eh_frame_hdr. (cherry picked from FBD2677996)	2015-11-19 17:59:41 -08:00
Rafael Auler	38dac03e6b	Make llvm-flo print dynamic coverage of rewritten functions Summary: This is an attempt at determining the hotness of functions we are rewriting and help detect if we are discarding hot functions. This patch introduces logic to estimate the number of instructions executed in each function by using the profile data for branches. It sums the products of BB frequency and size. Since we can only do this for functions we have successfully disassembled, created the CFG and annotated with profiling data, all complex functions that were not disassembled are left out from this analysis. (cherry picked from FBD2654985)	2015-11-13 15:27:59 -08:00
Rafael Auler	75798a891b	Do not bail on functions with indirect calls Summary: Previously, we were marking functions with indirect calls as too complex to be disassembled, but this was unnecessarily conservative. This patch removes this restriction. (cherry picked from FBD2669627)	2015-11-02 09:46:50 -08:00
Rafael Auler	7886f4e81a	Ignore LSDA information for now Summary: Teach llvm-flo to drop on function with LSDA information until we know how to update them after block reordering. (cherry picked from FBD2640806)	2015-11-10 17:21:42 -08:00
Rafael Auler	1d248ec51b	Write .eh_frame and .eh_frame_hdr after reordering BBs Summary: This patch adds logic to detect when the binary has extra space reserved for us via the __flo_storage symbol. If this symbol is present, it means we have extra space in the binary to write extraneous information. When we write a new .eh_frame, we cannot discard the old .eh_frame because it may still contain relevant information for functions we do not reorder. Thus, we write the new .eh_frame into __flo_storage and patch the current .eh_frame_hdr to point to the new .eh_frame only for the functions we touched, generating a binary that works with a bi-.eh_frame model. (cherry picked from FBD2639326)	2015-11-10 15:20:50 -08:00
Rafael Auler	70db5677fb	Write updated CFI to temporary object file Summary: This patch is an intermediary step towards updating the CFI in the optimized binary. It adds the logic necessary to output our CFI annotations to a new .eh_frame in the temporary object file we create to hold rewritten functions. The next step will be to fully integrate this new .eh_frame into the optimized binary. (cherry picked from FBD2633728)	2015-11-09 11:08:02 -08:00
Rafael Auler	6c851dc2e3	Attempts to fix CFI state after reordering Summary: This patch introduces logic to check how the CFI instructions define a table to help during stack unwinding at exception run time and attempts to fix any problem in this table that may have been introduced by reordering the basic blocks. If it fails to fix this problem, the function is marked as not simple and not eligible for rewriting. (cherry picked from FBD2633696)	2015-11-08 12:23:54 -08:00
Maksim Panchenko	bc9d6e3b6c	Regenerate exception handling information after optimizations. Summary: Regenerate exception handling information after optimizations. Use '-print-eh-ranges' to see CFG with updated ranges. (cherry picked from FBD2660982)	2015-11-13 14:18:45 -08:00
Maksim Panchenko	56cca2fb5b	Fix LSDA reading issues. Summary: There were two issues: we were trying to process non-simple functions, i.e. function that we don't fully understand, and then we failed to stop iterating if EH closing label was after the last instruction in a function. (cherry picked from FBD2664460)	2015-11-17 11:02:04 -08:00
Maksim Panchenko	be2a19523c	Add exception handling information to CFG. Summary: Read .gcc_except_table and add information to CFG. Calls have extra operands indicating there's a possible handler for exceptions and an action. Landing pad information is recorded in BinaryFunction. Also convert JMP instructions that are calls into tail calls pseudo instructions so that they don't miss call instruction analysis. (cherry picked from FBD2652775)	2015-11-12 18:56:58 -08:00
Rafael Auler	2117362a09	Revert 45fc13b as it breaks HHVM rewriting Summary: Reverting this commit until we better investigate why it is necessary to change local symbol names with a prefix. (cherry picked from FBD28109521)	2015-11-12 10:41:46 -08:00
Rafael Auler	1df130ae17	Remove add PG prefix from symbols that are already local Summary: After discussion with Maksim, we decided to drop the lines that add the PG prefix if the symbol is already local, since they wouldn't be impacted by the way LLVM handles these symbols. (cherry picked from FBD28109400)	2015-11-12 10:02:12 -08:00
Rafael Auler	e80d11f27a	Fix bug in local symbol name disambiguation algorithm Summary: This bug would cause llvm-flo to fail to disambiguate two local symbols with the same file name, causing two different addresses to compete in the symbol table for the resolution of a given name, causing unpredicted behavior in the linker. (cherry picked from FBD2646626)	2015-11-11 23:56:24 -08:00
Rafael Auler	a30d04c3e2	Annotate BinaryFunctions with MCCFIInstructions encoding CFI Summary: In order to represent CFI information in our BinaryFunction class, this patch adds a map of Offsets to CFI instructions. In this way, we make it easy to check exactly where DWARF CFI information is annotated in the disassembled function. (cherry picked from FBD2619216)	2015-11-04 16:48:47 -08:00
Maksim Panchenko	de46e6fc07	Parse whole contents of .gcc_except_table even if we are not printing. Summary: We need to parse the whole contents of .gcc_except_table even if we are not printing exceptions. Otherwise we are missing type index table and miscalculate the size of the current table. (cherry picked from FBD2632965)	2015-11-09 12:27:13 -08:00
Rafael Auler	2088875656	Teach llvm-flo how to read .eh_frame information from binaries Summary: In order to reorder binaries with C++ exceptions, we first need to read DWARF CFI (call frame info) from binaries in a table in the .eh_frame ELF section. This table contains unwinding information we need to be aware of when reordering basic blocks, so as to avoid corrupting it. This patch also cleans up some code from Exceptions.cpp due to a refactoring where we moved some functions to the LLVM's libSupport. (cherry picked from FBD2614464)	2015-11-05 13:37:30 -08:00
Maksim Panchenko	7d592d0975	Verbose printing of actions from .gcc_except_table Summary: Print actions for exception ranges from .gcc_except_table. Types are printed as names if the name is available from symbol table. (cherry picked from FBD2612631)	2015-11-03 14:26:33 -08:00
Maksim Panchenko	21cc191ea8	Added function to parse and dump .gcc_except_table Summary: Use '-print-exceptions' option to dump contents of .gcc_except_table. (cherry picked from FBD2609925)	2015-11-02 11:50:53 -07:00
Rafael Auler	0e8998713c	Extract non-taken branch frequencies from LBR Summary: Previously, we inferred all non-taken branch frequencies with the information we had for taken branches. This patch teaches perf2flo and llvm-flo how to read and incorporate non-taken branch frequencies directly from the traces available in LBR data and by disassembling the binary. It still leaves the inference engine untouched in case we need it to fill out other fall-throughs. (cherry picked from FBD2589212)	2015-10-26 15:00:56 -07:00
Rafael Auler	13a520ab30	Implement two cluster layout heuristics Summary: Pettis' paper on block layout (PLDI'90) suggests we should order clusters (or chains, using the paper terminology) using a specific criterion. This patch implements two distinct ideas for cluster layout that can be activated using different command-line flags. The first one reflects Pettis' ideas on minimizing branch mispredictions and the second one is targeted at reducing I-cache misses, described in the Ispike paper (CGO'04). (cherry picked from FBD2588693)	2015-10-23 09:38:26 -07:00
Rafael Auler	2539539bde	Fixes priority queue ordering in llvm-flo block reordering Summary: Fixes a bug which caused the block reordering heuristic to put in the same cluster hot basic blocks and cold basic blocks, increasing I-cache misses. (cherry picked from FBD2588203)	2015-10-27 03:04:58 -07:00
Maksim Panchenko	d4d773458c	More control over function printing. Summary: Can use '-print-*' option to print function at specific stage. Use '-print-all' to print at every stage. (cherry picked from FBD2578196)	2015-10-23 15:52:59 -07:00
Maksim Panchenko	7f44331773	Issue warning when relaxed tail call is seen on input. Summary: Issue warning when we see a 2-byte tail call. Currently we will increase the size of these instructions. (cherry picked from FBD2575520)	2015-10-20 10:51:17 -07:00
Rafael Auler	546c4e6e84	Fix bug in BinaryFunction::fixBranches() in llvm-flo Summary: When the ignore-nops patch landed, it exposed a bug in fixBranches() where it ignored empty BBs. However, we cannot ignore empty BBs when it is reordered and its fall-through changes. We must update it with a jump to the original fall-through. This patch fixes this. (cherry picked from FBD2568244)	2015-10-21 16:25:16 -07:00
Rafael Auler	dc848b5376	Fix entry BB execution count in llvm-flo Summary: When we have tailcalls, the execution count for the entry point is wrongly computed. Fix this. (cherry picked from FBD2563112)	2015-10-20 16:48:54 -07:00
Rafael Auler	ab63ca9afb	Implement unreachable BB elimination in llvm-flo Summary: It is important to remove dead blocks to free up space in functions and allow us to reorder blocks or align branch targets with more freedom. This patch implements a simple algorithm to delete all basic blocks that are not reachable from the entry point. Note that C++ exceptions may create "unreachable" blocks, so this option must be used with care. (cherry picked from FBD2562637)	2015-10-20 12:47:37 -07:00
Rafael Auler	9f41a0d263	Do not schedule BBs before the entry point Summary: SPEC CPU2006 perlbench triggered a bug in our heuristic block reordering algorithm where a hot edge that targets the entry point (as in a recursive tail call) would make us try to allocate the call site before the function entry point. Since we don't update function addresses yet, moving the entry point will corrupt the program. This patch fixes this. (cherry picked from FBD2562528)	2015-10-20 12:30:22 -07:00
Rafael Auler	b0115a4536	Teach llvm-flo how to handle two back-to-back JMPs Summary: If we have two consecutive JMP instructions and no branches to the second one, the second one is dead code, but llvm-flo does not handle these cases properly and put two JMPs in the same BB. This patch fixes this, putting the extraneous JMP in a separate block, making it easy for us to detect it is dead code and remove it later in a separate step. (cherry picked from FBD2562465)	2015-10-20 10:17:38 -07:00
Maksim Panchenko	85b99eb7b7	Eliminate nop instruction in input and derive alignment. Summary: Nop instructions are primarily used for alignment purposes on the input. We remove all nops when we build CFG and derive alignment of basic blocks based on existing alignment and a presence of nops before it. This will not always work as some basic blocks will be naturally aligned without necessity for nops. However, it's better than random alignment. We would also add heuristics for BB alignment based on execution profile. (cherry picked from FBD2561740)	2015-10-20 10:51:17 -07:00
Rafael Auler	cd6250d1e3	Fixes branches after reordering basic blocks in a binary function Summary: Adds logic in BinaryFunction to be able to fix branches (invert its condition, delete or add a branch), making the new function work with the new layout proposed by the layout pass. All the architecture-specific content was designed to live in the LLVM Target library, in the MCInstrAnalysis pass. For now, we only introduce such logic to the X86 backend. (cherry picked from FBD2551479)	2015-10-16 09:49:04 -07:00
Rafael Auler	ef059af3d1	Fix bug in block reorder heuristic Summary: Tests with SPEC CPU2006 400.perlbench exposed a bug in the block reordering heuristic that happened when two blocks are both successor and predecessor of each other. This patch fixes this. (cherry picked from FBD2555835)	2015-10-19 10:43:54 -07:00
Rafael Auler	31e6bd1226	Fix missing sanity check in BinaryFunction::optimizeLayout() Summary: SPEC CPU2006 perlbench exposed a bug in BinaryFunction::optimizeLayout() where it would try to optimize the layout even though the function had zero basic blocks. This patch simply checks if the function has zero basic blocks and bails out. (cherry picked from FBD2556831)	2015-10-19 13:23:03 -07:00
Maksim Panchenko	b4ed5cc942	Make FLO work on hhvm binary. Summary: Fixes several issues that prevented us from running hhvm binary. (cherry picked from FBD2543057)	2015-10-14 15:35:14 -07:00
Rafael Auler	ec22caff1e	Fix comments. NFC. Summary: Updated comments in BinaryFunction class. (cherry picked from FBD28108888)	2015-10-16 17:15:00 -07:00
Rafael Auler	9a8d357d0b	Fix DataReader to work with new local sym perf2flo format Summary: In a recent commit, we changed local symbols to be specially tagged with the number 2 (local sym) instead of 1 (sym). This patch modifies the reader to don't choke when seeing a 2 in the symbol id field. (cherry picked from FBD2552776)	2015-10-16 17:00:36 -07:00
Rafael Auler	f9ed45893b	Teach llvm-flo how to reorder blocks in an optimal way Summary: This patch implements a dynamic programming approach to solve reorder basic blocks with profiling information in an optimal way. Since this is analogous to TSP, it is NP-hard and the algorithm is exponential in time and memory consumption. Therefore, we only use the optimal algorithm to decide the layout of small functions (with less than 11 basic blocks). (cherry picked from FBD2544124)	2015-10-14 16:58:55 -07:00
Rafael Auler	34f7085503	Teach llvm-flo how to reorder basic blocks with a heuristic Summary: This patch introduces a first approach to reorder basic blocks based on profiling data that gives us the execution frequency for each edge. Our strategy is to layout basic blocks in a order that maximizes the weight (hotness) of branches that will be deleted. We can delete branches when src comes right before dst in the new layout order. This can be reduced to the TSP problem. This patch uses a greedy heuristic to solve the problem: we start with a graph with no edges and progressively add edges by choosing the hottest edges first, building a layout order that attempts to put BBs with hot edges together. (cherry picked from FBD2544076)	2015-10-13 12:18:54 -07:00
Rafael Auler	9b58b2e64b	Make llvm-flo infer branch count data for fall-through edges Summary: The LBR only has information about taken branches and does not record information when a branch is not taken. In our CFG, we call these edges "fall-through" edges. This patch teaches llvm-flo how to infer fall-through edge frequencies. (cherry picked from FBD2536633)	2015-10-13 10:25:45 -07:00
Maksim Panchenko	f79f6302c1	Converted local offsets from uint64_t to uint32_t. Refactoring. (cherry picked from FBD2543557)	2015-10-14 16:46:59 -07:00
Rafael Auler	4c1da22ae9	Add branch count information to binary CFG Summary: Changes DataReader to organize branch perf data per function name and sets up logistics to bring this data to BinaryFunction::buildCFG(). To do this, we expand BinaryContext with a const reference to DataReader. This patch also adds the "-dump-functions" flag to force llvm-flo to dump the current state of BinaryFunctions once they are disassembled and their CFG built, allowing us to test whether the builder is sane with LLVM LIT tests. (cherry picked from FBD2534675)	2015-10-12 12:30:47 -07:00
Maksim Panchenko	d30423f872	Don't bail out if there's no input data file specified. Summary: Don't attempt to read data file if it was not specified by the user. (cherry picked from FBD2533440)	2015-10-12 14:46:18 -07:00
Maksim Panchenko	ffcc2be7fa	FLO: added support for rip-relative operands. Summary: Detect and replace rip-relative operands with relocations. (cherry picked from FBD2529818)	2015-10-09 21:47:18 -07:00
Maksim Panchenko	f166c4ab2b	Fix CFG building issue. Summary: Fixed getBasicBlockContainingOffset() to return correct basic block. (cherry picked from FBD2532514)	2015-10-12 12:12:16 -07:00
Rafael Auler	e1a539b0ec	Add initial implementation of DataReader Summary: This patch introduces DataReader, a module responsible for parsing llvm flo data files into in-memory data structures. (cherry picked from FBD2515754)	2015-10-05 18:31:25 -07:00
Maksim Panchenko	9a2fe7ebe4	Commit FLO with control flow graph. Summary: llvm-flo disassembles, builds control flow graph, and re-writes simple functions. (cherry picked from FBD2524024)	2015-10-09 17:21:14 -07:00
Maksim Panchenko	7927c14ff5	Fixed cmake. (cherry picked from FBD28108725)	2015-10-02 12:38:07 -07:00
Maksim Panchenko	a89c417357	Removed remote .arcconfig + comment change. (cherry picked from FBD2503821)	2015-10-02 12:06:31 -07:00
Maksim Panchenko	575b24d719	Initial FLO commit. Summary: Directory created. (cherry picked from FBD28105260)	2015-10-02 11:55:15 -07:00
Maksim Panchenko	25b976aa12	BOLT root commit	2022-01-10 17:58:05 -08:00

... 18 19 20 21 22 ...

1191 Commits