llvm-project

Commit Graph

Author	SHA1	Message	Date
Bill Nell	5feee9f1d8	[BOLT] More CG refactoring Summary: Do some additional refactoring of the CallGraph class. Add a BinaryFunctionCallGraph class that has the BOLT specific bits. This is in preparation to moving the generic CallGraph class into a library that both BOLT and HHVM can use. Make data members of CallGraph private and add the appropriate accessor methods. (cherry picked from FBD5143468)	2017-05-26 15:46:46 -07:00
Maksim Panchenko	95ab659fe4	[BOLT] Do not assert on an empty location list. Summary: Clang generates an empty debug location list, which doesn't make sense, but we probably shouldn't assert on it and instead issue a warning in verbosity mode. There is only a single empty location list in the whole llvm binary. (cherry picked from FBD5166666)	2017-06-01 12:30:52 -07:00
Bill Nell	733e8c464f	HFSort/call graph refactoring Summary: I've factored out the call graph code from dataflow and function reordering code and done a few small renames/cleanups. I've also moved the function reordering pass into a separate file because it was starting to get big. I've got more refactoring planned for hfsort/call graph but this is a start. (cherry picked from FBD5140771)	2017-05-26 12:53:21 -07:00
Bill Nell	9b190cc74b	[BOLT] Fix SCTC again again. Summary: I put the const_cast<BinaryFunction *>(this) on the wrong version of getBasicBlockAfter(). It's on the right one now. (cherry picked from FBD5159127)	2017-05-31 14:23:37 -07:00
Maksim Panchenko	6c32079d57	[BOLT] Update addresses for DW_TAG_GNU_call_site and DW_TAG_label. Summary: Some DWARF tags (such as GNU_call_site and label) reference instruction addresses in the input binary. When we update debug info we need to update these tags too with new addresses. Also fix base address used for calculation of output addresses in relocation mode. (cherry picked from FBD5155814)	2017-05-31 09:36:49 -07:00
Bill Nell	35d2530a40	[BOLT] Fix SCTC again. Summary: Respect hot/cold boundaries when using BinaryFunction::getBasicBlockAfter(). (cherry picked from FBD5153379)	2017-05-30 19:06:22 -07:00
Maksim Panchenko	2e744e6867	[BOLT] Emit sorted DWARF ranges and location lists. Summary: When producing address ranges and location lists for debug info add a post-processing step that sorts them and merges adjacent entries. Fix a memory allocation/free issue for .debug_ranges section. (cherry picked from FBD5130583)	2017-05-24 15:20:27 -07:00
Bill Nell	96943d2f4b	Add option to generate function order file. Summary: Add -generate-function-order=<filename> option to write the computed function order to a file. We can read this order in later rather than recomputing each time we process a binary with BOLT. (cherry picked from FBD5127915)	2017-05-24 18:40:29 -07:00
Maksim Panchenko	2428567f7d	[BOLT] Fix no-assertions build. (cherry picked from FBD5130285)	2017-05-25 10:29:38 -07:00
Maksim Panchenko	174e3a825b	[BOLT] Fix C++ ABI function alignment. Summary: C++ functions have to be aligned at 2-bytes minimum on x86-64. (cherry picked from FBD5128185)	2017-05-24 21:59:01 -07:00
Bill Nell	5cd58961a9	Add .bolt_info notes section containing BOLT revision and command line args. Summary: Optinally add a .bolt_info notes section containing BOLT revision and command line args. The new section is controlled by the -add-bolt-info flag which is on by default. (cherry picked from FBD5125890)	2017-05-24 14:14:16 -07:00
Rafael Auler	2ee4bbd3c1	[BOLT] Optimize jump tables with hot entries Summary: This diff is similar to Bill's diff for optimizing jump tables (and is built on top of it), but it differs in the strategy used to optimize the jump table. The previous approach loads the target address from the jump table and compare it to check if it is a hot target. This accomplishes branch misprediction reduction by promote the indirect jmp to a (more predictable) direct jmp. load %r10, JMPTABLE cmp %r10, HOTTARGET je HOTTARGET ijmp [JMPTABLE + %index * scale] The idea in this diff is instead to make dcache better by avoiding the load of the jump table, leaving branch mispredictions as a secondary target. To do this we compare the index used in the indirect jmp and if it matches a known hot entry, it performs a direct jump to the target. cmp %index, HOTINDEX je CORRESPONDING_TARGET ijmp [JMPTABLE + %index * scale] The downside of this approach is that we may have multiple indices associated with a single target, but we only have profiling to show which targets are hot and we have no clue about which indices are hot. INDEX TARGET 0 4004f8 8 4004f8 10 4003d0 18 4004f8 Profiling data: TARGET COUNT 4004f8 10020 4003d0 17 In this example, we know 4004f8 is hot, but to make a direct call to it we need to check for indices 0, 8 and 18 -- 3 comparisons instead of 1. Therefore, once we know a target is hot, we must generate code to compare against all possible indices associated with this target because we don't know which index is the hot one (IF there's a hotter index). cmp %index, 0 je 4004f8 cmp %index, 8 je 4004f8 cmp %index, 18 je 4004f8 (... up to N comparisons as in --indirect-call-promotion-topn=N ) ijmp [JMPTABLE + %index * scale] (cherry picked from FBD5005620)	2017-05-01 14:04:40 -07:00
Bill Nell	3a3bcd767e	Don't add useless uncond branch to fallthroughs when running SCTC. Summary: SCTC was sometimes adding unconditional branches to fallthrough blocks. This diff checks to see if the unconditional branch is really necessary, e.g. it's not to a fallthrough block. (cherry picked from FBD5098493)	2017-05-19 14:45:46 -07:00
Maksim Panchenko	96adec51eb	[BOLT] Rework debug info processing. Summary: Multiple improvements to debug info handling: * Add support for relocation mode. * Speed-up processing. * Reduce memory consumption. * Bug fixes. The high-level idea behind the new debug handling is that we don't save intermediate state for ranges and location lists. Instead we depend on function and basic block address transformations to update the info as a final post-processing step. For HHVM in non-relocation mode the peak memory went down from 55GB to 35GB. Processing time went from over 6 minutes to under 5 minutes. (cherry picked from FBD5113431)	2017-05-16 09:27:34 -07:00
Rafael Auler	511a1c78b2	[BOLT] Add dataflow infrastructure Summary: This diff introduces a common infrastructure for performing dataflow analyses in BinaryFunctions as well as a few analyses that are useful in a variety of scenarios. The largest user of this infrastructure so far is shrink wrapping, which will be added in a separate diff. (cherry picked from FBD4983671)	2017-05-01 16:51:27 -07:00
Maksim Panchenko	457b7f14b9	[BOLT] Fix debug info for input with continuous range. Summary: When we see a compilation unit with continuous range on input, it has two attributes: DW_AT_low_pc and DW_AT_high_pc. We convert the range to a non-continuous one and change the attributes to DW_AT_ranges and DW_AT_producer. However, gdb seems to expect every compilation unit to have a base address specified via DW_AT_low_pc, even when its value is always 0. Otherwise gdb will not show proper debug info for such modules. With this diff we produce DW_AT_ranges followed by DW_AT_low_pc. The problem is that the first attribute takes DW_FORM_sec_offset which is exactly 4 bytes, and in many cases we are left with 12 bytes to fill in. We used to fill this space with DW_AT_producer, which took an arbitrary-length field. For DW_AT_low_pc we can use a trick of using DW_FORM_udata (unsigned ULEB128 encoded integer) which can take up to 12 bytes, even when the value is 0. (cherry picked from FBD5109798)	2017-05-22 17:17:04 -07:00
Bill Nell	4806b13835	[BOLT] Add jump table support to ICP Summary: Add jump table support to ICP. The optimization is basically the same as ICP for tail calls. The big difference is that the profiling data comes from the jump table and the targets are local symbols rather than global. I've removed an instruction from ICP for tail calls. The code used to have a conditional jump to a block with a direct jump to the target, i.e. B1: cmp foo,(%rax) jne B3 B2: jmp foo B3: ... this code is now: B1: cmp foo,(%rax) je foo B2: ... The other changes in this diff: - Move ICP + new jump table support to separate file in Passes. - Improve the CFG validation to handle jump tables. - Fix the double jump peephole so that the successor of the modified block is updated properly. Also make sure that any existing branches in the block are modified to properly reflect the new CFG. - Add an invocation of the double jump peephole to SCTC. This allows us to remove a call to peepholes/UCE occurring after fixBranches() in the pass manager. - Miscellaneous cleanups to BOLT output. (cherry picked from FBD4727757)	2017-03-08 19:58:33 -08:00
Maksim Panchenko	c789d5137b	[BOLT] Add option to keep/generate .debug_aranges. Summary: GOLD linker removes .debug_aranges while generating .gdb_index. Some tools however rely on the presence of this section. Add an option to generate .debug_aranges if it was removed, or keep it in the file if it was present. Generally speaking .debug_aranges duplicates information present in .gdb_index addresses table. (cherry picked from FBD5084808)	2017-05-17 18:35:00 -07:00
Maksim Panchenko	69b586326c	[BOLT] Support adding new non-allocatable sections. Summary: We had the ability to add allocatable sections before. This diff expands this capability to non-allocatable sections. (cherry picked from FBD5082018)	2017-05-16 17:29:31 -07:00
Maksim Panchenko	3adb52d80e	[BOLT] Update .gdb_index section. Summary: Update address table in .gdb_index section. (cherry picked from FBD5068255)	2017-05-15 15:21:59 -07:00
Maksim Panchenko	3f42fdf7da	[BOLT] Update function address and size in relocation mode. Summary: Set function addresses after code emission but before we update debug info and symbol table entries. (cherry picked from FBD5029609)	2017-05-08 22:51:36 -07:00
Maksim Panchenko	13c89e6ef1	[BOLT] Fix branch data for __builtin_unreachable(). Summary: When we have a conditional branch past the end of function (a result of a call to__builtin_unreachable()), we replace the branch with nop, but keep branch information for validation purposes. If that branch has a recorded profile we mistakenly create an additional successor to a containing basic block (a 3rd successor). Instead of adding the branch to FTBranches list we should be adding to IgnoredBranches. (cherry picked from FBD4912840)	2017-04-18 23:32:11 -07:00
Maksim Panchenko	075f076503	[BOLT] Don't abort on processing binaries with .gdb_index section Summary: While writing non-allocatable sections we had an assumption that the size of such section is congruent to the alignment, as typically such sections are a collections of fixed-sized elements. .gdb_index breaks this assumption. This diff removes the assertion that was triggered by a presence of .gdb_index section, and makes sure that we insert a padding if we are appending to a section with a size not congruent to section alignment. (cherry picked from FBD4844553)	2017-04-06 10:49:59 -07:00
Bill Nell	c7cccacc4f	[BOLT] Enable SCTC by default. (cherry picked from FBD4837849)	2017-04-05 13:23:58 -07:00
Maksim Panchenko	34c8a7c21b	[BOLT] Relocation support for non-allocatable sections. Summary: Relocations can be created for non-allocatable (aka Note) sections. To start using this for debug info, the emission has to be moved earlier in the pipeline for relocation processing to kick in. (cherry picked from FBD4835204)	2017-04-05 09:29:24 -07:00
Maksim Panchenko	a99005397f	[BOLT] Fix branch count in removeDuplicateConditionalSuccessor(). Summary: When we merge the original branch counts we have to make sure both of them have a profile. Otherwise set the count to COUNT_NO_PROFILE. The misprediction count should be 0. (cherry picked from FBD4837774)	2017-04-05 13:00:20 -07:00
Bill Nell	6c5c65e3a3	[BOLT] Fix double jump peephole, remove useless conditional branches. Summary: I split some of this out from the jumptable diff since it fixes the double jump peephole. I've changed the pass manager so that UCE and peepholes are not called after SCTC. I've incorporated a call to the double jump fixer to SCTC since it is needed to fix things up afterwards. While working on fixing the double jump peephole I discovered a few useless conditional branches that could be removed as well. I highly doubt that removing them will improve perf at all but it does seem odd to leave in useless conditional branches. There are also some minor logging improvements. (cherry picked from FBD4751875)	2017-03-20 22:44:25 -07:00
Maksim Panchenko	f7d32f7e7d	[BOLT] Detect and reject binaries built for coverage. Summary: Don't attempt to optimize binaries built with coverage support. (cherry picked from FBD4810330)	2017-03-31 07:51:30 -07:00
Maksim Panchenko	c166a8c1a7	[BOLT] Fix debug info update for inlining. Summary: When inlining, if a callee has debug info and a caller does not (i.e. a containing compilation unit was compiled without "-g"), we try to update a nonexistent compilation unit. Instead we should skip updating debug info in such cases. Minor refactoring of line number emitting code. (cherry picked from FBD4823982)	2017-04-03 16:24:26 -07:00
Maksim Panchenko	0bde796e50	[BOLT] Organize options in categories for pretty printing (near NFC). Summary: Each BOLT-specific option now belongs to BoltCategory or BoltOptCategory. Use alphabetical order for options in source code (does not affect output). The result is a cleaner output of "llvm-bolt -help" which does not include any unrelated llvm options and is close to the following: ..... BOLT generic options: -data=<string> - <data file> -dyno-stats - print execution info based on profile -hot-text - hot text symbols support (relocation mode) -o=<string> - <output file> -relocs - relocation mode - use relocations to move functions in the binary -update-debug-sections - update DWARF debug sections of the executable -use-gnu-stack - use GNU_STACK program header for new segment (workaround for issues with strip/objcopy) -use-old-text - re-use space in old .text if possible (relocation mode) -v=<uint> - set verbosity level for diagnostic output BOLT optimization options: -align-blocks - try to align BBs inserting nops -align-functions=<uint> - align functions at a given value (relocation mode) -align-functions-max-bytes=<uint> - maximum number of bytes to use to align functions -boost-macroops - try to boost macro-op fusions by avoiding the cache-line boundary -eliminate-unreachable - eliminate unreachable code -frame-opt - optimize stack frame accesses ...... (cherry picked from FBD4793684)	2017-03-28 14:40:20 -07:00
Maksim Panchenko	d5a0264a9e	[BOLT] Issue error in relocs mode if input is lacking relocations. Summary: If we specify "-relocs" flag and an input has no relocations we proceed with assumptions that relocations were there and break the binary. Detect the condition above, and reject the input. (cherry picked from FBD4761239)	2017-03-22 22:05:50 -07:00
Rafael Auler	ad81bd6779	Change dynostats dynamic instruction count policy Summary: Also add LOAD/STORE counters. (cherry picked from FBD4732284)	2017-03-17 10:32:56 -07:00
Bill Nell	b1ef186ca9	[BOLT] Don't allow non-symbol targets in ICP Summary: ICP was letting through call targets that weren't symbols. This diff filters out the non-symbol targets before running ICP. (cherry picked from FBD4735358)	2017-03-18 11:55:45 -07:00
Maksim Panchenko	e6f96de4d0	[BOLT] Add option to print only specific functions. Summary: Add option '-print-only=func1,func2,...' to print only functions of interest. The rest of the functions are still processed and optimized (e.g. inlined), but only the ones on the list are printed. (cherry picked from FBD4734610)	2017-03-17 19:05:11 -07:00
Maksim Panchenko	6cfd7ac2d5	[BOLT] Do not overwrite starting address in non-relocation mode. Summary: In non-relocation mode we shouldn't attemtp to change ELF entry point. What made matters worse - it broke '-max-funcs=' and '-funcs=' options since an entry function more often than not was excluded from the list of processed functions, and we were setting entry point to 0. (cherry picked from FBD4720044)	2017-03-15 19:31:20 -07:00
Maksim Panchenko	559a57a181	[BOLT] Improve dynostats output. Summary: Reduce verbosity of dynostats to make them more readable. * Don't print "before" dynostats twice. * Detect if dynostats have changed after optimization and print before/after only if at least one metric have changed. Otherwise just print dynostats once and indicate "no change". * If any given metric hasn't changed, then print the difference as "(=)" as opposed to (+0.0%). (cherry picked from FBD4705920)	2017-03-14 09:03:23 -07:00
Maksim Panchenko	351af0c895	[BOLT] Do not process empty functions. Summary: While running on a recent test binary BOLT failed with an error. We were trying to process '__hot_end' (which is not really a function), and asserted that it had no basic blocks. This diff marks functions with empty basic blocks list as non-simple since there's no need to process them. (cherry picked from FBD4696517)	2017-03-12 11:30:05 -07:00
Bill Nell	2e5c2e689f	Fix hfsort callgraph stats, add hfsort test. Summary: The stats for call sites that are not included in the call graph were broken. The intention is to count the total number of call sites vs. the number of call sites that are ignored because they have targets that are not BinaryFunctions. Also add a new test for hfsort. (cherry picked from FBD4668631)	2017-03-07 11:45:07 -08:00
Maksim Panchenko	f4825ea417	[BOLT] Fix gcc5 build. Summary: A <numeric> include is required for gcc5 build. (cherry picked from FBD4671953)	2017-03-07 18:09:09 -08:00
Maksim Panchenko	98737b34bb	[BOLT] Fix verbose output. Summary: Inadvertently, output of BOLT became way too verbose. Discovered while building HHVM on master. (cherry picked from FBD4669881)	2017-03-07 14:22:15 -08:00
Bill Nell	fed0980139	[BOLT] Update tests Summary: Fix validateCFG to handle BBs that were generated from code that used _builtin_unreachable(). Add -verify-cfg option to run CFG validation after every optimization pass. (cherry picked from FBD4641174)	2017-02-27 21:44:38 -08:00
Maksim Panchenko	0acba2bcf0	[BOLT] Detect unmarked data in text. Summary: Sometimes a code written in assembly will have unmarked data (such as constants) embedded into text. Typically such data falls into a "padding" address space of a function. This diffs detects such references, and adjusts the padding space to prevent overwriting of code in data. Note that in relocation mode we prefer to overwrite the original code (-use-old-text) and thus cannot simply ignore data in text. (cherry picked from FBD4662780)	2017-02-21 14:18:09 -08:00
Maksim Panchenko	f241e252fc	[BOLT] Detect and handle __builtin_unreachable(). Summary: Calls to __builtin_unreachable() can result in a inconsistent CFG. It was possible for basic block to end with a conditional branche and have a single successor. Or there could exist non-terminated basic block without successors. We also often treated conditional jumps with destination past the end of a function as conditional tail calls. This can be prevented reliably at least when the byte past the end of the function does not belong to the next function. This diff includes several changes: * At disassembly stage jumps past the end of a function are converted into 'nops'. This is done only for cases when we can guarantee that the jump is not a tail call. Conversion to nop is required since the instruction could be referenced either by exception handling tables and/or debug info. Nops are later removed. * In CFG insert 'ret' into non-terminated basic blocks without successors (this almost never happens). * Conditional jumps at the end of the function are removed from CFG. The block will still have a single successor. * Cases where a destination of a jump instruction is the start of the next function, are still conservatively handled as (conditional) tail calls. (cherry picked from FBD4655046)	2017-03-03 11:35:41 -08:00
Maksim Panchenko	6dc2351505	[BOLT] New CFI handling policy. Summary: The new interface for handling Call Frame Information: * CFI state at any point in a function (in CFG state) is defined by CFI state at basic block entry and CFI instructions inside the block. The state is independent of basic blocks layout order (this is implied by CFG state but wasn't always true in the past). * Use BinaryBasicBlock::getCFIStateAtInstr(const MCInst Inst) to get CFI state at any given instruction in the program. No need to call fixCFIState() after any given pass. fixCFIState() is called only once during function finalization, and any function transformations after that point are prohibited. * When introducing new basic blocks, make sure CFI state at entry is set correctly and matches CFI instructions in the basic block (if any). * When splitting basic blocks, use getCFIStateAtInstr() to get a state at the split point, and set the new basic block's CFI state to this value. Introduce CFG_Finalized state to indicate that no further optimizations are allowed on the function. This state is reached after we have synced CFI instructions and updated EH info. Rename "-print-after-fixup" option to "-print-finalized". This diffs fixes CFI for cases when we split conditional tail calls, and for indirect call promotion optimization. (cherry picked from FBD4629307)	2017-02-24 21:59:33 -08:00
Rafael Auler	965a373dc4	Fix warnings when compiling with clang (NFC) Summary: Fix inconsistent override keyword usages and initializes a missing field of a Relocation object when using braced initializers. (cherry picked from FBD4622856)	2017-02-27 13:09:27 -08:00
Maksim Panchenko	2029458f34	[BOLT] Strip 'repz' prefix from 'repz retq'. Summary: Add pass to strip 'repz' prefix from 'repz retq' sequence. The prefix is not used in Intel CPUs afaik. The pass is on by default. (cherry picked from FBD4610329)	2017-02-23 18:09:10 -08:00
Maksim Panchenko	88a461014b	[BOLT] Don't set code skew in relocations mode. Summary: We use code skew in non-relocation mode since functions have fixed addresses, and internal alignment has to be adjusted wrt the skew. However in relocation mode it interferes with effective code alignment, and has to be disabled. I missed it when was re-basing the relocation diff. (cherry picked from FBD4599670)	2017-02-22 11:29:52 -08:00
Maksim Panchenko	d3e33b6edc	[BOLT] Fix -jump-tables=basic in relocation mode. Summary: In a prev diff I added an option to update jump tables in-place (on by default) and accidentally broke the default handling of jump tables in relocation mode. The update should be happening semi-automatically, but because we ignore relocations for jump tables it wasn't happening (derp). Since we mostly use '-jump-tables=move' this hasn't been noticed for some time. This diff gets rid of IgnoredRelocations and removes relocations from a relocation set when they are no longer needed. If relocations are created later for jump tables they are no longer ignored. (cherry picked from FBD4595159)	2017-02-21 16:15:15 -08:00
Maksim Panchenko	88244a10bb	[BOLT] Move BOLT passes under Passes subdirectory (NFC). Summary: Move passes under Passes subdirectory. Move inlining passes under Passes/Inliner.* (cherry picked from FBD4575832)	2017-02-16 14:57:57 -08:00
Maksim Panchenko	f06a1455ea	[BOLT] Add support for *GOTPCRELX relocation type. Summary: gcc5 can generate new types of relocations that give linker a freedom to substitute instructions. These relocations are PC-relative, and since we manually process such relocations they don't present much of a problem. Additionally, detect non-pc-relative access from code into a middle of a function. Occasionally I've seen such code, but don't know exactly how to trigger its generation. Just issue a warning for now. (cherry picked from FBD4566473)	2017-02-14 22:55:10 -08:00

... 6 7 8 9 10 ...

616 Commits All Branches Search

616 Commits

All Branches