llvm-project

Commit Graph

Author	SHA1	Message	Date
Theodoros Kasampalis	32739247eb	More aggressive inlining pass Summary: This adds functionality for a more aggressive inlining pass, that can inline tail calls and functions with more than one basic block. (cherry picked from FBD3677856)	2016-07-29 14:17:06 -07:00
Bill Nell	82d76ae18b	Add MCInst annotation mechanism to MCInstrAnalysis class. Summary: Add three new MCOperand types: Annotation, LandingPad and GnuArgsSize. Annotation is used for associating random data with MCInsts. Clients can construct their own annotation types (subclassed from MCAnnotation) and associate them with instructions. Annotations are looked up by string keys. Annotations can be added, removed and queried using an instance of the MCInstrAnalysis class. The LandingPad operand is a MCSymbol, uint64_t pair used to encode exception handling information for call instructions. GnuArgsSize is used to annotate calls with the DW_CFA_GNU_args_size attribute. (cherry picked from FBD3597877)	2016-07-28 10:34:50 -07:00
Theodoros Kasampalis	713e361f36	Fix for correct disassembling of conditional tail calls. Summary: BOLT attempts to convert jumps that serve as tail calls to dedicated tail call instructions, but this is impossible when the jump is conditional because there is no corresponding tail call instruction. This was causing the creation of a duplicate fall-through edge for basic blocks terminated with a conditional jump serving as a tail call when there is profile data available for the non-taken branch. In this case, the first fall-through edge had a count taken from the profile data, while the second has a count computed (incorrectly) by BinaryFunction::inferFallThroughCounts. (cherry picked from FBD3560504)	2016-07-13 18:57:40 -07:00
Maksim Panchenko	486ab273c7	Add printing support for indirect tail calls. Summary: LLVM was missing assembler print string for indirect tail calls which are synthetic instructions created by us. (cherry picked from FBD3640197)	2016-07-28 18:49:48 -07:00
Bill Nell	50e011f4e5	CFG editing functions Summary: This diff adds a number of methods to BinaryFunction that can be used to edit the CFG after it is created. The basic public functions are: - createBasicBlock - create a new block that is not inserted into the CFG. - insertBasicBlocks - insert a range of blocks (made with createBasicBlock) into the CFG. - updateLayout - update the CFG layout (either by inserting new blocks at a certain point or recomputing the entire layout). - fixFallthroughBranch - add a direct jump to the fallthrough successor for a given block. There are a number of private helper functions used to implement the above. This was split off the ICP diff to simplify it a bit. (cherry picked from FBD3611313)	2016-07-23 12:50:34 -07:00
Theodoros Kasampalis	ab599fe71a	Basic block clustering algorithm for minimizing branches. Summary: This algorithm is similar to our main clustering algorithm but uses a different heuristic for selecting edges to become fall-throughs. The weight of an edge is calculated as the win in branches if we choose to layout this edge as a fall-through. For example, the edges A -> B with execution count 100 and A -> C with execution count 500 (where B and C are the only successors of A) have weights -400 and +400 respectively. (cherry picked from FBD3606591)	2016-07-15 16:11:30 -07:00
Theodoros Kasampalis	a9bb3320ad	Identical Code Folding (ICF) pass Summary: Added an ICF pass to BOLT, that can recognize identical functions and replace references to these functions with references to just one representative. (cherry picked from FBD3460297)	2016-06-09 11:36:55 -07:00
Bill Nell	82401630a2	Factor out instruction printing and size computation. Summary: I've factored out the instruction printing and size computation routines to methods on BinaryContext. I've also added some more debug print functions. This was split off the ICP diff to simplify it a bit. (cherry picked from FBD3610690)	2016-07-23 08:01:53 -07:00
Theodoros Kasampalis	156a55209c	Simplification of loads from read-only data sections. Summary: Instructions that load data from the a read-only data section and their target address can be computed statically (e.g. RIP-relative addressing) are modified to corresponding instructions that use immediate operands. We apply the transformation only when the resulting instruction will have smaller or equal size. (cherry picked from FBD3397112)	2016-06-03 00:58:11 -07:00
Theodoros Kasampalis	17b846586c	Loop detection for BOLT's CFG. Summary: Loop detection for the CFG data structure. Added a GraphTraits specialization for BOLT's CFG that allows us to use LLVM's loop detection interface. (cherry picked from FBD3604837)	2016-05-26 10:58:01 -07:00
Bill Nell	ea53cffb2d	Add movabs -> mov shortening optimization. Add peephole optimization pass that does instruction shortening. Summary: Shorten when a mov instruction has a 64-bit immediate that can be repesented as a sign extended 32-bit number, use the smaller mov instruction (MOV64ri -> MOV64ri32). Add peephole optimization pass that does instruction shortening. (cherry picked from FBD3603099)	2016-07-21 16:40:06 -07:00
Maksim Panchenko	c6d0c568d4	Add BinaryContext::getSectionForAddress() Summary: Interface for accessing section from BinaryContext. (cherry picked from FBD3600854)	2016-07-21 12:45:35 -07:00
Maksim Panchenko	f2d82919d0	Move debug-handling code into DWARFRewriter (NFC). Summary: RewriteInstance.cpp is getting too big. Split the code. (cherry picked from FBD3596103)	2016-05-31 19:12:26 -07:00
Maksim Panchenko	bf46263eed	Shorten instructions if possible. Summary: Generate short versions of branch instructions by default and rely on relaxation to produce longer versions when needed. Also produce short versions of arithmetic instructions if immediate fits into one byte. This was only triggered once on HHVM binary. (cherry picked from FBD3591466)	2016-07-19 11:19:18 -07:00
Bill Nell	674dbcc0de	Fix crash in patchELFPHDRTable when no functions are modified. Summary: patchELFPHDRTable was asserting that it could not find an entry for .eh_frame_hdr in SectionMapInfo when no functions were modified by BOLT. This just changes code to skip modifying GNU_EH_FRAME program headers hen SectionMapInfo is empty. The existing header is copied and written instead. (cherry picked from FBD3557481)	2016-07-12 16:43:53 -07:00
Maksim Panchenko	84b5b9e462	Create alternative name for local symbols. Summary: If a profile data was collected on a stripped binary but an input to BOLT is unstripped, we would use a different mangling scheme for local functions and ignore their profiles. To solve the issue this diff adds alternative name for all local functions such that one of the names would match the name in the profile. If the input binary was stripped, we reject it, unless "-allow-stripped" option was passed. It's more complicated to do a matching in this case since we have less information than at the time of profile collection. It's also not that simple to tell if the profile was gathered on a stripped binary (in which case we would have no issue matching data). (cherry picked from FBD3548012)	2016-07-11 18:51:13 -07:00
Bill Nell	bdd4af2134	Store index inside BinaryBasicBlock instead of in map on BinaryFunction. Summary: Store the basic block index inside the BinaryBasicBlock instead of a map in BinaryFunction. This cut another 15-20 sec. from the processing time for hhvm. (cherry picked from FBD3533606)	2016-07-07 21:43:43 -07:00
Bill Nell	90c9323511	Use unordered_map instead of map in ReorderAlgorithm and BinaryFunction::BasicBlockIndices. Summary: Use unordered_map instead of map in ReorderAlgorithm and BinaryFunction::BasicBlockIndices. Cuts about 30sec off the processing time for the hhvm binary. (~8.5 min to ~8min) (cherry picked from FBD3530910)	2016-07-07 11:48:50 -07:00
Theodoros Kasampalis	c20506c570	Fix in inferFallthroughCounts Summary: This fixes the initialization of basic block execution counts, where we should skip edges to the first basic block but we were not skipping the corresponding profile info. Also, I removed a check that was done twice. (cherry picked from FBD3519265)	2016-07-03 21:30:35 -07:00
Bill Nell	260f6fbdb6	Add option to dump CFGs in (simple) graphviz format during all passes. Summary: I noticed the BinaryFunction::viewGraph() method that hadn't been implemented and decided I could use a simple DOT dumper for CFGs while working on the indirect call optimization. I've implemented the bare minimum for the dumper. It's just nodes+BB labels with dges. We can add more detailed information as needed/desired. (cherry picked from FBD3509326)	2016-07-01 08:40:56 -07:00
Theodoros Kasampalis	6eb4e5b687	perf2bolt can extract branch records with histories Summary: Added perf2bolt functionality for extracting branch records with histories of previous branches. The length of the histories is user defined, and the default is 0 (previous functionality). Also, DataReader can parse perf2bolt output with histories. Note: creating profile data with long histories can increase their size significantly (2x for history of length 1, 3x for length 2 etc). (cherry picked from FBD3473983)	2016-06-21 18:44:42 -07:00
Theodoros Kasampalis	287fa51324	Fix for ignoring fall-through profile data when jump is followed by no-op Summary: When a conditional jump is followed by one or more no-ops, the destination of fall-through branch was recorded as the first no-op in FuncBranchInfo. However the fall-through basic block after the jump starts after the no-ops, so the profile data could not match the CFG and was ignored. (cherry picked from FBD3496084)	2016-06-27 14:51:38 -07:00
Theodoros Kasampalis	d09b00ebff	Refactoring of the reordering algorithms Summary: The various reorder and clustering algorithms have been refactored into separate classes, so that it is easier to add new algorithms and/or change the logic of algorithm selection. (cherry picked from FBD3473656)	2016-06-16 18:47:57 -07:00
Maksim Panchenko	f1192a7118	Support for multiple function names. Summary: With ICF optimization in the linker we were getting mismatches of function names in .fdata and BinaryFunction name. This diff adds support for multiple function names for BinaryFunction and does a match against all possible names for the profile. (cherry picked from FBD3466215)	2016-06-10 17:13:05 -07:00
Maksim Panchenko	70f82d9371	Reject profile data for functions that do not match. Summary: Verify profile data for a function and reject if there are branches that don't correspond to any branches in the function CFG. Note that we have to ignore branches resulting from recursive calls. Fix printing instruction offsets in disassembled state. Allow function to have non-zero execution count even if we don't have branch information. (cherry picked from FBD3451596)	2016-06-15 18:36:16 -07:00
Maksim Panchenko	88ac5d9d0e	[merge-fdata] Add option to print function list. Summary: Print total number of functions/objects that have profile and add new options: -print - print the list of objects with count to stderr =none - do not print objects/functions =exec - print functions sorted by execution count =branches - print functions sorted by total branch count -q - do not print merged data to stdout (cherry picked from FBD3442288)	2016-06-09 17:45:15 -07:00
Bill Nell	980a06265a	Revert "Indirect call optimization." This reverts commit 33966090e18545b64013614e7929ff1bdcdf10d5. (cherry picked from FBD28110782)	2016-06-08 17:38:13 -07:00
Bill Nell	8bcfd9a392	Indirect call optimization. (cherry picked from FBD28110629)	2016-06-07 16:27:52 -07:00
Bill Nell	45e2219ae4	Allocate BinaryBasicBlocks with new rather than storing them in the BasicBlocks vector. Summary: This will help optimization passes that need to modify the CFG after it is constructed. Otherwise, the BinaryBasicBlock pointers stored in the layout, successors and predecessors would need to be modified every time a new basic block is created. (cherry picked from FBD3403372)	2016-06-07 16:27:52 -07:00
Maksim Panchenko	6da0d95326	Fix large functions debug info by default. Summary: Turn on -fix-debuginfo-large-functions by default. In the process of testing I've discovered that we output cold code for functions that were too large to be emitted. Fixed that. (cherry picked from FBD3372697)	2016-05-31 19:29:34 -07:00
Maksim Panchenko	4460da0d81	Improvements for debug info. Summary: Assembly functions could have no corresponding DW_AT_subprogram entries, yet they are represented in module ranges (and .debug_aranges) and will have line number information. Make sure we update those. Eliminated unnecessary data structures and optimized some passes. For .debug_loc unused location entries are no longer processed resulting in smaller output files. Overall it's a small processing time improvement and memory imporement. (cherry picked from FBD3362540)	2016-05-27 20:19:19 -07:00
Theodoros Kasampalis	65ac8bbdf2	Better edge counts for fall through blocks in presence of C++ exceptions. Summary: The inference algorithm for counts of fall through edges takes possible jumps to landing pad blocks into account. Also, the landing pad block execution counts are updated using profile data. (cherry picked from FBD3350727)	2016-05-26 15:10:09 -07:00
Theodoros Kasampalis	485f9220b7	Taking LP counts into account for FT count inference (cherry picked from FBD28110493)	2016-05-24 09:26:25 -07:00
Theodoros Kasampalis	fb5f18b2dc	Correctly updating landing pad exec counts. (cherry picked from FBD28110316)	2016-05-23 16:16:25 -07:00
Maksim Panchenko	06b9c5b342	Better .debug_line for non-simple functions. Summary: Generate .debug_line info for non-simple functions in a way that if preferrable by 'objdump -S'. (cherry picked from FBD3345485)	2016-05-24 20:50:36 -07:00
Maksim Panchenko	7b97793b94	Fix for clang .debug_info. Summary: Clang uses different attribute for high_pc which was incompatible with the way we were updating ranges. This diff fixes it. (cherry picked from FBD3345537)	2016-05-24 14:54:23 -07:00
Maksim Panchenko	cfa5d753eb	Miscellaneous fixes for debug info. Summary: * Fix several cases for handling debug info: - properly update CU DW_AT_ranges for function with folded body due to ICF optimization - convert ranges to DW_AT_ranges from hi/low PC for all DIEs - add support for [a, a) range - update CU ranges even when there are no functions registered * Overwrite .debug_ranges section instead of appending. * Convert assertions in debug info handling part into warnings. (cherry picked from FBD3339383)	2016-05-23 19:36:38 -07:00
Maksim Panchenko	7ab3db129b	Create DW_AT_ranges for compile units. Summary: Some compile unit DIEs might be missing DW_AT_ranges because they were compiled without "-ffunction-sections" option. This diff adds the attribute to all compile units. If the section is not present, we need to create it. Will do it in a separate diff. (cherry picked from FBD3314984)	2016-05-17 18:10:14 -07:00
Maksim Panchenko	f047b9d43a	Overwrite contents of .debug_line section. Summary: Overwrite contents of .debug_line section since we don't reference the original contents anymore. This saves ~100MB of HHVM binary. (cherry picked from FBD3314917)	2016-05-16 17:02:17 -07:00
Bill Nell	e63984f325	Patch forward jumping tail calls to prevent branch mispredictions. Summary: A simple optimization to prevent branch misprediction for tail calls. Convert the sequence: j<cc> L1 ... L1: jmp foo # tail call into: j<cc> foo but only if 'j<cc> foo' turns out to be a forward branch. (cherry picked from FBD3234207)	2016-05-02 12:47:18 -07:00
Maksim Panchenko	b445f5eb7b	Fix issue with garbage address in .debug_line. Summary: While emitting debug lines for a function we don't overwrite, we don't have a code section context that is needed by default writing routine. Hence we have to emit end_sequence after the last address, not at the end of section. (cherry picked from FBD3291533)	2016-05-11 19:13:38 -07:00
Bill Nell	f7e7e25b88	Put all optimization passes under the pass manager. Summary: Move eliminate unreachable code, block reordering, and CFI/exception fixup into official optimization passes. (cherry picked from FBD3248991)	2016-05-02 12:47:18 -07:00
Gabriel Poesia	5fa128e748	Inlining of small functions. Summary: Added an optimization pass of inlining calls to small functions (with only one basic block). Inlining is done in a very simple way, inserting instructions to simulate the changes to the stack pointer that call/ret would make before/after the inlined function executes. Also, the heuristic prefers to inline calls that happen in the hottest blocks (by looking at their execution count). Calls in cold blocks are ignored. (cherry picked from FBD3233516)	2016-04-25 14:25:58 -07:00
Gabriel Poesia	d1f525499e	Optimize calls to functions that are a single unconditional jump Summary: Many functions (around 600) in the HHVM binary are simply a single unconditional jump instruction to another function. These can be trivially optimized by modifying the call sites to directly call the branch target instead (because it also happens with more than one jump in sequence, we do it iteratively). This diff also adds a very simple analysis/optimization pass system in which this pass is the first one to be implemented. A follow-up to this could be to move the current optimizations to other passes. (cherry picked from FBD3211138)	2016-04-15 15:59:52 -07:00
Gabriel Poesia	e6acc7bb53	Optimize calls to functions that are a single unconditional jump Summary: Many functions (around 600) in the HHVM binary are simply a single unconditional jump instruction to another function. These can be trivially optimized by modifying the call sites to directly call the branch target instead (because it also happens with more than one jump in sequence, we do it iteratively). This diff also adds a very simple analysis/optimization pass system in which this pass is the first one to be implemented. A follow-up to this could be to move the current optimizations to other passes. (cherry picked from FBD3211138)	2016-04-15 15:59:52 -07:00
Gabriel Poesia	459eb8c230	Fix "Cannot update ranges for DIE at offset" error messages. Summary: Fix the error message by not printing it :) Explanation: a previous diff accidentally removed this error message from within the DEBUG macro, and it's expected that we'll have a bunch of them since a lot of the DIEs we try to update are empty or meaningless. For instance (and mainly), there is a huge number of lexical block DIEs with no attributes in .debug_info. In the first phase of collecting debugging info, we store the offsets of all these DIEs, only later to realize that we cannot update their address ranges because they have none. A better fix would be to check this earlier and not store offsets of DIEs we cannot update to begin with. (cherry picked from FBD3236923)	2016-04-28 12:55:35 -07:00
Maksim Panchenko	de95a5b6a4	Make merge-fdata generate smaller .fdata files. Summary: A lot of the space in the merged .fdata is taken by branches to and from [heap], which is jitted code. On different machines, or during different runs, jitted addresses are all different. We don't use these addresses, but we need branch info to get accurate function call counts. This diff treats all [heap] addresses the same, resulting in a simplified merged file. The size of the compressed file decreased from 70MB to 8MB. (cherry picked from FBD3233943)	2016-04-27 18:06:18 -07:00
Maksim Panchenko	1258903b54	Fix for functions in different segments. Summary: In a test binary some functions are placed in a segment preceding the segment containing .text section. As a result, we were miscalculating maximum function size as the calculation was based on addresses only. This diff fixes the calculation by checking if symbol after function belongs to the same section. If it does not, then we set the maximum function size based on the size of the containing section and not on the address distance to the next symbol. (cherry picked from FBD3229205)	2016-04-26 23:42:39 -07:00
Maksim Panchenko	3811673a0c	Option to break in given functions. Summary: Added option "-break-funcs=func1,func2,...." to coredump in any given function by introducing ud2 sequence at the beginning of the function. Useful for debugging and validating stack traces. Also renamed options containing "_" to use "-" instead. Also run hhvm test with "-update-debug-sections". (cherry picked from FBD3210248)	2016-04-21 09:54:33 -07:00
Maksim Panchenko	87a90ae133	Fix ninja install-* for BOLT utilities. Summary: Make sure we can install all tools needed for processing BOLT .fdata files such as perf2bolt, merge-fdata, etc. (cherry picked from FBD3223477)	2016-04-25 22:13:12 -07:00

1 2 3 4

151 Commits