llvm-project

Commit Graph

Author	SHA1	Message	Date
Theodoros Kasampalis	d09b00ebff	Refactoring of the reordering algorithms Summary: The various reorder and clustering algorithms have been refactored into separate classes, so that it is easier to add new algorithms and/or change the logic of algorithm selection. (cherry picked from FBD3473656)	2016-06-16 18:47:57 -07:00
Maksim Panchenko	f1192a7118	Support for multiple function names. Summary: With ICF optimization in the linker we were getting mismatches of function names in .fdata and BinaryFunction name. This diff adds support for multiple function names for BinaryFunction and does a match against all possible names for the profile. (cherry picked from FBD3466215)	2016-06-10 17:13:05 -07:00
Maksim Panchenko	70f82d9371	Reject profile data for functions that do not match. Summary: Verify profile data for a function and reject if there are branches that don't correspond to any branches in the function CFG. Note that we have to ignore branches resulting from recursive calls. Fix printing instruction offsets in disassembled state. Allow function to have non-zero execution count even if we don't have branch information. (cherry picked from FBD3451596)	2016-06-15 18:36:16 -07:00
Maksim Panchenko	88ac5d9d0e	[merge-fdata] Add option to print function list. Summary: Print total number of functions/objects that have profile and add new options: -print - print the list of objects with count to stderr =none - do not print objects/functions =exec - print functions sorted by execution count =branches - print functions sorted by total branch count -q - do not print merged data to stdout (cherry picked from FBD3442288)	2016-06-09 17:45:15 -07:00
Bill Nell	980a06265a	Revert "Indirect call optimization." This reverts commit 33966090e18545b64013614e7929ff1bdcdf10d5. (cherry picked from FBD28110782)	2016-06-08 17:38:13 -07:00
Bill Nell	8bcfd9a392	Indirect call optimization. (cherry picked from FBD28110629)	2016-06-07 16:27:52 -07:00
Bill Nell	45e2219ae4	Allocate BinaryBasicBlocks with new rather than storing them in the BasicBlocks vector. Summary: This will help optimization passes that need to modify the CFG after it is constructed. Otherwise, the BinaryBasicBlock pointers stored in the layout, successors and predecessors would need to be modified every time a new basic block is created. (cherry picked from FBD3403372)	2016-06-07 16:27:52 -07:00
Maksim Panchenko	6da0d95326	Fix large functions debug info by default. Summary: Turn on -fix-debuginfo-large-functions by default. In the process of testing I've discovered that we output cold code for functions that were too large to be emitted. Fixed that. (cherry picked from FBD3372697)	2016-05-31 19:29:34 -07:00
Maksim Panchenko	4460da0d81	Improvements for debug info. Summary: Assembly functions could have no corresponding DW_AT_subprogram entries, yet they are represented in module ranges (and .debug_aranges) and will have line number information. Make sure we update those. Eliminated unnecessary data structures and optimized some passes. For .debug_loc unused location entries are no longer processed resulting in smaller output files. Overall it's a small processing time improvement and memory imporement. (cherry picked from FBD3362540)	2016-05-27 20:19:19 -07:00
Theodoros Kasampalis	65ac8bbdf2	Better edge counts for fall through blocks in presence of C++ exceptions. Summary: The inference algorithm for counts of fall through edges takes possible jumps to landing pad blocks into account. Also, the landing pad block execution counts are updated using profile data. (cherry picked from FBD3350727)	2016-05-26 15:10:09 -07:00
Theodoros Kasampalis	485f9220b7	Taking LP counts into account for FT count inference (cherry picked from FBD28110493)	2016-05-24 09:26:25 -07:00
Theodoros Kasampalis	fb5f18b2dc	Correctly updating landing pad exec counts. (cherry picked from FBD28110316)	2016-05-23 16:16:25 -07:00
Maksim Panchenko	06b9c5b342	Better .debug_line for non-simple functions. Summary: Generate .debug_line info for non-simple functions in a way that if preferrable by 'objdump -S'. (cherry picked from FBD3345485)	2016-05-24 20:50:36 -07:00
Maksim Panchenko	7b97793b94	Fix for clang .debug_info. Summary: Clang uses different attribute for high_pc which was incompatible with the way we were updating ranges. This diff fixes it. (cherry picked from FBD3345537)	2016-05-24 14:54:23 -07:00
Maksim Panchenko	cfa5d753eb	Miscellaneous fixes for debug info. Summary: * Fix several cases for handling debug info: - properly update CU DW_AT_ranges for function with folded body due to ICF optimization - convert ranges to DW_AT_ranges from hi/low PC for all DIEs - add support for [a, a) range - update CU ranges even when there are no functions registered * Overwrite .debug_ranges section instead of appending. * Convert assertions in debug info handling part into warnings. (cherry picked from FBD3339383)	2016-05-23 19:36:38 -07:00
Maksim Panchenko	7ab3db129b	Create DW_AT_ranges for compile units. Summary: Some compile unit DIEs might be missing DW_AT_ranges because they were compiled without "-ffunction-sections" option. This diff adds the attribute to all compile units. If the section is not present, we need to create it. Will do it in a separate diff. (cherry picked from FBD3314984)	2016-05-17 18:10:14 -07:00
Maksim Panchenko	f047b9d43a	Overwrite contents of .debug_line section. Summary: Overwrite contents of .debug_line section since we don't reference the original contents anymore. This saves ~100MB of HHVM binary. (cherry picked from FBD3314917)	2016-05-16 17:02:17 -07:00
Bill Nell	e63984f325	Patch forward jumping tail calls to prevent branch mispredictions. Summary: A simple optimization to prevent branch misprediction for tail calls. Convert the sequence: j<cc> L1 ... L1: jmp foo # tail call into: j<cc> foo but only if 'j<cc> foo' turns out to be a forward branch. (cherry picked from FBD3234207)	2016-05-02 12:47:18 -07:00
Maksim Panchenko	b445f5eb7b	Fix issue with garbage address in .debug_line. Summary: While emitting debug lines for a function we don't overwrite, we don't have a code section context that is needed by default writing routine. Hence we have to emit end_sequence after the last address, not at the end of section. (cherry picked from FBD3291533)	2016-05-11 19:13:38 -07:00
Bill Nell	f7e7e25b88	Put all optimization passes under the pass manager. Summary: Move eliminate unreachable code, block reordering, and CFI/exception fixup into official optimization passes. (cherry picked from FBD3248991)	2016-05-02 12:47:18 -07:00
Gabriel Poesia	5fa128e748	Inlining of small functions. Summary: Added an optimization pass of inlining calls to small functions (with only one basic block). Inlining is done in a very simple way, inserting instructions to simulate the changes to the stack pointer that call/ret would make before/after the inlined function executes. Also, the heuristic prefers to inline calls that happen in the hottest blocks (by looking at their execution count). Calls in cold blocks are ignored. (cherry picked from FBD3233516)	2016-04-25 14:25:58 -07:00
Gabriel Poesia	d1f525499e	Optimize calls to functions that are a single unconditional jump Summary: Many functions (around 600) in the HHVM binary are simply a single unconditional jump instruction to another function. These can be trivially optimized by modifying the call sites to directly call the branch target instead (because it also happens with more than one jump in sequence, we do it iteratively). This diff also adds a very simple analysis/optimization pass system in which this pass is the first one to be implemented. A follow-up to this could be to move the current optimizations to other passes. (cherry picked from FBD3211138)	2016-04-15 15:59:52 -07:00
Gabriel Poesia	e6acc7bb53	Optimize calls to functions that are a single unconditional jump Summary: Many functions (around 600) in the HHVM binary are simply a single unconditional jump instruction to another function. These can be trivially optimized by modifying the call sites to directly call the branch target instead (because it also happens with more than one jump in sequence, we do it iteratively). This diff also adds a very simple analysis/optimization pass system in which this pass is the first one to be implemented. A follow-up to this could be to move the current optimizations to other passes. (cherry picked from FBD3211138)	2016-04-15 15:59:52 -07:00
Gabriel Poesia	459eb8c230	Fix "Cannot update ranges for DIE at offset" error messages. Summary: Fix the error message by not printing it :) Explanation: a previous diff accidentally removed this error message from within the DEBUG macro, and it's expected that we'll have a bunch of them since a lot of the DIEs we try to update are empty or meaningless. For instance (and mainly), there is a huge number of lexical block DIEs with no attributes in .debug_info. In the first phase of collecting debugging info, we store the offsets of all these DIEs, only later to realize that we cannot update their address ranges because they have none. A better fix would be to check this earlier and not store offsets of DIEs we cannot update to begin with. (cherry picked from FBD3236923)	2016-04-28 12:55:35 -07:00
Maksim Panchenko	de95a5b6a4	Make merge-fdata generate smaller .fdata files. Summary: A lot of the space in the merged .fdata is taken by branches to and from [heap], which is jitted code. On different machines, or during different runs, jitted addresses are all different. We don't use these addresses, but we need branch info to get accurate function call counts. This diff treats all [heap] addresses the same, resulting in a simplified merged file. The size of the compressed file decreased from 70MB to 8MB. (cherry picked from FBD3233943)	2016-04-27 18:06:18 -07:00
Maksim Panchenko	1258903b54	Fix for functions in different segments. Summary: In a test binary some functions are placed in a segment preceding the segment containing .text section. As a result, we were miscalculating maximum function size as the calculation was based on addresses only. This diff fixes the calculation by checking if symbol after function belongs to the same section. If it does not, then we set the maximum function size based on the size of the containing section and not on the address distance to the next symbol. (cherry picked from FBD3229205)	2016-04-26 23:42:39 -07:00
Maksim Panchenko	3811673a0c	Option to break in given functions. Summary: Added option "-break-funcs=func1,func2,...." to coredump in any given function by introducing ud2 sequence at the beginning of the function. Useful for debugging and validating stack traces. Also renamed options containing "_" to use "-" instead. Also run hhvm test with "-update-debug-sections". (cherry picked from FBD3210248)	2016-04-21 09:54:33 -07:00
Maksim Panchenko	87a90ae133	Fix ninja install-* for BOLT utilities. Summary: Make sure we can install all tools needed for processing BOLT .fdata files such as perf2bolt, merge-fdata, etc. (cherry picked from FBD3223477)	2016-04-25 22:13:12 -07:00
Maksim Panchenko	ff68b34553	Tool to merge .fdata files. Summary: merge-fdata tool takes multiple .fdata files and outputs to stdout combined fdata. Takes about 2 seconds per each additional .fdata file with hhvm production data. (cherry picked from FBD3216430)	2016-04-08 12:18:06 -07:00
Maksim Panchenko	43bc4a09ad	Changed splitting options and fixed sorting. Summary: Splitting option now has different meanings/values. Since landing pads are mostly always cold/frozen, we should split them before anything else (we still check the execution count is 0). That's value '1'. Everything else goes on top of that and has increased value (2 - large functions, 3 - everything). Sorting was non-deterministic and somewhat broken for functions with EH ranges. Fixed that and added '-split-all-cold' option to outline all 0-count blocks. Fixed compilation of test cases. After my last commit the binaries were linked to wrong source files (i.e. debug info). Had to rebuild the binaries from updated sources. (cherry picked from FBD3209369)	2016-04-20 15:31:11 -07:00
Maksim Panchenko	4f44d60947	Special handling for GNU_args_size call frame instruction. Summary: GNU_args_size is a special kind of CFI that tells runtime to adjust %rsp when control is passed to a landing pad. It is used for annotating call instructions that pass (extra) parameters on the stack and there's a corresponding landing pad. It is also special in a way that its value is not handled by DW_CFA_remember_state/DW_CFA_restore_state instruction sequence that we utilize to restore the state after block re-ordering. This diff adds association of call instructions with GNU_args_size value when it's used. If the function does not use GNU_args_size, there is no overhead. Otherwise, we regenerate GNU_args_size instruction during code emission, i.e. after all optimizations and block-reordering. (cherry picked from FBD3201322)	2016-04-19 22:00:29 -07:00
Gabriel Poesia	ad344c4387	Group debugging info representation and serialization code. Summary: Moved the classes related to representing and serializing DWARF entities into a single header, DebugData.h. (cherry picked from FBD3153279)	2016-04-07 15:06:43 -07:00
Gabriel Poesia	f6c8929799	Fix debugging info for simple functions that we fail to rewrite. Summary: Simple functions which we fail to rewrite after optimizations were having wrong debugging information because the latter would reflect the optimized version of the function. There are only 48 functions (at this time) in this situation in the HHVM binary. The simple fix is to add another full pass. Another more complicated path, which will be more efficient, is to reset only the BinaryContext and emit again, but then we need to recreate all symbols in the new MCContext and update the pointers. I started taking this path but it started getting too complicated for only those 48 functions (needed to create a new map of global symbols, recreate landing pads - which needed to have the internal intermediate labels in the functions kept to be updated too, etc). Because the overhead is quite large (another full emission pass - around 4m30s here) and the impact is small I put this behind a new command-line flag which is off by default: -fix-debuginfo-large-functions. (cherry picked from FBD3166576)	2016-04-11 17:46:18 -07:00
Gabriel Poesia	0e77c53b89	Update address ranges of inlined functions and try/catch blocks. Summary: Update address ranges of inlined functions and try/catch blocks. This was missing and lead gdb to show weird information in a core dump we inspected because of the several nestings of inline in the call stack. This is very similar to Lexical Blocks, so the change is to basically generalize that code to do the same for DW_AT_try_block, DW_AT_catch_block and DW_AT_inlined_subroutine. (cherry picked from FBD3169417)	2016-04-12 11:41:03 -07:00
Maksim Panchenko	e16b5d8b78	Option to pass a file with list of functions to skip. Summary: Take "-skip_funcs_file=<file>" option and don't process any function listed in the <file>. (cherry picked from FBD3160226)	2016-04-08 19:30:27 -07:00
Gabriel Poesia	2694e58fa2	Update unmatched and nested subprogram DIEs. Summary: readelf was showing some errors because we weren't updating DIEs that were not shallow in the DIE tree, or DIEs of functions with addresses we don't recognize (mostly functions with address 0, which could have been removed by the Linker Script but still have debugging information there). These DIEs need to be updated because their abbreviations are patched. (cherry picked from FBD3159335)	2016-04-08 16:24:38 -07:00
Gabriel Poesia	665b03a464	Fix behavior with multiple functions with same address. Summary: We were updating only one DIE per function, but because the Linker Script may map multiple functions to the same address this would cause us to generate invalid debug info (as some DIEs weren't updated but their abbreviations were changed). (cherry picked from FBD3157263)	2016-04-08 11:55:42 -07:00
Gabriel Poesia	784f6a8773	Emit debug line information for non-simple functions. Summary: Non-simple functions aren't emitted, and thus didn't have line number information emitted. This diff emits it for those functions by extending LLVM's generation of the line number program to allow for absolute addresses (it is wholly symbolic), then iterating over the relevant line tables from the input and appending entries with absolute addresses to the line tables to be emited. This still leaves the simple but not overwritten functions unhandled (there were 48 in HHVM in my last run). However, I think that to fix them we'd need another pass, since by the time we realize a simple function wont't fit, debug line info was already written to the output. (cherry picked from FBD3148468)	2016-04-05 19:35:45 -07:00
Maksim Panchenko	e513bfd86d	Only set output ranges when updating dbg info. Summary: Save processing time by setting output ranges when needed. (cherry picked from FBD3148791)	2016-04-06 18:03:44 -07:00
Gabriel Poesia	4b4db40174	Update DWARF location lists after optimization. Summary: Summary: Update DWARF location lists in .debug_loc and pointers to them in .debug_info so that gdb can print variables which change location during their lifetime. The following changes were made: - Refactored BasicBlockOffsetRanges to allow ranges to be tied to binary information (so that we can reuse it for location lists) - Implemented range compression optimization in BasicBlockOffsetRanges (needed otherwise too much data was being generated). - Added representation for location lists (LocationList.h, BinaryContext.h) - Implemented .debug_loc serializer that keeps the updated offsets (DebugLocWriter.{h,cpp}) - After disassembly, traverse entries in .debug_loc and save them in context (BinaryContext.cpp) - After optimizations, serialize .debug_loc and update pointers in .debug_info (RewriteInstance.cpp) (cherry picked from FBD3130682)	2016-04-01 11:37:28 -07:00
Maksim Panchenko	4349b63144	Re-enable conditional function spitting under an option. Summary: Add a parameter value to "-split-functions=" option to allow splitting only when the function is too large to fit: 0 - never split 1 - split if too large to fit 2 - always split We may use this option when the profile data is not very precise. In that case excessive splitting may increase iTLB misses. (cherry picked from FBD3137700)	2016-03-31 16:38:49 -07:00
Gabriel Poesia	0a07d9bf88	Don't skip non-simple functions on function address ranges update. Summary: This fixes a problem in which bolt was generating a malformed .debug_info section on the bzip2 binary. The bug was the following: - A simple and a non-simple function shared an abbreviation - The abbreviation was patched to contain DW_AT_ranges because of the simple function - The non-simple function's data was not updated, but then it didn't match the layout expected by the abbreviation anymore And because we were already creating an address ranges list in .debug_ranges even for non-simple functions, it doesn't make sense not to use it anyway. (cherry picked from FBD3129219)	2016-04-01 15:09:34 -07:00
Gabriel Poesia	ffa9641e16	Update DWARF lexical blocks address ranges. Summary: Updates DWARF lexical blocks address ranges in the output binary after optimizations. This is similar to updating function address ranges except that the ranges representation needs to be more general, since address ranges can begin or end in the middle of a basic block. The following changes were made: - Added a data structure for iterating over the basic blocks that intersect an address range: BasicBlockTable.h - Added some more bookkeeping in BinaryBasicBlock. Basically, I needed to keep track of the block's size in the input binary as well as its address in the output binary. This information is mostly set by BinaryFunction after disassembly. - Added a representation for address ranges relative to basic blocks (BasicBlockOffsetRanges.h). Will also serve for location lists. - Added a representation for Lexical Blocks (LexicalBlock.h) - Small refactorings in DebugArangesWriter: -- Renamed to DebugRangesSectionsWriter since it also writes .debug_ranges -- Refactored it not to depend on BinaryFunction but instead on anything that can be assined an aoffset in .debug_ranges (added an interface for that) - Iterate over the DIE tree during initialization to find lexical blocks in .debug_info (BinaryContext.cpp) - Added patches to .debug_abbrev and .debug_info in RewriteInstance to update lexical blocks attributes (in fact, this part is very similar to what was done to function address ranges and I just refactored/reused that code) - Added small test case (lexical_blocks_address_ranges_debug.test) (cherry picked from FBD3113181)	2016-03-28 17:45:22 -07:00
Maksim Panchenko	e8ef8a5619	Speedup section remapping. Summary: Before this diff LLVM used to iterate over all sections to find the one with an address we want to remap. Since we have extremely large number of section this process is highly inefficient. Instead we add a new interface to remap a section with a given ID (which effectively is an index into an array of sections), and pass the ID instead of the address. This cuts down the processing time of hhvm binary by 10 seconds, and brings the total processing time to a little under 2 minutes. (cherry picked from FBD3110015)	2016-03-28 22:39:48 -07:00
Maksim Panchenko	595d0885d9	Populate function execution count while parsing fdata. Summary: Populate function execution count while parsing fdata. Before we used a quadratic algorithm to populate the execution count (had to iterate over all branches for every single function). Ignore non-symbol to non-symbol branches while parsing fdata. These changes combined drop HHVM processing time from 4 minutes 53 seconds down to 2 minutes 9 seconds on my devserver. Test case had to be modified since it contained irrelevant branches from PLT to libc. (cherry picked from FBD3106263)	2016-03-28 11:06:28 -07:00
Gabriel Poesia	466cbae866	Update subroutine address ranges in binary. Summary: [WIP] Update DWARF info for function address ranges. This diff currently does not work for unknown reasons, but I'm describing here what's the current state. According to both llvm-dwarf and readelf our output seems correct, but GDB does not interpret it as expected. All details go below in hope I missed something. I couldn't actually track the whole change that introduced support for what we need in gdb yet, but I think I can get to it (2007-12-04: Support lexical bocks and function bodies that occupy non-contiguous address ranges). I have reasons to believe gdb at least at some nges). The set of introduced changes was basically this: - After disassembly, iterate over the DIEs in .debug_info and find the ones that correspond to each BinaryFunction. - Refactor DebugArangesWriter to also write addresses of functions to .debug_ranges and track the offsets of function address ranges there - Add some infrastructure to facilitate patching the binary in simple ways (BinaryPatcher.h) - In RewriteInstance, after writing .debug_ranges already with function address ranges, for each function do: -- Find the abbreviation corresponding to the function -- Patch .debug_abbrev to replace DW_AT_low_pc with DW_AT_ranges and DW_AT_high_pc with DW_AT_producer (I'll explain this hack below). Also patch the corresponding forms to DW_FORM_sec_offset and DW_FORM_string (null-terminated in-place string). -- Patch debug_info with the .debug_ranges offset in place of the first 4 bytes of DW_AT_low_pc (DW_AT_ranges only occupies 4 bytes whereas low_pc occupies 8), and write an arbitrary string in-place in the other 12 bytes that were the 4 MSB of low_pc and the 8 bytes of high_pc before the patch. This depends on low_pc and high_pc being put consecutively by the compiler, but it serves to validate the idea. I tried another way of doing it that does not rely on this but it didn't work either and I believe the reason for either not working is the same (and still unknown, but unrelated to them. I might be wrong though, and if I find yet another way of doing it I may try it). The other way was to use a form of DW_FORM_data8 for the section offset. This is disallowed by the specification, but I doubt gdb validates this, as it's just easier to store it as 64-bit anyway as this is even necessary to support 64-bit DWARF (which is not what gcc generates by default apparently). I still need to make changes to the diff to make it production-ready, but first I want to figure out why it doesn't work as expected. By looking at the output of llvm-dwarfdump or readelf, all of .debug_ranges, .debug_abbrev and .debug_info seem to have been correctly updated. However, gdb seems to have serious problems with what we write. (In fact, readelf --debug-dump=Ranges shows some funny warning messages of the form ("Warning: There is a hole [0x100 - 0x120] in .debug_ranges"), but I played around with this and it seems it's just because no compile unit was using these ranges. Changing .debug_info apparently changes these warnings, so they seem to be unrelated to the section itself. Also looking at the hex dump of the section doesn't help, as everything seems fine. llvm-dwarfdump doesn't say anything. So I think .debug_ranges is fine.) The result is that gdb not only doesn't show the function name as we wanted, but it also stops showing line number information. Apparently it's not reading/interpreting the address ranges at all, and so the functions now have no associated address ranges, only the symbol value which allows one to put a breakpoint in the function, but not to show source code. As this left me without more ideas of what to try to feed gdb with, I believe the most promising next trial is to try to debug gdb itself, unless someone spots anything I missed. I found where the interesting part of the code lies for this case (gdb/dwarf2read.c and some other related files, but mainly that one). It seems in some parts gdb uses DW_AT_ranges for only getting its lowest and highest addresses and setting that as low_pc and high_pc (see dwarf2_get_pc_bounds in gdb's code and where it's called). I really hope this is not actually the case for function address ranges. I'll investigate this further. Otherwise I don't think any changes we make will make it work as initially intended, as we'll simply need gdb to support it and in that case it doesn't. (cherry picked from FBD3073641)	2016-03-16 18:08:29 -07:00
Gabriel Poesia	9cdb7bdb55	Write only minimal .debug_line information. Summary: We used to output .debug_line information for every instruction, but because of the way gdb (and probably lldb as of llvm::DWARFDebugLine::LineTable::findAddress) queries the line table it's not necessary to output information for two instructions if they follow each other and map to the same source line. By not repeating this information we generate a bit less .debug_line data. (cherry picked from FBD3056402)	2016-03-15 16:22:04 -07:00
Maksim Panchenko	a60914427c	Update DW_AT_ranges for CU when it exists. Summary: If CU has DW_AT_ranges update the value. Note that it does not create DW_AT_ranges attribute. (cherry picked from FBD3051904)	2016-03-14 19:04:23 -07:00
Maksim Panchenko	d01172ffa8	Refactor existing debugging code. Summary: Almost NFC. Isolate code for updating debug info. (cherry picked from FBD3051536)	2016-03-14 18:48:05 -07:00
Gabriel Poesia	dc7cc1fb18	Fix default line number information for instructions. Summary: The line number information generated from a null pointer was actually valid, which caused new instructions without the line number information set to have a valid and wrong line number reference. This diff fixes this by making the null pointer be assigned to an invalid line number row. (cherry picked from FBD3048453)	2016-03-14 11:40:52 -07:00

1 2 3

129 Commits