llvm-project

Commit Graph

Author	SHA1	Message	Date
Maksim Panchenko	628d06b1e5	Preserve layout of basic blocks with 0 profile counts. Summary: Preserve original layout for basic blocks that have 0 execution count. Since we don't optimize for size, it's better to rely on the original input order. (cherry picked from FBD2875335)	2016-01-21 14:18:30 -08:00
Maksim Panchenko	b91d1f1299	Enable REPNZ prefix support. Summary: I didn't see a case where REPNZ were not disassembled/reassembled properly. (cherry picked from FBD2869229)	2016-01-26 17:53:08 -08:00
Maksim Panchenko	218c5f0916	Fix a bug with outlining first basic block. Summary: We should never outline the first basic block. Also add an option to accept a file with the list of functions to optimize. (cherry picked from FBD2868184)	2016-01-26 16:03:58 -08:00
Maksim Panchenko	89578e2314	Allow to partially split functions with exceptions. Summary: We could split functions with exceptions even without creating a new exception handling table. This limits us to only move basic blocks that never throw, and are not a start of a landing pad. (cherry picked from FBD2862937)	2016-01-22 16:45:39 -08:00
Maksim Panchenko	bbb745efa9	Don't create empty basic blocks. Fix CFI bug. Summary: Some basic blocks were created empty because they only contained alignment nop's. Ignore such nop's before basic block gets created. Fixed intermittent aborts related to CFI update. (cherry picked from FBD2844465)	2016-01-19 00:20:06 -08:00
Maksim Panchenko	4a44d187c6	Handle more CFI cases and some. Summary: * Update CFI state for larger range of functions to increase coverage. * Issue more warnings indicating reasons for skipping functions. * Print top called functions in the binary. (cherry picked from FBD2839734)	2016-01-16 14:58:22 -08:00
Maksim Panchenko	d9536e6092	Added an option to reverse original basic blocks order. Summary: Modified processing of "-reorder-blocks=" option and added an option to reverse original basic blocks order for testing purposes. (cherry picked from FBD2829862)	2016-01-13 17:19:40 -08:00
Maksim Panchenko	c9b7e3e09e	Write updated LSDA's. Summary: Write new exception ranges tables (LSDA's) into the output file. (cherry picked from FBD2828312)	2015-12-18 17:00:46 -08:00
Maksim Panchenko	b42c72cbf6	Fix issues with some CFI instructions with gcc 4.9. Summary: Fixes some issues discovered after hhvm switched to gcc 4.9. Add support for DW_CFA_GNU_args_size instruction. Allow CFI instruction after the last instruction in a function. Reverse conditions of assert for DW_CFA_set_loc. (cherry picked from FBD28110096)	2015-12-18 20:26:44 -08:00
Maksim Panchenko	a6efd11c05	Code/comments cleanup. Summary: Consolidate cold function info under cold FragmentInfo. Minor code and comment mods to LSDA handling. (cherry picked from FBD28109981)	2015-12-17 12:59:15 -08:00
Maksim Panchenko	e2fcb371a8	Ignore functions referencing symbol at 0x0. Summary: Binary code could be weird. It could include calls to address 0 and reference data at 0 (e.g. with lea on x86). LLVM JIT fatals while resolving relocations against symbols at address 0x0. For now we will stop emitting such code, i.e. we'll skip functions. (cherry picked from FBD28109837)	2015-12-16 17:56:49 -08:00
Maksim Panchenko	f7d7a85a24	Turn EH ranges support back on. Summary: Changed the way EH info is stored/extracted from call instruction. Make sure indirect calls work. (cherry picked from FBD28109629)	2015-12-15 17:06:27 -08:00
Rafael Auler	fb6e8c5d0b	Don't touch functions whose internal BBs are targets of interprocedural branches Summary: In a test binary, we found 8 cases where code in a function A would jump to the middle of another function B. In this case, we cannot reorder function B because this would change instruction offsets and break the program. This is pretty rare but can happen in code written in assembly. (cherry picked from FBD2719850)	2015-12-03 13:29:52 -08:00
Rafael Auler	9a73a8c446	Turns off basic block alignment by default Summary: We found out that the insertion of extra nops to preserve alignment of some loop bodies do not pay off the increased function size, since this extra size may inhibit us from rewriting a reordered version of this function. (cherry picked from FBD2718466)	2015-12-03 09:45:18 -08:00
Rafael Auler	04c80af012	Don't choke on DW_CFA_def_cfa_expression and friends Summary: Our CFI parser in the LLVM library was giving up on parsing all CFI instructions when finding a single instruction with expression operands. Yet, all gcc-4.9 binaries seem to have at least one CFI instruction with expression operands (DW_CFA_def_cfa_expression). This patch fixes this and makes DebugInfo continue to parse other instructions, even though it does not completely parse DWARF expressions yet. However, this seems to be enough to allow llvm-flo to process gcc-4.9 binaries because the FDEs with DWARF expressions are linked to the PLT region, and not to functions that we process. If we ever try to read a function whose CFI depends on DWARF expression, which is unlikely, llvm-flo will assert. (cherry picked from FBD2693088)	2015-11-24 13:55:44 -08:00
Rafael Auler	d6f01452d1	Change function splitting to be a two-pass process Summary: This patch builds upon the previous patch to create a two-pass process to function splitting. We first perform the full rewriting pipeline to discover which functions need splitting. Afterwards, we restart the pipeline with those functions annotated to be split. (cherry picked from FBD2691709)	2015-11-24 09:29:41 -08:00
Rafael Auler	c67a753e3c	Refactoring llvm-flo.cpp into a new class RewriteInstance, NFC. Summary: Previously, llvm-flo.cpp contained a long function doing lots of different tasks. This patch refactors this logic into a separate class with different member functions, exposing the relationship between each step of the rewritting process and making it easier to coordinate/change it. (cherry picked from FBD2691674)	2015-11-23 17:54:18 -08:00
Rafael Auler	ccbbb8f8b9	Teach llvm-flo how to split functions into hot and cold regions Summary: After basic block reordering, it may be possible that the reordered function is now larger than the original because of the following reasons: - jump offsets may change, forcing some jump instructions to use 4-byte immediate operand instead of the 1-byte, shorter version. - fall-throughs change, forcing us to emit an extra jump instruction to jump to the original fall-through at the end of a basic block. Since we currently do not change function addresses, we need to rewrite the function back in the binary in the original location. If it doesn't fit, we were dropping the function. This patch adds a flag -split-functions that tells llvm-flo to split hot functions into hot and cold separate regions. The hot region is written back in the original function location, while the cold region is written in a separate, far-away region reserved to flo via a linker script. This patch also adds the logic to create and extra FDE to supply unwinding information to the cold part of the function. Owing to this, we now need to rewrite .eh_frame_hdr to another location and patch the EH_FRAME ELF segment to point to this new .eh_frame_hdr. (cherry picked from FBD2677996)	2015-11-19 17:59:41 -08:00
Rafael Auler	38dac03e6b	Make llvm-flo print dynamic coverage of rewritten functions Summary: This is an attempt at determining the hotness of functions we are rewriting and help detect if we are discarding hot functions. This patch introduces logic to estimate the number of instructions executed in each function by using the profile data for branches. It sums the products of BB frequency and size. Since we can only do this for functions we have successfully disassembled, created the CFG and annotated with profiling data, all complex functions that were not disassembled are left out from this analysis. (cherry picked from FBD2654985)	2015-11-13 15:27:59 -08:00
Rafael Auler	75798a891b	Do not bail on functions with indirect calls Summary: Previously, we were marking functions with indirect calls as too complex to be disassembled, but this was unnecessarily conservative. This patch removes this restriction. (cherry picked from FBD2669627)	2015-11-02 09:46:50 -08:00
Rafael Auler	7886f4e81a	Ignore LSDA information for now Summary: Teach llvm-flo to drop on function with LSDA information until we know how to update them after block reordering. (cherry picked from FBD2640806)	2015-11-10 17:21:42 -08:00
Rafael Auler	1d248ec51b	Write .eh_frame and .eh_frame_hdr after reordering BBs Summary: This patch adds logic to detect when the binary has extra space reserved for us via the __flo_storage symbol. If this symbol is present, it means we have extra space in the binary to write extraneous information. When we write a new .eh_frame, we cannot discard the old .eh_frame because it may still contain relevant information for functions we do not reorder. Thus, we write the new .eh_frame into __flo_storage and patch the current .eh_frame_hdr to point to the new .eh_frame only for the functions we touched, generating a binary that works with a bi-.eh_frame model. (cherry picked from FBD2639326)	2015-11-10 15:20:50 -08:00
Rafael Auler	70db5677fb	Write updated CFI to temporary object file Summary: This patch is an intermediary step towards updating the CFI in the optimized binary. It adds the logic necessary to output our CFI annotations to a new .eh_frame in the temporary object file we create to hold rewritten functions. The next step will be to fully integrate this new .eh_frame into the optimized binary. (cherry picked from FBD2633728)	2015-11-09 11:08:02 -08:00
Rafael Auler	6c851dc2e3	Attempts to fix CFI state after reordering Summary: This patch introduces logic to check how the CFI instructions define a table to help during stack unwinding at exception run time and attempts to fix any problem in this table that may have been introduced by reordering the basic blocks. If it fails to fix this problem, the function is marked as not simple and not eligible for rewriting. (cherry picked from FBD2633696)	2015-11-08 12:23:54 -08:00
Maksim Panchenko	bc9d6e3b6c	Regenerate exception handling information after optimizations. Summary: Regenerate exception handling information after optimizations. Use '-print-eh-ranges' to see CFG with updated ranges. (cherry picked from FBD2660982)	2015-11-13 14:18:45 -08:00
Maksim Panchenko	56cca2fb5b	Fix LSDA reading issues. Summary: There were two issues: we were trying to process non-simple functions, i.e. function that we don't fully understand, and then we failed to stop iterating if EH closing label was after the last instruction in a function. (cherry picked from FBD2664460)	2015-11-17 11:02:04 -08:00
Maksim Panchenko	be2a19523c	Add exception handling information to CFG. Summary: Read .gcc_except_table and add information to CFG. Calls have extra operands indicating there's a possible handler for exceptions and an action. Landing pad information is recorded in BinaryFunction. Also convert JMP instructions that are calls into tail calls pseudo instructions so that they don't miss call instruction analysis. (cherry picked from FBD2652775)	2015-11-12 18:56:58 -08:00
Rafael Auler	2117362a09	Revert 45fc13b as it breaks HHVM rewriting Summary: Reverting this commit until we better investigate why it is necessary to change local symbol names with a prefix. (cherry picked from FBD28109521)	2015-11-12 10:41:46 -08:00
Rafael Auler	1df130ae17	Remove add PG prefix from symbols that are already local Summary: After discussion with Maksim, we decided to drop the lines that add the PG prefix if the symbol is already local, since they wouldn't be impacted by the way LLVM handles these symbols. (cherry picked from FBD28109400)	2015-11-12 10:02:12 -08:00
Rafael Auler	e80d11f27a	Fix bug in local symbol name disambiguation algorithm Summary: This bug would cause llvm-flo to fail to disambiguate two local symbols with the same file name, causing two different addresses to compete in the symbol table for the resolution of a given name, causing unpredicted behavior in the linker. (cherry picked from FBD2646626)	2015-11-11 23:56:24 -08:00
Rafael Auler	a30d04c3e2	Annotate BinaryFunctions with MCCFIInstructions encoding CFI Summary: In order to represent CFI information in our BinaryFunction class, this patch adds a map of Offsets to CFI instructions. In this way, we make it easy to check exactly where DWARF CFI information is annotated in the disassembled function. (cherry picked from FBD2619216)	2015-11-04 16:48:47 -08:00
Maksim Panchenko	de46e6fc07	Parse whole contents of .gcc_except_table even if we are not printing. Summary: We need to parse the whole contents of .gcc_except_table even if we are not printing exceptions. Otherwise we are missing type index table and miscalculate the size of the current table. (cherry picked from FBD2632965)	2015-11-09 12:27:13 -08:00
Rafael Auler	2088875656	Teach llvm-flo how to read .eh_frame information from binaries Summary: In order to reorder binaries with C++ exceptions, we first need to read DWARF CFI (call frame info) from binaries in a table in the .eh_frame ELF section. This table contains unwinding information we need to be aware of when reordering basic blocks, so as to avoid corrupting it. This patch also cleans up some code from Exceptions.cpp due to a refactoring where we moved some functions to the LLVM's libSupport. (cherry picked from FBD2614464)	2015-11-05 13:37:30 -08:00
Maksim Panchenko	7d592d0975	Verbose printing of actions from .gcc_except_table Summary: Print actions for exception ranges from .gcc_except_table. Types are printed as names if the name is available from symbol table. (cherry picked from FBD2612631)	2015-11-03 14:26:33 -08:00
Maksim Panchenko	21cc191ea8	Added function to parse and dump .gcc_except_table Summary: Use '-print-exceptions' option to dump contents of .gcc_except_table. (cherry picked from FBD2609925)	2015-11-02 11:50:53 -07:00
Rafael Auler	0e8998713c	Extract non-taken branch frequencies from LBR Summary: Previously, we inferred all non-taken branch frequencies with the information we had for taken branches. This patch teaches perf2flo and llvm-flo how to read and incorporate non-taken branch frequencies directly from the traces available in LBR data and by disassembling the binary. It still leaves the inference engine untouched in case we need it to fill out other fall-throughs. (cherry picked from FBD2589212)	2015-10-26 15:00:56 -07:00
Rafael Auler	13a520ab30	Implement two cluster layout heuristics Summary: Pettis' paper on block layout (PLDI'90) suggests we should order clusters (or chains, using the paper terminology) using a specific criterion. This patch implements two distinct ideas for cluster layout that can be activated using different command-line flags. The first one reflects Pettis' ideas on minimizing branch mispredictions and the second one is targeted at reducing I-cache misses, described in the Ispike paper (CGO'04). (cherry picked from FBD2588693)	2015-10-23 09:38:26 -07:00
Rafael Auler	2539539bde	Fixes priority queue ordering in llvm-flo block reordering Summary: Fixes a bug which caused the block reordering heuristic to put in the same cluster hot basic blocks and cold basic blocks, increasing I-cache misses. (cherry picked from FBD2588203)	2015-10-27 03:04:58 -07:00
Maksim Panchenko	d4d773458c	More control over function printing. Summary: Can use '-print-*' option to print function at specific stage. Use '-print-all' to print at every stage. (cherry picked from FBD2578196)	2015-10-23 15:52:59 -07:00
Maksim Panchenko	7f44331773	Issue warning when relaxed tail call is seen on input. Summary: Issue warning when we see a 2-byte tail call. Currently we will increase the size of these instructions. (cherry picked from FBD2575520)	2015-10-20 10:51:17 -07:00
Rafael Auler	546c4e6e84	Fix bug in BinaryFunction::fixBranches() in llvm-flo Summary: When the ignore-nops patch landed, it exposed a bug in fixBranches() where it ignored empty BBs. However, we cannot ignore empty BBs when it is reordered and its fall-through changes. We must update it with a jump to the original fall-through. This patch fixes this. (cherry picked from FBD2568244)	2015-10-21 16:25:16 -07:00
Rafael Auler	dc848b5376	Fix entry BB execution count in llvm-flo Summary: When we have tailcalls, the execution count for the entry point is wrongly computed. Fix this. (cherry picked from FBD2563112)	2015-10-20 16:48:54 -07:00
Rafael Auler	ab63ca9afb	Implement unreachable BB elimination in llvm-flo Summary: It is important to remove dead blocks to free up space in functions and allow us to reorder blocks or align branch targets with more freedom. This patch implements a simple algorithm to delete all basic blocks that are not reachable from the entry point. Note that C++ exceptions may create "unreachable" blocks, so this option must be used with care. (cherry picked from FBD2562637)	2015-10-20 12:47:37 -07:00
Rafael Auler	9f41a0d263	Do not schedule BBs before the entry point Summary: SPEC CPU2006 perlbench triggered a bug in our heuristic block reordering algorithm where a hot edge that targets the entry point (as in a recursive tail call) would make us try to allocate the call site before the function entry point. Since we don't update function addresses yet, moving the entry point will corrupt the program. This patch fixes this. (cherry picked from FBD2562528)	2015-10-20 12:30:22 -07:00
Rafael Auler	b0115a4536	Teach llvm-flo how to handle two back-to-back JMPs Summary: If we have two consecutive JMP instructions and no branches to the second one, the second one is dead code, but llvm-flo does not handle these cases properly and put two JMPs in the same BB. This patch fixes this, putting the extraneous JMP in a separate block, making it easy for us to detect it is dead code and remove it later in a separate step. (cherry picked from FBD2562465)	2015-10-20 10:17:38 -07:00
Maksim Panchenko	85b99eb7b7	Eliminate nop instruction in input and derive alignment. Summary: Nop instructions are primarily used for alignment purposes on the input. We remove all nops when we build CFG and derive alignment of basic blocks based on existing alignment and a presence of nops before it. This will not always work as some basic blocks will be naturally aligned without necessity for nops. However, it's better than random alignment. We would also add heuristics for BB alignment based on execution profile. (cherry picked from FBD2561740)	2015-10-20 10:51:17 -07:00
Rafael Auler	cd6250d1e3	Fixes branches after reordering basic blocks in a binary function Summary: Adds logic in BinaryFunction to be able to fix branches (invert its condition, delete or add a branch), making the new function work with the new layout proposed by the layout pass. All the architecture-specific content was designed to live in the LLVM Target library, in the MCInstrAnalysis pass. For now, we only introduce such logic to the X86 backend. (cherry picked from FBD2551479)	2015-10-16 09:49:04 -07:00
Rafael Auler	ef059af3d1	Fix bug in block reorder heuristic Summary: Tests with SPEC CPU2006 400.perlbench exposed a bug in the block reordering heuristic that happened when two blocks are both successor and predecessor of each other. This patch fixes this. (cherry picked from FBD2555835)	2015-10-19 10:43:54 -07:00
Rafael Auler	31e6bd1226	Fix missing sanity check in BinaryFunction::optimizeLayout() Summary: SPEC CPU2006 perlbench exposed a bug in BinaryFunction::optimizeLayout() where it would try to optimize the layout even though the function had zero basic blocks. This patch simply checks if the function has zero basic blocks and bails out. (cherry picked from FBD2556831)	2015-10-19 13:23:03 -07:00
Maksim Panchenko	b4ed5cc942	Make FLO work on hhvm binary. Summary: Fixes several issues that prevented us from running hhvm binary. (cherry picked from FBD2543057)	2015-10-14 15:35:14 -07:00
Rafael Auler	ec22caff1e	Fix comments. NFC. Summary: Updated comments in BinaryFunction class. (cherry picked from FBD28108888)	2015-10-16 17:15:00 -07:00
Rafael Auler	9a8d357d0b	Fix DataReader to work with new local sym perf2flo format Summary: In a recent commit, we changed local symbols to be specially tagged with the number 2 (local sym) instead of 1 (sym). This patch modifies the reader to don't choke when seeing a 2 in the symbol id field. (cherry picked from FBD2552776)	2015-10-16 17:00:36 -07:00
Rafael Auler	f9ed45893b	Teach llvm-flo how to reorder blocks in an optimal way Summary: This patch implements a dynamic programming approach to solve reorder basic blocks with profiling information in an optimal way. Since this is analogous to TSP, it is NP-hard and the algorithm is exponential in time and memory consumption. Therefore, we only use the optimal algorithm to decide the layout of small functions (with less than 11 basic blocks). (cherry picked from FBD2544124)	2015-10-14 16:58:55 -07:00
Rafael Auler	34f7085503	Teach llvm-flo how to reorder basic blocks with a heuristic Summary: This patch introduces a first approach to reorder basic blocks based on profiling data that gives us the execution frequency for each edge. Our strategy is to layout basic blocks in a order that maximizes the weight (hotness) of branches that will be deleted. We can delete branches when src comes right before dst in the new layout order. This can be reduced to the TSP problem. This patch uses a greedy heuristic to solve the problem: we start with a graph with no edges and progressively add edges by choosing the hottest edges first, building a layout order that attempts to put BBs with hot edges together. (cherry picked from FBD2544076)	2015-10-13 12:18:54 -07:00
Rafael Auler	9b58b2e64b	Make llvm-flo infer branch count data for fall-through edges Summary: The LBR only has information about taken branches and does not record information when a branch is not taken. In our CFG, we call these edges "fall-through" edges. This patch teaches llvm-flo how to infer fall-through edge frequencies. (cherry picked from FBD2536633)	2015-10-13 10:25:45 -07:00
Maksim Panchenko	f79f6302c1	Converted local offsets from uint64_t to uint32_t. Refactoring. (cherry picked from FBD2543557)	2015-10-14 16:46:59 -07:00
Rafael Auler	4c1da22ae9	Add branch count information to binary CFG Summary: Changes DataReader to organize branch perf data per function name and sets up logistics to bring this data to BinaryFunction::buildCFG(). To do this, we expand BinaryContext with a const reference to DataReader. This patch also adds the "-dump-functions" flag to force llvm-flo to dump the current state of BinaryFunctions once they are disassembled and their CFG built, allowing us to test whether the builder is sane with LLVM LIT tests. (cherry picked from FBD2534675)	2015-10-12 12:30:47 -07:00
Maksim Panchenko	d30423f872	Don't bail out if there's no input data file specified. Summary: Don't attempt to read data file if it was not specified by the user. (cherry picked from FBD2533440)	2015-10-12 14:46:18 -07:00
Maksim Panchenko	ffcc2be7fa	FLO: added support for rip-relative operands. Summary: Detect and replace rip-relative operands with relocations. (cherry picked from FBD2529818)	2015-10-09 21:47:18 -07:00
Maksim Panchenko	f166c4ab2b	Fix CFG building issue. Summary: Fixed getBasicBlockContainingOffset() to return correct basic block. (cherry picked from FBD2532514)	2015-10-12 12:12:16 -07:00
Rafael Auler	e1a539b0ec	Add initial implementation of DataReader Summary: This patch introduces DataReader, a module responsible for parsing llvm flo data files into in-memory data structures. (cherry picked from FBD2515754)	2015-10-05 18:31:25 -07:00
Maksim Panchenko	9a2fe7ebe4	Commit FLO with control flow graph. Summary: llvm-flo disassembles, builds control flow graph, and re-writes simple functions. (cherry picked from FBD2524024)	2015-10-09 17:21:14 -07:00
Maksim Panchenko	7927c14ff5	Fixed cmake. (cherry picked from FBD28108725)	2015-10-02 12:38:07 -07:00
Maksim Panchenko	a89c417357	Removed remote .arcconfig + comment change. (cherry picked from FBD2503821)	2015-10-02 12:06:31 -07:00
Maksim Panchenko	575b24d719	Initial FLO commit. Summary: Directory created. (cherry picked from FBD28105260)	2015-10-02 11:55:15 -07:00
Maksim Panchenko	25b976aa12	BOLT root commit	2022-01-10 17:58:05 -08:00

... 21 22 23 24 25

1216 Commits