llvm-project

Commit Graph

Author	SHA1	Message	Date
Maksim Panchenko	30fd960951	[BOLT] Update local symbol count in symbol table Summary: Fix sh_info entry for symbol table section to reflect updated number of local symbols. (cherry picked from FBD10503216)	2018-10-22 18:48:12 -07:00
Maksim Panchenko	a76b13d48e	[perf2bolt] Pre-aggregate LBR samples Summary: Pre-aggregating LBR data cuts pef2bolt processing times in half. (cherry picked from FBD10420286)	2018-10-02 17:16:26 -07:00
Rafael Auler	74a71c6812	Fix bug in analyzeRelocation for GOT entries Summary: Special case GOT relocs to ignore addend subtracting logic in analyzeRelocation, since the addend does not refer to the target of the instruction being analyzed. Also make the code honor the comments in the special case about zeroed out ExtractValue but non-zero addend. Fix facebookincubator/BOLT#40 (cherry picked from FBD10355019)	2018-10-11 18:12:09 -07:00
Facebook Github Bot	b166ccbea8	[BOLT][PR] Fix compiler warnings in BinaryContext and RegAnalysis Summary: This pull request fixes two compiler warnings: - missing `break;` in a switch-case statement in RegAnalysis.cpp (-Wimplicit-fallthrough warning) - misleading indentation in BinaryContext.cpp (-Wmisleading-indentation warning) Pull Request resolved: https://github.com/facebookincubator/BOLT/pull/39 GitHub Author: Andreas Ziegler <andreas.ziegler@fau.de> (cherry picked from FBD10202092)	2018-10-04 10:46:16 -07:00
Igor Sugak	c3c80822a3	[BOLT] Capitalize i Summary: as titled (cherry picked from FBD10136655)	2018-10-01 16:22:46 -07:00
Igor Sugak	cc2276d3f1	[BOLT] fix build with gcc-4.8.5 Summary: These are two minor changes to make it copatible with gcc-4.8.5 (cherry picked from FBD9884971)	2018-09-17 12:17:33 -07:00
Maksim Panchenko	ce508b58c6	[BOLT] Support relocations without symbols Summary: lld may generate relocations without associated symbols. Instead of rejecting binaries with such relocations, we can re-create the symbol the relocation is against based on the extracted value. (cherry picked from FBD10054576)	2018-09-21 12:00:20 -07:00
Rafael Auler	bd0b99c45d	[BOLT] Change stub-insertion pass for AArch64 Summary: Previously, we were expanding eligible branches with stubs. After expansion, we were computing which stubs were unnecessary and removing them, assuming ranges were shortening as code is removed. The problem with this approach is that for branches that refer to code that is not managed by BOLT, the distance to that location can increase and we can end up with an out-of-range branch. This rewrites the pass to be simpler, only increasing size and expanding code with stubs as needed after each iteration, stopping when code stops increasing. Besides this rewrite, the stub-insertion pass now supports stubs grouping similar to what the linker does, allowing different functions to share the same veneer that jumps to a common callee. It also fixes a bug in the previous implementation that, in very large functions that use TBZ/TBNZ (+-32KB range), it would mistakenly try to reuse a local stub BB that is out of range. This includes a change to allow hot functions to be put at the end of the .text section, closer to the heap, requiring no veneers to jump to JITted code. And finally it enables eliminate veneers pass by default. (cherry picked from FBD10023158)	2018-09-17 13:36:59 -07:00
Maksim Panchenko	1387a9d761	[BOLT] Keep .text section in file when using old text Summary: If we reuse text section under `-use-old-text` option, then there's no need to rename it. Tools, such as perf, seem to not like binaries without `.text`. Additionally, check if the code fits into `.text` using the page alignment, otherwise we were skipping the alignment relying on the user detecting the warning message. This could have resulted in unexpected performance drops. Also add `-no-huge-pages` option to use regular page size for code alignment purposes (i.e. 4KiB instead of 2MiB). (cherry picked from FBD10024670)	2018-09-24 20:58:31 -07:00
Maksim Panchenko	53b72d0f2e	[BOLT] Ignore symbols from non-allocatable sections Summary: While creating BinaryData objects we used to process all symbol table entries. However, some symbols could belong to non-allocatable sections, and thus we have to ignore them for the purpose of analyzing in-memory data. (cherry picked from FBD9666511)	2018-09-05 14:36:52 -07:00
Maksim Panchenko	8026760ac0	[BOLT] Fix another issue with profile after ICP Summary: For jump tables ICP was using profile from the jump table itself which doesn't work correct if the jump table is re-used at different code locations. (cherry picked from FBD9618774)	2018-08-30 13:21:50 -07:00
spupyrev	41ed5431a0	[BOLT] turning on the compact aligner by default Summary: Making UseCompactAligner true by default (cherry picked from FBD9325158)	2018-08-14 14:49:10 -07:00
Maksim Panchenko	cd19f718b4	[BOLT] Merge jump table profile data Summary: While running ICF pass we have skipped merging profile data for jump tables. We were only updating profile in the CFG. Fix that. (cherry picked from FBD9595523)	2018-08-30 13:21:29 -07:00
Maksim Panchenko	69e6004a42	[perf2bolt] Fix processing of binaries with names over 15 chars long Summary: Do not truncate the binary name for comparison purposes as the binary name we are getting from "perf script" is no longer truncated. (cherry picked from FBD9596409)	2018-08-30 14:51:10 -07:00
Rafael Auler	d0a80b0870	[BOLT] Change ForceRelocation behavior Summary: Only record address as addend if the target of the relocation is the pseudo-symbol Zero. (cherry picked from FBD9551543)	2018-08-28 18:15:13 -07:00
Maksim Panchenko	708a550084	[BOLT] Fix profile after ICP Summary: After optimizing a target of a jump table, ICP was not updating edge counts corresponding to that target. As a result the edge could be left hot and negatively influence the code layout. (cherry picked from FBD9524396)	2018-08-23 22:47:46 -07:00
Maksim Panchenko	2511b09985	[BOLT][DWARF] Fix line info for empty CU DIEs Summary: In some rare cases a compiler may generate DWARF that contains an empty CU DIE that references a debug line fragment. That fragment will contain no file name information, and we fail to register it. Then, as a result, DW_AT_stmt_list is not updated for the CU. This may cause some DWARF-processing tools to segfault. As a solution/workaround, we register "<unknown>" file name for such debug line tables. (cherry picked from FBD9526705)	2018-08-27 20:12:59 -07:00
Rafael Auler	a7e0704be6	[BOLT] Reduce AArch64 target feature flags Summary: Eliminate some flags that are not recognized and are currently printing warnings when BOLT runs on AArch64. (cherry picked from FBD9499971)	2018-08-24 10:42:00 -07:00
Rafael Auler	af1177d99f	[BOLT] Add mattr options to AArch64 target Summary: Make the AArch64 subtarget enable all features, so the disassembler won't choke on extension instructions. (cherry picked from FBD9477066)	2018-08-22 18:47:39 -07:00
Rafael Auler	9c4fcafa37	[BOLT] Add update-build-id option, on by default Summary: The build-id is used by tools to uniquely identify binaries. Update the output binary build-id with a different number to make it distinguishable from the input binary. This implementation just flips the last build-id bit. (cherry picked from FBD9235336)	2018-08-08 17:55:24 -07:00
Rafael Auler	510a8c4bbe	[BOLT] Fix shrink-wrapping CFI update Summary: When updating CFI for a function that was optimized by shrink-wrapping, if the function had no frame pointers, the CFI update algorithm was incorrect. (cherry picked from FBD9328658)	2018-08-14 17:32:06 -07:00
Maksim Panchenko	88bb145164	[BOLT] Update allocatable relocation sections Summary: Position-independent binaries may have runtime relocations of type R_X86_64_RELATIVE that need an update if they were pointing to one of the functions that we have relocated. (cherry picked from FBD9374164)	2018-08-16 16:53:14 -07:00
Maksim Panchenko	87788ca926	[perf2bolt] Support profiling of PIEs and .so's Summary: Processing profile data for binaries with flexible load address (such as position-independent executables and shared objects) requires adjusting binary addresses depending on the base load address. For every PID the mapping will be more or less unique when executing with ASLR enabled, thus we have to keep the mapping record for all PIDs associated with the binary. Then we adjust the addresses based on those mappings. (cherry picked from FBD9368566)	2018-08-14 13:24:44 -07:00
Maksim Panchenko	560c23411a	[perf2bolt] Use mmap events for PID collection Summary: Switch from using `perf script --show-task-events` to `perf script --show-mmap-events` for associating a binary with PIDs in perf.data. The output of the former command does not provide enough information for PIE/.so processing. (cherry picked from FBD9346586)	2018-08-14 13:24:44 -07:00
Rafael Auler	b10d4724c3	[BOLT] Fix pseudo calculation in BinaryBasicBlock Summary: A recent commit broke our tests because it was depending on getNumNonPseudos() at a very late stage of our optimization pipeline. The problem was in a instruction deletion member function in BinaryBasicBlock that was not updating the number of pseudos after deletion. Fix this. (cherry picked from FBD9305972)	2018-08-13 14:36:38 -07:00
Laith Saed Sakka	b2382dc552	retpoline insertion : further updates. Summary: Couple of updates: 1) Handle address pattern with segment register. 2) Assume R11 available for PLT calls always. 3) Add CFI state to each BB. 4) early exit getMacroOpFusionPair if Instruction.size() <2. (cherry picked from FBD9172426)	2018-08-03 16:36:06 -07:00
Maksim Panchenko	c35dc2a386	[BOLT] Detect and handle fixed indirect branches Summary: Sometimes GCC can generate code where one of jump table entries is being used by an indirect branch with a fixed memory reference, such as "jmp *(JT+8)". If we don't convert such branches to direct ones and move jump tables, then the indirect branch will reference the old table value and will end up at the non-updated destination, possibly causing a runtime crash. This fix converts such indirect branches into direct ones. For now we mark functions containing indirect branches with fixed destination as non-simple to prevent unreachable code elimination problem triggered by related dead/unreachable jump table. (cherry picked from FBD9192363)	2018-08-06 11:22:45 -07:00
Laith Saed Sakka	06e1554158	Retpoline Insertion Pass Summary: retpoline insertion implemented for reloc mode, (cherry picked from FBD8832838)	2018-07-25 19:07:41 -07:00
Maksim Panchenko	39f6fcc947	[BOLT] Add support for IFUNC Summary: Relocation value verification was failing for IFUNC as the real value used for relocation wasn't the symbol value, but a corresponding PLT entry. Relax the verification and skip any symbols of ST_Other type. (cherry picked from FBD9123741)	2018-07-30 10:29:47 -07:00
Maksim Panchenko	df94786119	[BOLT] Fix range checks Summary: containsRange() functions were incorrectly checking for an empty range at the end of containing object. I.e. [a,b) was reporting true for containing [b,b). (cherry picked from FBD9074643)	2018-07-30 16:30:18 -07:00
Maksim Panchenko	fe9f8219fa	[BOLT] Fix TBSS-related issue Summary: TLS segment provide a template for initializing thread-local storage for every new thread. It consists of initialized and uninitialized parts. The uninitialized part of TLS, .tbss, is completely meaningless from a binary analysis perspective. It doesn't take any space in the file, or in memory. Note that this is different from a regular .bss section that takes space in memory. We should not place .tbss into a list of allocatable sections, otherwise it may cause conflicts with objects contained in the next section. (cherry picked from FBD9074056)	2018-07-30 16:30:18 -07:00
Maksim Panchenko	771d976543	[BOLT][NFC] Minor code refactoring (cherry picked from FBD8882632)	2018-07-12 10:13:03 -07:00
Maksim Panchenko	49920a8fad	[BOLT] Add R_X86_64_PC64 relocation support (cherry picked from FBD8980691)	2018-07-24 14:30:16 -07:00
spupyrev	631da736b0	[BOLT] further speeding up cache+ Summary: For large binaries, cache+ algorithm adds a noticeable overhead in comparison with cache. This modification restricts search space of the optimization, which makes cache+ as fast as cache for all tested binaries. There is a tiny (in the order of 0.01%) regression in cache-related metrics, but this is not noticeable in practice. (cherry picked from FBD8369968)	2018-05-17 18:27:13 -07:00
Rafael Auler	ddfcf4f266	[BOLT] Add parser for pre-aggregated perf data Summary: The regular perf2bolt aggregation job is to read perf output directly. However, if the data is coming from a database instead of perf, one could write a query to produce a pre-aggregated file. This function deals with this case. The pre-aggregated file contains aggregated LBR data, but without binary knowledge. BOLT will parse it and, using information from the disassembled binary, augment it with fall-through edge frequency information. After this step is finished, this data can be either written to disk to be consumed by BOLT later, or can be used by BOLT immediately if kept in memory. File format syntax: {B\|F\|f} [<start_id>:]<start_offset> [<end_id>:]<end_offset> <count> [<mispred_count>] B - indicates an aggregated branch F - an aggregated fall-through (trace) f - an aggregated fall-through with external origin - used to disambiguate between a return hitting a basic block head and a regular internal jump to the block <start_id> - build id of the object containing the start address. We can skip it for the main binary and use "X" for an unknown object. This will save some space and facilitate human parsing. <start_offset> - hex offset from the object base load address (0 for the main executable unless it's PIE) to the start address. <end_id>, <end_offset> - same for the end address. <count> - total aggregated count of the branch or a fall-through. <mispred_count> - the number of times the branch was mispredicted. Omitted for fall-throughs. Example F 41be50 41be50 3 F 41be90 41be90 4 f 41be90 41be90 7 B 4b1942 39b57f0 3 0 B 4b196f 4b19e0 2 0 (cherry picked from FBD8887182)	2018-07-17 18:31:46 -07:00
Laith Saed Sakka	27f3032447	Add initial function injection support Summary: This diff have the API needed to inject functions using bolt. In relocation mode injected functions are emitted between the cold and the hot functions, In non-reloc mode injected functions are emitted a next text section. (cherry picked from FBD8715965)	2018-07-08 12:14:08 -07:00
Maksim Panchenko	6e45f5aeec	[perf2bolt] Enforce file matching in perf2bolt Summary: If the input binary does not have a build-id and the name does not match any file names in perf.data, then reject the binary, and issue an error message suggesting to rename it to one of the listed names from perf.data. (cherry picked from FBD8846181)	2018-07-13 15:26:41 -07:00
Maksim Panchenko	f2f164f474	[perf2bolt] Fix perf build-id matching Summary: Recent compiler tool chains can produce build-ids that are less than 40 characters long. Linux perf, however, always outputs 40 characters, expanding the string with 0's as needed. Fix the matching by only checking the string prefix. (cherry picked from FBD8839452)	2018-07-13 10:49:41 -07:00
Rafael Auler	7aee0adbf9	[BOLT-AArch64] Create cold symbols on demand Summary: Rework the logic we use for managing references to constant islands. Defer the creation of the cold versions to when we split the function and will need them. (cherry picked from FBD8228803)	2018-05-31 10:33:53 -07:00
Maksim Panchenko	44a36937f8	[BOLT] Fix llvm-dwarfdump issues Summary: llvm-dwarfdump is relying on getRelocatedSection() to return section_end() for ELF files of types other than relocatable objects. We've changed the function to return relocatable section for other types of ELF files. As a result, llvm-dwarfdump started re-processing relocations for sections that already had relocations applied, e.g. in executable files, and this resulted in wrong values reported. As a workaround/solution, we make this function return relocated section for executable (and any non-relocatable objects) files only if the section is allocatable. (cherry picked from FBD8760175)	2018-07-06 21:30:23 -07:00
Maksim Panchenko	66e0313d15	[perf2bolt] Accept `-` as a valid misprediction symbol Summary: As reported in GH-28 `perf` can produce `-` symbol for misprediction bit if the bit is not supported by the kernel/HW. In this case we can ignore the bit. (cherry picked from FBD8786827)	2018-07-10 10:25:55 -07:00
Rafael Auler	12380b8b06	Fix assembly after adding entry points Summary: When a given function B, located after function A, references one of A's basic blocks, it registers a new global symbol at the reference address and update A's Labels vector via BinaryFunction::addEntryPoint(). However, we don't update A's branch targets at this point. So we end up with an inconsistent CFG, where the basic block names are global symbols, but the internal branch operands are still referencing the old local name of the corresponding blocks that got promoted to an entry point. This patch fix this by detecting this situation in addEntryPoint and iterating over all instructions, looking for references to the old symbol and replacing them to use the new global symbol (since this is now an entry point). Fixes facebookincubator/BOLT#26 (cherry picked from FBD8728407)	2018-07-03 11:57:46 -07:00
Rafael Auler	544d1577c1	Avoid removing BBs referenced by JTs Summary: While removing unreachable blocks, we may decide to remove a block that is listed as a target in a jump table entry. If we do that, this label will be then undefined and LLVM assembler will crash. Mitigate this for now by not removing such blocks, as we don't support removing unnecessary jump tables yet. Fixes facebookincubator/BOLT#20 (cherry picked from FBD8730269)	2018-07-03 17:02:33 -07:00
Laith Saed Sakka	b6c4d8e924	-- Adding Veneer elimination pass and Veneer count to dyno stats. Summary: Create a pass that performs veneers elimination . (cherry picked from FBD8359299)	2018-06-07 11:10:37 -07:00
Maksim Panchenko	207ac19c63	Revert "[LongJumpPass] X86 enablement. First attempt." This reverts commit 010b0f7603fc9fa209c6dc95ce4b9c08e7b70d75. (cherry picked from FBD28111178)	2018-07-06 14:54:53 -07:00
Puyan Lotfi	64c429da89	[LongJumpPass] X86 enablement. First attempt. (cherry picked from FBD8753328)	2018-07-06 12:31:36 -07:00
Maksim Panchenko	b447979b8c	[BOLT] Fix diagnostics printing in data aggregator Summary: Print correct part of the string while reporting an error. (cherry picked from FBD8745329)	2018-07-05 20:47:38 -07:00
Maksim Panchenko	d7b2474f83	[DebugInfo] Change default value of FDEPointerEncoding Summary: If the encoding is not specified in CIE augmentation string, then it should be DW_EH_PE_absptr instead of DW_EH_PE_omit. (cherry picked from FBD8740274)	2018-07-05 14:21:49 -07:00
Maksim Panchenko	365613b404	[BOLT] Fix no-assertions build Summary: In release build without assertions MCInst::dump() is undefined and causes link time failure. Fixes facebookincubator/BOLT#27. (cherry picked from FBD8732905)	2018-07-04 10:33:26 -07:00
Maksim Panchenko	a6a37995d9	[BOLT] Reject processing of PIE binaries Summary: Check if the input binary ELF type. Reject any binary not of ET_EXEC type, including position-independent executables (PIEs). Also print the first function containing PIC jump table. (cherry picked from FBD8707274)	2018-06-29 21:12:55 -07:00

1 2 3 4 5 ...

593 Commits All Branches Search

593 Commits

All Branches