llvm-project

Commit Graph

Author	SHA1	Message	Date
Alexander Shaposhnikov	e3654fc274	[BOLT] Uniquify names of local symbols Summary: 1. Uniquify names of local symbols. 2. Handle aliases. (cherry picked from FBD20270196)	2020-03-04 18:36:44 -08:00
Alexander Shaposhnikov	842a25f785	[BOLT] Mark functions containing data as non-simple Summary: Temporarily mark functions containing data as non-simple. (cherry picked from FBD20213279)	2020-03-02 22:41:12 -08:00
Maksim Panchenko	cb9c991dcb	[BOLT] Remove allow-section-relocations option Summary: The option is not used. Remove all related code. (cherry picked from FBD20237859)	2020-03-03 15:51:24 -08:00
Maksim Panchenko	c7e012e145	[BOLT][NFC] Get rid of BestFit parameter Summary: The parameter is no longer used. (cherry picked from FBD20236516)	2020-03-03 14:28:42 -08:00
Alexander Shaposhnikov	b0cbb60165	[BOLT] Fix begin decrementing Summary: Fix begin decrementing. (cherry picked from FBD20232474)	2020-03-03 13:36:32 -08:00
Maksim Panchenko	d89bb53afa	[BOLT][NFC] Factor out relocation processing (cherry picked from FBD20087297)	2020-02-24 17:10:02 -08:00
Rafael Auler	340da8f294	[BOLT] Fix shrink wrapping to check pops Summary: Shrink wrapping has a mode where it will directly move push pop pairs, instead of replacing them with stores/loads. This is an ambitious mode that is triggered sometimes, but whenever matching with a push, it would operate with the assumption that the restoring instruction was a pop, not a load, otherwise it would assert. Fix this assertion to bail nicely back to non-pushpop mode (use regular store and load instructions). (cherry picked from FBD20085905)	2020-02-18 16:00:40 -08:00
Maksim Panchenko	2df4e7b99e	[BOLT][NFC] Minor refactoring of RewriteInstance (cherry picked from FBD20087424)	2020-02-24 17:12:41 -08:00
Maksim Panchenko	495761dc70	[BOLT][NFC] Remove unused BinarySection member functions (cherry picked from FBD20087243)	2020-02-24 16:56:45 -08:00
Maksim Panchenko	3b45212e84	[BOLT] Delete ExecutableFileMemoryManager::registerNoteSection() Summary: The interface is no longer in use. (cherry picked from FBD20070558)	2020-02-24 09:40:32 -08:00
Alexander Shaposhnikov	01b7c90242	[BOLT] Add missing override Summary: Add missing override in X86MCPlusBuilder.cpp. (cherry picked from FBD20064222)	2020-02-23 22:27:28 -08:00
Maksim Panchenko	be43f89c4f	[BOLT][llvm] Update llvm.patch Summary: (cherry picked from FBD20063562)	2020-02-23 19:51:33 -08:00
Alexander Shaposhnikov	76aa1c26aa	[BOLT] Enable reversing the order of basic blocks Summary: Enable reversing the order of basic blocks. (cherry picked from FBD19943692)	2020-02-17 13:35:09 -08:00
Alexander Shaposhnikov	4ad5048393	[BOLT] Add first bits to build CFG Summary: Add first bits to build CFG. (cherry picked from FBD19943472)	2020-02-17 12:18:42 -08:00
Alexander Shaposhnikov	5b64bf2128	[BOLT] Disassemble functions from a MachO binary Summary: Add first bits to disassemble functions from a MachO binary. (cherry picked from FBD19900493)	2020-02-11 14:30:33 -08:00
Rafael Auler	a9d85413ac	[BOLT] Emit long nops by default Summary: Change our X86 target to use long nops by default. In general, BOLT does not put nops into the instruction stream that is going to be executed, since it doesn't align basic blocks, only functions. Since we rebased BOLT, our relationship with MCAssembler changed because it stopped using multibyte nops and we never needed to revisit that. But it makes a difference if we want to mitigate perf issues with the Intel JCC erratum, since the nops inserted are going to be decoded and executed. To make MCAssembler emit long nops again, we need to explictly set mattr (Features) of the X86 target. (cherry picked from FBD19987277)	2020-02-19 16:13:58 -08:00
Maksim Panchenko	9711286858	[BOLT] Get rid of BinarySection::IsLocal Summary: The flag is no longer used/needed. (cherry picked from FBD19951571)	2020-02-18 09:20:17 -08:00
Alexander Shaposhnikov	16630f5c58	[BOLT] Factor out NameResolver from RewriteInstance Summary: Factor out the helper class NameResolver from the class RewriteInstance. (cherry picked from FBD19943916)	2020-02-17 14:37:46 -08:00
Alexander Shaposhnikov	754b6569f6	[BOLT] Add missing std::move Summary: Add missing std::move in the method BinaryFunction::addAlternativeName (cherry picked from FBD19944661)	2020-02-17 17:53:12 -08:00
Alexander Shaposhnikov	36cf37c4c1	[BOLT] Add initial bits for parsing MachO files Summary: Start adding initial bits for MachO, this diff contains some small preparations for finding functions inside a MachO binary, this will be done in the next diff. The concept of a section in the MachO world is quite different from ELF, nevertheless, for functions for now it more or less fits into the current picture (in BOLT), but things will diverge more significantly a bit later. (cherry picked from FBD19648161)	2020-01-30 13:10:48 -08:00
Rafael Auler	58a129a602	[BOLT] Move peepholes pass after sctc Summary: There are two peephole subpasses, remove-double-jumps and remove-useless-conditional-branches, that operates by reading branches directly, which makes them tricky to run before fix-branches. In the case of remove-double-jumps, it will even lead to suboptimal code if the patched branch was going to be removed by fix-branches when the target is the fall-through. If the final target is a tail call, it will lead to a broken CFG in the worst case. Fix this by moving these passes after SCTC, which already produces CFGs with conditional tail calls. (cherry picked from FBD18795592)	2019-12-03 12:28:22 -08:00
Rafael Auler	c82e7fd1cc	[BOLT] Decoder cache friendly alignment wrt Intel JCC Erratum Summary: This diff ports reviews.llvm.org/D70157 to our LLVM tree, which makes the integrated assembler able to align X86 control-flow changing instructions in a way to reduce the performance impact of the ucode update on Intel processors that implement the JCC erratum mitigation. See white paper "Mitigations for Jump Conditional Code Erratum" by Intel published November 2019. To port this patch, I changed classifySecondInstInMacroFusion to analyze instruction opcodes directly instead of analyzing the CondCond operand (in more recent versions of LLVM, all conditional branches share the same opcode, but with a different conditional operand). I also pulled to our tree Alignment.h as a dependency, and the macroop analyzing helpers. x86-align-branch-boundary and -x86-align-branch are the two flags that control nop insertion to avoid disabling the decoder cache, following the original patch. In BOLT, I added the flag x86-align-branch-boundary-hot-only to request the alignment to only be applied to hot code, which is turned on by default. The reason is because such alignment is expensive to perform on large modules, but if we limit it to hot code, the relaxation pass runtime becomes tolerable. (cherry picked from FBD19828850)	2020-02-10 18:50:53 -08:00
Alexander Shaposhnikov	d5b8fc8fbe	[BOLT] Make the methods isText/isData more robust Summary: Make the methods isText/isData work for MachO. (cherry picked from FBD19849460)	2020-02-11 17:54:48 -08:00
Alexander Shaposhnikov	c3c4b15a2e	[BOLT] Remove BinaryContext::getFunctionData Summary: In this diff we refactor the code around getting the original binary encoding of function's body. The main changes are: remove BinaryContext::getFunctionData, remove the parameter of the method BinaryFunction::disassemble, introduce BinaryFunction::getData. (cherry picked from FBD19824368)	2020-02-10 15:35:11 -08:00
Maksim Panchenko	41de03b8e9	[BOLT] Fix section names under `-generate-link-sections` Summary: Use proper function while printing modified function name to file. (cherry picked from FBD19791847)	2020-02-07 09:39:38 -08:00
Rafael Auler	0080d74506	[BOLT] Fix issue with strict and builtin_unreachable Summary: In strict mode, a jump table with targets generated by builtin_unreachable (located at the very end of the function) was asserting when being recreated by postProcessIndirectBranches. Fix this. (cherry picked from FBD19614981)	2020-01-28 18:38:10 -08:00
Maksim Panchenko	d57513e4ab	[BOLT] Fix symbol table issue with ICF Summary: Not all symbol table entries were updated after ICF. (cherry picked from FBD19319685)	2020-01-08 13:32:59 -08:00
Maksim Panchenko	ac697b7d3a	[BOLT] Replace list of Names with Symbols for BinaryFunction Summary: BinaryFunction used to have a list of Names associated with its main entry point. However, the function is primarily identified by its corresponding symbol or symbols, and these symbols are available as we are creating them for a corresponding BinaryData object. There's also no reason to emit symbols for alternative function names (aliases), so change the code to only emit needed symbols. When we emit a cold fragment for a function, only emit one cold symbol for the fragment instead of one per every main entry symbol/name. When we match a symbol to an entry point in the function, with this change we can first go through the list of main entry symbols (now that they are available). (cherry picked from FBD19426709)	2020-01-13 11:56:59 -08:00
Alexander Shaposhnikov	7a59783d7a	[BOLT] Move createBinaryContext to BinaryContext Summary: 1. Move createBinaryContext to BinaryContext. 1. Add support for nonlinux triples in createBinaryContext. 2. Remove unnecessary std::move in DWARFRewriter.cpp. (cherry picked from FBD19421314)	2020-01-15 15:23:45 -08:00
Rafael Auler	961d3d02d8	[BOLT] Move postProcessEntryPoints after disassembly Summary: Call postProcessEntryPoints only after all functions have been disassembled and all interprocedural references have been processed, when all possible entry points have been accounted for. This makes our detection of bad entries more robust as it does not depend on the order of the functions any more. (cherry picked from FBD19404767)	2020-01-14 17:12:03 -08:00
Maksim Panchenko	0283271f29	[BOLT] Do no report error on mismatched instruction encoding Summary: When the validation of instruction encoding fails but we are able to continue processing the binary, do no report an error. Report encoding format only under `-v=1`. (cherry picked from FBD19376531)	2020-01-13 11:24:10 -08:00
Maksim Panchenko	45b27d7b44	[BOLT] Get rid of Names in BinaryData Summary: For BinaryData, we used to maintain a vector of StringRef names and also a vector of pointers to MCSymbol's associated with the data. There was an unnecessary duplication of information and an associated overhead of keeping it in sync. Fix it by removing Names and using Symbols wherever Names were used. Also merge two variants of registerNameAtAddress() and remove unreachable/dead code in the process. (cherry picked from FBD19359123)	2020-01-10 16:17:47 -08:00
Maksim Panchenko	088e3c032a	[BOLT] Improve handling of secondary function entry points Summary: "Fix symbol table entries for secondary entries" diff broke the inliner. Fix the breakage and make the discovery of secondary entry points more accurate. Add ability to BinaryContext::getFunctionForSymbol() to return an entry point discriminator and use it instead of calling getEntryForSymbol() and isSecondaryEntry(). This is the preferred way since getFunctionForSymbol() is thread-safe. (cherry picked from FBD19295983)	2020-01-06 14:57:15 -08:00
Alexander Shaposhnikov	8c7f524afb	[BOLT] Fix build of the runtime on OSX Summary: Fix the compilation error on OSX (cherry picked from FBD19269806)	2020-01-02 16:20:13 -08:00
Rafael Auler	de284bc510	[BOLT] Fix symbol table entries for secondary entries Summary: Commit "Support full instrumentation" changed the map SymbolToFunction in BinaryContext to map secondary entries of functions too. This introduced unexpected behavior in our symbol table rewriting logic, which caused it to mistakenly write them with the address of the original function. Fix the behavior of getBinaryFunctionAtAddress to correct this. Also fix other users of SymbolToFunction to ensure they are not accidentally using secondary entries when they shouldn't. (cherry picked from FBD19168319)	2019-12-18 12:14:42 -08:00
Xin-Xin Wang	9aa276d349	[BOLT] Make .debug_loc update deterministic Summary: Change the single DebugLocWriter to one for each compilation unit. Then, each thread can write to its own DebugLocWriter and we can combine the data in a deterministic order once the threads are done. The only catch is that each thread would need the offset of the location lists it adds, so we make a list of pending location list patches and compute the final offsets at the end. (cherry picked from FBD18153069)	2019-10-25 11:47:51 -07:00
Maksim Panchenko	d414acfbb6	[perf2bolt] Better mmap event matching Summary: When perf tool reports a mapping address of a binary, it is not always the address of the first loadable segment we were checking against. As a result, perf2botl was not working properly for binaries where the first segment was not executable. The fix is to check if the address reported by mmap event matches any of the loadable segments. Note that the segment alignment has to be applied to get real loadable address of the segment. Fixes facebookincubator/BOLT#65 (cherry picked from FBD19146419)	2019-12-17 11:17:31 -08:00
Rafael Auler	16a497c627	[BOLT] Support full instrumentation Summary: Add full instrumentation support (branches, direct and indirect calls). Add output statistics to show how many hot bytes were split from cold ones in functions. Add -cold-threshold option to allow splitting warm code (non-zero count). Add option in bolt-diff to report missing functions in profile 2. In instrumentation, fini hooks are fixed to run proper finalization code after program finishes. Hooks for startup are added to setup the runtime structures that needs initilization, such as indirect call hash tables. Add support for automatically dumping profile data every N seconds by forking a watcher process during runtime. (cherry picked from FBD17644396)	2019-12-13 17:27:03 -08:00
Rafael Auler	e46d52de5b	[BOLT] Fix non-determinism in ICP with threads Summary: -icp-top-callsites selects candidates for optimization until a threshold is met. Currently, this parameter is set to 99% of calls by default. The order of functions evaluated changes in parallel mode, thus the functions that may be included to satisfy 99% of all calls may change, leading to different optimization decisions when running in parallel versus sequential. Fix this by enabling optimizations for all branches with the same frequency once we reach our budget instead of cutting off immediatelly after our budget is satisfied. In that way, order of functions has no impact on which functions are optimized. (cherry picked from FBD18902239)	2019-12-13 16:46:00 -08:00
Xin-Xin Wang	bdb60857e8	[BOLT] Make .debug_loc update deterministic Summary: Change the single DebugLocWriter to one for each compilation unit. Then, each thread can write to its own DebugLocWriter and we can combine the data in a deterministic order once the threads are done. The only catch is that each thread would need the offset of the location lists it adds, so we make a list of pending location list patches and compute the final offsets at the end. (cherry picked from FBD18153069)	2019-10-25 11:47:51 -07:00
Maksim Panchenko	e5d1334ad5	[perf2bolt] Ignore mmap events unrelated to execution Summary: Some processes can mmap the main binary for the purpose of introspection. We should ignore such mmap events for fixed-address binaries. For PIC binaries, we record the mapping and do the address filtering later for all sample events. (cherry picked from FBD18844314)	2019-12-05 16:52:15 -08:00
Xin-Xin Wang	6f93d53bf5	[BOLT] Remove test for impossible debug ranges condition Summary: The condition `DebugRangesOffset == -1U` can never happen since DebugRangesOffset has type `uint64_t` and the value always comes from `RangesSectionWriter->addRanges` which gets its value from `DebugRangesSectionWriter.SectionOffset` which has type `uint32_t`. The condition seems to be left over from a time where something was using `-1` as an error value. I'm removing that check so I can use `-1` as a tag to refer to the empty range that will be at the beginning of the ranges section. (cherry picked from FBD18153119)	2019-10-25 15:18:37 -07:00
Xin-Xin Wang	112c4251f5	[BOLT] Separate DebugRangesSectionsWriter into Ranges and ARanges Summary: The `.debug_aranges` section is already deterministic and is logically separate from the `.debug_ranges` section so separate them into separate classes so that it will be easier to make DebugRangesSectionsWriter deterministic (cherry picked from FBD18153057)	2019-10-25 11:24:49 -07:00
Xin-Xin Wang	8e2d3f7c30	[BOLT] Fix invalid abbrev error when reading debug_info section with readelf Summary: This fixes a bug which causes the debug_info and debug_loc sections to be unreadable by readelf/objdump. Basically, we're using 12 bytes of a ULEB128 value to fill in space, but readelf can't read more than 9 bytes of ULEB128. Thus, we replace that value with a string of 'a' instead. (cherry picked from FBD18097728)	2019-10-23 15:19:49 -07:00
Rafael Auler	28f91871b3	[PERF2BOLT/BOLT] Improve support for .so Summary: Avoid asserting on inputs that are shared libraries with R_X86_64_64 static relocs and RELATIVE dynamic relocations matching those. Our relocation checking mechanism would expect the result of the static relocation to be encoded in the binary, but the linker instead puts it as an addend in the RELATIVE dyn reloc. Also fix aggregation for .so if the executable segment is not the first one in the binary. (cherry picked from FBD18651868)	2019-11-14 16:07:11 -08:00
Rafael Auler	4bcc53a408	[BOLT] Fix shrink wrapping empty BB issue Summary: When combining icp=calls and shrink wrapping, the former may generate empty BBs that are going to trigger a bug in shrink wraping restore placement strategy. The restore is wrongly pushed to the BB successor instead of being added to the current block. Add a pass to go over the CFG to fix empty blocks by adding a temporary NOP instruction that is going to be deleted later. Empty BBs are not supported by one of the analysis done at this pass. (cherry picked from FBD18717994)	2019-11-26 15:09:40 -08:00
Maksim Panchenko	3cc4fc267b	[BOLT] Proper support for -trap-avx512 option Summary: If -trap-avx512 option is not set, verify that we correctly encode AVX-512 instructions and treat them as ordinary instructions. (cherry picked from FBD18666427)	2019-11-22 14:53:20 -08:00
Maksim Panchenko	7350d40404	[BOLT][NFC] Refactor BinaryFunction::addEntryPoint() Summary: There is no need to support existing functionality of adding entry points after the CFG is built as the function is only called in empty or disassembled state. Previously we used to run disassemble+buildCFG per function, but now these phases are decoupled. Also, remove a couple of redundant checks. (cherry picked from FBD18622822)	2019-11-11 17:02:37 -08:00
Maksim Panchenko	a09659fd54	[BOLT] Refactor markAmbiguousRelocations() Summary: Refactor markAmbiguousRelocations() code and move it to BinaryContext. Also remove a redundant check. (cherry picked from FBD18623815)	2019-11-18 14:08:17 -08:00
Maksim Panchenko	658f270417	[BOLT] Refactor data PC relocations in BinaryContext Summary: We only use locations of PC relocations and ignore the rest of the data. There's no need to store type and value. (cherry picked from FBD18623280)	2019-11-19 18:52:08 -08:00

... 2 3 4 5 6 ...

832 Commits All Branches Search

832 Commits

All Branches