llvm-project

Commit Graph

Author	SHA1	Message	Date
Amir Ayupov	2da5b12a3d	[BOLT] Hugify: check for THP support via sysfs Summary: Remove dependence on kernel version check, query sysfs directly instead. (cherry picked from FBD28858208)	2021-06-02 19:11:52 -07:00
Maksim Panchenko	7bccf8d25d	[BOLT][NFC] Fix debug info printouts for inlined functions Summary: While printing debug info for instructions, we should use line tables from the corresponding DWARF CU which could be different from the containing function CU in case of inlined instructions. (cherry picked from FBD28908324)	2021-06-04 12:31:31 -07:00
Amir Ayupov	65d227c035	[BOLT][TEST] Fix test case to conform to analyzePICJumpTable pattern matching Summary: Make sure that jump table is properly recognized in `split_func_jump_table_fragment.s`. (cherry picked from FBD28839976)	2021-06-02 10:50:47 -07:00
James Luo	1c06193d0f	[BOLT] Resolve JumpTable namespace issue in pseudo probe decoder migration Summary: This diff fixes the JumpTable namespace conflicts during the migration of pseudo probe decoder. (cherry picked from FBD28859927)	2021-06-02 22:46:57 -07:00
Maksim Panchenko	a26370389a	[BOLT][NFC] Disable ProcessAllSections in RuntimeDyld Summary: FBD55943 changed the way ProcessAllSections works in RuntimeDyld. After the change, all sections, including symbol table, section table, etc. are loaded into memory whenever ProcessAllSections is enabled. In BOLT we rely on RuntimeDyld for processing sections with relocations. These include most allocatable sections and additionally .debug_line. The latter is skipped by RuntimeDyld without ProcessAllSections flag. If we enable ProcessAllSections, we will have to deal with allocating memory for more sections than we need (see above) and later to filter them out. The alternative is to mark all sections that we actually plan to use as "required for execution" (using RuntimeDyld terminology). For .debug_line section on ELF it means adding SHF_ALLOC flag. On MachO, RuntimeDyld currently treats all sections as required. (cherry picked from FBD28729398)	2021-05-26 16:23:34 -07:00
Vladislav Khmelevsky	5a6c379f5b	[PR] Instrumentation: Emit paddings to preserve data alignment Summary: Vladislav Khmelevsky, Advanced Software Technology Lab, Huawei facebookincubator/BOLT#156 (cherry picked from FBD28521843)	2021-05-14 14:09:05 +03:00
Vladislav Khmelevsky	79807d99fe	[PR] Introduce loop inversion pass Summary: This patch introduces LoopInversionPass. Its main purpose is to ensure that the loop layout is optimal depending on the profile information. So if profile information shows that the loop is used, the unconditional jump instruction must be executed only once and vice-versa. Please take a look to the pass header file and test for more details. Also change link_fdata script a bit, to be able to change FDATA prefix, like FileCheck does. Vladislav Khmelevsky, Advanced Software Technology Lab, Huawei PR facebookincubator/BOLT#153 (cherry picked from FBD28391811)	2021-05-11 20:59:13 +03:00
Amir Ayupov	12e9fec697	Rebase: [BOLT] DebugFission Support Summary: Implemented support for Debug Fission. For the most part it doesn't impact Monolithic execution path. One area that was changed is the DW_AT_low_pc/DW_AT_high_pc conversion. Before it was to DW_AT_ranges/DW_AT_low_pc, now DW_AT_low_pc is kept in same place. Another more visible impact is in Skeleton CU the DW_AT_low_pc is replaced with DW_AT_ranges_base if it's not originally present and bolt converted ranges conversion inside the dwo units. Output of this are multiple .dwo files with updated debug information. (cherry picked from FBD29569788)	2021-04-01 11:43:00 -07:00
Amir Ayupov	99d7f90635	[BOLT][NFC][TEST] Added llvm-dwarfdump and llvm-mc to BOLT_TEST_DEPS (cherry picked from FBD28427352)	2021-05-13 15:36:43 -07:00
Maksim Panchenko	ba6fdb8113	[BOLT] Preserve original jump table relocations Summary: Remove relocations against internal function labels, e.g. jump table relocations, only when overwriting them. While reading an input file with relocations, we create internal relocations against code references (we skip PIC relocations). Later, when we discover jump tables, we remove corresponding relocations with the assumption that original relocations will either be ignored or replaced by new relocations. However, it is possible to miss some references to the jump table, in which case the original entries will not be ignored. While such situation is abnormal, it is still a better/safer approach to preserve relocations if we are not replacing them with new ones. (cherry picked from FBD28406628)	2021-05-12 23:35:10 -07:00
Maksim Panchenko	81c59d9a54	[BOLT][NFC] Change interface for searching relocations (cherry picked from FBD28406629)	2021-05-12 23:29:04 -07:00
Amir Ayupov	500edf26c9	[BOLT][NFC] Address warning about ProgramPoint implicit copy constructor Summary: Explicit assignment operator can be replaced with an implicit one. Remove it to allow an implicit copy constructor: ``` bolt/src/Passes/DataflowAnalysis.h:74:8: warning: definition of implicit copy constructor for 'ProgramPoint' is deprecated because it has a user-declared copy assignment operator [-Wdeprecated-copy] void operator=(const ProgramPoint &PP) { ^ bolt/src/Passes/DataflowAnalysis.h:62:14: note: in implicit copy constructor for 'llvm::bolt::ProgramPoint' first required here return ProgramPoint(&*Last); ``` (cherry picked from FBD28335138)	2021-05-10 14:16:25 -07:00
Maksim Panchenko	fe37f1870e	[BOLT][NFC] Follow LLVM variable initialization style (cherry picked from FBD28417604)	2021-05-13 10:50:47 -07:00
Vladislav Khmelevsky	b728bfc70a	[PR] Add missing includes Summary: Adds missing headers removed by IWYU. NB: this caused build breakage on ubuntu-latest Vladislav Khmelevsky, Advanced Software Technology Lab, Huawei (cherry picked from FBD28368185)	2021-05-11 15:55:57 +03:00
Vladislav Khmelevsky	de298c08fd	[PR] Fix tests build with -no-pie option Summary: Since gcc/ld could produce and expect PIE files we need to pass -no-pie option to avoid linking errors for tests. Vladislav Khmelevsky, Advanced Software Technology Lab, Huawei (cherry picked from FBD28360045)	2021-05-11 03:25:49 +03:00
Alexey Moksyakov	ce84e9607a	[PR] Fix bb reordering optimization Summary: Reorder-blocks optimization pass doesn't take into account that available offset for legacy Jcc instructions (for example, JRCXZ - operand 8 bits) has to be less than 255 bytes. It's rare case and to exclude such functions with unsupported instructions from optimization passes added extra checking Alexey Moksyakov Advanced Software Technology Lab, Huawei (cherry picked from FBD28264117)	2021-04-23 11:34:40 +03:00
Amir Ayupov	9a884543f1	[BOLT][NFC] Avoid unnecessary copies with push_back Summary: Small refactoring inspired by clang-tidy modernize-use-emplace (cherry picked from FBD28307493)	2021-05-07 18:43:25 -07:00
Amir Ayupov	94653797f3	Rebase: [BOLT][NFC] Avoid binutils in tests Summary: Replace binutils tools with llvm tools (cherry picked from FBD29575630)	2021-05-04 16:45:28 -07:00
Amir Ayupov	eb99a6665c	Rebase: [BOLT][NFC] Remove unneeded includes with include-what-you-use Summary: Ran iwyu multiple times, manually picked header remove lines. Reached fixed point wrt removal: iwyu doesn't automatically remove any more headers or forward declarations. (cherry picked from FBD29569221)	2021-04-30 13:54:02 -07:00
Maksim Panchenko	5239182075	[perf2bolt] Further relax segment matching Summary: Previously, we used p_align value of the code segment to predict the mapping of the segment at runtime. However, at times the reported value is not aligned and at other times the actual aligned value will be different because of the different page size used. All we know is that the page size used at runtime should not exceed p_align value. Adjust our segment address matching accordingly. (cherry picked from FBD28133066)	2021-04-30 15:02:29 -07:00
Maksim Panchenko	bd86c06c1b	[BOLT][NFC] Remove CFIReaderWriter::fdes() (cherry picked from FBD27918126)	2021-04-21 12:33:08 -07:00
Maksim Panchenko	f8fa3e97d5	[BOLT] Remove -dump-eh-frame option Summary: The option duplicates functionality of "llvm-dwarfdump -eh-frame". (cherry picked from FBD27917505)	2021-04-21 12:13:22 -07:00
Maksim Panchenko	3355936e14	[BOLT][NFC] Remove RewriteInstance::EHFrame (cherry picked from FBD27915725)	2021-04-21 11:24:15 -07:00
Amir Ayupov	f84f451a54	[BOLT][NFC] Use const reference for MCInstrDesc Summary: Addressing comments from the review for "Expand auto types". Use const reference in MCPlusBuilder for MCInstrDesc where the copy is not necessary. (cherry picked from FBD27844344)	2021-04-17 21:48:46 -07:00
Amir Ayupov	c7306cc219	Rebase: [BOLT][NFC] Expand auto types Summary: Expanded auto types across BOLT semi-automatically with the aid of clangd LSP (cherry picked from FBD33289309)	2021-04-08 00:19:26 -07:00
Rafael Auler	dc2673a039	[BOLT] Fix value invalidation bug in runtimelib Summary: We can't use a fragment of the old LibPath as an input to create a new one. (cherry picked from FBD27642728)	2021-04-07 21:40:23 -07:00
Rafael Auler	35732d954b	[BOLT] Remove cantFail in getAddressRanges calls Summary: We may have a CU with empty ranges, so accept errors coming from DWARFDie::getAddressRanges(). This happens when using tools that selectively strip debuginfo from the binary. (cherry picked from FBD27602731)	2021-04-06 12:57:09 -07:00
Amir Ayupov	f1bfb18ceb	[BOLT] Refactor SectionPatchers map to a Patcher in BinarySection Summary: Refactor SectionPatches to avoid the use of extra map and a cast from StringRef to std::string. cherry-picked from FBD26756560 (cherry picked from FBD27490641)	2021-03-18 13:06:18 -07:00
Amir Ayupov	081e39aa15	Rebase: [cherry-pick] [BOLT] Add option to skip writing an output file Summary: The user may wish to run BOLT for printing statistics only (i.e. to check that the profile is valid). Add an option to run BOLT without writing any output file, similar to a dry run. This option is triggered by supplying -o with "/dev/null". (cherry picked from FBD29568632)	2021-03-29 16:04:57 -07:00
Maksim Panchenko	e7169be93f	[BOLT] Do not assert on jump table heuristic failure Summary: During the initial indirect jump analysis, we used to assert that the discovered jump table type matched the pattern of the corresponding instruction sequence. E.g., for PIC jump table memory we expected the PIC jump table instruction sequence. The assertions were too conservative, as in the case of a mismatch we can mark the indirect jump as having an unknown control flow. That should be sufficient to either skip the function processing or rely on relocation information for possible recovery of the control flow. (cherry picked from FBD27255816)	2021-03-23 13:41:41 -07:00
Rafael Auler	b3c34d568a	[BOLT] Fix instrumentation bug in duplicated JTs Summary: Fix a bug with instrumentation when trying to instrument functions that share a jump table with multiple indirect jumps. Usually, each indirect jump that uses a JT will have its own copy of it. When this does not happen, we need to duplicate the jump table safely, so we can split the edges correctly (each copy of the jump table may have different split edges). For this to happen, we need to correctly match the sequence of instructions that perform the indirect jump to identify the base address of the jump table and patch it to point to the new cloned JT. It was reported to us a case in which the compiler generated suboptimal code to do an indirect jump which our matcher failed to identify. Fixes facebookincubator/BOLT#126 (cherry picked from FBD27065579)	2021-03-15 16:34:25 -07:00
Maksim Panchenko	b11c826889	[BOLT] Fix false references to zero-sized objects Summary: Whenever BOLT encounters a data reference in code, it tries to convert it into <Object+Offset> form. The primary reason behind this approach is to support read-only data-reordering optimization. However, with the current level of the linker and compiler support we don't have enough information to always correctly restore the original <Object+Offset>. E.g. with zero-sized symbols we have to speculate that the actual size of the underlying object extends to the next symbol. Most of the time, there will be an object pointed by a zero-sized symbol and even if we are guessing incorrectly, there will be no harm in creating references of such form. The problem happens when there's no object corresponding to the original symbol and the next object is an (unmarked) jump table: A: # <- zero-sized object .LJUMP_TABLE: .long <entry1> .long <entry2> .... .LB: .long 21 .LC: .long 42 The jump table will be moved and all references past it (up to the next named object) will be incorrectly updated. We should not speculate about the size of A in a case like that and treat all discovered data objects (and thus references) independently. (cherry picked from FBD27005660)	2021-03-15 12:06:56 -07:00
Vladislav Khmelevsky	76d346ca14	[BOLT][PR] Instrumentation: Introduce -no-counters-clear and -wait-forks options Summary: This PR introduces 2 new instrumentation options: 1. instrumentation-no-counters-clear: Discussed at https://github.com/facebookincubator/BOLT/issues/121 2. instrumentation-wait-forks: Since the instrumentation counters are mapped as MAP_SHARED it will be nice to add ability to wait until all forks of the parent process will die using tracking of process group. The last patch is just emitBinary code refactor. Vladislav Khmelevsky, Advanced Software Technology Lab, Huawei Pull Request resolved: https://github.com/facebookincubator/BOLT/pull/125 GitHub Author: Vladislav Khmelevskyi <Vladislav.Khmelevskyi@huawei.com> (cherry picked from FBD26919011)	2021-03-09 16:18:11 -08:00
Maksim Panchenko	225a8d7f2c	[BOLT] Ignore TBSS section at layout time Summary: TBSS section is a "virtual" section that does not take memory or file space. Ignore it completely while adjusting section sizes. (cherry picked from FBD26824484)	2021-03-04 16:31:12 -08:00
Vladislav Khmelevsky	ec9751eef5	[BOLT][PR] readDynamicRelocations: Skip NONE relocations Summary: NONE relocations should not be processed during dynamic relocations read process Vladislav Khmelevsky, Advanced Software Technology Lab, Huawei Pull Request resolved: https://github.com/facebookincubator/BOLT/pull/118 GitHub Author: Vladislav Khmelevsky <Vladislav.Khmelevskyi@huawei.com> (cherry picked from FBD26489881)	2021-02-17 15:36:58 -08:00
Alexander Yermolovich	06959eedcf	Fix up test for Update DW_AT_stmt_list for .debug_types Summary: As titled. (cherry picked from FBD28112186)	2021-03-17 17:08:26 -07:00
Rafael Auler	da752c9c5c	Fix license for a few remaining files Summary: As titled. (cherry picked from FBD28112137)	2021-03-17 15:04:19 -07:00
Alexander Yermolovich	0ec91a25df	Update DW_AT_stmt_list for .debug_types Summary: There is no real link between CU and TU, so relying on fact that address are the same, and we are updating all of them. (cherry picked from FBD28112114)	2021-02-17 15:30:10 -08:00
Rafael Auler	16521f1f79	[BOLT] Update license headers Summary: Update license and fix headers for some files. (cherry picked from FBD28112041)	2021-03-15 18:04:18 -07:00
Amir Ayupov	1c5d3a056c	Rebase: Merge BOLT codebase in monorepo Summary: This commit is the first step in rebasing all of BOLT history in the LLVM monorepo. It also solves trivial build issues by updating BOLT codebase to use current LLVM. There is still work left in rebasing some BOLT features and in making sure everything is working as intended. History has been rewritten to put BOLT in the /bolt folder, as opposed to /tools/llvm-bolt. (cherry picked from FBD33289252)	2020-12-01 16:29:39 -08:00
Alexander Shaposhnikov	0a8aaf56bb	[BOLT] Add support for reading profile on Mach-O Summary: Add support for reading profile on Mach-O. (cherry picked from FBD25777049)	2021-01-29 16:37:07 -08:00
Alexander Shaposhnikov	a0dd5b05dc	[BOLT] Add support for dumping profile on MacOS Summary: Add support for dumping profile on MacOS. (cherry picked from FBD25751363)	2021-01-28 12:44:14 -08:00
Alexander Shaposhnikov	3b876cc3e7	[BOLT] Add support for dumping counters on MacOS Summary: Add support for dumping counters on MacOS (cherry picked from FBD25750516)	2021-01-28 12:32:03 -08:00
Alexander Shaposhnikov	6a84124e1d	[BOLT] Add support for __literal16 section on MachO Summary: 1. Add support for __literal16 section in the instrumentation runtime library for MacOS. 2. Fix emitting __counters section. (cherry picked from FBD25746342)	2021-01-28 12:04:46 -08:00
Sergey Pupyrev	fea6b4e469	an updated version of ExtTSP Summary: a few minor updates in block reordering: - some refactoring to improve readability; - optimized chain splitting strategy to improve quality of layout and performance of the algorithm. (cherry picked from FBD25126220)	2021-01-27 18:29:16 -08:00
Alexander Shaposhnikov	d6e60c5bec	[BOLT] Enable intToStr for MacOS Summary: Enable intToStr et al. in the runtime library for MacOS. (cherry picked from FBD25745358)	2021-01-20 16:40:17 -08:00
Alexander Shaposhnikov	faaefff618	[BOLT] Fix operator new signature Summary: Use size_t for the first parameter of operator new. https://en.cppreference.com/w/cpp/memory/new/operator_new (cherry picked from FBD25750921)	2021-01-20 12:56:41 -08:00
Amir Ayupov	a86cd533b3	[BOLT] Fix missing newlines in debug prints (cherry picked from FBD25966797)	2021-01-19 18:43:16 -08:00
Rafael Auler	0de92b8346	[PERF2BOLT] Relax segment matching requirements Summary: When looking at perf.data's available binaries and their respective mmap'ed segments, match them with the input binary by looking at both aligned and non-aligned addresses. If we suppose the alignment is the mmap'ed page size, we may miss some cases and perf2bolt will refuse to proceed because it failed to match the input binary with a process recorded in perf.data. (cherry picked from FBD25732673)	2021-01-11 06:24:46 -08:00
Rafael Auler	e3898d5969	[BOLT] Add threshold options for lite mode Summary: Add options for trading processing speed for binary performance. -lite-threshold-pct=<uint> Threshold (in percent) for selecting functions to process in lite mode. Higher threshold means fewer functions to process. E.g threshold of 90 means only top 10 percent of functions with profile will be processed. -lite-threshold-count=<uint> Similar to '-lite-threshold-pct' but specify threshold using absolute function call count. I.e. limit processing to functions executed at least the specified number of times. -no-scan Do not scan cold functions for external references (may result in slower binary). (cherry picked from FBD24739092)	2020-12-30 12:23:58 -08:00

1 2 3 4 5 ...

914 Commits All Branches Search

914 Commits

All Branches