llvm-project

Commit Graph

Author	SHA1	Message	Date
wlei	484a569eea	[llvm-profgen] Fix total samples related issues Since total sample and body sample are used to compute hotness threshold in compiler, we found in some services changing the total samples computation will cause noticeable regression. Hence, here we will revert the changes and just keep all total samples number identical to the old tool. Three changes in this diff: 1. Revert previous diff(https://reviews.llvm.org/D112672: [llvm-profgen] Update total samples by accumulating all its body samples) and put it under a switch. 2. Keep the negative line number. Although compiler doesn't consume the count but it will be used to compute hot threshold. 3. Change to accumulate total samples per byte instead of per instruction. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D115013	2021-12-08 12:33:41 -08:00
wlei	27cb3707db	[llvm-profgen] Trim cold function profiles for non-CS AutoFDO This change allows to trim the profile if it's considered to be cold for baseline AutoFDO. We reuse the cold threshold from `ProfileSummaryBuilder::getColdCountThreshold(..)` which can be set by percent(--profile-summary-cutoff-cold) or by value(--profile-summary-cold-count). Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D113785	2021-12-08 12:20:50 -08:00
wlei	f15a854567	[llvm-profgen] Truncate the context with zero probe ID Due to the debug info merging, there may have some contexts with zero probe id, we should truncate the context to avoid misleading pre-inliner. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D114284	2021-11-30 16:21:25 -08:00
wlei	41a681ce09	[FS-AFDO][llvm-profgen] Generate profile with FS-AFDO discriminator In order to support generating profile with FS discriminator, three kind of changes are done in llvm-profgen: 1) Dissassemble .rodata section to check if FS discriminator var ('"__llvm_fs_discriminator__"') exists and set the corresponding flag in the binary. 2) Change the discriminator decoding in `getBaseDiscriminator` and `getDuplicationFactor`. 3) set true for `FunctionSamples::ProfileIsFS` to enable FS functionality in ProfileData. Reviewed By: xur, hoy, wenlei Differential Revision: https://reviews.llvm.org/D113296	2021-11-30 15:57:59 -08:00
Hongtao Yu	bf317f6698	[CSSPGO] Sorting nodes in a cycle of profiled call graph. For nodes that are in a cycle of a profiled call graph, the current order the underlying scc_iter computes purely depends on how those nodes are reached from outside the SCC and inside the SCC, based on the Tarjan algorithm. This does not honor profile edge hotness, thus does not gurantee hot callsites to be inlined prior to cold callsites. To mitigate that, I'm adding an extra sorter on top of scc_iter to sort scc functions in the order of callsite hotness, instead of changing the internal of scc_iter. Sorting on callsite hotness can be optimally based on detecting cycles on a directed call graph, i.e, to remove the coldest edge until a cycle is broken. However, detecting cycles isn't cheap. I'm using an MST-based approach which is faster and appear to deliver some performance wins. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D114204	2021-11-30 09:01:08 -08:00
wlei	c2e08aba1a	[llvm-profgen] Compute and show profile density AutoFDO performance is sensitive to profile density, i.e., the amount of samples in the profile relative to the program size, because profiles with insufficient samples could be inaccurate due to statistical noise and thus hurt AutoFDO performance. A previous investigation showed that AutoFDO performed better on MySQL with increased amount of samples. Therefore, we implement a profile-density computation feature to give hints about profile density to users and the compiler. We define the density of a profile Prof as follows: - For each function A in the profile, density(A) = total_samples(A) / sizeof(A). - density(Prof) = min(density(A)) for all functions A that are warm (defined below). A function is considered warm if its total-samples is within top N percent of the profile. For implementation, we reuse the `ProfileSummaryBuilder::getHotCountThreshold(..)` as threshold which can be set by percent(`--profile-summary-cutoff-hot`) or by value(`--profile-summary-hot-count`). We also introduce `--hot-function-density-threshold` to set hot function density threshold and will give suggestion if profile density is below it which implies we should increase samples. This also applies for CS profile with all profiles merged into base. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D113781	2021-11-29 23:54:31 -08:00
Wenlei He	f7976edc1e	[llvm-profgen] Add switch to allow use of first loadable segment for calculating offset Adding `-use-loadable-segment-as-base` to allow use of first loadable segment for calculating offset. By default first executable segment is used for calculating offset. The switch helps compatibility with unsymbolized profile generated from older tools. Differential Revision: https://reviews.llvm.org/D113727	2021-11-15 19:00:27 -08:00
wlei	aab1810006	[llvm-profgen] Fix bug of setting function entry Previously we set `isFuncEntry` flag to true when the funcName from DWARF is equal to the name in symbol table and we use this flag to ignore reporting callsite sample that's from an intra func branch. However, in HHVM, it appears that the symbol table name is inconsistent with the dwarf info func name, it's likely due to `OptimizeGlobalAliases`. This change is a workaround in llvm-profgen side to mark the only one range as the function entry and add warnings for the remaining inconsistence. This also fixed a missing `getCanonicalFnName` for symbol name which caused the mismatching as well. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D113492	2021-11-12 12:18:43 -08:00
wlei	5bf191a381	[llvm-profgen] Fix index out of bounds error while using ip.advance Previously we assume there're some non-executing sections at the bottom of the text section so that we won't hit the array's bound. But on BOLTed binary, it turned out .bolt section is at the bottom of text section which can be profiled, then it crash llvm-profgen. This change try to fix it. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D113238	2021-11-05 18:38:40 -07:00
wlei	dc9f037955	[llvm-profgen] Refactor the code of getHashCode Refactor to generate hash code lazily. Tested on clang self build, no observable generating time regression. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D113059	2021-11-02 19:56:20 -07:00
wlei	138202a8c3	[llvm-profgen] Warn on invalid range and show warning summary Two things in this diff: 1) Warn on the invalid range, currently three types of checking, see the detailed message in the code. 2) In some situation, llvm-profgen gives lots of warnings on the truncated stacks which is noisy. This change provides a switch to `--show-detailed-warning` to skip the warnings. Alternatively, we use a summary for those warning and show the percentage of cases with those issues. Example of warning summary. ``` warning: 0.05%(1120/2428958) cases with issue: Profile context truncated due to missing probe for call instruction. warning: 0.00%(2/178637) cases with issue: Range does not belong to any functions, likely from external function. ``` Reviewed By: hoy Differential Revision: https://reviews.llvm.org/D111902	2021-11-02 19:55:55 -07:00
wlei	3f3103c6a9	[llvm-profgen] Fill zero count for all function ranges Allow filling zero count for all the function ranges even there is no samples hitting that function. Add a switch for this. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D112858	2021-11-01 09:57:05 -07:00
wlei	f5537643b8	[llvm-profgen] Update total samples by accumulating all its body samples Like probe-based profile, the total samples is the sum of all its body samples. This patch fix it by a post-processing update for the line-number based profile. Tested it on our internal services, results showed no performance change. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D112672	2021-10-29 10:36:57 -07:00
Kazu Hirata	3b285ff517	[llvm-profgen] Fix a set-but-unused warning This patch fixes: llvm/tools/llvm-profgen/ProfiledBinary.cpp:357:12: error: variable 'EndOffset' set but not used [-Werror,-Wunused-but-set-variable] The last use of the variable was removed on Oct 26 in commit `40ca411251`.	2021-10-29 10:19:44 -07:00
wlei	2f8196db92	[llvm-profgen] Fix bug of populating profile symbol list Previous implementation of populating profile symbol list is wrong, it only included the profiled symbols. Actually it should use all symbols, here this switches to use the symbols from debug info. Also turned the flag off by default. Reviewed By: wenlei, hoy Differential Revision: https://reviews.llvm.org/D111824	2021-10-29 09:59:12 -07:00
wlei	40ca411251	[llvm-profgen] Switch to DWARF-based symbol and ranges It happened a bug that some callsite name in the profile is not a real function, it turned out that there're some non-function symbol from the ELF text section, e.g. the global accessible branch label and also recalled that we can have one function being split into multiple ranges. We shouldn't count samples for those are not the entry of the real function. So this change tried to fix this issue by switching to use the name or ranges from DWARF-based debug info, the range of which assure it's the real function start. For the split functions, we assume that the real entry function's DWARF name should always match the symbol table name. The switching is also consistent with the body samples' symbol which is from DWARF. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D112282	2021-10-29 09:59:12 -07:00
Hongtao Yu	259e4c5658	[CSSPGO] Trim cold base profiles for the CS preinliner. Adding support to the CS preinliner to trim cold base profiles. This makes trimming consistent with the inline decision made by the preinliner. Also disable the existing profile merger when preinliner is on unless explicitly specified. Reviewed By: wenlei, wlei Differential Revision: https://reviews.llvm.org/D112489	2021-10-27 22:50:27 -07:00
wlei	a5f411b7f8	[llvm-profgen] Allow unsymbolized profile as perf input This change allows the unsymbolized profile as input. The unsymbolized profile is created by `llvm-profgen` with `--skip-symbolization` and it's after the sample aggregation but before symbolization , so it has much small file size. It can be used for sample merging and trimming, also is useful for debugging or adding test cases. A switch `--unsymbolized-profile=file-patch` is added for this. Format of unsymbolized profile: ``` [context stack1] # If it's a CS profile number of entries in RangeCounter from_1-to_1:count_1 from_2-to_2:count_2 ...... from_n-to_n:count_n number of entries in BranchCounter src_1->dst_1:count_1 src_2->dst_2:count_2 ...... src_n->dst_n:count_n [context stack2] ...... ``` Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D111750	2021-10-25 23:58:08 -07:00
Kazu Hirata	4e3eebc6bd	[tools, utils] Use StringRef::contains (NFC)	2021-10-22 17:22:13 -07:00
Wenlei He	e8c245dcd3	[llvm-profgen] Skip duplication factor outside of body sample computation We incorrectly use duplication factor for total samples even though we already accumulate samples instead of taking MAX. It causes profile to have bloated total samples for functions with loop unrolled or vectorized. The change fix the issue for total sample, head sample and call target samples. Differential Revision: https://reviews.llvm.org/D112042	2021-10-19 23:10:45 -07:00
Wenlei He	a316343e19	[llvm-profgen] Allow generating AutoFDO profile from CSSPGO binary Add `-use-dwarf-correlation` switch to allow llvm-profgen to generate AutoFDO profile for binaries built with CSSPGO (pseudo-probe). Differential Revision: https://reviews.llvm.org/D111776	2021-10-14 09:11:56 -07:00
wlei	30ca33eab0	[llvm-profgen] Ignore the whole trace with the leading external branch The first LBR entry can be an external branch, we should ignore the whole trace. ``` 7f7448e889e4 0x7f7448e889e4/0x7f7448e88826/P/-/-/1 0x7f7448e8899f/0x7f7448e889d8/P/-/-/4 ... ``` Reviewed By: wenlei, hoy Differential Revision: https://reviews.llvm.org/D111749	2021-10-13 16:52:29 -07:00
wlei	ab5d65e685	[llvm-profgen] Ignore stack samples before aggregation With `ignore-stack-samples`, We can ignore the call stack before the samples aggregation which could reduce some redundant computations. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D111577	2021-10-13 16:52:29 -07:00
Wenlei He	da4e5fc861	[llvm-profgen] Deduplicate PID when processing perf input When parsing mmap to retrieve PID, deduplicate them before passing PID list to perf script. Perf script would error out when there's duplicated PID in the input, however raw perf data may main duplicated PID for large binary where more than one mmap is needed to load executable segment. Differential Revision: https://reviews.llvm.org/D111384	2021-10-10 13:30:17 -07:00
Reid Kleckner	89b57061f7	Move TargetRegistry.(h\|cpp) from Support to MC This moves the registry higher in the LLVM library dependency stack. Every client of the target registry needs to link against MC anyway to actually use the target, so we might as well move this out of Support. This allows us to ensure that Support doesn't have includes from MC/*. Differential Revision: https://reviews.llvm.org/D111454	2021-10-08 14:51:48 -07:00
wlei	b1a45c62f0	[llvm-profgen] Ignore branch count against outline function For some transformations like hot-cold split or coro split, it can outline its part of function ranges. Since sample loader is the early stage of backend and no split happens at that time, compiler can't recognize those function, so in llvm-profgen we should attribute the sample to the original function. This is already done for the body range samples since we use the symbols from dwarf which is created before the split. But for branch samples, the call from master function to its outlined function is actually not a call to the original function, we shouldn't add head/callsie samples for it. So instead of dwarf symbol, we use the symbols from symbol table and ignore those functions with special suffixes(like `.cold` ,`.resume`) for accumulating the callsite/head samples. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D110864	2021-10-07 14:03:34 -07:00
wlei	16516f8925	[llvm-profgen] Support symbol list for accurate profile Differential Revision: https://reviews.llvm.org/D110859	2021-10-06 11:41:39 -07:00
wlei	31a5cb3292	[llvm-profgen] Filter out invalid debug line Differential Revision: https://reviews.llvm.org/D110081	2021-10-04 19:09:06 -07:00
wlei	46cf7d75d9	[llvm-profgen] Add duplication factor for line-number based profile This change adds duplication factor multiplier while accumulating body samples for line-number based profile. The body sample count will be `duplication-factor * count`. Base discriminator and duplication factor is decoded from the raw discriminator, this requires some refactor works. Differential Revision: https://reviews.llvm.org/D109934	2021-10-04 19:08:55 -07:00
wlei	fb29d812e4	[CSSPGO] Rename the field of SampleContextFrame Differential Revision: https://reviews.llvm.org/D110980	2021-10-04 19:06:59 -07:00
Wenlei He	47d66355ef	[llvm-profgen] Fix alignment in preferred based calculation We used the segment alignment in elf header to assume the loader alignment. However this is incorrect because loader alignment is always the same as page size. If segment needs to be aligned at load time, linker will set aligned address as virtual address in elf header. Differential Revision: https://reviews.llvm.org/D110795	2021-09-29 23:01:10 -07:00
Wenlei He	1f0bc617bd	[llvm-porfgen] Allow perf data as input This change enables llvm-profgen to take raw perf data as alternative input format. Sometimes we need to retrieve evenets for processes with matching binary. Using perf data as input allows us to retrieve process Ids from mmap events for matching binary, then filter by process id during perf script generation. Differential Revision: https://reviews.llvm.org/D110793	2021-09-29 22:57:35 -07:00
Wenlei He	941191aae4	[llvm-profgen] Refactor and better diagnostics This change contains diagnostics improvments, refactoring and preparation for consuming perf data directly. Diagnostics: - We now have more detailed diagnostics when no mmap is found. - We also print warning for abnormal transition to external code. Refactoring: - Simplify input perf trace processing to only allow a single input file. This is because 1) using multiple input perf trace (perf script) is error prone because we may miss key mmap events. 2) the functionality is not really being used anyways. - Make more functions private for Readers, move non-trivial definitions out of header. Cleanup some inconsistency. - Prepare for consuming perf data as input directly. Differential Revision: https://reviews.llvm.org/D110729	2021-09-29 22:55:50 -07:00
wlei	a03cf331e1	[llvm-profgen] Strip context to support non-CS profile generation for hybrid sample Differential Revision: https://reviews.llvm.org/D109769	2021-09-28 12:20:23 -07:00
wlei	ce40843a3f	[llvm-profgen][CSSPGO] On-demand function size computation for preinliner Similar to https://reviews.llvm.org/D110465, we can compute function size on-demand for the functions that's hit by samples. Here we leverage the raw range samples' address to compute a set of sample hit function. Then `BinarySizeContextTracker` just works on those function range for the size. Reviewed By: hoy Differential Revision: https://reviews.llvm.org/D110466	2021-09-28 09:09:38 -07:00
wlei	091c16f76b	[llvm-profgen] On-demand symbolization Previously we do symbolization for all the functions and actually we only need the symbols that's hit by the samples. This can significantly speed up the time for large size binary. Optimization for per-inliner will come along with next patch. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D110465	2021-09-28 09:09:25 -07:00
wlei	1422fa5fab	[llvm-profgen] Unify output format of different unsymbolized profiles Differential Revision: https://reviews.llvm.org/D110080	2021-09-24 14:18:00 -07:00
wlei	28277e9b48	[AutoFDO][llvm-profgen] Report zero count for unexecuted part of function code In order to be consistent with compiler that interprets zero count as unexecuted(cold), this change reports zero-value count for unexecuted part of function code. For the implementation, it leverages the range counter, initializes all the executed function range with the zero-value. After all ranges are merged and converted into disjoint ranges, the remaining zero count will indicates the unexecuted(cold) part of the function. This change also extends the current `findDisjointRanges` method which now can support adding zero-value range. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D109713	2021-09-24 14:15:05 -07:00
wlei	d5f2013004	[AutoFDO][llvm-profgen] Profile generation for LBR(non-CS) sample This patch introduces non-CS AutoFDO profile generation into LLVM. The profile is supposed to be well consumed by compiler using `-fprofile-sample-use=[profile]`. After range and branch counters are extracted from the LBR sample, here we go through each addresses for symbolization, create FunctionSamples and populate its sub fields like TotalSamples, BodySamples and HeadSamples etc. For inlined code, as we need to map back to original code, so we always add body samples to the leaf frame's function sample. Reviewed By: wenlei, hoy Differential Revision: https://reviews.llvm.org/D109551	2021-09-24 13:55:34 -07:00
wlei	a7cdcf25c1	[llvm-profgen] Ignore invalid perf line in LBR record Similar to https://reviews.llvm.org/D109637, there is a whole invalid line of message in perfscript. ``` warning: Invalid address in LBR record at line 14118674: Processed 14138923 events and lost 1 chunks! warning: Invalid address in LBR record at line 14118676: Check IO/CPU overload! ``` This only happened for LBR only perfscript, hybridperfscript have a check of " 0x" to make sure it's the LBR perf line. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D110424	2021-09-24 13:44:57 -07:00
wlei	1ed69bb86e	[llvm-profgen] Fix a dangling vector reference in CS line number based generator It seems we missed one spot to persist `SampleContextFrameVector` into the global table (CSProfileGenerator::populateFunctionBoundarySamples:340) which causes a crash. This change tried to fix it in a centralized way i. e. where we generate the `FunctionSamples`. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D110275	2021-09-22 18:33:28 -07:00
wlei	686cc00067	[llvm-profgen] Fix an out-of-range error during unwinding It happened that the LBR entry target can be the first address of text section which causes an out-of-range crash. So here add a boundary check. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D110271	2021-09-22 18:33:27 -07:00
wlei	c2be2d3284	[llvm-profgen] Fix a bug of assertion The assertion should work on the entire context. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D110268	2021-09-22 18:33:27 -07:00
Wenlei He	81c249784f	[llvm-profgen] Use hot threshold for context merging and trimming Without preinliner, we need to tune down the cold count cutoff to merge/trim more context to limit profile size for large components. However it doesn't make sense for cold threshold to be higher than hot threshold, so we now change to use hot threshold as merging/trimming cut off instead. Differential Revision: https://reviews.llvm.org/D110212	2021-09-22 15:01:51 -07:00
Hongtao Yu	734f4d832c	[llvm-profgen] An option to dump disasm of specified symbols For large app, dumping disasm of the whole program can be slow and result in gianant output. Adding a switch to dump specific symbols only. Reviewed By: wlei Differential Revision: https://reviews.llvm.org/D110079	2021-09-22 10:32:59 -07:00
Wenlei He	446e21623c	[llvm-profgen] Use context-sensitive byte size cost for preinliner decisions by default Turn on `use-context-cost-for-preinliner` to use context-sensitive byte size cost for preinliner decisions by default. This is a more accurate proxy of inline cost than profile size. We tested on our large workload that it delivers measureable CPU improvement. Differential Revision: https://reviews.llvm.org/D109893	2021-09-16 10:36:12 -07:00
Hongtao Yu	0057c7185d	[CSSPGO][llvm-profgen] Truncate stack samples with invalid return address. Invalid frame addresses exist in call stack samples due to bad unwinding. This could happen to frame-pointer-based unwinding and the callee functions that do not have the frame pointer chain set up. It isn't common when the program is built with the frame pointer omission disabled, but can still happen with third-party static libs built with frame pointer omitted. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D109638	2021-09-14 21:56:22 -07:00
Hongtao Yu	8cbbd7e0b2	[llvm-profgen] Ignore broken LBR samples Perf script can sometimes give disordered LBR samples like below. ``` b022500 32de0044 3386e1d1 7f118e05720c 7f118df2d81f 0x2a0b9622/0x2a0b9610/P/-/-/1 0x2a0b79ff/0x2a0b9618/P/-/-/2 0x2a0b7a4a/0x2a0b79e8/P/-/-/1 0x2a0b7a33/0x2a0b7a46/P/-/-/1 0x2a0b7a42/0x2a0b7a23/P/-/-/1 0x2a0b7a21/0x2a0b7a37/P/-/-/2 0x2a0b79e6/0x2a0b7a07/P/-/-/1 0x2a0b79d4/0x2a0b79dc/P/-/-/2 0x2a0b7a03/0x2a0b79aa/P/-/-/1 0x2a0b79a8/0x2a0b7a00/P/-/-/234 0x2a0b9613/0x2a0b7930/P/-/-/1 0x2a0b9622/0x2a0b9610/P/-/-/1 0x2a0b79ff/0x2a0b9618/P/-/-/2 0x2a0b7a4a/0x2aWarning: Processed 10263226 events and lost 1 chunks! ``` Note that the last LBR record `0x2a0b7a4a/0x2aWarning:` . Currently llvm-profgen does not detect that and as a result an uninitialized branch target value will be used. The uninitialized value can cause creepy instruction ranges created which which in turn will result in a completely wrong profile. An example is like ``` .... @ _ZN5folly13loadUnalignedIsEET_PKv]:18446744073709551615:18446744073709551615 1: 18446744073709551615 !CFGChecksum: 4294967295 !Attributes: 0 ``` Reviewed By: wenlei, wlei Differential Revision: https://reviews.llvm.org/D109637	2021-09-14 12:11:17 -07:00
Wenlei He	a5d3cac033	[llvm-profgen] Turn off cold context trimming by default We merge cold context by default to save profile size. However trimming cold context after merging doesn't save size much, so default to off to reflect how it's commonly used. Differential Revision: https://reviews.llvm.org/D109166	2021-09-02 12:29:06 -07:00
Wenlei He	6eca242e09	[llvm-profgen] Deduplicate and improve warning for truncated context This change improves the warning for truncated context by: 1) deduplicate them as one call without probe can appear in many different context leading to duplicated warnings , 2) rephrase the message to make it easier to understand. The term "untracked frame" can be confusing. Differential Revision: https://reviews.llvm.org/D109115	2021-09-02 09:15:38 -07:00

1 2 3

131 Commits