Commit Graph

158 Commits

Author SHA1 Message Date
weihe 540489de68 [llvm-profdata] Implement llvm-profdata overlap for sample profiles
Implemented the `llvm-profdata overlap` feature for sample profiles. It reports weighted //similarity// and unweighted //overlap// metrics at program and function level for two input profiles. Similarity metrics are symmetric with regards to the order of two input profiles. By default, the tool only reports program-level summary. Users can look into function-level details via additional options `--function`, `--similarity-cutoff`, and `--value-cutoff`.

The similarity metrics are designed as follows:
* Program-level summary
    * Whole program profile similarity is an aggregate over function-level similarity `FS`: `PS = sum(FS(A) * avg_weight(A))` for all function `A`.
    * Whole program sample overlap: `PSO = common_samples / total_samples`.
    * Function overlap: `FO = #common_function / #total_function`.
    * Hot-function overlap: `HFO = #common_hot_function / #total_hot_function`.
    * Hot-block overlap: `HBO = #common_hot_block / #total_hot_block`.
* Function-level details
    * Function-level similarity is an aggregate over line/block-level similarities `BS` of all sample lines/blocks in the function, weighted by the closeness of the function's weights in two profiles: `FS = sum(BS(i)) * (1 - weight_distance(A))`.
    * Function-level sample overlap: `FSO = common_samples / total_samples` for samples in the function.

Reviewed By: wenlei, hoyFB, wmi

Differential Revision: https://reviews.llvm.org/D83852
2020-08-08 17:49:48 -07:00
Wei Mi a23f62343c Supplement instr profile with sample profile.
PGO profile is usually more precise than sample profile. However, PGO profile
needs to be collected from loadtest and loadtest may not be representative
enough to the production workload. Sample profile collected from production
can be used as a supplement -- for functions cold in loadtest but warm/hot
in production, we can scale up the related function in PGO profile if the
function is warm or hot in sample profile.

The implementation contains changes in compiler side and llvm-profdata side.
Given an instr profile and a sample profile, for a function cold in PGO
profile but warm/hot in sample profile, llvm-profdata will either mark
all the counters in the profile to be -1 or scale up the max count in the
function to be above hot threshold, depending on the zero counter ratio in
the profile. The assumption is if there are too many counters being zero
in the function profile, the profile is more likely to cause harm than good,
then llvm-profdata will mark all the counters to be -1 indicating the
function is hot but the profile is unaccountable. In compiler side, if a
function profile with all -1 counters is seen, the function entry count will
be set to be above hot threshold but its internal profile will be dropped.

In the long run, it may be useful to let compiler support using PGO profile
and sample profile at the same time, but that requires more careful design
and more substantial changes to make two profiles work seamlessly. The patch
here serves as a simple intermediate solution.

Differential Revision: https://reviews.llvm.org/D81981
2020-07-27 20:17:40 -07:00
Rong Xu 50da55a585 [PGO] Supporting code for always instrumenting entry block
This patch includes the supporting code that enables always
instrumenting the function entry block by default.

This patch will NOT the default behavior.

It adds a variant bit in the profile version, adds new directives in
text profile format, and changes llvm-profdata tool accordingly.

This patch is a split of D83024 (https://reviews.llvm.org/D83024)
Many test changes from D83024 are also included.

Differential Revision: https://reviews.llvm.org/D84261
2020-07-22 15:01:53 -07:00
Wei Mi 78fe6a3ee2 [NFC] Extract the code to write instr profile into function writeInstrProfile
So that the function writeInstrProfile can be used in other places.

Differential Revision: https://reviews.llvm.org/D83521
2020-07-09 16:30:28 -07:00
Fangrui Song 546be08837 [llvm-profdata] --hot-func-list: fix some style issues in D81800
Reviewed By: wenlei, hoyFB

Differential Revision: https://reviews.llvm.org/D82500
2020-06-24 15:17:03 -07:00
weihe 53cf53023c Add --hot-func-list to llvm-profdata show for sample profiles
Summary:
Add the --hot-func-list feature to llvm-profdata show for sample profiles. This feature prints a list of hot functions whose max sample count are above the 99% threshold, with their numbers of total samples, total samples percentage, max samples, entry samples, and their function names.

Test Plan:

Reviewers: wenlei, hoyFB

Reviewed By: wenlei, hoyFB

Subscribers: hoyFB, wenlei, weihe, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82355
2020-06-24 12:49:46 -07:00
Bruno Ricci 5342dd6bf4
Revert "Add --hot-func-list to llvm-profdata show for sample profiles"
This reverts commit 7348b951fe.
It is causing Asan failures.
2020-06-21 14:33:08 +01:00
weihe 7348b951fe Add --hot-func-list to llvm-profdata show for sample profiles
Summary: Add the --hot-func-list feature to llvm-profdata show for sample profiles. This feature prints a list of hot functions whose max sample count are above the 99% threshold, with their numbers of total samples, total samples percentage, max samples, entry samples, and their function names.

Reviewers: wmi, hoyFB, wenlei

Reviewed By: wmi

Subscribers: hoyFB, wenlei, llvm-commits, weihe

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D81800
2020-06-20 10:13:36 -07:00
Wei Mi 56926ae0fa [SampleFDO] Rename llvm-profdata flag -partial-profile to -gen-partial-profile.
The internal flag -partial-profile in llvm conflicts with the flag with
the same name in llvm-profdata. The conflict happens in builds with
LLVM_LINK_LLVM_DYLIB enabled. In this case the tools are linked with libLLVM
and we end up with two definitions for the same cl::opt.

The patch renames llvm-profdata flag -partial-profile to -gen-partial-profile.
2020-05-12 15:06:03 -07:00
Wenlei He 17fc651860 [llvm-profdata] Support -detailed-summary for Sample Profile
Summary: Add -detailed-summary support for sample profile dump to match that of instrumentation profile.

Reviewers: wmi, davidxl, hoyFB

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79291
2020-05-05 18:28:22 -07:00
Wei Mi b49eac71ad Recommit [SampleFDO] Add flag for partial profile.
Fix the error of show-prof-info.test on some platforms without zlib.

The common profile usage is to collect profile from a target and then use the profile to guide the optimized build for the same target. There are some cases that no profile can be collected for a target. In those cases, although no full profile is available, it is possible to have some partial profile collected from other targets to optimize common libraries and utilities. A flag is needed to tell the partial profile from the full profile apart, so compiler can use different strategy for them.

Differential Revision: https://reviews.llvm.org/D77426
2020-04-07 14:28:25 -07:00
Wei Mi c5da949ae8 Revert "[SampleFDO] Add flag for partial profile." show-prof-info.test breaks on some platforms.
This reverts commit e3ba652a14.
2020-04-07 12:54:51 -07:00
Wei Mi e3ba652a14 [SampleFDO] Add flag for partial profile.
The common profile usage is to collect profile from a target and then use the profile to guide the optimized build for the same target. There are some cases that no profile can be collected for a target. In those cases, although no full profile is available, it is possible to have some partial profile collected from other targets to optimize common libraries and utilities. A flag is needed to tell the partial profile from the full profile apart, so compiler can use different strategy for them.

Differential Revision: https://reviews.llvm.org/D77426
2020-04-07 12:17:56 -07:00
Wei Mi ebad678857 [SampleFDO] Port MD5 name table support to extbinary format.
Compbinary format uses MD5 to represent strings in name table. That gives smaller profile without the need of compression/decompression when writing/reading the profile. The patch adds the support in extbinary format. It is off by default but user can choose to enable it.

Note the feature of using MD5 in name table can bring very small chance of name conflict leading to profile mismatch. Besides, profile using the feature won't have the profile remapping support.

Differential Revision: https://reviews.llvm.org/D76255
2020-03-30 22:07:08 -07:00
Alexandre Ganea 8404aeb56a [Support] On Windows, ensure hardware_concurrency() extends to all CPU sockets and all NUMA groups
The goal of this patch is to maximize CPU utilization on multi-socket or high core count systems, so that parallel computations such as LLD/ThinLTO can use all hardware threads in the system. Before this patch, on Windows, a maximum of 64 hardware threads could be used at most, in some cases dispatched only on one CPU socket.

== Background ==
Windows doesn't have a flat cpu_set_t like Linux. Instead, it projects hardware CPUs (or NUMA nodes) to applications through a concept of "processor groups". A "processor" is the smallest unit of execution on a CPU, that is, an hyper-thread if SMT is active; a core otherwise. There's a limit of 32-bit processors on older 32-bit versions of Windows, which later was raised to 64-processors with 64-bit versions of Windows. This limit comes from the affinity mask, which historically is represented by the sizeof(void*). Consequently, the concept of "processor groups" was introduced for dealing with systems with more than 64 hyper-threads.

By default, the Windows OS assigns only one "processor group" to each starting application, in a round-robin manner. If the application wants to use more processors, it needs to programmatically enable it, by assigning threads to other "processor groups". This also means that affinity cannot cross "processor group" boundaries; one can only specify a "preferred" group on start-up, but the application is free to allocate more groups if it wants to.

This creates a peculiar situation, where newer CPUs like the AMD EPYC 7702P (64-cores, 128-hyperthreads) are projected by the OS as two (2) "processor groups". This means that by default, an application can only use half of the cores. This situation could only get worse in the years to come, as dies with more cores will appear on the market.

== The problem ==
The heavyweight_hardware_concurrency() API was introduced so that only *one hardware thread per core* was used. Once that API returns, that original intention is lost, only the number of threads is retained. Consider a situation, on Windows, where the system has 2 CPU sockets, 18 cores each, each core having 2 hyper-threads, for a total of 72 hyper-threads. Both heavyweight_hardware_concurrency() and hardware_concurrency() currently return 36, because on Windows they are simply wrappers over std:🧵:hardware_concurrency() -- which can only return processors from the current "processor group".

== The changes in this patch ==
To solve this situation, we capture (and retain) the initial intention until the point of usage, through a new ThreadPoolStrategy class. The number of threads to use is deferred as late as possible, until the moment where the std::threads are created (ThreadPool in the case of ThinLTO).

When using hardware_concurrency(), setting ThreadCount to 0 now means to use all the possible hardware CPU (SMT) threads. Providing a ThreadCount above to the maximum number of threads will have no effect, the maximum will be used instead.
The heavyweight_hardware_concurrency() is similar to hardware_concurrency(), except that only one thread per hardware *core* will be used.

When LLVM_ENABLE_THREADS is OFF, the threading APIs will always return 1, to ensure any caller loops will be exercised at least once.

Differential Revision: https://reviews.llvm.org/D71775
2020-02-14 10:24:22 -05:00
Benjamin Kramer adcd026838 Make llvm::StringRef to std::string conversions explicit.
This is how it should've been and brings it more in line with
std::string_view. There should be no functional change here.

This is mostly mechanical from a custom clang-tidy check, with a lot of
manual fixups. It uncovers a lot of minor inefficiencies.

This doesn't actually modify StringRef yet, I'll do that in a follow-up.
2020-01-28 23:25:25 +01:00
Yi Kong 01bfb366ac [llvm-profdata] Fix hint message since argument format has changed
"-sample" option is now changed to "--sample".
2020-01-20 20:57:03 +08:00
Bjorn Pettersson 1f43ea41c3 Prune Pass.h include from DataLayout.h. NFCI
Summary:
Reduce include dependencies by no longer including Pass.h from
DataLayout.h. That include seemed irrelevant to DataLayout, as
well as being irrelevant to several users of DataLayout.

Reviewers: rnk

Reviewed By: rnk

Subscribers: mehdi_amini, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D69261

llvm-svn: 375436
2019-10-21 17:51:54 +00:00
Wei Mi b523790ae1 [SampleFDO] Add compression support for any section in ExtBinary profile format
Previously ExtBinary profile format only supports compression using zlib for
profile symbol list. In this patch, we extend the compression support to any
section. User can select some or all of the sections to compress. In an
experiment, for a 45M profile in ExtBinary format, compressing name table
reduced its size to 24M, and compressing all the sections reduced its size
to 11M.

Differential Revision: https://reviews.llvm.org/D68253

llvm-svn: 373914
2019-10-07 16:12:37 +00:00
Rong Xu e0fa2689de [PGO] Fix typos from r359612. NFC.
llvm-svn: 373369
2019-10-01 18:06:50 +00:00
Wei Mi eee532cd5f Recommit [SampleFDO] Expose an interface to return the size of a section
or the size of the profile for profile in ExtBinary format.

Fix a test failure on Mac.

[SampleFDO] Expose an interface to return the size of a section or the
size of the profile for profile in ExtBinary format.

Sometimes we want to limit the size of the profile by stripping some functions
with low sample count or by stripping some function names with small text size
from profile symbol list. That requires the profile reader to have the
interfaces returning the size of a section or the size of total profile. The
patch add those interfaces.

At the same time, add some dump facility to show the size of each section.

Differential revision: https://reviews.llvm.org/D67726

llvm-svn: 372478
2019-09-21 17:23:55 +00:00
Amara Emerson 3bb56fa478 Revert "[SampleFDO] Expose an interface to return the size of a section or the size"
This reverts commit f118852046.

Broke the macOS build/greendragon bots.

llvm-svn: 372464
2019-09-21 09:11:51 +00:00
Wei Mi f118852046 [SampleFDO] Expose an interface to return the size of a section or the size
of the profile for profile in ExtBinary format.

Sometimes we want to limit the size of the profile by stripping some functions
with low sample count or by stripping some function names with small text size
from profile symbol list. That requires the profile reader to have the
interfaces returning the size of a section or the size of total profile. The
patch add those interfaces.

At the same time, add some dump facility to show the size of each section.

llvm-svn: 372439
2019-09-20 23:24:50 +00:00
Vedant Kumar 0fcfe89717 [llvm-profdata] Add mode to recover from profile read failures
Add a mode in which profile read errors are not immediately treated as
fatal. In this mode, merging makes forward progress and reports failure
only if no inputs can be read.

Differential Revision: https://reviews.llvm.org/D66985

llvm-svn: 370827
2019-09-03 22:23:16 +00:00
Wei Mi 198009ae8d Fix some errors introduced by rL370563 which were not exposed on my local machine.
1. zlib::compress accept &size_t but the param is an uint64_t.
2. Some systems don't have zlib installed. Don't use compression by default.

llvm-svn: 370564
2019-08-31 03:17:49 +00:00
Wei Mi 798e59b81f [SampleFDO] Add profile symbol list section to discriminate function being
cold versus function being newly added.

This is the second half of https://reviews.llvm.org/D66374.

Profile symbol list is the collection of function symbols showing up in
the binary which generates the current profile. It is used to discriminate
function being cold versus function being newly added. Profile symbol list
is only added for profile with ExtBinary format.

During profile use compilation, when profile-sample-accurate is enabled,
a function without profile will be regarded as cold only when it is
contained in that list.

Differential Revision: https://reviews.llvm.org/D66766

llvm-svn: 370563
2019-08-31 02:27:26 +00:00
Wei Mi be9073249e [SampleFDO] Add ExtBinary format to support extension of binary profile.
This is a patch split from https://reviews.llvm.org/D66374. It tries to add
a new format of profile called ExtBinary. The format adds a section header
table to the profile and organize the profile in sections, so the future
extension like adding a new section or extending an existing section will be
easier while keeping backward compatiblity feasible.

Differential Revision: https://reviews.llvm.org/D66513

llvm-svn: 369798
2019-08-23 19:05:30 +00:00
Jonas Devlieghere 0eaee545ee [llvm] Migrate llvm::make_unique to std::make_unique
Now that we've moved to C++14, we no longer need the llvm::make_unique
implementation from STLExtras.h. This patch is a mechanical replacement
of (hopefully) all the llvm::make_unique instances across the monorepo.

llvm-svn: 369013
2019-08-15 15:54:37 +00:00
Fangrui Song d9b948b6eb Rename F_{None,Text,Append} to OF_{None,Text,Append}. NFC
F_{None,Text,Append} are kept for compatibility since r334221.

llvm-svn: 367800
2019-08-05 05:43:48 +00:00
Rong Xu f0d3dcec97 llvm-profdata] Handle the cases of overlapping input file and output file
Currently llvm-profdata does not expect the same file name for the input profile
and the output profile.
>llvm-profdata merge A.profraw B.profraw -o B.profraw
The above command runs successfully but the resulted B.profraw is not correct.
This patch fixes the issue by moving the initialization of writer after loading
the profile.

For the show command, the following will report a confusing error of
"Empty raw profile file":
>llvm-profdata show B.profraw -o B.profraw
It's harder to fix as we need to output something before loading the input profile.
I don't think that a fix for this is worth the effort. I just make the error explicit for
the show command.

Differential Revision: https://reviews.llvm.org/D64360

llvm-svn: 365386
2019-07-08 21:03:12 +00:00
Rong Xu 998b97f6f1 [llvm-profdata] Add overlap command to compute similarity b/w two profile files
Add overlap functionality to llvm-profdata tool to compute the similarity
between two profile files.

Differential Revision: https://reviews.llvm.org/D60977

llvm-svn: 359612
2019-04-30 21:19:12 +00:00
Fangrui Song 8b0a15b0ef [llvm-profdata] Deleted unused Cutoffs added by D16005
llvm-svn: 356248
2019-03-15 10:43:51 +00:00
Rong Xu a6ff69f6dd [PGO] Context sensitive PGO (part 2)
Part 2 of CSPGO changes (mostly related to ProfileSummary).
Note that I use a default parameter in setProfileSummary() and getSummary().
This is to break the dependency in clang. I will make the parameter explicit
after changing clang in a separated patch.

Differential Revision: https://reviews.llvm.org/D54175

llvm-svn: 355131
2019-02-28 19:55:07 +00:00
Fangrui Song ef5987592e Fix some include order and file headers issues. NFC
llvm-svn: 354550
2019-02-21 07:42:31 +00:00
Petar Jovanovic 40a7f63c37 [PGO] Fix the type of the formated variable
Change the format type of Value to PRIu64 since it is a uint64_t.
The problem was detected on mips boards building 32-bit binaries,
where it was printing junk values and causing test failure.

Patch by Milos Stojanovic.

Differential Revision: https://reviews.llvm.org/D57583

llvm-svn: 353194
2019-02-05 18:09:28 +00:00
Chandler Carruth 2946cd7010 Update the file headers across all of the LLVM projects in the monorepo
to reflect the new license.

We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.

Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.

llvm-svn: 351636
2019-01-19 08:50:56 +00:00
Rong Xu 52aa224aff [llvm-profdata] add value-cutoff functionality in show command
This patch improves llvm-profdata show command:
(1) add -value-cutoff=<N> option: Show only those functions whose max count
    values are greater or equal to N.
(2) add -list-below-cutoff option: Only output names of functions whose max
    count value are below the cutoff.
(3) formats value-profile counts and prints out percentage.

Differential Revision: https://reviews.llvm.org/D56342

llvm-svn: 350673
2019-01-08 22:41:48 +00:00
Rong Xu 7162e16e6b [PGO] Revert r350579 to fix commit message.
Will re-commit it using the correct commit message.

llvm-svn: 350670
2019-01-08 22:37:12 +00:00
Rong Xu 6f366c3a04 [PGO] Use SourceFileName rather module name in PGOFuncName
In LTO or Thin-lto mode (though linker plugin), the module
names are of temp file names which are different for
different compilations. Using SourceFileName avoids the issue.
This should not change any functionality for current PGO as
all the current callers of getPGOFuncName() is before LTO.

llvm-svn: 350579
2019-01-07 23:25:56 +00:00
Richard Smith 3164fcfd27 Add flag to llvm-profdata to allow symbols in profile data to be remapped, and
add a tool to generate symbol remapping files.

Summary:
The new tool llvm-cxxmap builds a symbol mapping table from a file containing
a description of partial equivalences to apply to mangled names and files
containing old and new symbol tables.

Reviewers: davidxl

Subscribers: mgorny, llvm-commits

Differential Revision: https://reviews.llvm.org/D51470

llvm-svn: 342168
2018-09-13 20:22:02 +00:00
Richard Smith c6ba9ca169 Make llvm-profdata show -text work as advertised in the documentation.
Per LLVM's CommandGuide, llvm-profdata show -text is supposed to produce
textual output that can be passed as input to further llvm-profdata
invocations. This previously didn't work for two reasons:

1) -text was not sufficient to enable the machine-readable text format output;
instead, -text was effectively ignored if -counts was not also specified. (With
this patch, -counts is instead ignored if -text is specified, because the
machine-readable text format always includes counts.)

2) When the input data was an IR-level profile, the :ir marker was missing from
the output, resulting in a text format output that would not be usable as
profiling data due to function hash mismatches.

Differential Revision: https://reviews.llvm.org/D51188

llvm-svn: 340592
2018-08-24 01:34:45 +00:00
Wei Mi d9be2c7e64 [NFC] Change sample profile format enum name SPF_Raw_Binary to SPF_Binary.
Some out-of-tree targets depend on the enum name SPF_Binary. Keep the name
can avoid unnecessary churn to those targets.

llvm-svn: 334476
2018-06-12 05:53:49 +00:00
Wei Mi 432db3b43b Fix a typo in rL334447.
llvm-svn: 334475
2018-06-12 04:43:09 +00:00
Wei Mi a0c0857e7a [SampleFDO] Add a new compact binary format for sample profile.
Name table occupies a big chunk of size in current binary format sample profile.
In order to reduce its size, the patch changes the sample writer/reader to
save/restore MD5Hash of names in the name table. Sample annotation phase will
also use MD5Hash of name to query samples accordingly.

Experiment shows compact binary format can reduce the size of sample profile by
2/3 compared with binary format generally.

Differential Revision: https://reviews.llvm.org/D47955

llvm-svn: 334447
2018-06-11 22:40:43 +00:00
Jonas Devlieghere e46b7565bb [llvm-profdata] Use WithColor for printing errors
Use convenience helpers in WithColor to print errors and warnings.

Differential revision: https://reviews.llvm.org/D45658

llvm-svn: 330262
2018-04-18 14:42:33 +00:00
Rui Ueyama 197194b6c9 Define InitLLVM to do common initialization all at once.
We have a few functions that virtually all command wants to run on
process startup/shutdown. This patch adds InitLLVM class to do that
all at once, so that we don't need to copy-n-paste boilerplate code
to each llvm command's main() function.

Differential Revision: https://reviews.llvm.org/D45602

llvm-svn: 330046
2018-04-13 18:26:06 +00:00
Vedant Kumar 188efda585 [llvm-profdata] Don't treat non-fatal merge errors as fatal
This fixes an issue seen on the coverage bot:
http://lab.llvm.org:8080/green/view/Experimental/job/clang-stage2-coverage-R/1930

Profile merging shouldn't fail if a single counter mismatch is detected.

llvm-svn: 318555
2017-11-17 21:18:32 +00:00
Vedant Kumar faaa42ad0a [llvm-profdata] Fix a dangling reference to an error string
llvm-svn: 318502
2017-11-17 02:58:23 +00:00
Adam Nemet 1142b2d7b7 [llvm-profdata] Report if profile data file is IR- or FE-level
Differential Revision: https://reviews.llvm.org/D39997

llvm-svn: 318159
2017-11-14 16:59:18 +00:00
Rafael Espindola 8c0ff9508d Bring r314809 back.
But now include a check for CPU_COUNT so we still build on 10 year old
versions of glibc.

Original message:

Use sched_getaffinity instead of std:🧵:hardware_concurrency.

The issue with std:🧵:hardware_concurrency is that it forwards
to libc and some implementations (like glibc) don't take thread
affinity into consideration.

With this change a llvm program that can execute in only 2 cores will
use 2 threads, even if the machine has 32 cores.

This makes benchmarking a lot easier, but should also help if someone
doesn't want to use all cores for compilation for example.

llvm-svn: 314931
2017-10-04 20:27:01 +00:00