llvm-project

Commit Graph

Author	SHA1	Message	Date
Wenlei He	577e58bcc7	[InlineAdvisor] New inliner advisor to replay inlining from optimization remarks This change added a new inline advisor that takes optimization remarks from previous inlining as input, and provides the decision as advice so current inlining can replay inline decisions of a different compilation. Dwarf inline stack with line and discriminator is used as anchor for call sites including call context. The change can be useful for Inliner tuning as it provides a channel to allow external input for tweaking inline decisions. Existing alternatives like alwaysinline attribute is per-function, not per-callsite. Per-callsite inline intrinsic can be another solution (not yet existing), but it's intrusive to implement and also does not differentiate call context. A switch -sample-profile-inline-replay=<inline_remarks_file> is added to hook up the new inline advisor with SampleProfileLoader's inline decision for replay. Since SampleProfileLoader does top-down inlining, inline decision can be specialized for each call context, hence we should be able to replay inlining accurately. However with a bottom-up inliner like CGSCC inlining, the replay can be limited due to lack of specialization for different call context. Apart from that limitation, the new inline advisor can still be used by regular CGSCC inliner later if needed for tuning purpose. This is a resubmit of https://reviews.llvm.org/D83743	2020-08-15 20:17:21 -07:00
Wei Mi	836991d367	Fix a crash when the sample profile uses md5 and -sample-profile-merge-inlinee is enabled. When -sample-profile-merge-inlinee is enabled, new FunctionSamples may be created during profile merge without GUIDToFuncNameMap being initialized. That will occasionally cause compiler crash. The patch fixes it. Differential Revision: https://reviews.llvm.org/D84994	2020-07-30 21:21:06 -07:00
Wenlei He	d41d952be9	Revert "[InlineAdvisor] New inliner advisor to replay inlining from optimization remarks" This reverts commit `2d6ecfa168`.	2020-07-19 08:49:04 -07:00
Wenlei He	2d6ecfa168	[InlineAdvisor] New inliner advisor to replay inlining from optimization remarks Summary: This change added a new inline advisor that takes optimization remarks from previous inlining as input, and provides the decision as advice so current inlining can replay inline decisions of a different compilation. Dwarf inline stack with line and discriminator is used as anchor for call sites including call context. The change can be useful for Inliner tuning as it provides a channel to allow external input for tweaking inline decisions. Existing alternatives like alwaysinline attribute is per-function, not per-callsite. Per-callsite inline intrinsic can be another solution (not yet existing), but it's intrusive to implement and also does not differentiate call context. A switch -sample-profile-inline-replay=<inline_remarks_file> is added to hook up the new inline advisor with SampleProfileLoader's inline decision for replay. Since SampleProfileLoader does top-down inlining, inline decision can be specialized for each call context, hence we should be able to replay inlining accurately. However with a bottom-up inliner like CGSCC inlining, the replay can be limited due to lack of specialization for different call context. Apart from that limitation, the new inline advisor can still be used by regular CGSCC inliner later if needed for tuning purpose. Subscribers: mgorny, aprantl, hiraditya, llvm-commits Tags: #llvm Resubmit for https://reviews.llvm.org/D84086	2020-07-19 08:21:05 -07:00
Eric Christopher	ae08dbc673	Temporarily Revert "[InlineAdvisor] New inliner advisor to replay inlining from optimization remarks" as it is failing the inline-replay.ll test as well as sanitizers/Werror from returning a stack local variable. This reverts commit `029946b112`.	2020-07-17 14:58:01 -07:00
Wenlei He	029946b112	[InlineAdvisor] New inliner advisor to replay inlining from optimization remarks Summary: This change added a new inline advisor that takes optimization remarks for previous inlining as input, and provide the decision as advice so current inlining can replay inline decision of a different compilation. Dwarf inline stack with line and discriminator is used as anchor for call sites. The change can be useful for Inliner tuning. A switch -sample-profile-inline-replay=<inline_remarks_file> is added to hook up the new inliner advisor with SampleProfileLoader's inline decision for replay. The new inline advisor can also be used by regular CGSCC inliner later if needed. Reviewers: davidxl, mtrofin, wmi, hoy Subscribers: aprantl, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83743	2020-07-17 13:30:47 -07:00
Wenlei He	7c8a6936bf	[Remarks] Add callsite locations to inline remarks Summary: Add call site location info into inline remarks so we can differentiate inline sites. This can be useful for inliner tuning. We can also reconstruct full hierarchical inline tree from parsing such remarks. The messege of inline remark is also tweaked so we can differentiate SampleProfileLoader inline from CGSCC inline. Reviewers: wmi, davidxl, hoy Subscribers: hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D82213	2020-06-20 23:32:10 -07:00
Wei Mi	7a6c89427c	[SampleFDO] Add use-sample-profile function attribute. When sampleFDO is enabled, people may expect they can use -fno-profile-sample-use to opt-out using sample profile for a certain file. That could be either for debugging purpose or for performance tuning purpose. However, when thinlto is enabled, if a function in file A compiled with -fno-profile-sample-use is imported to another file B compiled with -fprofile-sample-use, the inlined copy of the function in file B may still get its profile annotated. The inconsistency may even introduce profile unused warning because if the target is not compiled with explicit debug information flag, the function in file A won't have its debug information enabled (debug information will be enabled implicitly only when -fprofile-sample-use is used). After it is imported into file B which is compiled with -fprofile-sample-use, profile annotation for the outline copy of the function will fail because the function has no debug information, and that will trigger profile unused warning. We add a new attribute use-sample-profile to control whether a function will use its sample profile no matter for its outline or inline copies. That will make the behavior of -fno-profile-sample-use consistent. Differential Revision: https://reviews.llvm.org/D79959	2020-06-02 17:23:17 -07:00
Hongtao Yu	47c1f2741f	Properly add out-of-module functions to the import list This patch addresses two issues related to adding inline functions to the import list while recursively going through the profiling data. 1. For callsite samples, only add an inlined function to the import list if it's from outside of the module (i.e. only has a declaration inside the module). 2. For body samples, add each target function to the import list if it's from outside of the module (i.e. only has a declaration inside the module). Previously we were using getSubProgram() to check whether it has dbg info, which is inaccurate. This fix properly add imports and could improve the quality of the pass. Added a few changes to the test to catch these cases. Differential Revision: https://reviews.llvm.org/D79379	2020-05-11 10:00:14 -07:00
Wei Mi	ebad678857	[SampleFDO] Port MD5 name table support to extbinary format. Compbinary format uses MD5 to represent strings in name table. That gives smaller profile without the need of compression/decompression when writing/reading the profile. The patch adds the support in extbinary format. It is off by default but user can choose to enable it. Note the feature of using MD5 in name table can bring very small chance of name conflict leading to profile mismatch. Besides, profile using the feature won't have the profile remapping support. Differential Revision: https://reviews.llvm.org/D76255	2020-03-30 22:07:08 -07:00
Wei Mi	154cd6de51	[SampleFDO] Fix invalid branch profile generated by indirect call promotion. Suppose an inline instance has hot total sample count but 0 entry count, and it is an indirect call target. If the indirect call has no other call target and inline instance associated with it and it is promoted, currently the conditional branch generated by indirect call promotion will have invalid branch profile which is !{!"branch_weights", i32 0, i32 0} -- because the entry count of the promoted target is 0 and the total entry count of all targets is also 0. This caused a SEGV in Control Height Reduction and may cause problem in other passes. Function entry count of an inline instance is computed by a heuristic -- using either the sample of the starting line or starting inner inline instance. The patch changes the heuristic a little bit so that when total sample count is larger than 0, the computed entry count will be at least 1. Then the new branch profile will be !{!"branch_weights", i32 1, i32 0}. Differential Revision: https://reviews.llvm.org/D72790	2020-01-15 18:36:06 -08:00
Wenlei He	7b61ae68ec	[AutoFDO] Inline replay for cold/small callees from sample profile loader Summary: Sample profile loader of AutoFDO tries to replay previous inlining using context sensitive profile. The replay only repeats inlining if the call site block is hot. As a result it punts inlining of small functions, some of which can be beneficial for size, and will still be inlined by CSGCC inliner later. The oscillation between sample profile loader's inlining and regular CGSSC inlining cause unnecessary loss of context-sensitive profile. It doesn't have much impact for inline decision itself, but it negatively affects post-inline profile quality as CGSCC inliner have to scale counts which is not as accurate as the original context sensitive profile, and bad post-inline profile can misguide code layout. This change added regular Inline Cost calculation for sample profile loader, so we can inline small functions upfront under switch -sample-profile-inline-size. In addition -sample-profile-cold-inline-threshold is added so we can tune the separate size threshold - currently the default is chosen to be the same as regular inliner's cold call-site threshold. Reviewers: wmi, davidxl Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70750	2019-12-06 11:44:45 -08:00
Wenlei He	532196d811	[AutoFDO] Top-down Inlining for specialization with context-sensitive profile Summary: AutoFDO's sample profile loader processes function in arbitrary source code order, so if I change the order of two functions in source code, the inline decision can change. This also prevented the use of context-sensitive profile to do specialization while inlining. This commit enforces SCC top-down order for sample profile loader. With this change, we can now do specialization, as illustrated by the added test case: Say if we have A->B->C and D->B->C call path, we want to inline C into B when root inliner is B, but not when root inliner is A or D, this is not possible without enforcing top-down order. E.g. Once C is inlined into B, A and D can only choose to inline (B->C) as a whole or nothing, but what we want is only inline B into A and D, not its recursive callee C. If we process functions in top-down order, this is no longer a problem, which is what this commit is doing. This change is guarded with a new switch "-sample-profile-top-down-load" for tuning, and it depends on D70653. Eventually, top-down can be the default order for sample profile loader. Reviewers: wmi, davidxl Subscribers: hiraditya, llvm-commits, tejohnson Tags: #llvm Differential Revision: https://reviews.llvm.org/D70655	2019-12-05 16:07:01 -08:00
Wenlei He	e503fd85d3	[AutoFDO] Properly merge context-sensitive profile of inlinee back to outlined function Summary: When sample profile loader decides not to inline a previously inlined call-site, we adjust the profile of outlined function simply by scaling up its profile counts by call-site count. This means the context-sensitive profile of that inlined instance will be thrown away. This commit try to keep context-sensitive profile for such cases: - Instead of scaling outlined function's profile, we now properly merge the FunctionSamples of inlined instance into outlined function, including all recursively inlined profile. - Instead of adjusting the profile for negative inline decision at the end of the sample profile loader pass, we do the profile merge right after processing each function. This change paired with top-down ordering of annotation/inline-replay (a separate diff) will make sure we recursively merge profile back before the profile is used for annotation and inline replay. A new switch -sample-profile-merge-inlinee is added to enable the new profile merge for tuning. It should be the default behavior eventually. Reviewers: wmi, davidxl Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70653	2019-12-05 15:57:55 -08:00
Wenlei He	ba1dfae054	Keep import function list for inlinee profile update Summary: When adjusting function entry counts after inlining, Funciton::setEntryCount is called without providing an import function list. The side effect of that is the previously set import function list will be dropped. The import function list is used by ThinLTO to help import hot cross module callee for LTO inlining, so dropping that during ThinLTO pre-link may adversely affect LTO inlining. The fix is to keep the list while updating entry counts for inlining. Reviewers: wmi, davidxl, tejohnson Subscribers: mehdi_amini, hiraditya, dexonsmith, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69736	2019-11-06 18:36:00 -08:00
Wei Mi	09dcfe6805	[SampleFDO] Add indexing for function profiles so they can be loaded on demand in ExtBinary format Currently for Text, Binary and ExtBinary format profiles, when we compile a module with samplefdo, even if there is no function showing up in the profile, we have to load all the function profiles from the profile input. That is a waste of compile time. CompactBinary format profile has already had the support of loading function profiles on demand. In this patch, we add the support to load profile on demand for ExtBinary format. It will work no matter the sections in ExtBinary format profile are compressed or not. Experiment shows it reduces the time to compile a server benchmark by 30%. When profile remapping and loading function profiles on demand are both used, extra work needs to be done so that the loading on demand process will take the name remapping into consideration. It will be addressed in a follow-up patch. Differential Revision: https://reviews.llvm.org/D68601 llvm-svn: 374233	2019-10-09 21:36:03 +00:00
Wei Mi	5f7e822dc7	[SampleFDO] Minimize performance impact when profile-sample-accurate is enabled. We can save memory and reduce binary size significantly by enabling ProfileSampleAccurate. However when ProfileSampleAccurate is true, function without sample will be regarded as cold and this could potentially cause performance regression. To minimize the potential negative performance impact, we want to be a little conservative here saying if a function shows up in the profile, no matter as outline instance, inline instance or call targets, treat the function as not being cold. This will handle the cases such as most callsites of a function are inlined in sampled binary (thus outline copy don't get any sample) but not inlined in current build (because of source code drift, imprecise debug information, or the callsites are all cold individually but not cold accumulatively...), so that the outline function showing up as cold in sampled binary will actually not be cold after current build. After the change, such function will be treated as not cold even profile-sample-accurate is enabled. At the same time we lower the hot criteria of callsiteIsHot check when profile-sample-accurate is enabled. callsiteIsHot is used to determined whether a callsite is hot and qualified for early inlining. When profile-sample-accurate is enabled, functions without profile will be regarded as cold and much less inlining will happen in CGSCC inlining pass, so we can worry less about size increase and be aggressive to allow more early inlining to happen for warm callsites and it is helpful for performance overall. Differential Revision: https://reviews.llvm.org/D67561 llvm-svn: 372232	2019-09-18 16:06:28 +00:00
Wei Mi	798e59b81f	[SampleFDO] Add profile symbol list section to discriminate function being cold versus function being newly added. This is the second half of https://reviews.llvm.org/D66374. Profile symbol list is the collection of function symbols showing up in the binary which generates the current profile. It is used to discriminate function being cold versus function being newly added. Profile symbol list is only added for profile with ExtBinary format. During profile use compilation, when profile-sample-accurate is enabled, a function without profile will be regarded as cold only when it is contained in that list. Differential Revision: https://reviews.llvm.org/D66766 llvm-svn: 370563	2019-08-31 02:27:26 +00:00
Wei Mi	be9073249e	[SampleFDO] Add ExtBinary format to support extension of binary profile. This is a patch split from https://reviews.llvm.org/D66374. It tries to add a new format of profile called ExtBinary. The format adds a section header table to the profile and organize the profile in sections, so the future extension like adding a new section or extending an existing section will be easier while keeping backward compatiblity feasible. Differential Revision: https://reviews.llvm.org/D66513 llvm-svn: 369798	2019-08-23 19:05:30 +00:00
Eric Christopher	cee313d288	Revert "Temporarily Revert "Add basic loop fusion pass."" The reversion apparently deleted the test/Transforms directory. Will be re-reverting again. llvm-svn: 358552	2019-04-17 04:52:47 +00:00
Eric Christopher	a863435128	Temporarily Revert "Add basic loop fusion pass." As it's causing some bot failures (and per request from kbarton). This reverts commit r358543/ab70da07286e618016e78247e4a24fcb84077fda. llvm-svn: 358546	2019-04-17 02:12:23 +00:00
Taewook Oh	a960f89962	[ProfileSummary] Count callsite samples when computing total samples. Summary: Currently ProfileSummaryBuilder doesn't count into callsite samples when computing total samples. Considering that ProfileSummaryInfo is used to checked the hotness of not only body samples but also callsite samples (from SampleProfileLoader), I think the callsite sample counts should be considered when computing total samples. Reviewers: eraman, danielcdh, wmi Subscribers: hiraditya, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59835 llvm-svn: 357627	2019-04-03 19:54:43 +00:00
Taewook Oh	6a27c48be2	[SampleProfile] Repeat indirect call promotion only when the target is actually hot. Summary: It is possible that multiple indirect call targets have been promoted for a single callsite from the profiled binary. Current implementation repeats promotion for all these targets as far as the callsite itself is hot (the callsite is assumed to be hot if any one of these targets was "hot" during the profiling). However, even when one of the ICPed target is hot other targets may not, and we should not repeat promotion for "cold" targets. Reviewers: danielcdh, wmi Subscribers: hiraditya, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59940 llvm-svn: 357484	2019-04-02 15:48:21 +00:00
David Callahan	d2eeb2516d	Update entry count for cold calls Summary: Profile sample files include the number of times each entry or inlined call site is sampled. This is translated into the entry count metadta on functions. When sample data is being read, if a call site that was inlined in the sample program is considered cold and not inlined, then the entry count of the out-of-line functions does not reflect the current compilation. In this patch, we note call sites where the function was not inlined and as a last action of the sample profile loading, we update the called function's entry count to reflect the calls from these call sites which are not included in the profile file. Reviewers: danielcdh, wmi, Kader, modocache Reviewed By: wmi Subscribers: davidxl, eraman, llvm-commits Differential Revision: https://reviews.llvm.org/D52845 llvm-svn: 352001	2019-01-24 00:55:23 +00:00
Wei Mi	3bcccdfe38	[SampleFDO] Skip profile reading when flattened profile used in ThinLTO postlink If the sample profile has no inlining hierachy information included, we call the sample profile is flattened. For flattened profile, in ThinLTO postlink phase, SampleProfileLoader's hot function inlining and profile annotation will do nothing, so it is better to save the effort to read in the profile and run the sample profile loader pass. It is helpful for reducing compile time when the flattened profile is huge. Differential Revision: https://reviews.llvm.org/D54819 llvm-svn: 351476	2019-01-17 20:48:34 +00:00
Richard Smith	6c67662816	Add a flag to remap manglings when reading profile data information. This can be used to preserve profiling information across codebase changes that have widespread impact on mangled names, but across which most profiling data should still be usable. For example, when switching from libstdc++ to libc++, or from the old libstdc++ ABI to the new ABI, or even from a 32-bit to a 64-bit build. The user can provide a remapping file specifying parts of mangled names that should be treated as equivalent (eg, std::__1 should be treated as equivalent to std::__cxx11), and profile data will be treated as applying to a particular function if its name is equivalent to the name of a function in the profile data under the provided equivalences. See the documentation change for a description of how this is configured. Remapping is supported for both sample-based profiling and instruction profiling. We do not support remapping indirect branch target information, but all other profile data should be remapped appropriately. Support is only added for the new pass manager. If someone wants to also add support for this for the old pass manager, doing so should be straightforward. This is the LLVM side of Clang r344199. Reviewers: davidxl, tejohnson, dlj, erik.pilkington Subscribers: mehdi_amini, steven_wu, dexonsmith, llvm-commits Differential Revision: https://reviews.llvm.org/D51249 llvm-svn: 344200	2018-10-10 23:13:47 +00:00
Wei Mi	6a14325dff	[SampleFDO] Add FunctionOffsetTable in compact binary format profile. The patch saves a function offset table which maps function name index to the offset of its function profile to the start of the binary profile. By using the function offset table, for those function profiles which will not be used when compiling a module, the profile reader does't have to read them. For profile size around 10~20M, it saves ~10% compile time. Differential Revision: https://reviews.llvm.org/D51863 llvm-svn: 342283	2018-09-14 20:52:59 +00:00
Wei Mi	94d44c97bc	[SampleFDO] Make sample profile loader unaware of compact format change. The patch tries to make sample profile loader independent of profile format change. It moves compact format related code into FunctionSamples and SampleProfileReader classes, and sample profile loader only has to interact with those two classes and will be unaware of profile format changes. The cleanup also contain some fixes to further remove the difference between compactbinary format and binary format. After the cleanup using different formats originated from the same profile will generate the same binaries, which we verified by compiling two large server benchmarks w/wo thinlto. Differential Revision: https://reviews.llvm.org/D51643 llvm-svn: 341591	2018-09-06 22:03:37 +00:00
Wei Mi	b1ef2cc53d	[SampleFDO] Fix a bug in getOrCompHotCountThreshold/getOrCompColdCountThreshold getOrCompHotCountThreshold/getOrCompColdCountThreshold introduced in https://reviews.llvm.org/D45377 contain a bad mistake and will only return 1 or 0 instead of the true hot/cold cutoff value. The patch fixes the mistake. But the mistake seems not causing big performance difference according to internal server benchmarks testing. Differential Revision: https://reviews.llvm.org/D50370 llvm-svn: 339162	2018-08-07 18:13:10 +00:00
Wei Mi	a0c0857e7a	[SampleFDO] Add a new compact binary format for sample profile. Name table occupies a big chunk of size in current binary format sample profile. In order to reduce its size, the patch changes the sample writer/reader to save/restore MD5Hash of names in the name table. Sample annotation phase will also use MD5Hash of name to query samples accordingly. Experiment shows compact binary format can reduce the size of sample profile by 2/3 compared with binary format generally. Differential Revision: https://reviews.llvm.org/D47955 llvm-svn: 334447	2018-06-11 22:40:43 +00:00
Wei Mi	0c2f6be662	[SampleFDO] Don't treat warm callsite with inline instance in the profile as cold We found current sampleFDO had a performance issue when triaging a regression. For a callsite with inline instance in the profile, even if hot callsite inliner cannot inline it, it may still execute enough times and should not be treated as cold in regular inliner later. However, currently if such callsite is not inlined by hot callsite inliner, and the BB where the callsite locates doesn't get samples from other instructions inside of it, the callsite will have no profile metadata annotated. In regular inliner cost analysis, if the callsite has no profile annotated and its caller has profile information, it will be treated as cold. The fix changes the isCallsiteHot check and chooses to compare CallsiteTotalSamples with hot cutoff value computed by ProfileSummaryInfo. Differential Revision: https://reviews.llvm.org/D45377 llvm-svn: 332058	2018-05-10 23:02:27 +00:00
Dehao Chen	c6c051f2ea	Include GUIDs from the same module when computing GUIDs that needs to be imported. Summary: In the compile phase of SamplePGO+ThinLTO, ICP is not invoked. Instead, indirect call targets will be included as function metadata for ThinIndex to buidl the call graph. This should not only include functions defined in other modules, but also functions defined in the same module, otherwise ThinIndex may find the callee dead and eliminate it, while ICP in backend will revive the symbol, which leads to undefined symbol. Reviewers: tejohnson Reviewed By: tejohnson Subscribers: sanjoy, llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D39480 llvm-svn: 317118	2017-11-01 20:26:47 +00:00
Dehao Chen	9bd60429e2	Directly return promoted direct call instead of rely on stripPointerCast. Summary: stripPointerCast is not reliably returning the value that's being type-casted. Instead it may look further at function attributes to further propagate the value. Instead of relying on stripPOintercast, the more reliable solution is to directly use the pointer to the promoted direct call. Reviewers: tejohnson, davidxl Reviewed By: tejohnson Subscribers: llvm-commits, sanjoy Differential Revision: https://reviews.llvm.org/D38603 llvm-svn: 315077	2017-10-06 17:04:55 +00:00
Dehao Chen	16f01fb1db	Annotate VP prof on indirect call if it is ICPed in the profiled binary. Summary: In SamplePGO, when an indirect call is promoted in the profiled binary, before profile annotation, it will be promoted and inlined. For the original indirect call, the current implementation will not mark VP profile on it. This is an issue when profile becomes stale. This patch annotates VP prof on indirect calls during annotation. Reviewers: tejohnson Reviewed By: tejohnson Subscribers: sanjoy, llvm-commits Differential Revision: https://reviews.llvm.org/D38477 llvm-svn: 315016	2017-10-05 20:15:29 +00:00
Dehao Chen	d26dae0d34	Separate the logic when handling indirect calls in SamplePGO ThinLTO compile phase and other phases. Summary: In SamplePGO ThinLTO compile phase, we will not invoke ICP as it may introduce confusion to the 2nd annotation. This patch extracted that logic and makes it clearer before profile annotation. In the mean time, we need to make function importing process both inlined callsites as well as not promoted indirect callsites. Reviewers: tejohnson Reviewed By: tejohnson Subscribers: sanjoy, mehdi_amini, llvm-commits, inglorion Differential Revision: https://reviews.llvm.org/D38094 llvm-svn: 314619	2017-10-01 05:24:51 +00:00
Dehao Chen	62b9c33e1e	Import all inlined indirect call targets for SamplePGO. Summary: In the ThinLTO compilation, if a function is inlined in the profiling binary, we need to inline it before annotation. If the callee is not available in the primary module, a first step is needed to import that callee function. For the current implementation, if the call is an indirect call, which has been promoted to >1 targets and inlined, SamplePGO will only import one target with the largest sample count. This patch fixed the bug to import all targets instead. Reviewers: tejohnson, davidxl Reviewed By: tejohnson Subscribers: sanjoy, llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D36637 llvm-svn: 313678	2017-09-19 21:18:14 +00:00
Dehao Chen	b6e60c8b80	Handle profile mismatch correctly for SamplePGO. Summary: Fix the bug when promoted call return type mismatches with the promoted function, we should not try to inline it. Otherwise it may lead to compiler crash. Reviewers: davidxl, tejohnson, eraman Reviewed By: tejohnson Subscribers: llvm-commits, sanjoy Differential Revision: https://reviews.llvm.org/D38018 llvm-svn: 313658	2017-09-19 18:26:54 +00:00
Dehao Chen	3a81f84d9a	Invoke GetInlineCost for legality check before inline functions in SampleProfileLoader. Summary: SampleProfileLoader inlines hot functions if it is inlined in the profiled binary. However, the inline needs to be guarded by legality check, otherwise it could lead to correctness issues. Reviewers: eraman, davidxl Reviewed By: eraman Subscribers: vitalybuka, sanjoy, llvm-commits Differential Revision: https://reviews.llvm.org/D37779 llvm-svn: 313277	2017-09-14 17:29:56 +00:00
Vitaly Buka	48624d327a	Revert "Invoke GetInlineCost for legality check before inline functions in SampleProfileLoader." Patch introduced uninitialized value. This reverts commit r313195. llvm-svn: 313230	2017-09-14 05:40:33 +00:00
Dehao Chen	15c86ef970	Invoke GetInlineCost for legality check before inline functions in SampleProfileLoader. Summary: SampleProfileLoader inlines hot functions if it is inlined in the profiled binary. However, the inline needs to be guarded by legality check, otherwise it could lead to correctness issues. Reviewers: eraman, davidxl Reviewed By: eraman Subscribers: sanjoy, llvm-commits Differential Revision: https://reviews.llvm.org/D37779 llvm-svn: 313195	2017-09-13 21:22:55 +00:00
Dehao Chen	50f2aa19e8	Do not inline recursive direct calls in sample loader pass. Summary: r305009 disables recursive inlining for indirect calls in sample loader pass. The same logic applies to direct recursive calls. Reviewers: iteratee, davidxl Reviewed By: iteratee Subscribers: sanjoy, llvm-commits, eraman Differential Revision: https://reviews.llvm.org/D34456 llvm-svn: 305934	2017-06-21 17:57:43 +00:00
Dehao Chen	e2a428bad7	Do not early-inline recursive calls in sample profile loader. Summary: Early-inlining of recursive call makes the code size bloat exponentially. We should not disable it. Reviewers: davidxl, dnovillo, iteratee Reviewed By: iteratee Subscribers: iteratee, llvm-commits, sanjoy Differential Revision: https://reviews.llvm.org/D34017 llvm-svn: 305009	2017-06-08 20:11:57 +00:00
Adrian Prantl	defc99a94e	Cleanup tests to not share a DISubprogram between multiple Functions. rdar://problem/31926379 llvm-svn: 302166	2017-05-04 16:24:31 +00:00
Dehao Chen	1ea8bd8109	Build SymbolMap in SampleProfileLoader to help matchin function names with suffix. Summary: If there is suffix added in the function name (e.g. module hash added by thinLTO), we will not be able to find a match in profile as the suffix does not exist in profile. This patch build a map from function name to Function *. The map includes the entry for the stripped function name so that inlineHotFunctions can find the corresponding function to promote/inline. Reviewers: davidxl, dnovillo, tejohnson Reviewed By: davidxl Subscribers: mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D31952 llvm-svn: 300507	2017-04-17 22:23:05 +00:00
Dehao Chen	2c7ca9b5df	SamplePGO: convert callsite samples map key from callsite_location to callsite_location+callee_name Summary: For iterative SamplePGO, an indirect call can be speculatively promoted to multiple direct calls and get inlined. All these promoted direct calls will share the same callsite location (offset+discriminator). With the current implementation, we cannot distinguish between different promotion candidates and its inlined instance. This patch adds callee_name to the key of the callsite sample map. And added helper functions to get all inlined callee samples for a given callsite location. This helps the profile annotator promote correct targets and inline it before annotation, and ensures all indirect call targets to be annotated correctly. Reviewers: davidxl, dnovillo Reviewed By: davidxl Subscribers: andreadb, llvm-commits Differential Revision: https://reviews.llvm.org/D31950 llvm-svn: 300240	2017-04-13 19:52:10 +00:00
Dehao Chen	a60cdd3881	Add function importing info from samplepgo profile to the module summary. Summary: For SamplePGO, the profile may contain cross-module inline stacks. As we need to make sure the profile annotation happens when all the hot inline stacks are expanded, we need to pass this info to the module importer so that it can import proper functions if necessary. This patch implemented this feature by emitting cross-module targets as part of function entry metadata. In the module-summary phase, the metadata is used to build call edges that points to functions need to be imported. Reviewers: mehdi_amini, tejohnson Reviewed By: tejohnson Subscribers: davidxl, llvm-commits Differential Revision: https://reviews.llvm.org/D30053 llvm-svn: 296498	2017-02-28 18:09:44 +00:00
Dehao Chen	920677a997	Fix an obvious bug in SampleProfileReaderGCC. Summary: The CallTargetProfile should be added to FProfile to be consistent with other profile readers. Reviewers: dnovillo, davidxl Reviewed By: davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30233 llvm-svn: 295852	2017-02-22 17:27:21 +00:00
Dehao Chen	4a9dd70213	Fix the samplepgo indirect call promotion bug: we should not promote a direct call. Summary: Checking CS.getCalledFunction() == nullptr does not necessary indicate indirect call. We also need to check if CS.getCalledValue() is not a constant. Reviewers: davidxl Reviewed By: davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29570 llvm-svn: 294260	2017-02-06 23:33:15 +00:00
Dehao Chen	274df5ea41	Explicitly promote indirect calls before sample profile annotation. Summary: In iterative sample pgo where profile is collected from PGOed binary, we may see indirect call targets promoted and inlined in the profile. Before profile annotation, we need to make this happen in order to annotate correctly on IR. This patch explicitly promotes these indirect calls and inlines them before profile annotation. Reviewers: xur, davidxl Reviewed By: davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29040 llvm-svn: 293657	2017-01-31 17:49:37 +00:00
Dehao Chen	6217fa44b8	Revert r292979 which causes compile time failure. llvm-svn: 293557	2017-01-30 22:26:05 +00:00

1 2

86 Commits