llvm-project

Commit Graph

Author	SHA1	Message	Date
Dehao Chen	a8bae82373	Refine instruction weight annotation algorithm for sample profiler. Summary: This patch refined the instruction weight anootation algorithm: 1. Do not use dbg_value intrinsics for annotation. 2. Annotate cold calls if the call is inlined in profile, but not inlined before preparation. This indicates that the annotation preparation step found no sample for the inlined callsite, thus the call should be very cold. Reviewers: dnovillo, davidxl Subscribers: mgrang, llvm-commits Differential Revision: http://reviews.llvm.org/D19286 llvm-svn: 266936	2016-04-20 23:36:23 +00:00
Pete Cooper	adebb9379a	Remove llvm::getDISubprogram in favor of Function::getSubprogram llvm::getDISubprogram walks the instructions in a function, looking for one in the scope of the current function, so that it can find the !dbg entry for the subprogram itself. Now that !dbg is attached to functions, this should not be necessary. This patch changes all uses to just query the subprogram directly on the function. Ideally this should be NFC, but in reality its possible that a function: has no !dbg (in which case there's likely a bug somewhere in an opt pass), or that none of the instructions had a scope referencing the function, so we used to not find the !dbg on the function but now we will Reviewed by Duncan Exon Smith. Differential Revision: http://reviews.llvm.org/D18074 llvm-svn: 263184	2016-03-11 02:14:16 +00:00
Dehao Chen	57d1dda558	Use LineLocation instead of CallsiteLocation to index callsite profile. Summary: With discriminator, LineLocation can uniquely identify a callsite without the need to specifying callee name. Remove Callee function name from the key, and put it in the value (FunctionSamples). Reviewers: davidxl, dnovillo Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D17827 llvm-svn: 262634	2016-03-03 18:09:32 +00:00
Dehao Chen	1012be120a	Perform InstructioinCombiningPass before SampleProfile pass. Summary: SampleProfile pass needs to be performed after InstructionCombiningPass, which helps eliminate un-inlinable function calls. Reviewers: davidxl, dnovillo Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D17742 llvm-svn: 262419	2016-03-01 22:53:02 +00:00
Dehao Chen	6c73b49911	Set function entry count as 0 if sample profile is not found for the function. Summary: This change makes the sample profile's behavior consistent with instr profile. Reviewers: davidxl, eraman, dnovillo Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D17522 llvm-svn: 261587	2016-02-22 22:46:21 +00:00
Benjamin Kramer	8a752e316d	Use ArrayRef to hide SmallVector details, kill a useless vector copy along the way. llvm-svn: 260824	2016-02-13 16:01:12 +00:00
Matthias Braun	b30f2f5141	Avoid overly large SmallPtrSet/SmallSet These sets perform linear searching in small mode so it is never a good idea to use SmallSize/N bigger than 32. llvm-svn: 259283	2016-01-30 01:24:31 +00:00
Diego Novillo	10cf124bb9	SamplePGO - Reduce memory utilization by 10x. DenseMap is the wrong data structure to use for sample records and call sites. The keys are too large, causing massive core memory growth when reading profiles. Before this patch, a 21Mb input profile was causing the compiler to grow to 3Gb in memory. By switching to std::map, the compiler now grows to 300Mb in memory. There still are some opportunities for memory footprint reduction. I'll be looking at those next. llvm-svn: 255389	2015-12-11 23:21:38 +00:00
Diego Novillo	7ff0a174d1	SamplePGO - Do not use std::to_string in diagnostics. This fixes buildbots in systems that std::to_string is not present. It also tidies the output of the diagnostic to render doubles a bit better (thanks Ben Kramer for help with string streams and format). llvm-svn: 254261	2015-11-29 18:23:26 +00:00
Diego Novillo	84f06cc835	SamplePGO - Add initial support for inliner annotations. This adds two thresholds to the sample profiler to affect inlining decisions: the concept of global hotness and coldness. Functions that have accumulated more than a certain fraction of samples at runtime, are annotated with the InlineHint attribute. Conversely, functions that accumulate less than a certain fraction of samples, are annotated with the Cold attribute. This is very similar to the hints emitted by Clang when using instrumentation profiles. Notice that this is a very blunt instrument. A function may have globally collected a significant fraction of samples, but that does not necessarily mean that every callsite for that function is hot. Ideally, we would annotate each callsite with the samples collected at that callsite. This way, the inliner can incorporate all these weights into its cost model. Once the inliner offers this functionality, we can change the hints emitted here to a more precise per-callsite annotation. For now, this is providing some measure of speedups with our internal benchmarks. I've observed speedups of up to 23% (though the geo mean is about 3%). I expect these numbers to improve as the inliner gets better annotations. llvm-svn: 254212	2015-11-27 23:14:51 +00:00
Diego Novillo	b579240875	SamplePGO - Fix default threshold for hot callsites. Based on testing of internal benchmarks, I'm lowering this threshold to a value of 0.1%. This means that SamplePGO will respect 99.9% of the original inline decisions when following a profile. The performance difference is noticeable in some tests. With the previous threshold, the speedups over baseline -O2 was about 0.63%. With the new default, the speedups are around 3% on average. The point of this threshold is not to do more aggressive inlining. When an inlined callsite crosses this threshold, SamplePGO will redo the inline decision so that it can better apply the input profile. By respecting most original inline decisions, we can apply more of the input profile because the shape of the code follows the profile more closely. In the next series, I'll be looking at adding some inline hints for the cold callsites and for toplevel functions that are hot/cold as well. llvm-svn: 254211	2015-11-27 23:14:49 +00:00
Diego Novillo	0b6985a3c6	SamplePGO - Add test for hot/cold inlined functions. When the original binary is executed and sampled, the resulting profile contains information on the original inline stack. We currently follow the original inline plan if we notice that the inlined callsite has more than 0 samples to it. A better way is to determine whether the callsite is actually worth inlining. If the callsite accumulates a small fraction of the samples spent in the parent function, then we don't want to bother inlining it (as it means that the callsite is actually cold). This patch introduces a threshold expressed in percentage of samples in relation to the parent function. If the callsite uses less than N% of the total samples used by its parent, the original inline decision is not re-applied. I've set the threshold to the very arbitrary value of 5%. I'm yet to do any actual experiments to see what's a good value. I wanted to separate the basic mechanism from the tuning. llvm-svn: 254034	2015-11-24 22:38:37 +00:00
Diego Novillo	243ea6a7d6	SamplePGO - Add coverage tracking for samples. The existing coverage tracker counts the number of records that were used from the input profile. An alternative view of coverage is to check how many available samples were applied. This way, if the profile contains several records with few samples, it doesn't really matter much that they were not applied. The more interesting records to apply are the ones that contribute many samples. llvm-svn: 253912	2015-11-23 20:12:21 +00:00
Diego Novillo	1ca881c4bb	SamplePGO - Clear coverage tracking when clearing per-function data. llvm-svn: 253877	2015-11-23 16:30:17 +00:00
Diego Novillo	39ab68f39b	SamplePGO - Use newly introduced local variable. NFC. llvm-svn: 253868	2015-11-23 15:24:13 +00:00
Diego Novillo	5fb49e5c5f	SamplePGO - Do not count never-executed inlined functions when computing coverage. If a function was originally inlined but not actually hot at runtime, its samples will not be counted inside the parent function. This throws off the coverage calculation because it expects to find more used records than it should. Fixed by ignoring functions that will not be inlined into the parent. Currently, this is inlined functions with 0 samples. In subsequent patches, I'll change this to mean "cold" functions. llvm-svn: 253716	2015-11-20 21:46:38 +00:00
Diego Novillo	df544a098a	SamplePGO - Add line offset and discriminator information to sample reports. While debugging some sampling coverage problems, I found this useful: When applying samples from a profile, it helps to also know what line offset and discriminator the sample belongs to. This makes it easy to correlate against the input profile. llvm-svn: 253670	2015-11-20 15:39:42 +00:00
David Blaikie	2297a9142e	StringRef-ify DiagnosticInfoSampleProfile::Filename llvm-svn: 251823	2015-11-02 20:01:13 +00:00
Diego Novillo	f9ed08e16e	SamplePGO - Count sample records in embedded profiles when computing coverage. The initial coverage checking code for sample records failed to count records inside inlined profiles. This change fixes the oversight. llvm-svn: 251752	2015-10-31 21:53:58 +00:00
Daniel Jasper	1de905a667	Fix use-after-free. Thanks ASAN for giving me a detailed report :-). llvm-svn: 251623	2015-10-29 12:49:37 +00:00
Diego Novillo	748b3ffe3b	SamplePGO - Add flag to check sampling coverage. This adds the flag -mllvm -sample-profile-check-coverage=N to the SampleProfile pass. N is the percent of input sample records that the user expects to apply. If the pass does not use N% (or more) of the sample records in the input, it emits a warning. This is useful to detect some forms of stale profiles. If the code has drifted enough from the original profile, there will be records that do not match the IR anymore. This will not detect cases where a sample profile record for line L is referring to some other instructions that also used to be at line L. llvm-svn: 251568	2015-10-28 22:30:25 +00:00
Diego Novillo	a8a3bd2100	SamplePGO - Clear per-function data after applying a profile. The pass was keeping around a lot of per-function data (visited blocks, edges, dominance, etc) that is just taking up memory for no reason. In fact, from function to function it could potentially confuse the propagator since some maps are indexed by line offsets which can be common between functions. llvm-svn: 251531	2015-10-28 17:40:22 +00:00
Diego Novillo	aa55507ff9	Tidy a comment. NFC. llvm-svn: 251434	2015-10-27 18:41:46 +00:00
Diego Novillo	c04270d2e4	Fix SamplePGO segfault when debug info is missing. When emitting a remark for a conditional branch annotation, the remark uses the line location information of the conditional branch in the message. In some cases, that information is unavailable and the optimization would segfaul. I'm still not sure whether this is a bug or WAI, but the optimizer should not die because of this. llvm-svn: 251420	2015-10-27 17:37:00 +00:00
Diego Novillo	e822b63681	Remove unused local variable. NFC. llvm-svn: 251344	2015-10-26 20:50:26 +00:00
Diego Novillo	7963ea1996	SamplePGO - Add optimization reports. This adds a couple of optimization remarks to the SamplePGO transformation. When it decides to inline a hot function (to mimic the inline stack and repeat useful inline decisions in the original build). It will also report branch destinations. For instance, given the code fragment: 6 if (i < 1000) 7 sum -= i; 8 else 9 sum += -i * rand(); If the 'else' branch is taken most of the time, building this code with -Rpass=sample-profile will produce: a.cc:9:14: remark: most popular destination for conditional branches at small.cc:6:9 [-Rpass=sample-profile] sum += -i * rand(); ^ llvm-svn: 251330	2015-10-26 18:52:53 +00:00
Dehao Chen	100424124b	Tolerate negative offset when matching sample profile. In some cases (as illustrated in the unittest), lineno can be less than the heade_lineno because the function body are included from some other files. In this case, offset will be negative. This patch makes clang still able to match the profile to IR in this situation. http://reviews.llvm.org/D13914 llvm-svn: 250873	2015-10-21 01:22:27 +00:00
Diego Novillo	38be33302c	Sample Profiles - Adjust integer types. Mostly NFC. This adjusts all integers in the reader/writer to reflect the types stored on profile files. They should all be unsigned 32-bit or 64-bit values. Changed all associated internal types to be uint32_t or uint64_t. The only place that needed some adjustments is in the sample profile transformation. Altough the weight read from the profile are 64-bit values, the internal API for branch weights only accepts 32-bit values. The pass now saturates weights that overflow uint32_t. llvm-svn: 250427	2015-10-15 16:36:21 +00:00
Dehao Chen	41dc5a6e86	Make HeaderLineno a local variable. http://reviews.llvm.org/D13576 As we are using hierarchical profile, there is no need to keep HeaderLineno a member variable. This is because each level of the inline stack will have its own header lineno. One should use the head lineno of its own inline stack level instead of the actual symbol. llvm-svn: 249848	2015-10-09 16:50:16 +00:00
Dehao Chen	7c41dd6498	Update sample profile propagation algorithm. http://reviews.llvm.org/D13218 llvm-svn: 248968	2015-10-01 00:26:56 +00:00
Dehao Chen	6722688eaa	http://reviews.llvm.org/D13145 Support hierarachical sample profile format. llvm-svn: 248865	2015-09-30 00:42:46 +00:00
Dehao Chen	8e7df83e6a	http://reviews.llvm.org/D13231 Change lookup functions to const functions. llvm-svn: 248818	2015-09-29 18:28:15 +00:00
Dehao Chen	028e122ca9	Revert r248810 which breaks tests. llvm-svn: 248814	2015-09-29 18:18:49 +00:00
Dehao Chen	410a25aa7a	http://reviews.llvm.org/D13231 Change lookup functions to const functions. llvm-svn: 248810	2015-09-29 17:59:58 +00:00
Diego Novillo	7732ae4a4f	Fix memory leak in sample profile pass. The problem here were the function analyses invoked by the function pass manager from the new IPO pass. I looked at other IPO passes needing dominance information and the only one that requires it (partial inliner) does not use the standard dependency mechanism. This patch mimics what the partial inliner does to compute dominance, post-dominance and loop info. One thing I like about this approach is that I can delay the computation of all this until I actually need it. This should bring the ASAN buildbot back to green. If there's a better way to fix this, I'll do it in a follow-up patch. llvm-svn: 246066	2015-08-26 20:00:27 +00:00
Diego Novillo	4d71113cdb	Convert SampleProfile pass into a Module pass. Eventually, we will need sample profiles to be incorporated into the inliner's cost models. To do this, we need the sample profile pass to be a module pass. This patch makes no functional changes beyond the mechanical adjustments needed to run SampleProfile as a module pass. llvm-svn: 245940	2015-08-25 15:25:11 +00:00

36 Commits