Following up D81682 and D83903, remove the code for the old value profiling
buckets, which have been replaced with the new, extended buckets and disabled by
default.
Also syncing InstrProfData.inc between compiler-rt and llvm.
Differential Revision: https://reviews.llvm.org/D88838
D68041 placed `__profc_`, `__profd_` and (if exists) `__profvp_` in different comdat groups.
There are some issues:
* Cost: one or two additional section headers (`.group` section(s)): 64 or 128 bytes on ELF64.
* `__profc_`, `__profd_` and (if exists) `__profvp_` should be retained or
discarded. Placing them into separate comdat groups is conceptually inferior.
* If the prevailing group does not include `__profvp_` (value profiling not
used) but a non-prevailing group from another translation unit has `__profvp_`
(the function is inlined into another and triggers value profiling), there
will be a stray `__profvp_` if --gc-sections is not enabled.
This has been fixed by 3d6f53018f.
Actually, we can reuse an existing symbol (we choose `__profd_`) as the group
signature to avoid a string in the string table (the sole reason that D68041
could improve code size is that `__profv_` was an otherwise unused symbol which
wasted string table space). This saves one or two section headers.
For a -DCMAKE_BUILD_TYPE=Release -DLLVM_BUILD_INSTRUMENTED=IR build, `ninja
clang lld`, the patch has saved 10.5MiB (2.2%) for the total .o size.
Reviewed By: davidxl
Differential Revision: https://reviews.llvm.org/D84723
Extend the memop value profile buckets to be more flexible (could accommodate a
mix of individual values and ranges) and to cover more value ranges (from 11 to
22 buckets).
Disabled behind a flag (to be enabled separately) and the existing code to be
removed later.
Differential Revision: https://reviews.llvm.org/D81682
Skip profile count promotion if any of the ExitBlocks contains a ret
instruction. This is to prevent dumping of incomplete profile -- if the
the loop is a long running loop and dump is called in the middle
of the loop, the result profile is incomplete.
ExitBlocks containing a ret instruction is an indication of a long running
loop -- early exit to error handling code.
Differential Revision: https://reviews.llvm.org/D84379
This reverts commit 4a539faf74.
There is a __llvm_profile_instrument_range related crash in PGO-instrumented clang:
```
(gdb) bt
llvm::ConstantRange const&, llvm::APInt const&, unsigned int, bool) ()
llvm::ScalarEvolution::getRangeForAffineAR(llvm::SCEV const*, llvm::SCEV
const*, llvm::SCEV const*, unsigned int) ()
```
(The body of __llvm_profile_instrument_range is inlined, so we can only find__llvm_profile_instrument_target in the trace)
```
23│ 0x000055555dba0961 <+65>: nopw %cs:0x0(%rax,%rax,1)
24│ 0x000055555dba096b <+75>: nopl 0x0(%rax,%rax,1)
25│ 0x000055555dba0970 <+80>: mov %rsi,%rbx
26│ 0x000055555dba0973 <+83>: mov 0x8(%rsi),%rsi # %rsi=-1 -> SIGSEGV
27│ 0x000055555dba0977 <+87>: cmp %r15,(%rbx)
28│ 0x000055555dba097a <+90>: je 0x55555dba0a76 <__llvm_profile_instrument_target+342>
```
`__llvm_profile_instrument_memop` transitively calls calloc, thus calloc
should not be instrumented.
I saw a
`calloc -> __llvm_profile_instrument_memop -> calloc -> __llvm_profile_instrument_memop -> ...`
infinite loop leading to stack overflow
when the malloc implementation (e.g. tcmalloc) is built and instrumented along with the application.
We should figure out the library calls which may be instrumented and disable
their instrumentation before rolling out this change.
Reviewed By: yamauchi
Differential Revision: https://reviews.llvm.org/D84358
Extend the memop value profile buckets to be more flexible (could accommodate a
mix of individual values and ranges) and to cover more value ranges (from 11 to
22 buckets).
Disabled behind a flag (to be enabled separately) and the existing code to be
removed later.
Add an option to always instrument function entry BB (default off)
Add an option to do atomically updates on the first counter in each
instrumented function.
Differential Revision: https://reviews.llvm.org/D82123
Extend the memop value profile buckets to be more flexible (could accommodate a
mix of individual values and ranges) and to cover more value ranges (from 11 to
22 buckets).
Disabled behind a flag (to be enabled separately) and the existing code to be
removed later.
Differential Revision: https://reviews.llvm.org/D81682
The !associated metadata may be attached to a global object declaration
with a single argument that references another global object. This
metadata prevents discarding of the global object in linker GC unless
the referenced object is also discarded.
Furthermore, when a function symbol is discarded by the linker, setting
up !associated metadata allows linker to discard counters, data and
values associated with that function symbol. This is not possible today
because there's metadata to guide the linker. This approach is also used
by other instrumentations like sanitizers.
Note that !associated metadata is only supported by ELF, it does not have
any effect on non-ELF targets.
Differential Revision: https://reviews.llvm.org/D76802
The !associated metadata may be attached to a global object declaration
with a single argument that references another global object. This
metadata prevents discarding of the global object in linker GC unless
the referenced object is also discarded.
Furthermore, when a function symbol is discarded by the linker, setting
up !associated metadata allows linker to discard counters, data and
values associated with that function symbol. This is not possible today
because there's metadata to guide the linker. This approach is also used
by other instrumentations like sanitizers.
Note that !associated metadata is only supported by ELF, it does not have
any effect on non-ELF targets.
Differential Revision: https://reviews.llvm.org/D76802
Follow the model used on Linux, where the clang driver passes the
linker a -u switch to force the profile runtime to be linked in,
rather than having every TU emit a dead function with a reference.
Differential Revision: https://reviews.llvm.org/D79835
Follow the model used on Linux, where the clang driver passes the
linker a -u switch to force the profile runtime to be linked in,
rather than having every TU emit a dead function with a reference.
Patch By: mcgrathr
Differential Revision: https://reviews.llvm.org/D79835
Summary: This was preventing MemorySanitizerLegacyPass from appearing in --print-after-all.
Reviewers: vitalybuka
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79661
Try again with an up-to-date version of D69471 (99317124 was a stale
revision).
---
Revise the coverage mapping format to reduce binary size by:
1. Naming function records and marking them `linkonce_odr`, and
2. Compressing filenames.
This shrinks the size of llc's coverage segment by 82% (334MB -> 62MB)
and speeds up end-to-end single-threaded report generation by 10%. For
reference the compressed name data in llc is 81MB (__llvm_prf_names).
Rationale for changes to the format:
- With the current format, most coverage function records are discarded.
E.g., more than 97% of the records in llc are *duplicate* placeholders
for functions visible-but-not-used in TUs. Placeholders *are* used to
show under-covered functions, but duplicate placeholders waste space.
- We reached general consensus about giving (1) a try at the 2017 code
coverage BoF [1]. The thinking was that using `linkonce_odr` to merge
duplicates is simpler than alternatives like teaching build systems
about a coverage-aware database/module/etc on the side.
- Revising the format is expensive due to the backwards compatibility
requirement, so we might as well compress filenames while we're at it.
This shrinks the encoded filenames in llc by 86% (12MB -> 1.6MB).
See CoverageMappingFormat.rst for the details on what exactly has
changed.
Fixes PR34533 [2], hopefully.
[1] http://lists.llvm.org/pipermail/llvm-dev/2017-October/118428.html
[2] https://bugs.llvm.org/show_bug.cgi?id=34533
Differential Revision: https://reviews.llvm.org/D69471
Revise the coverage mapping format to reduce binary size by:
1. Naming function records and marking them `linkonce_odr`, and
2. Compressing filenames.
This shrinks the size of llc's coverage segment by 82% (334MB -> 62MB)
and speeds up end-to-end single-threaded report generation by 10%. For
reference the compressed name data in llc is 81MB (__llvm_prf_names).
Rationale for changes to the format:
- With the current format, most coverage function records are discarded.
E.g., more than 97% of the records in llc are *duplicate* placeholders
for functions visible-but-not-used in TUs. Placeholders *are* used to
show under-covered functions, but duplicate placeholders waste space.
- We reached general consensus about giving (1) a try at the 2017 code
coverage BoF [1]. The thinking was that using `linkonce_odr` to merge
duplicates is simpler than alternatives like teaching build systems
about a coverage-aware database/module/etc on the side.
- Revising the format is expensive due to the backwards compatibility
requirement, so we might as well compress filenames while we're at it.
This shrinks the encoded filenames in llc by 86% (12MB -> 1.6MB).
See CoverageMappingFormat.rst for the details on what exactly has
changed.
Fixes PR34533 [2], hopefully.
[1] http://lists.llvm.org/pipermail/llvm-dev/2017-October/118428.html
[2] https://bugs.llvm.org/show_bug.cgi?id=34533
Differential Revision: https://reviews.llvm.org/D69471
We have to avoid using a GOT relocation to access the bias variable,
setting the hidden visibility achieves that.
Differential Revision: https://reviews.llvm.org/D73529
Patch by Chris Chrulski
When generating value profiling instrumentation, ensure the call gets the
correct funclet token, otherwise WinEHPrepare will turn the call (and all
subsequent instructions) into unreachable.
Differential Revision: https://reviews.llvm.org/D73221
Patch by Chris Chrulski
This fixes a problem with the current behavior when assertions are enabled.
A loop that exits to a catchswitch instruction is skipped for the counter
promotion, however this check was being done after the PGOCounterPromoter
tried to collect an insertion point for the exit block. A call to
getFirstInsertionPt() on a block that begins with a catchswitch instruction
triggers an assertion. This change performs a check whether the counter
promotion is possible prior to collecting the ExitBlocks and InsertPts.
Differential Revision: https://reviews.llvm.org/D73222
Summary:
This is a follow up on https://reviews.llvm.org/D71473#inline-647262.
There's a caveat here that `Align(1)` relies on the compiler understanding of `Log2_64` implementation to produce good code. One could use `Align()` as a replacement but I believe it is less clear that the alignment is one in that case.
Reviewers: xbolva00, courbet, bollu
Subscribers: arsenm, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, Jim, kerbowa, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D73099
This is an alternative to the continous mode that was implemented in
D68351. This mode relies on padding and the ability to mmap a file over
the existing mapping which is generally only available on POSIX systems
and isn't suitable for other platforms.
This change instead introduces the ability to relocate counters at
runtime using a level of indirection. On every counter access, we add a
bias to the counter address. This bias is stored in a symbol that's
provided by the profile runtime and is initially set to zero, meaning no
relocation. The runtime can mmap the profile into memory at abitrary
location, and set bias to the offset between the original and the new
counter location, at which point every subsequent counter access will be
to the new location, which allows updating profile directly akin to the
continous mode.
The advantage of this implementation is that doesn't require any special
OS support. The disadvantage is the extra overhead due to additional
instructions required for each counter access (overhead both in terms of
binary size and performance) plus duplication of counters (i.e. one copy
in the binary itself and another copy that's mmapped).
Differential Revision: https://reviews.llvm.org/D69740
Revise the coverage mapping format to reduce binary size by:
1. Naming function records and marking them `linkonce_odr`, and
2. Compressing filenames.
This shrinks the size of llc's coverage segment by 82% (334MB -> 62MB)
and speeds up end-to-end single-threaded report generation by 10%. For
reference the compressed name data in llc is 81MB (__llvm_prf_names).
Rationale for changes to the format:
- With the current format, most coverage function records are discarded.
E.g., more than 97% of the records in llc are *duplicate* placeholders
for functions visible-but-not-used in TUs. Placeholders *are* used to
show under-covered functions, but duplicate placeholders waste space.
- We reached general consensus about giving (1) a try at the 2017 code
coverage BoF [1]. The thinking was that using `linkonce_odr` to merge
duplicates is simpler than alternatives like teaching build systems
about a coverage-aware database/module/etc on the side.
- Revising the format is expensive due to the backwards compatibility
requirement, so we might as well compress filenames while we're at it.
This shrinks the encoded filenames in llc by 86% (12MB -> 1.6MB).
See CoverageMappingFormat.rst for the details on what exactly has
changed.
Fixes PR34533 [2], hopefully.
[1] http://lists.llvm.org/pipermail/llvm-dev/2017-October/118428.html
[2] https://bugs.llvm.org/show_bug.cgi?id=34533
Differential Revision: https://reviews.llvm.org/D69471
This file lists every pass in LLVM, and is included by Pass.h, which is
very popular. Every time we add, remove, or rename a pass in LLVM, it
caused lots of recompilation.
I found this fact by looking at this table, which is sorted by the
number of times a file was changed over the last 100,000 git commits
multiplied by the number of object files that depend on it in the
current checkout:
recompiles touches affected_files header
342380 95 3604 llvm/include/llvm/ADT/STLExtras.h
314730 234 1345 llvm/include/llvm/InitializePasses.h
307036 118 2602 llvm/include/llvm/ADT/APInt.h
213049 59 3611 llvm/include/llvm/Support/MathExtras.h
170422 47 3626 llvm/include/llvm/Support/Compiler.h
162225 45 3605 llvm/include/llvm/ADT/Optional.h
158319 63 2513 llvm/include/llvm/ADT/Triple.h
140322 39 3598 llvm/include/llvm/ADT/StringRef.h
137647 59 2333 llvm/include/llvm/Support/Error.h
131619 73 1803 llvm/include/llvm/Support/FileSystem.h
Before this change, touching InitializePasses.h would cause 1345 files
to recompile. After this change, touching it only causes 550 compiles in
an incremental rebuild.
Reviewers: bkramer, asbirlea, bollu, jdoerfert
Differential Revision: https://reviews.llvm.org/D70211
With this patch, compiler generated profile variables will have its own COMDAT
name for ELF format, which syncs the behavior with COFF. Tested with clang
PGO bootstrap. This shows a modest reduction in object sizes in ELF format.
Differential Revision: https://reviews.llvm.org/D68041
llvm-svn: 373241
For COFF, a comdat group is really a symbol marked
IMAGE_COMDAT_SELECT_ANY and zero or more other symbols marked
IMAGE_COMDAT_SELECT_ASSOCIATIVE. Typically the associative symbols in
the group are not external and are not referenced by other TUs, they are
things like debug info, C++ dynamic initializers, or other section
registration schemes. The Visual C++ linker reports a duplicate symbol
error for symbols marked IMAGE_COMDAT_SELECT_ASSOCIATIVE even if they
would be discarded after handling the leader symbol.
Fixes coverage-inline.cpp in check-profile after r372020.
llvm-svn: 372182
This fixes relocations against __profd_ symbols in discarded sections,
which is PR41380.
In general, instrumentation happens very early, and optimization and
inlining happens afterwards. The counters for a function are calculated
early, and after inlining, counters for an inlined function may be
widely referenced by other functions.
For C++ inline functions of all kinds (linkonce_odr &
available_externally mainly), instr profiling wants to deduplicate these
__profc_ and __profd_ globals. Otherwise the binary would be quite
large.
I made __profd_ and __profc_ comdat in r355044, but I chose to make
__profd_ internal. At the time, I was only dealing with coverage, and in
that case, none of the instrumentation needs to reference __profd_.
However, if you use PGO, then instrumentation passes add calls to
__llvm_profile_instrument_range which reference __profd_ globals. The
solution is to make these globals externally visible by using
linkonce_odr linkage for data as was done for counters.
This is safe because PGO adds a CFG hash to the names of the data and
counter globals, so if different TUs have different globals, they will
get different data and counter arrays.
Reviewers: xur, hans
Differential Revision: https://reviews.llvm.org/D67579
llvm-svn: 372020
Summary:
This is the first change to enable the TLI to be built per-function so
that -fno-builtin* handling can be migrated to use function attributes.
See discussion on D61634 for background. This is an enabler for fixing
handling of these options for LTO, for example.
This change should not affect behavior, as the provided function is not
yet used to build a specifically per-function TLI, but rather enables
that migration.
Most of the changes were very mechanical, e.g. passing a Function to the
legacy analysis pass's getTLI interface, or in Module level cases,
adding a callback. This is similar to the way the per-function TTI
analysis works.
There was one place where we were looking for builtins but not in the
context of a specific function. See FindCXAAtExit in
lib/Transforms/IPO/GlobalOpt.cpp. I'm somewhat concerned my workaround
could provide the wrong behavior in some corner cases. Suggestions
welcome.
Reviewers: chandlerc, hfinkel
Subscribers: arsenm, dschuff, jvesely, nhaehnle, mehdi_amini, javed.absar, sbc100, jgravelle-google, eraman, aheejin, steven_wu, george.burgess.iv, dexonsmith, jfb, asbirlea, gchatelet, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66428
llvm-svn: 371284
Currently, many profiling tests on Solaris FAIL like
Command Output (stderr):
--
Undefined first referenced
symbol in file
__llvm_profile_register_names_function /tmp/lit_tmp_Nqu4eh/infinite_loop-9dc638.o
__llvm_profile_register_function /tmp/lit_tmp_Nqu4eh/infinite_loop-9dc638.o
Solaris 11.4 ld supports the non-standard GNU ld extension of adding
__start_SECNAME and __stop_SECNAME labels to sections whose names are valid
as C identifiers. Given that we already use Solaris 11.4-only features
like ld -z gnu-version-script-compat and fully working .preinit_array
support in compiler-rt, we don't need to worry about older versions of
Solaris ld.
The patch documents that support (although the comment in
lib/Transforms/Instrumentation/InstrProfiling.cpp
(needsRuntimeRegistrationOfSectionRange) is quite cryptic what it's
actually about), and adapts the affected testcase not to expect the
alternativeq __llvm_profile_register_functions and __llvm_profile_init.
It fixes all affected tests.
Tested on amd64-pc-solaris2.11.
Differential Revision: https://reviews.llvm.org/D41111
llvm-svn: 363984
Summary:
I hadn't realized that instrumentation runs before inlining, so we can't
use the function as the comdat group. Doing so can create relocations
against discarded sections when references to discarded __profc_
variables are inlined into functions outside the function's comdat
group.
In the future, perhaps we should consider standardizing the comdat group
names that ELF and COFF use. It will save object file size, since
__profv_$sym won't appear in the symbol table again.
Reviewers: xur, vsk
Subscribers: eraman, hiraditya, cfe-commits, #sanitizers, llvm-commits
Tags: #clang, #sanitizers, #llvm
Differential Revision: https://reviews.llvm.org/D58737
llvm-svn: 355044
Current PGO profile counts are not context sensitive. The branch probabilities
for the inlined functions are kept the same for all call-sites, and they might
be very different from the actual branch probabilities. These suboptimal
profiles can greatly affect some downstream optimizations, in particular for
the machine basic block placement optimization.
In this patch, we propose to have a post-inline PGO instrumentation/use pass,
which we called Context Sensitive PGO (CSPGO). For the users who want the best
possible performance, they can perform a second round of PGO instrument/use on
the top of the regular PGO. They will have two sets of profile counts. The
first pass profile will be manly for inline, indirect-call promotion, and
CGSCC simplification pass optimizations. The second pass profile is for
post-inline optimizations and code-gen optimizations.
A typical usage:
// Regular PGO instrumentation and generate pass1 profile.
> clang -O2 -fprofile-generate source.c -o gen
> ./gen
> llvm-profdata merge default.*profraw -o pass1.profdata
// CSPGO instrumentation.
> clang -O2 -fprofile-use=pass1.profdata -fcs-profile-generate -o gen2
> ./gen2
// Merge two sets of profiles
> llvm-profdata merge default.*profraw pass1.profdata -o profile.profdata
// Use the combined profile. Pass manager will invoke two PGO use passes.
> clang -O2 -fprofile-use=profile.profdata -o use
This change touches many components in the compiler. The reviewed patch
(D54175) will committed in phrases.
Differential Revision: https://reviews.llvm.org/D54175
llvm-svn: 354930
Summary:
The motivating use case is eliminating duplicate profile data registered
for the same inline function in two object files. Before this change,
users would observe multiple symbol definition errors with VC link, but
links with LLD would succeed.
Users (Mozilla) have reported that PGO works well with clang-cl and LLD,
but when using LLD without this static registration, we would get into a
"relocation against a discarded section" situation. I'm not sure what
happens in that situation, but I suspect that duplicate, unused profile
information was retained. If so, this change will reduce the size of
such binaries with LLD.
Now, Windows uses static registration and is in line with all the other
platforms.
Reviewers: davidxl, wmi, inglorion, void, calixte
Subscribers: mgorny, krytarowski, eraman, fedor.sergeev, hiraditya, #sanitizers, dmajor, llvm-commits
Tags: #sanitizers, #llvm
Differential Revision: https://reviews.llvm.org/D57929
llvm-svn: 353547
Summary:
Experimentally we found that promotion to scalars carries less benefits
than sinking and hoisting in LICM. When using MemorySSA, we build an
AliasSetTracker on demand in order to reuse the current infrastructure.
We only build it if less than AccessCapForMSSAPromotion exist in the
loop, a cap that is by default set to 250. This value ensures there are
no runtime regressions, and there are small compile time gains for
pathological cases. A much lower value (20) was found to yield a single
regression in the llvm-test-suite and much higher benefits for compile
times. Conservatively we set the current cap to a high value, but we will
explore lowering it when MemorySSA is enabled by default.
Reviewers: sanjoy, chandlerc
Subscribers: nemanjai, jlebar, Prazek, george.burgess.iv, jfb, jsji, llvm-commits
Differential Revision: https://reviews.llvm.org/D56625
llvm-svn: 353339
This cleans up all GetElementPtr creation in LLVM to explicitly pass a
value type rather than deriving it from the pointer's element-type.
Differential Revision: https://reviews.llvm.org/D57173
llvm-svn: 352913
This cleans up all LoadInst creation in LLVM to explicitly pass the
value type rather than deriving it from the pointer's element-type.
Differential Revision: https://reviews.llvm.org/D57172
llvm-svn: 352911
This cleans up all CallInst creation in LLVM to explicitly pass a
function type rather than deriving it from the pointer's element-type.
Differential Revision: https://reviews.llvm.org/D57170
llvm-svn: 352909
Recommit r352791 after tweaking DerivedTypes.h slightly, so that gcc
doesn't choke on it, hopefully.
Original Message:
The FunctionCallee type is effectively a {FunctionType*,Value*} pair,
and is a useful convenience to enable code to continue passing the
result of getOrInsertFunction() through to EmitCall, even once pointer
types lose their pointee-type.
Then:
- update the CallInst/InvokeInst instruction creation functions to
take a Callee,
- modify getOrInsertFunction to return FunctionCallee, and
- update all callers appropriately.
One area of particular note is the change to the sanitizer
code. Previously, they had been casting the result of
`getOrInsertFunction` to a `Function*` via
`checkSanitizerInterfaceFunction`, and storing that. That would report
an error if someone had already inserted a function declaraction with
a mismatching signature.
However, in general, LLVM allows for such mismatches, as
`getOrInsertFunction` will automatically insert a bitcast if
needed. As part of this cleanup, cause the sanitizer code to do the
same. (It will call its functions using the expected signature,
however they may have been declared.)
Finally, in a small number of locations, callers of
`getOrInsertFunction` actually were expecting/requiring that a brand
new function was being created. In such cases, I've switched them to
Function::Create instead.
Differential Revision: https://reviews.llvm.org/D57315
llvm-svn: 352827