Commit Graph

439 Commits

Author SHA1 Message Date
Simon Pilgrim 61cdaf66fe [ADT] Remove APInt/APSInt toString() std::string variants
<string> is currently the highest impact header in a clang+llvm build:

https://commondatastorage.googleapis.com/chromium-browser-clang/llvm-include-analysis.html

One of the most common places this is being included is the APInt.h header, which needs it for an old toString() implementation that returns std::string - an inefficient method compared to the SmallString versions that it actually wraps.

This patch replaces these APInt/APSInt methods with a pair of llvm::toString() helpers inside StringExtras.h, adjusts users accordingly and removes the <string> from APInt.h - I was hoping that more of these users could be converted to use the SmallString methods, but it appears that most end up creating a std::string anyhow. I avoided trying to use the raw_ostream << operators as well as I didn't want to lose having the integer radix explicit in the code.

Differential Revision: https://reviews.llvm.org/D103888
2021-06-11 13:19:15 +01:00
Jinsong Ji 4a89ed373c [AIX] Add traceback ssp canary bit support
We will need to set the ssp canary bit in traceback table to communicate
with unwinder about the canary.

Reviewed By: #powerpc, shchenz

Differential Revision: https://reviews.llvm.org/D103202
2021-06-10 02:40:02 +00:00
Tomas Matheson e79e8041c5 [MC][NFCI] Factor out ELF section unique ID calculation
Precursor to D100944. The logic for determining the unique ID had become
quite difficult to reason about, so I have factored this out into a
separate function.

Differential Revision: https://reviews.llvm.org/D102336
2021-05-26 11:51:29 +01:00
Sam Clegg 3041b16f73 [WebAssembly] Add TLS data segment flag: WASM_SEG_FLAG_TLS
Previously the linker was relying solely on the name of the segment
to imply TLS.

Differential Revision: https://reviews.llvm.org/D102202
2021-05-12 13:31:02 -07:00
Sam Clegg 3b8d2be527 Reland: "[lld][WebAssembly] Initial support merging string data"
This change was originally landed in: 5000a1b4b9
It was reverted in: 061e071d8c

This change adds support for a new WASM_SEG_FLAG_STRINGS flag in
the object format which works in a similar fashion to SHF_STRINGS
in the ELF world.

Unlike the ELF linker this support is currently limited:
- No support for SHF_MERGE (non-string merging)
- Always do full tail merging ("lo" can be merged with "hello")
- Only support single byte strings (p2align 0)

Like the ELF linker merging is only performed at `-O1` and above.

This fixes part of https://bugs.llvm.org/show_bug.cgi?id=48828,
although crucially it doesn't not currently support debug sections
because they are not represented by data segments (they are custom
sections)

Differential Revision: https://reviews.llvm.org/D97657
2021-05-10 16:03:38 -07:00
Nico Weber 061e071d8c Revert "[lld][WebAssembly] Initial support merging string data"
This reverts commit 5000a1b4b9.
Breaks tests, see https://reviews.llvm.org/D97657#2749151

Easily repros locally with `ninja check-llvm-mc-webassembly`.
2021-05-10 18:28:28 -04:00
Sam Clegg 5000a1b4b9 [lld][WebAssembly] Initial support merging string data
This change adds support for a new WASM_SEG_FLAG_STRINGS flag in
the object format which works in a similar fashion to SHF_STRINGS
in the ELF world.

Unlike the ELF linker this support is currently limited:
- No support for SHF_MERGE (non-string merging)
- Always do full tail merging ("lo" can be merged with "hello")
- Only support single byte strings (p2align 0)

Like the ELF linker merging is only performed at `-O1` and above.

This fixes part of https://bugs.llvm.org/show_bug.cgi?id=48828,
although crucially it doesn't not currently support debug sections
because they are not represented by data segments (they are custom
sections)

Differential Revision: https://reviews.llvm.org/D97657
2021-05-10 13:15:12 -07:00
Harald van Dijk b0ef2070bc
[X86] Fix position-independent TType encoding
The logic for x86_64 position-independent TType encodings was backwards,
using 8 bytes where 4 were wanted and 4 where 8 were wanted. For regular
x86_64, this was mostly harmless, exception tables are allowed to use
8-byte encodings even when it is not needed. For the large code model,
and for X32, however, the generated exception tables were wrong. For the
large code model, we cannot assume that the address will fit in 4 bytes.
For X32, we cannot use 64-bit relocations.

Fixes PR50148.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D102132
2021-05-10 17:04:33 +01:00
Philipp Krones 632ebc4ab4 [MC] Untangle MCContext and MCObjectFileInfo
This untangles the MCContext and the MCObjectFileInfo. There is a circular
dependency between MCContext and MCObjectFileInfo. Currently this dependency
also exists during construction: You can't contruct a MOFI without a MCContext
without constructing the MCContext with a dummy version of that MOFI first.
This removes this dependency during construction. In a perfect world,
MCObjectFileInfo wouldn't depend on MCContext at all, but only be stored in the
MCContext, like other MC information. This is future work.

This also shifts/adds more information to the MCContext making it more
available to the different targets. Namely:

- TargetTriple
- ObjectFileType
- SubtargetInfo

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D101462
2021-05-05 10:03:02 -07:00
Sidharth Baveja 70c433a184 [XCOFF][AIX] Add Global Variables Directly to TOC for 32 bit AIX
Summary:
This patch implements the backend implementation of adding global variables
directly to the table of contents (TOC), rather than adding the address of the
variable to the TOC.
Currently, this patch will look for the "toc-data" attribute on symbols in the
IR, and then add those symbols to the TOC.
ATM, this is implemented for 32 bit AIX.

Reviewers: sfertile
Differential Revision: https://reviews.llvm.org/D101178
2021-04-30 14:48:02 +00:00
jasonliu 7049fbf960 [XCOFF] Handle the case when personality routine is an alias
Summary:
Personality routine could be an alias to another personality routine.
Fix the situation when we compile the file that contains the personality
routine and the file also have functions that need to refer to the
personality routine.

Reviewed By: hubert.reinterpretcast

Differential Revision: https://reviews.llvm.org/D101401
2021-04-29 22:03:30 +00:00
Sriraman Tallam a64411916c Basic block sections for functions with implicit-section-name attribute
Functions can have section names set via #pragma or section attributes,
basic block sections should be correctly named for such functions.

With #pragma, the expectation is that all functions in that file are placed
in the same section in the final binary. Basic block sections should be
correctly named with the unique flag set so that the final binary has all the
basic blocks of the function in that named section. This patch fixes the bug
by calling getExplictSectionGlobal when implicit-section-name attribute is set
to make sure the function's basic blocks get the correct section name.

Differential Revision: https://reviews.llvm.org/D101311
2021-04-29 12:29:34 -07:00
Petr Hosek 7a718e1630 [MC] Use COMDAT for LSDA only if IR comdat type is any
This fixed issue introduced in 16af973933
and 796feb6163.

Differential Revision: https://reviews.llvm.org/D100909
2021-04-21 14:41:39 -07:00
Victor Huang 1756b2adc9 [AIX][TLS] Generate TLS variables in assembly files
This patch allows generating TLS variables in assembly files on AIX.
Initialized and external uninitialized variables are generated with the
.csect pseudo-op and local uninitialized variables are generated with
the .comm/.lcomm pseudo-ops. The patch also adds a check to
explicitly say that TLS is not yet supported on AIX.

Reviewed by: daltenty, jasonliu, lei, nemanjai, sfertile
Originally patched by: bsaleil
Commandeered by: NeHuang

Differential Revision: https://reviews.llvm.org/D96184
2021-03-02 18:22:48 -06:00
Fangrui Song 47c5576d7d ELF: Create unique SHF_GNU_RETAIN sections for llvm.used global objects
If a global object is listed in `@llvm.used`, place it in a unique section with
the `SHF_GNU_RETAIN` flag. The section is a GC root under `ld --gc-sections`
with LLD>=13 or GNU ld>=2.36.

For front ends which do not expect to see multiple sections of the same name,
consider emitting `@llvm.compiler.used` instead of `@llvm.used`.

SHF_GNU_RETAIN is restricted to ELFOSABI_GNU and ELFOSABI_FREEBSD in
binutils. We don't do the restriction - see the rationale in D95749.

The integrated assembler has supported SHF_GNU_RETAIN since D95730.
GNU as>=2.36 supports section flag 'R'.
We don't need to worry about GNU ld support because older GNU ld just ignores
the unknown SHF_GNU_RETAIN.

With this change, `__attribute__((retain))` functions/variables emitted
by clang will get the SHF_GNU_RETAIN flag.

Differential Revision: https://reviews.llvm.org/D97448
2021-02-26 16:38:44 -08:00
Jon Roelofs 7f6e331645 Support `#pragma clang section` directives on MachO targets
rdar://59560986

Differential Revision: https://reviews.llvm.org/D97233
2021-02-25 09:30:10 -08:00
Petr Hosek 11a53f47fb Revert "[InstrProfiling] Use nobits as __llvm_prf_cnts section type in ELF"
This reverts commit 6b286d93f7 because
in some cases when the optimizer evaluates the global initializer,
__llvm_prf_cnts may not be entirely zero initialized.
2021-02-24 00:41:43 -08:00
Petr Hosek 6b286d93f7 [InstrProfiling] Use nobits as __llvm_prf_cnts section type in ELF
This can reduce the binary size because counters will no longer occupy
space in the binary, instead they will be allocated by dynamic linker.

Differential Revision: https://reviews.llvm.org/D97110
2021-02-20 14:20:33 -08:00
Pan, Tao 12edddafac [CodeGen] Fix two dots between text section name and symbol name
There is a trailing dot in text section name if it has prefix, don't add
repeated dot when connect text section name and symbol name.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D96327
2021-02-20 10:15:48 +08:00
Yang Fan 796feb6163
[MC][ELF] Fix unused variable warning (NFC)
GCC warning:
```
/llvm-project/llvm/lib/CodeGen/TargetLoweringObjectFileImpl.cpp: In member function ‘virtual llvm::MCSection* llvm::TargetLoweringObjectFileELF::getSectionForLSDA(const llvm::Function&, const llvm::MCSymbol&, const llvm::TargetMachine&) const’:
/llvm-project/llvm/lib/CodeGen/TargetLoweringObjectFileImpl.cpp:871:8: warning: variable ‘IsComdat’ set but not used [-Wunused-but-set-variable]
  871 |   bool IsComdat = false;
      |        ^~~~~~~~
```
2021-02-18 14:23:18 +08:00
Chen Zheng 5517923b1c [XCOFF][NFC] make csect properties optional for getXCOFFSection
We are going to support debug sections for XCOFF. So the csect
properties are not necessary. This patch makes these properties
optional.

Reviewed By: hubert.reinterpretcast

Differential Revision: https://reviews.llvm.org/D95931
2021-02-17 20:51:42 -05:00
Sriraman Tallam d1a838babc Basic block sections should enable function sections implicitly.
Basic block sections enables function sections implicitly, this is not needed
and is inefficient with "=list" option.

We had basic block sections enable function sections implicitly in clang. This
is particularly inefficient with "=list" option as it places functions that do
not have any basic block sections in separate sections. This causes unnecessary
object file overhead for large applications.

This patch disables this implicit behavior. It only creates function sections
for those functions that require basic block sections.

Further, there was an inconistent behavior with llc as llc was not turning on
function sections by default. This patch makes llc and clang consistent and
tests are added to check the new behavior.

This is the first of two patches and this adds functionality in LLVM to
create a new section for the entry block if function sections is not
enabled.

Differential Revision: https://reviews.llvm.org/D93876
2021-02-16 16:27:16 -08:00
Petr Hosek 16af973933 [MC][ELF] Support for zero flag section groups
This change introduces support for zero flag ELF section groups to LLVM.
LLVM already supports COMDAT sections, which in ELF are a special type
of ELF section groups. These are generally useful to enable linker GC
where you want a group of sections to always travel together, that is to
be either retained or discarded as a whole, but without the COMDAT
semantics. Other ELF assemblers already support zero flag ELF section
groups and this change helps us reach feature parity.

Differential Revision: https://reviews.llvm.org/D95851
2021-02-16 14:23:40 -08:00
Fangrui Song e44a100942 .gcc_except_table: Set SHF_LINK_ORDER if binutils>=2.36, and drop unneeded unique ID for -fno-unique-section-names
GNU ld>=2.36 supports mixed SHF_LINK_ORDER and non-SHF_LINK_ORDER sections in an
output section, so we can set SHF_LINK_ORDER if -fbinutils-version=2.36 or above.

If -fno-function-sections or older binutils, drop unique ID for -fno-unique-section-names.
The users can just specify -fbinutils-version=2.36 or above to allow GC with both GNU ld and LLD.
(LLD does not support garbage collection of non-group non-SHF_LINK_ORDER .gcc_except_table sections.)
2021-02-05 21:45:21 -08:00
Fangrui Song 34b60d8a56 Add -fbinutils-version= to gate ELF features on the specified binutils version
There are two use cases.

Assembler
We have accrued some code gated on MCAsmInfo::useIntegratedAssembler().  Some
features are supported by latest GNU as, but we have to use
MCAsmInfo::useIntegratedAs() because the newer versions have not been widely
adopted (e.g. SHF_LINK_ORDER 'o' and 'unique' linkage in 2.35, --compress-debug-sections= in 2.26).

Linker
We want to use features supported only by LLD or very new GNU ld, or don't want
to work around older GNU ld. We currently can't represent that "we don't care
about old GNU ld".  You can find such workarounds in a few other places, e.g.
Mips/MipsAsmprinter.cpp PowerPC/PPCTOCRegDeps.cpp X86/X86MCInstrLower.cpp
AArch64 TLS workaround for R_AARCH64_TLSLD_MOVW_DTPREL_* (PR ld/18276),
R_AARCH64_TLSLE_LDST8_TPREL_LO12 (https://bugs.llvm.org/show_bug.cgi?id=36727 https://sourceware.org/bugzilla/show_bug.cgi?id=22969)

Mixed SHF_LINK_ORDER and non-SHF_LINK_ORDER components (supported by LLD in D84001;
GNU ld feature request https://sourceware.org/bugzilla/show_bug.cgi?id=16833 may take a while before available).
This feature allows to garbage collect some unused sections (e.g. fragmented .gcc_except_table).

This patch adds `-fbinutils-version=` to clang and `-binutils-version` to llc.
It changes one codegen place in SHF_MERGE to demonstrate its usage.
`-fbinutils-version=2.35` means the produced object file does not care about GNU
ld<2.35 compatibility. When `-fno-integrated-as` is specified, the produced
assembly can be consumed by GNU as>=2.35, but older versions may not work.

`-fbinutils-version=none` means that we can use all ELF features, regardless of
GNU as/ld support.

Both clang and llc need `parseBinutilsVersion`. Such command line parsing is
usually implemented in `llvm/lib/CodeGen/CommandFlags.cpp` (LLVMCodeGen),
however, ClangCodeGen does not depend on LLVMCodeGen. So I add
`parseBinutilsVersion` to `llvm/lib/Target/TargetMachine.cpp` (LLVMTarget).

Differential Revision: https://reviews.llvm.org/D85474
2021-01-26 12:28:23 -08:00
Amanieu d'Antras 21bfd068b3 [AArch64] Add support for the GNU ILP32 ABI
Add the aarch64[_be]-*-gnu_ilp32 targets to support the GNU ILP32 ABI for AArch64.

The needed codegen changes were mostly already implemented in D61259, which added support for the watchOS ILP32 ABI. The main changes are:
- Wiring up the new target to enable ILP32 codegen and MC.
- ILP32 va_list support.
- ILP32 TLSDESC relocation support.

There was existing MC support for ELF ILP32 relocations from D25159 which could be enabled by passing "-target-abi ilp32" to llvm-mc. This was changed to check for "gnu_ilp32" in the target triple instead. This shouldn't cause any issues since the existing support was slightly broken: it was generating ELF64 objects instead of the ELF32 object files expected by the GNU ILP32 toolchain.

This target has been tested by running the full rustc testsuite on a big-endian ILP32 system based on the GCC ILP32 toolchain.

Reviewed By: kristof.beyls

Differential Revision: https://reviews.llvm.org/D94143
2021-01-20 13:34:47 +00:00
Kazu Hirata 12fc9ca3a4 [llvm] Remove redundant string initialization (NFC)
Identified with readability-redundant-string-init.
2021-01-12 21:43:46 -08:00
Brandon Bergren 8f004471c2 [PowerPC] Add the LLVM triple for powerpcle [1/5]
Add a triple for powerpcle-*-*.

This is a little-endian encoding of the 32-bit PowerPC ABI, useful in certain niche situations:

1) A loader such as the FreeBSD loader which will be loading a little endian kernel. This is required for PowerPC64LE to load properly in pseries VMs.
Such a loader is implemented as a freestanding ELF32 LSB binary.

2) Userspace emulation of a 32-bit LE architecture such as x86 on 64-bit hosts such as PowerPC64LE with tools like box86 requires having a 32-bit LE toolchain and library set, as they operate by translating only the main binary and switching to native code when making library calls.

3) The Void Linux for PowerPC project is experimenting with running an entire powerpcle userland.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D93918
2021-01-02 12:17:22 -06:00
diggerlin a1e1dcabe4 [XCOFF][AIX] Emit EH information in traceback table
SUMMARY:

In order for the runtime on AIX to find the compact unwind section(EHInfo table),
we would need to set the following on the traceback table:

The 6th byte's longtbtable field to true to signal there is an Extended TB Table Flag.
The Extended TB Table Flag to be 0x08 to signal there is an exception handling info presents.
Emit the offset between ehinfo TC entry and TOC base after all other optional portions of traceback table.

The patch is authored by Jason Liu.

Reviewers: David Tenty, Digger Lin
Differential Revision: https://reviews.llvm.org/D92766
2020-12-16 09:34:59 -05:00
Zequan Wu b6b522c4db [NFC] cleanup cg-profile emission on TargetLowerinng
Differential Revision: https://reviews.llvm.org/D93150
2020-12-14 13:07:44 -08:00
Hongtao Yu 705a4c149d [CSSPGO] Pseudo probe encoding and emission.
This change implements pseudo probe encoding and emission for CSSPGO. Please see RFC here for more context: https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s

Pseudo probes are in the form of intrinsic calls on IR/MIR but they do not turn into any machine instructions. Instead they are emitted into the binary as a piece of data in standalone sections.  The probe-specific sections are not needed to be loaded into memory at execution time, thus they do not incur a runtime overhead. 

**ELF object emission**

The binary data to emit are organized as two ELF sections, i.e, the `.pseudo_probe_desc` section and the `.pseudo_probe` section. The `.pseudo_probe_desc` section stores a function descriptor for each function and the `.pseudo_probe` section stores the actual probes, each fo which corresponds to an IR basic block or an IR function callsite. A function descriptor is stored as a module-level metadata during the compilation and is serialized into the object file during object emission.

Both the probe descriptors and pseudo probes can be emitted into a separate ELF section per function to leverage the linker for deduplication.  A `.pseudo_probe` section shares the same COMDAT group with the function code so that when the function is dead, the probes are dead and disposed too. On the contrary, a `.pseudo_probe_desc` section has its own COMDAT group. This is because even if a function is dead, its probes may be inlined into other functions and its descriptor is still needed by the profile generation tool.

The format of `.pseudo_probe_desc` section looks like:

```
.section   .pseudo_probe_desc,"",@progbits
.quad   6309742469962978389  // Func GUID
.quad   4294967295           // Func Hash
.byte   9                    // Length of func name
.ascii  "_Z5funcAi"          // Func name
.quad   7102633082150537521
.quad   138828622701
.byte   12
.ascii  "_Z8funcLeafi"
.quad   446061515086924981
.quad   4294967295
.byte   9
.ascii  "_Z5funcBi"
.quad   -2016976694713209516
.quad   72617220756
.byte   7
.ascii  "_Z3fibi"
```

For each `.pseudoprobe` section, the encoded binary data consists of a single function record corresponding to an outlined function (i.e, a function with a code entry in the `.text` section). A function record has the following format :

```
FUNCTION BODY (one for each outlined function present in the text section)
    GUID (uint64)
        GUID of the function
    NPROBES (ULEB128)
        Number of probes originating from this function.
    NUM_INLINED_FUNCTIONS (ULEB128)
        Number of callees inlined into this function, aka number of
        first-level inlinees
    PROBE RECORDS
        A list of NPROBES entries. Each entry contains:
          INDEX (ULEB128)
          TYPE (uint4)
            0 - block probe, 1 - indirect call, 2 - direct call
          ATTRIBUTE (uint3)
            reserved
          ADDRESS_TYPE (uint1)
            0 - code address, 1 - address delta
          CODE_ADDRESS (uint64 or ULEB128)
            code address or address delta, depending on ADDRESS_TYPE
    INLINED FUNCTION RECORDS
        A list of NUM_INLINED_FUNCTIONS entries describing each of the inlined
        callees.  Each record contains:
          INLINE SITE
            GUID of the inlinee (uint64)
            ID of the callsite probe (ULEB128)
          FUNCTION BODY
            A FUNCTION BODY entry describing the inlined function.
```

To support building a context-sensitive profile, probes from inlinees are grouped by their inline contexts. An inline context is logically a call path through which a callee function lands in a caller function. The probe emitter builds an inline tree based on the debug metadata for each outlined function in the form of a trie tree. A tree root is the outlined function. Each tree edge stands for a callsite where inlining happens. Pseudo probes originating from an inlinee function are stored in a tree node and the tree path starting from the root all the way down to the tree node is the inline context of the probes. The emission happens on the whole tree top-down recursively. Probes of a tree node will be emitted altogether with their direct parent edge. Since a pseudo probe corresponds to a real code address, for size savings, the address is encoded as a delta from the previous probe except for the first probe. Variant-sized integer encoding, aka LEB128, is used for address delta and probe index.

**Assembling**

Pseudo probes can be printed as assembly directives alternatively. This allows for good assembly code readability and also provides a view of how optimizations and pseudo probes affect each other, especially helpful for diff time assembly analysis.

A pseudo probe directive has the following operands in order: function GUID, probe index, probe type, probe attributes and inline context. The directive is generated by the compiler and can be parsed by the assembler to form an encoded `.pseudoprobe` section in the object file.

A example assembly looks like:

```
foo2: # @foo2
# %bb.0: # %bb0
pushq %rax
testl %edi, %edi
.pseudoprobe 837061429793323041 1 0 0
je .LBB1_1
# %bb.2: # %bb2
.pseudoprobe 837061429793323041 6 2 0
callq foo
.pseudoprobe 837061429793323041 3 0 0
.pseudoprobe 837061429793323041 4 0 0
popq %rax
retq
.LBB1_1: # %bb1
.pseudoprobe 837061429793323041 5 1 0
callq *%rsi
.pseudoprobe 837061429793323041 2 0 0
.pseudoprobe 837061429793323041 4 0 0
popq %rax
retq
# -- End function
.section .pseudo_probe_desc,"",@progbits
.quad 6699318081062747564
.quad 72617220756
.byte 3
.ascii "foo"
.quad 837061429793323041
.quad 281547593931412
.byte 4
.ascii "foo2"
```

With inlining turned on, the assembly may look different around %bb2 with an inlined probe:

```
# %bb.2:                                # %bb2
.pseudoprobe    837061429793323041 3 0
.pseudoprobe    6699318081062747564 1 0 @ 837061429793323041:6
.pseudoprobe    837061429793323041 4 0
popq    %rax
retq
```

**Disassembling**

We have a disassembling tool (llvm-profgen) that can display disassembly alongside with pseudo probes. So far it only supports ELF executable file.

An example disassembly looks like:

```
00000000002011a0 <foo2>:
  2011a0: 50                    push   rax
  2011a1: 85 ff                 test   edi,edi
  [Probe]:  FUNC: foo2  Index: 1  Type: Block
  2011a3: 74 02                 je     2011a7 <foo2+0x7>
  [Probe]:  FUNC: foo2  Index: 3  Type: Block
  [Probe]:  FUNC: foo2  Index: 4  Type: Block
  [Probe]:  FUNC: foo   Index: 1  Type: Block  Inlined: @ foo2:6
  2011a5: 58                    pop    rax
  2011a6: c3                    ret
  [Probe]:  FUNC: foo2  Index: 2  Type: Block
  2011a7: bf 01 00 00 00        mov    edi,0x1
  [Probe]:  FUNC: foo2  Index: 5  Type: IndirectCall
  2011ac: ff d6                 call   rsi
  [Probe]:  FUNC: foo2  Index: 4  Type: Block
  2011ae: 58                    pop    rax
  2011af: c3                    ret
```

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D91878
2020-12-10 17:29:28 -08:00
Mitch Phillips 7ead5f5aa3 Revert "[CSSPGO] Pseudo probe encoding and emission."
This reverts commit b035513c06.

Reason: Broke the ASan buildbots:
  http://lab.llvm.org:8011/#/builders/5/builds/2269
2020-12-10 15:53:39 -08:00
Hongtao Yu b035513c06 [CSSPGO] Pseudo probe encoding and emission.
This change implements pseudo probe encoding and emission for CSSPGO. Please see RFC here for more context: https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s

Pseudo probes are in the form of intrinsic calls on IR/MIR but they do not turn into any machine instructions. Instead they are emitted into the binary as a piece of data in standalone sections.  The probe-specific sections are not needed to be loaded into memory at execution time, thus they do not incur a runtime overhead. 

**ELF object emission**

The binary data to emit are organized as two ELF sections, i.e, the `.pseudo_probe_desc` section and the `.pseudo_probe` section. The `.pseudo_probe_desc` section stores a function descriptor for each function and the `.pseudo_probe` section stores the actual probes, each fo which corresponds to an IR basic block or an IR function callsite. A function descriptor is stored as a module-level metadata during the compilation and is serialized into the object file during object emission.

Both the probe descriptors and pseudo probes can be emitted into a separate ELF section per function to leverage the linker for deduplication.  A `.pseudo_probe` section shares the same COMDAT group with the function code so that when the function is dead, the probes are dead and disposed too. On the contrary, a `.pseudo_probe_desc` section has its own COMDAT group. This is because even if a function is dead, its probes may be inlined into other functions and its descriptor is still needed by the profile generation tool.

The format of `.pseudo_probe_desc` section looks like:

```
.section   .pseudo_probe_desc,"",@progbits
.quad   6309742469962978389  // Func GUID
.quad   4294967295           // Func Hash
.byte   9                    // Length of func name
.ascii  "_Z5funcAi"          // Func name
.quad   7102633082150537521
.quad   138828622701
.byte   12
.ascii  "_Z8funcLeafi"
.quad   446061515086924981
.quad   4294967295
.byte   9
.ascii  "_Z5funcBi"
.quad   -2016976694713209516
.quad   72617220756
.byte   7
.ascii  "_Z3fibi"
```

For each `.pseudoprobe` section, the encoded binary data consists of a single function record corresponding to an outlined function (i.e, a function with a code entry in the `.text` section). A function record has the following format :

```
FUNCTION BODY (one for each outlined function present in the text section)
    GUID (uint64)
        GUID of the function
    NPROBES (ULEB128)
        Number of probes originating from this function.
    NUM_INLINED_FUNCTIONS (ULEB128)
        Number of callees inlined into this function, aka number of
        first-level inlinees
    PROBE RECORDS
        A list of NPROBES entries. Each entry contains:
          INDEX (ULEB128)
          TYPE (uint4)
            0 - block probe, 1 - indirect call, 2 - direct call
          ATTRIBUTE (uint3)
            reserved
          ADDRESS_TYPE (uint1)
            0 - code address, 1 - address delta
          CODE_ADDRESS (uint64 or ULEB128)
            code address or address delta, depending on ADDRESS_TYPE
    INLINED FUNCTION RECORDS
        A list of NUM_INLINED_FUNCTIONS entries describing each of the inlined
        callees.  Each record contains:
          INLINE SITE
            GUID of the inlinee (uint64)
            ID of the callsite probe (ULEB128)
          FUNCTION BODY
            A FUNCTION BODY entry describing the inlined function.
```

To support building a context-sensitive profile, probes from inlinees are grouped by their inline contexts. An inline context is logically a call path through which a callee function lands in a caller function. The probe emitter builds an inline tree based on the debug metadata for each outlined function in the form of a trie tree. A tree root is the outlined function. Each tree edge stands for a callsite where inlining happens. Pseudo probes originating from an inlinee function are stored in a tree node and the tree path starting from the root all the way down to the tree node is the inline context of the probes. The emission happens on the whole tree top-down recursively. Probes of a tree node will be emitted altogether with their direct parent edge. Since a pseudo probe corresponds to a real code address, for size savings, the address is encoded as a delta from the previous probe except for the first probe. Variant-sized integer encoding, aka LEB128, is used for address delta and probe index.

**Assembling**

Pseudo probes can be printed as assembly directives alternatively. This allows for good assembly code readability and also provides a view of how optimizations and pseudo probes affect each other, especially helpful for diff time assembly analysis.

A pseudo probe directive has the following operands in order: function GUID, probe index, probe type, probe attributes and inline context. The directive is generated by the compiler and can be parsed by the assembler to form an encoded `.pseudoprobe` section in the object file.

A example assembly looks like:

```
foo2: # @foo2
# %bb.0: # %bb0
pushq %rax
testl %edi, %edi
.pseudoprobe 837061429793323041 1 0 0
je .LBB1_1
# %bb.2: # %bb2
.pseudoprobe 837061429793323041 6 2 0
callq foo
.pseudoprobe 837061429793323041 3 0 0
.pseudoprobe 837061429793323041 4 0 0
popq %rax
retq
.LBB1_1: # %bb1
.pseudoprobe 837061429793323041 5 1 0
callq *%rsi
.pseudoprobe 837061429793323041 2 0 0
.pseudoprobe 837061429793323041 4 0 0
popq %rax
retq
# -- End function
.section .pseudo_probe_desc,"",@progbits
.quad 6699318081062747564
.quad 72617220756
.byte 3
.ascii "foo"
.quad 837061429793323041
.quad 281547593931412
.byte 4
.ascii "foo2"
```

With inlining turned on, the assembly may look different around %bb2 with an inlined probe:

```
# %bb.2:                                # %bb2
.pseudoprobe    837061429793323041 3 0
.pseudoprobe    6699318081062747564 1 0 @ 837061429793323041:6
.pseudoprobe    837061429793323041 4 0
popq    %rax
retq
```

**Disassembling**

We have a disassembling tool (llvm-profgen) that can display disassembly alongside with pseudo probes. So far it only supports ELF executable file.

An example disassembly looks like:

```
00000000002011a0 <foo2>:
  2011a0: 50                    push   rax
  2011a1: 85 ff                 test   edi,edi
  [Probe]:  FUNC: foo2  Index: 1  Type: Block
  2011a3: 74 02                 je     2011a7 <foo2+0x7>
  [Probe]:  FUNC: foo2  Index: 3  Type: Block
  [Probe]:  FUNC: foo2  Index: 4  Type: Block
  [Probe]:  FUNC: foo   Index: 1  Type: Block  Inlined: @ foo2:6
  2011a5: 58                    pop    rax
  2011a6: c3                    ret
  [Probe]:  FUNC: foo2  Index: 2  Type: Block
  2011a7: bf 01 00 00 00        mov    edi,0x1
  [Probe]:  FUNC: foo2  Index: 5  Type: IndirectCall
  2011ac: ff d6                 call   rsi
  [Probe]:  FUNC: foo2  Index: 4  Type: Block
  2011ae: 58                    pop    rax
  2011af: c3                    ret
```

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D91878
2020-12-10 09:50:08 -08:00
Pan, Tao 7af802994e [CodeGen] Add text section prefix for COFF object file
Text section prefix is created in CodeGenPrepare, it's file format independent implementation,  text section name is written into object file in TargetLoweringObjectFile, it's file format dependent implementation, port code of adding text section prefix to text section name from ELF to COFF.
Different with ELF that use '.' as concatenation character, COFF use '$' as concatenation character. That is, concatenation character is variable, so split concatenation character from text section prefix.
Text section prefix is existing feature of ELF, it can help to reduce icache and itlb misses, it's also make possible aggregate other compilers e.g. v8 created same prefix sections. Furthermore, the recent feature Machine Function Splitter (basic block level text prefix section) is based on text section prefix.

Reviewed By: pengfei, rnk

Differential Revision: https://reviews.llvm.org/D92073
2020-12-08 18:56:21 +08:00
jasonliu a65d8c5d72 [XCOFF][AIX] Generate LSDA data and compact unwind section on AIX
Summary:
AIX uses the existing EH infrastructure in clang and llvm.
The major differences would be
1. AIX do not have CFI instructions.
2. AIX uses a new personality routine, named __xlcxx_personality_v1.
   It doesn't use the GCC personality rountine, because the
   interoperability is not there yet on AIX.
3. AIX do not use eh_frame sections. Instead, it would use a eh_info
section (compat unwind section) to store the information about
personality routine and LSDA data address.

Reviewed By: daltenty, hubert.reinterpretcast

Differential Revision: https://reviews.llvm.org/D91455
2020-12-02 18:42:44 +00:00
Leonard Chan a97f62837f [llvm][IR] Add dso_local_equivalent Constant
The `dso_local_equivalent` constant is a wrapper for functions that represents a
value which is functionally equivalent to the global passed to this. That is, if
this accepts a function, calling this constant should have the same effects as
calling the function directly. This could be a direct reference to the function,
the `@plt` modifier on X86/AArch64, a thunk, or anything that's equivalent to the
resolved function as a call target.

When lowered, the returned address must have a constant offset at link time from
some other symbol defined within the same binary. The address of this value is
also insignificant. The name is leveraged from `dso_local` where use of a function
or variable is resolved to a symbol in the same linkage unit.

In this patch:
- Addition of `dso_local_equivalent` and handling it
- Update Constant::needsRelocation() to strip constant inbound GEPs and take
  advantage of `dso_local_equivalent` for relative references

This is useful for the [Relative VTables C++ ABI](https://reviews.llvm.org/D72959)
which makes vtables readonly. This works by replacing the dynamic relocations for
function pointers in them with static relocations that represent the offset between
the vtable and virtual functions. If a function is externally defined,
`dso_local_equivalent` can be used as a generic wrapper for the function to still
allow for this static offset calculation to be done.

See [RFC](http://lists.llvm.org/pipermail/llvm-dev/2020-August/144469.html) for more details.

Differential Revision: https://reviews.llvm.org/D77248
2020-11-19 10:26:17 -08:00
Yuanfang Chen a223354161 [CGProfile] allows bitcast in metadata node storing function pointers
For example,  during RAUW in IRMover, the `Function` ValueAsMetadata in "CG Profile" could become bitcast.

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D88433
2020-11-13 09:28:21 -08:00
jasonliu 42d2109380 [XCOFF] Enable explicit sections on AIX
Implement mechanism to allow explicit sections to be generated on AIX.

Reviewed By: DiggerLin

Differential Revision: https://reviews.llvm.org/D88615
2020-11-09 16:27:38 +00:00
Jessica Clarke f77c8a48ae [CodeGen] Fix regression from D83655
Arm EHABI has a null LSDASection as it does its own thing, so we should
continue to return null in that case rather than try and cast it.
2020-11-03 03:57:46 +00:00
Fangrui Song ee5d1a0449 [AsmPrinter] Split up .gcc_except_table
MC currently produces monolithic .gcc_except_table section. GCC can split up .gcc_except_table:

* if comdat: `.section .gcc_except_table._Z6comdatv,"aG",@progbits,_Z6comdatv,comdat`
* otherwise, if -ffunction-sections: `.section .gcc_except_table._Z3fooi,"a",@progbits`

This ensures that (a) non-prevailing copies are discarded and (b)
.gcc_except_table associated to discarded text sections can be discarded by a
.gcc_except_table-aware linker (GNU ld, but not gold or LLD)

This patches matches the GCC behavior. If -fno-unique-section-names is
specified, we don't append the suffix. If -ffunction-sections is additionally specified,
use `.section ...,unique`.

Note, if clang driver communicates that the linker is LLD and we know it
is new (11.0.0 or later) we can use SHF_LINK_ORDER to avoid string table
costs, at least in the -fno-unique-section-names case. We cannot use it on GNU
ld because as of binutils 2.35 it does not support mixed SHF_LINK_ORDER &
non-SHF_LINK_ORDER components in an output section
https://sourceware.org/bugzilla/show_bug.cgi?id=26256

For RISC-V -mrelax, this patch additionally fixes an assembler-linker
interaction problem: because a section is shrinkable, the length of a call-site
code range is not a constant. Relocations referencing the associated text
section (STT_SECTION) are needed. However, a STB_LOCAL relocation referencing a
discarded section group member from outside the group is disallowed by the ELF
specification (PR46675):

```
// a.cc
inline int comdat() { try { throw 1; } catch (int) { return 1; } return 0; }
int main() { return comdat(); }

// b.cc
inline int comdat() { try { throw 1; } catch (int) { return 1; } return 0; }
int foo() { return comdat(); }

clang++ -target riscv64-linux -c a.cc b.cc -fPIC -mno-relax
ld.lld -shared a.o b.o => ld.lld: error: relocation refers to a symbol in a discarded section:
```

-fbasic-block-sections= is similar to RISC-V -mrelax: there are outstanding relocations.

Reviewed By: jrtc27, rahmanl

Differential Revision: https://reviews.llvm.org/D83655
2020-11-02 14:36:25 -08:00
Snehasish Kumar 77638a5343 [llvm] Set the default for -bbsections-cold-text-prefix to .text.split.
After using this for a while, we find that it is generally useful to
have it set to .text.split. by default, removing the need for an
additional -mllvm option.

Differential Revision: https://reviews.llvm.org/D88997
2020-10-14 12:16:36 -07:00
jasonliu 78a9e62aa6 [XCOFF] Enable -fdata-sections on AIX
Summary:
Some design decision worth noting about:

I've noticed a recent mailing discussing about why string literal is
not affected by -fdata-sections for ELF target:
http://lists.llvm.org/pipermail/llvm-dev/2020-September/145121.html

But on AIX, our linker could not split the mergeable string like other target.
So I think it would make more sense for us to emit separate csect for
every mergeable string in -fdata-sections mode,
as there might not be other ways for linker to do garbage collection
on unused mergeable string.

Reviewed By: daltenty, hubert.reinterpretcast

Differential Revision: https://reviews.llvm.org/D88339
2020-10-02 00:16:24 +00:00
Zequan Wu 6c91e623e5 [CodeGen] emit CG profile for COFF object file
Differential Revision: https://reviews.llvm.org/D87811
2020-09-29 12:03:30 -07:00
Arthur Eubanks a2578e92e2 Revert "Reland [CodeGen] emit CG profile for COFF object file"
This reverts commit 506b6170cb.

This still causes link errors, see https://crbug.com/1130780.
2020-09-27 22:43:14 -07:00
Snehasish Kumar d2696dec45 [llvm] Add -bbsections-cold-text-prefix to emit cold clusters to a different section.
This change adds an option to basic block sections to allow cold
clusters to be assigned a custom text prefix. With a custom prefix such
as ".text.split." (D87840), lld can place them in a separate output section.
The benefits are -

* Empirically shown to improve icache and itlb metrics by 3-5%
(absolute) compared to placing split parts in .text.unlikely.
* Mitigates against poor profiles, eg samplePGO profiles used with the
machine function splitter. Optimizations such as hugepage remapping can
make different decisions at the section granularity.
* Enables section granularity hotness monitoring (checking on the
decisions made during compilation vs sample data from production).

Differential Revision: https://reviews.llvm.org/D87813
2020-09-24 15:26:15 -07:00
Zequan Wu 506b6170cb Reland [CodeGen] emit CG profile for COFF object file
This reverts commit 90242caca2.

Error fixed at f5435399e8

Differential Revision: https://reviews.llvm.org/D87811
2020-09-24 14:38:53 -07:00
Reid Kleckner 90242caca2 Revert "[CodeGen] emit CG profile for COFF object file"
This reverts commit 91aed9bf97, it is
causing link errors.
2020-09-22 13:47:39 -07:00
Reid Kleckner 9932561b48 [COFF] Move per-global .drective emission from AsmPrinter to TLOFCOFF
This changes the order of output sections and the output assembly, but
is otherwise NFC.

It simplifies the TLOF interface by removing two COFF-only methods.
2020-09-18 14:31:01 -07:00
Zequan Wu 91aed9bf97 [CodeGen] emit CG profile for COFF object file
I forgot to add emission of CG profile for COFF object file, when adding the support (https://reviews.llvm.org/D81775)

Differential Revision: https://reviews.llvm.org/D87811
2020-09-18 10:57:54 -07:00
Fangrui Song 82d0749749 [TargetLoweringObjectFileImpl] Make .llvmbc and .llvmcmd non-SHF_ALLOC
There are two ways .llvmbc can be produced:

* clang -c -fembed-bitcode=all (which also produces .llvmcmd)
* LTO backend: ld.lld -mllvm -lto-embed-bitcode or -plugin-opt=-lto-embed-bitcode

.llvmbc and .llvmcmd have the SHF_ALLOC flag, so they can be dropped by
--gc-sections.

This patch sets SectionKind::Metadata to drop the SHF_ALLOC flag. This
is conceptually correct: the two sections are not part of the process
image, so SHF_ALLOC is not appropriate.

`test/LTO/X86/embed-bitcode.ll`: changed `llvm-objcopy -O binary --only-section` to
`llvm-objcopy --dump-section`. `-O binary` does not dump non-SHF_ALLOC sections.

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D86374
2020-08-25 13:37:29 -07:00