Commit Graph

445 Commits

Author SHA1 Message Date
spupyrev eecd41aa09 Revert "Rebase: [Facebook] [MC] Introduce NeverAlign fragment type"
This reverts commit 6d0528636a.
2022-07-11 09:50:47 -07:00
Rafael Auler 6d0528636a Rebase: [Facebook] [MC] Introduce NeverAlign fragment type
Summary:
Introduce NeverAlign fragment type.

The intended usage of this fragment is to insert it before a pair of
macro-op fusion eligible instructions. NeverAlign fragment ensures that
the next fragment (first instruction in the pair) does not end at a
given alignment boundary by emitting a minimal size nop if necessary.

In effect, it ensures that a pair of macro-fusible instructions is not
split by a given alignment boundary, which is a precondition for
macro-op fusion in modern Intel Cores (64B = cache line size, see Intel
Architecture Optimization Reference Manual, 2.3.2.1 Legacy Decode
Pipeline: Macro-Fusion).

This patch introduces functionality used by BOLT when emitting code with
MacroFusion alignment already in place.

The use case is different from BoundaryAlign and instruction bundling:
- BoundaryAlign can be extended to perform the desired alignment for the
first instruction in the macro-op fusion pair (D101817). However, this
approach has higher overhead due to reliance on relaxation as
BoundaryAlign requires in the general case - see
https://reviews.llvm.org/D97982#2710638.
- Instruction bundling: the intent of NeverAlign fragment is to prevent
the first instruction in a pair ending at a given alignment boundary, by
inserting at most one minimum size nop. It's OK if either instruction
crosses the cache line. Padding both instructions using bundles to not
cross the alignment boundary would result in excessive padding. There's
no straightforward way to request instruction bundling to avoid a given
end alignment for the first instruction in the bundle.

LLVM: https://reviews.llvm.org/D97982

Manual rebase conflict history:
https://phabricator.intern.facebook.com/D30142613

Test Plan: sandcastle

Reviewers: #llvm-bolt

Subscribers: phabricatorlinter

Differential Revision: https://phabricator.intern.facebook.com/D31361547
2022-07-11 09:31:52 -07:00
Guillaume Chatelet 412c788ab0 [NFC][Alignment] Use Align in MCAlignFragment 2022-06-15 12:31:00 +00:00
Fangrui Song 689c3a2552 [MC] Fix letter case of some MCSection member functions 2022-03-11 20:07:00 -08:00
serge-sans-paille ef736a1c39 Cleanup LLVMMC headers
There's a few relevant forward declarations in there that may require downstream
adding explicit includes:

llvm/MC/MCContext.h no longer includes llvm/BinaryFormat/ELF.h, llvm/MC/MCSubtargetInfo.h, llvm/MC/MCTargetOptions.h
llvm/MC/MCObjectStreamer.h no longer include llvm/MC/MCAssembler.h
llvm/MC/MCAssembler.h no longer includes llvm/MC/MCFixup.h, llvm/MC/MCFragment.h

Counting preprocessed lines required to rebuild llvm-project on my setup:
before: 1052436830
after:  1049293745

Which is significant and backs up the change in addition to the usual benefits of
decreasing coupling between headers and compilation units.

Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D119244
2022-02-09 11:09:17 +01:00
Alex Lorenz 0756aa3978 [macho] add support for emitting macho files with two build version load commands
This patch extends LLVM IR to add metadata that can be used to emit macho files with two build version load commands.
It utilizes "darwin.target_variant.triple" and "darwin.target_variant.SDK Version" metadata names for that,
which will be set by a future patch in clang.

MachO uses two build version load commands to represent an object file / binary that is targeting both the macOS target,
and the Mac Catalyst target. At runtime, a dynamic library that supports both targets can be loaded from either a native
macOS or a Mac Catalyst app on a macOS system. We want to add support to this to upstream to LLVM to be able to build
compiler-rt for both targets, to finish the complete support for the Mac Catalyst platform, which is right now targetable
by upstream clang, but the compiler-rt bits aren't supported because of the lack of this multiple build version support.

Differential Revision: https://reviews.llvm.org/D112189
2021-12-07 18:17:47 -08:00
Peter Smith e63455d5e0 [MC] Use local MCSubtargetInfo in writeNops
On some architectures such as Arm and X86 the encoding for a nop may
change depending on the subtarget in operation at the time of
encoding. This change replaces the per module MCSubtargetInfo retained
by the targets AsmBackend in favour of passing through the local
MCSubtargetInfo in operation at the time.

On Arm using the architectural NOP instruction can have a performance
benefit on some implementations.

For Arm I've deleted the copy of the AsmBackend's MCSubtargetInfo to
limit the chances of this causing problems in the future. I've not
done this for other targets such as X86 as there is more frequent use
of the MCSubtargetInfo and it looks to be for stable properties that
we would not expect to vary per function.

This change required threading STI through MCNopsFragment and
MCBoundaryAlignFragment.

I've attempted to take into account the in tree experimental backends.

Differential Revision: https://reviews.llvm.org/D45962
2021-09-07 15:46:19 +01:00
Saleem Abdulrasool bbea64250f RISCV: adjust handling of relocation emission for RISCV
This re-architects the RISCV relocation handling to bring the
implementation closer in line with the implementation in binutils.  We
would previously aggressively resolve the relocation.  With this
restructuring, we always will emit a paired relocation for any symbolic
difference of the type of S±T[±C] where S and T are labels and C is a
constant.

GAS has a special target hook controlled by `RELOC_EXPANSION_POSSIBLE`
which indicates that a fixup may be expanded into multiple relocations.
This is used by the RISCV backend to always emit a paired relocation -
either ADD[WIDTH] + SUB[WIDTH] for text relocations or SET[WIDTH] +
SUB[WIDTH] for a debug info relocation.  Irrespective of whether linker
relaxation support is enabled, symbolic difference is always emitted as
a paired relocation.

This change also sinks the target specific behaviour down into the
target specific area rather than exposing it to the shared relocation
handling.  In the process, we also sink the "special" handling for debug
information down into the RISCV target.  Although this improves the path
for the other targets, this is not necessarily entirely ideal either.
The changes in the debug info emission could be done through another
type of hook as this functionality would be required by any other target
which wishes to do linker relaxation.  However, as there are no other
targets in LLVM which currently do this, this is a reasonable thing to
do until such time as the code needs to be shared.

Improve the handling of the relocation (and add a reduced test case from
the Linux kernel) to ensure that we handle complex expressions for
symbolic difference.  This ensures that we correct relocate symbols with
the adddends normalized and associated with the addition portion of the
paired relocation.

This change also addresses some review comments from Alex Bradbury about
the relocations meant for use in the DWARF CFA being named incorrectly
(using ADD6 instead of SET6) in the original change which introduced the
relocation type.

This resolves the issues with the symbolic difference emission
sufficiently to enable building the Linux kernel with clang+IAS+lld
(without linker relaxation).

Resolves PR50153, PR50156!
Fixes: ClangBuiltLinux/linux#1023, ClangBuiltLinux/linux#1143

Reviewed By: nickdesaulniers, maskray

Differential Revision: https://reviews.llvm.org/D103539
2021-06-17 08:20:02 -07:00
Fangrui Song 71635ea5ff MCDwarf: Delete uneeded parameter
And change signature
2021-01-21 00:55:07 -08:00
Hongtao Yu 705a4c149d [CSSPGO] Pseudo probe encoding and emission.
This change implements pseudo probe encoding and emission for CSSPGO. Please see RFC here for more context: https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s

Pseudo probes are in the form of intrinsic calls on IR/MIR but they do not turn into any machine instructions. Instead they are emitted into the binary as a piece of data in standalone sections.  The probe-specific sections are not needed to be loaded into memory at execution time, thus they do not incur a runtime overhead. 

**ELF object emission**

The binary data to emit are organized as two ELF sections, i.e, the `.pseudo_probe_desc` section and the `.pseudo_probe` section. The `.pseudo_probe_desc` section stores a function descriptor for each function and the `.pseudo_probe` section stores the actual probes, each fo which corresponds to an IR basic block or an IR function callsite. A function descriptor is stored as a module-level metadata during the compilation and is serialized into the object file during object emission.

Both the probe descriptors and pseudo probes can be emitted into a separate ELF section per function to leverage the linker for deduplication.  A `.pseudo_probe` section shares the same COMDAT group with the function code so that when the function is dead, the probes are dead and disposed too. On the contrary, a `.pseudo_probe_desc` section has its own COMDAT group. This is because even if a function is dead, its probes may be inlined into other functions and its descriptor is still needed by the profile generation tool.

The format of `.pseudo_probe_desc` section looks like:

```
.section   .pseudo_probe_desc,"",@progbits
.quad   6309742469962978389  // Func GUID
.quad   4294967295           // Func Hash
.byte   9                    // Length of func name
.ascii  "_Z5funcAi"          // Func name
.quad   7102633082150537521
.quad   138828622701
.byte   12
.ascii  "_Z8funcLeafi"
.quad   446061515086924981
.quad   4294967295
.byte   9
.ascii  "_Z5funcBi"
.quad   -2016976694713209516
.quad   72617220756
.byte   7
.ascii  "_Z3fibi"
```

For each `.pseudoprobe` section, the encoded binary data consists of a single function record corresponding to an outlined function (i.e, a function with a code entry in the `.text` section). A function record has the following format :

```
FUNCTION BODY (one for each outlined function present in the text section)
    GUID (uint64)
        GUID of the function
    NPROBES (ULEB128)
        Number of probes originating from this function.
    NUM_INLINED_FUNCTIONS (ULEB128)
        Number of callees inlined into this function, aka number of
        first-level inlinees
    PROBE RECORDS
        A list of NPROBES entries. Each entry contains:
          INDEX (ULEB128)
          TYPE (uint4)
            0 - block probe, 1 - indirect call, 2 - direct call
          ATTRIBUTE (uint3)
            reserved
          ADDRESS_TYPE (uint1)
            0 - code address, 1 - address delta
          CODE_ADDRESS (uint64 or ULEB128)
            code address or address delta, depending on ADDRESS_TYPE
    INLINED FUNCTION RECORDS
        A list of NUM_INLINED_FUNCTIONS entries describing each of the inlined
        callees.  Each record contains:
          INLINE SITE
            GUID of the inlinee (uint64)
            ID of the callsite probe (ULEB128)
          FUNCTION BODY
            A FUNCTION BODY entry describing the inlined function.
```

To support building a context-sensitive profile, probes from inlinees are grouped by their inline contexts. An inline context is logically a call path through which a callee function lands in a caller function. The probe emitter builds an inline tree based on the debug metadata for each outlined function in the form of a trie tree. A tree root is the outlined function. Each tree edge stands for a callsite where inlining happens. Pseudo probes originating from an inlinee function are stored in a tree node and the tree path starting from the root all the way down to the tree node is the inline context of the probes. The emission happens on the whole tree top-down recursively. Probes of a tree node will be emitted altogether with their direct parent edge. Since a pseudo probe corresponds to a real code address, for size savings, the address is encoded as a delta from the previous probe except for the first probe. Variant-sized integer encoding, aka LEB128, is used for address delta and probe index.

**Assembling**

Pseudo probes can be printed as assembly directives alternatively. This allows for good assembly code readability and also provides a view of how optimizations and pseudo probes affect each other, especially helpful for diff time assembly analysis.

A pseudo probe directive has the following operands in order: function GUID, probe index, probe type, probe attributes and inline context. The directive is generated by the compiler and can be parsed by the assembler to form an encoded `.pseudoprobe` section in the object file.

A example assembly looks like:

```
foo2: # @foo2
# %bb.0: # %bb0
pushq %rax
testl %edi, %edi
.pseudoprobe 837061429793323041 1 0 0
je .LBB1_1
# %bb.2: # %bb2
.pseudoprobe 837061429793323041 6 2 0
callq foo
.pseudoprobe 837061429793323041 3 0 0
.pseudoprobe 837061429793323041 4 0 0
popq %rax
retq
.LBB1_1: # %bb1
.pseudoprobe 837061429793323041 5 1 0
callq *%rsi
.pseudoprobe 837061429793323041 2 0 0
.pseudoprobe 837061429793323041 4 0 0
popq %rax
retq
# -- End function
.section .pseudo_probe_desc,"",@progbits
.quad 6699318081062747564
.quad 72617220756
.byte 3
.ascii "foo"
.quad 837061429793323041
.quad 281547593931412
.byte 4
.ascii "foo2"
```

With inlining turned on, the assembly may look different around %bb2 with an inlined probe:

```
# %bb.2:                                # %bb2
.pseudoprobe    837061429793323041 3 0
.pseudoprobe    6699318081062747564 1 0 @ 837061429793323041:6
.pseudoprobe    837061429793323041 4 0
popq    %rax
retq
```

**Disassembling**

We have a disassembling tool (llvm-profgen) that can display disassembly alongside with pseudo probes. So far it only supports ELF executable file.

An example disassembly looks like:

```
00000000002011a0 <foo2>:
  2011a0: 50                    push   rax
  2011a1: 85 ff                 test   edi,edi
  [Probe]:  FUNC: foo2  Index: 1  Type: Block
  2011a3: 74 02                 je     2011a7 <foo2+0x7>
  [Probe]:  FUNC: foo2  Index: 3  Type: Block
  [Probe]:  FUNC: foo2  Index: 4  Type: Block
  [Probe]:  FUNC: foo   Index: 1  Type: Block  Inlined: @ foo2:6
  2011a5: 58                    pop    rax
  2011a6: c3                    ret
  [Probe]:  FUNC: foo2  Index: 2  Type: Block
  2011a7: bf 01 00 00 00        mov    edi,0x1
  [Probe]:  FUNC: foo2  Index: 5  Type: IndirectCall
  2011ac: ff d6                 call   rsi
  [Probe]:  FUNC: foo2  Index: 4  Type: Block
  2011ae: 58                    pop    rax
  2011af: c3                    ret
```

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D91878
2020-12-10 17:29:28 -08:00
Mitch Phillips 7ead5f5aa3 Revert "[CSSPGO] Pseudo probe encoding and emission."
This reverts commit b035513c06.

Reason: Broke the ASan buildbots:
  http://lab.llvm.org:8011/#/builders/5/builds/2269
2020-12-10 15:53:39 -08:00
Hongtao Yu b035513c06 [CSSPGO] Pseudo probe encoding and emission.
This change implements pseudo probe encoding and emission for CSSPGO. Please see RFC here for more context: https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s

Pseudo probes are in the form of intrinsic calls on IR/MIR but they do not turn into any machine instructions. Instead they are emitted into the binary as a piece of data in standalone sections.  The probe-specific sections are not needed to be loaded into memory at execution time, thus they do not incur a runtime overhead. 

**ELF object emission**

The binary data to emit are organized as two ELF sections, i.e, the `.pseudo_probe_desc` section and the `.pseudo_probe` section. The `.pseudo_probe_desc` section stores a function descriptor for each function and the `.pseudo_probe` section stores the actual probes, each fo which corresponds to an IR basic block or an IR function callsite. A function descriptor is stored as a module-level metadata during the compilation and is serialized into the object file during object emission.

Both the probe descriptors and pseudo probes can be emitted into a separate ELF section per function to leverage the linker for deduplication.  A `.pseudo_probe` section shares the same COMDAT group with the function code so that when the function is dead, the probes are dead and disposed too. On the contrary, a `.pseudo_probe_desc` section has its own COMDAT group. This is because even if a function is dead, its probes may be inlined into other functions and its descriptor is still needed by the profile generation tool.

The format of `.pseudo_probe_desc` section looks like:

```
.section   .pseudo_probe_desc,"",@progbits
.quad   6309742469962978389  // Func GUID
.quad   4294967295           // Func Hash
.byte   9                    // Length of func name
.ascii  "_Z5funcAi"          // Func name
.quad   7102633082150537521
.quad   138828622701
.byte   12
.ascii  "_Z8funcLeafi"
.quad   446061515086924981
.quad   4294967295
.byte   9
.ascii  "_Z5funcBi"
.quad   -2016976694713209516
.quad   72617220756
.byte   7
.ascii  "_Z3fibi"
```

For each `.pseudoprobe` section, the encoded binary data consists of a single function record corresponding to an outlined function (i.e, a function with a code entry in the `.text` section). A function record has the following format :

```
FUNCTION BODY (one for each outlined function present in the text section)
    GUID (uint64)
        GUID of the function
    NPROBES (ULEB128)
        Number of probes originating from this function.
    NUM_INLINED_FUNCTIONS (ULEB128)
        Number of callees inlined into this function, aka number of
        first-level inlinees
    PROBE RECORDS
        A list of NPROBES entries. Each entry contains:
          INDEX (ULEB128)
          TYPE (uint4)
            0 - block probe, 1 - indirect call, 2 - direct call
          ATTRIBUTE (uint3)
            reserved
          ADDRESS_TYPE (uint1)
            0 - code address, 1 - address delta
          CODE_ADDRESS (uint64 or ULEB128)
            code address or address delta, depending on ADDRESS_TYPE
    INLINED FUNCTION RECORDS
        A list of NUM_INLINED_FUNCTIONS entries describing each of the inlined
        callees.  Each record contains:
          INLINE SITE
            GUID of the inlinee (uint64)
            ID of the callsite probe (ULEB128)
          FUNCTION BODY
            A FUNCTION BODY entry describing the inlined function.
```

To support building a context-sensitive profile, probes from inlinees are grouped by their inline contexts. An inline context is logically a call path through which a callee function lands in a caller function. The probe emitter builds an inline tree based on the debug metadata for each outlined function in the form of a trie tree. A tree root is the outlined function. Each tree edge stands for a callsite where inlining happens. Pseudo probes originating from an inlinee function are stored in a tree node and the tree path starting from the root all the way down to the tree node is the inline context of the probes. The emission happens on the whole tree top-down recursively. Probes of a tree node will be emitted altogether with their direct parent edge. Since a pseudo probe corresponds to a real code address, for size savings, the address is encoded as a delta from the previous probe except for the first probe. Variant-sized integer encoding, aka LEB128, is used for address delta and probe index.

**Assembling**

Pseudo probes can be printed as assembly directives alternatively. This allows for good assembly code readability and also provides a view of how optimizations and pseudo probes affect each other, especially helpful for diff time assembly analysis.

A pseudo probe directive has the following operands in order: function GUID, probe index, probe type, probe attributes and inline context. The directive is generated by the compiler and can be parsed by the assembler to form an encoded `.pseudoprobe` section in the object file.

A example assembly looks like:

```
foo2: # @foo2
# %bb.0: # %bb0
pushq %rax
testl %edi, %edi
.pseudoprobe 837061429793323041 1 0 0
je .LBB1_1
# %bb.2: # %bb2
.pseudoprobe 837061429793323041 6 2 0
callq foo
.pseudoprobe 837061429793323041 3 0 0
.pseudoprobe 837061429793323041 4 0 0
popq %rax
retq
.LBB1_1: # %bb1
.pseudoprobe 837061429793323041 5 1 0
callq *%rsi
.pseudoprobe 837061429793323041 2 0 0
.pseudoprobe 837061429793323041 4 0 0
popq %rax
retq
# -- End function
.section .pseudo_probe_desc,"",@progbits
.quad 6699318081062747564
.quad 72617220756
.byte 3
.ascii "foo"
.quad 837061429793323041
.quad 281547593931412
.byte 4
.ascii "foo2"
```

With inlining turned on, the assembly may look different around %bb2 with an inlined probe:

```
# %bb.2:                                # %bb2
.pseudoprobe    837061429793323041 3 0
.pseudoprobe    6699318081062747564 1 0 @ 837061429793323041:6
.pseudoprobe    837061429793323041 4 0
popq    %rax
retq
```

**Disassembling**

We have a disassembling tool (llvm-profgen) that can display disassembly alongside with pseudo probes. So far it only supports ELF executable file.

An example disassembly looks like:

```
00000000002011a0 <foo2>:
  2011a0: 50                    push   rax
  2011a1: 85 ff                 test   edi,edi
  [Probe]:  FUNC: foo2  Index: 1  Type: Block
  2011a3: 74 02                 je     2011a7 <foo2+0x7>
  [Probe]:  FUNC: foo2  Index: 3  Type: Block
  [Probe]:  FUNC: foo2  Index: 4  Type: Block
  [Probe]:  FUNC: foo   Index: 1  Type: Block  Inlined: @ foo2:6
  2011a5: 58                    pop    rax
  2011a6: c3                    ret
  [Probe]:  FUNC: foo2  Index: 2  Type: Block
  2011a7: bf 01 00 00 00        mov    edi,0x1
  [Probe]:  FUNC: foo2  Index: 5  Type: IndirectCall
  2011ac: ff d6                 call   rsi
  [Probe]:  FUNC: foo2  Index: 4  Type: Block
  2011ae: 58                    pop    rax
  2011af: c3                    ret
```

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D91878
2020-12-10 09:50:08 -08:00
Fangrui Song 35be65cb1c [MC] Fix an assert in MCAssembler::writeSectionData to be aware of errors
If MCContext has an error, MCAssembler::layout may stop early
and some MCFragment's may not finalize.

In the Linux kernel, arch/x86/lib/memcpy_64.S could trigger the assert before
"x86_64: Change .weak to SYM_FUNC_START_WEAK for arch/x86/lib/mem*_64.S"
2020-10-29 23:11:18 -07:00
Fangrui Song 45d0343900 [MC] Allow .org directives in SHT_NOBITS sections
This is used by kvm-unit-tests and can be trivially supported.
2020-09-11 15:12:42 -07:00
Jian Cai c6334db577 [X86] support .nops directive
Add support of .nops on X86. This addresses llvm.org/PR45788.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D82826
2020-08-03 11:50:56 -07:00
Shengchen Kan e59e39b7c4 [MC] Simplify the logic of applying fixup for fragments, NFCI
Replace mutiple `if else`  clauses with a `switch` clause and remove redundant checks. Before this patch, we need to add a statement like `if(!isa<MCxxxFragment>(Frag)) `  here each time we add a new kind of `MCEncodedFragment` even if it has no fixups. After this patch, we don't need to do that.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D83366
2020-07-09 16:39:13 +08:00
Thomas Preud'homme 6c67ee0f58 [MC] Fix PR45805: infinite recursion in assembler
Give up folding an expression if the fragment of one of the operands
would require laying out a fragment already being laid out. This
prevents hitting an infinite recursion when a fill size expression
refers to a later fragment since computing the offset of that fragment
would require laying out the fill fragment and thus computing its size
expression.

Reviewed By: echristo

Differential Revision: https://reviews.llvm.org/D79570
2020-06-25 15:42:36 +01:00
Shengchen Kan 8bb059ab63 [MC][Bugfix] Remove redundant parameter for relaxInstruction
Summary:
Before this patch, `relaxInstruction` takes three arguments, the first
argument refers to the instruction before relaxation and the third
argument is the output instruction after relaxation. There are two quite
strange things:
  1) The first argument's type is `const MCInst &`, the third
  argument's type is `MCInst &`, but they may be aliased to the same
  variable
  2) The backends of ARM, AMDGPU, RISC-V, Hexagon assume that the third
  argument is a fresh uninitialized `MCInst` even if `relaxInstruction`
  may be called like `relaxInstruction(Relaxed, STI, Relaxed)` in a
  loop.

In this patch, we drop the thrid argument, and let `relaxInstruction`
directly modify the given instruction. Also, this patch fixes the bug https://bugs.llvm.org/show_bug.cgi?id=45580, which is introduced by D77851, and
breaks the assumption of ARM, AMDGPU, RISC-V, Hexagon.

Reviewers: Razer6, MaskRay, jyknight, asb, luismarques, enderby, rtaylor, colinl, bcain

Reviewed By: Razer6, MaskRay, bcain

Subscribers: bcain, nickdesaulniers, nathanchance, wuzish, annita.zhang, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, tpr, sbc100, jgravelle-google, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78364
2020-04-21 11:06:55 +08:00
Simon Pilgrim bcd7f77713 MCObjectWriter.h - remove Endian.h/EndianStream.h/raw_ostream.h includes. NFC
Push these includes down to the the writers that actually need them, a number of which were implicitly relying on the MCObjectWriter.h.
2020-04-17 10:44:08 +01:00
Fangrui Song e13a8a1fc5 [MC][COFF][ELF] Reject instructions in IMAGE_SCN_CNT_UNINITIALIZED_DATA/SHT_NOBITS sections
For `.bss; nop`, MC inappropriately calls abort() (via report_fatal_error()) with a message
`cannot have fixups in virtual section!`
It is a bug to crash for invalid user input. Fix it by erroring out early in EmitInstToData().

Similarly, emitIntValue() in a virtual section (SHT_NOBITS in ELF) can crash with the mssage
`non-zero initializer found in section '.bss'` (see D4199)
It'd be nice to report the location but so many directives can call emitIntValue()
and it is difficult to track every location.
Note, COFF does not crash because MCAssembler::writeSectionData() is not
called for an IMAGE_SCN_CNT_UNINITIALIZED_DATA section.

Note, GNU as' arm64 backend reports ``Error: attempt to store non-zero value in section `.bss'``
for a non-zero .inst but fails to do so for other instructions.
We simply reject all instructions, even if the encoding is all zeros.

The Mach-O counterpart is D48517 (see `test/MC/MachO/zerofill-text.s`)

Reviewed By: rnk, skan

Differential Revision: https://reviews.llvm.org/D78138
2020-04-15 21:02:47 -07:00
Fangrui Song 90a63f6d2d [MC] Replace MCSection*::getName() with MCSection::getName(). NFC
I plan to use MCSection::getName() in D78138. Having the function in the base class is also convenient for debugging.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D78251
2020-04-15 18:35:27 -07:00
Fangrui Song 7d1ff446b6 [MC] Rename MCSection*::getSectionName() to getName(). NFC
A pending change will merge MCSection*::getName() to MCSection::getName().
2020-04-15 16:48:14 -07:00
Jian Cai 6a38e0e4f5 [MC] Recalculate fragment offsets after relaxation
Summary:
The current relaxation implementation is not correctly adjusting the
size and offsets of fragements in one section based on changes in size
of another if the layout order of the two happened to be such that the
former was visited before the later. Therefore, we need to invalidate
the fragments in all sections after each iteration of relaxation, and
possibly further relax some of them in the next ieration. This fixes
PR#45190.

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76114
2020-03-17 14:48:05 -07:00
Shengchen Kan 3a503ce663 [X86] Reduce the number of emitted fragments due to branch align
Summary:
Currently, a BoundaryAlign fragment may be inserted after the branch
that needs to be aligned to truncate the current fragment, this fragment is
unused at most of time. To avoid that, we can insert a new empty Data
fragment instead. Non-relaxable instruction is usually emitted into Data
fragment, so the inserted empty Data fragment will be reused at a high
possibility.

Reviewers: annita.zhang, reames, MaskRay, craig.topper, LuoYuanke, jyknight

Reviewed By: reames, LuoYuanke

Subscribers: llvm-commits, dexonsmith, hiraditya

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75438
2020-03-12 15:37:35 +08:00
Shengchen Kan af57b139a0 Temporarily Revert [X86] Not track size of the boudaryalign fragment during the layout
Summary: This reverts commit 2ac19feb15.
This commit causes some test cases to run fail when branch is aligned.
2020-03-03 11:15:56 +08:00
Philip Reames 5565820e6e Use range-for in MCAssembler [NFC] 2020-03-02 14:57:35 -08:00
Shengchen Kan 2ac19feb15 [X86] Not track size of the boudaryalign fragment during the layout
Summary:
Currently the boundaryalign fragment caches its size during the process
of layout and then it is relaxed and update the size in each iteration. This
behaviour is unnecessary and ugly.

Reviewers: annita.zhang, reames, MaskRay, craig.topper, LuoYuanke, jyknight

Reviewed By: MaskRay

Subscribers: hiraditya, dexonsmith, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75404
2020-03-02 09:32:30 +08:00
Hans Wennborg 2e24219d3c [MC][ARM] Resolve some pcrel fixups at assembly time (PR44929)
MC currently does not emit these relocation types, and lld does not
handle them. Add FKF_Constant as a work-around of some ARM code after
D72197. Eventually we probably should implement these relocation types.

By Fangrui Song!

Differential revision: https://reviews.llvm.org/D72892
2020-02-27 12:43:29 +01:00
Philip Reames eca4bfea3d [MC] Pull out a relaxFragment helper [NFC]
Having this as it's own function helps to reduce indentation and allows use of return instead of wiring a value over the switch.  A lambda would have also worked, but with slightly deeper nesting.
2020-02-26 13:37:12 -08:00
James Clarke 3f5976c97d [RISCV] Fix evaluating %pcrel_lo against global and weak symbols
Summary:
Previously, we would erroneously turn %pcrel_lo(label), where label has
a %pcrel_hi against a weak symbol, into %pcrel_lo(label + offset), as
evaluatePCRelLo would believe the target independent logic was going to
fold it. Moreover, even if that were fixed, shouldForceRelocation lacks
an MCAsmLayout and thus cannot evaluate the %pcrel_hi fixup to a value
and check the symbol, so we would then erroneously constant-fold the
%pcrel_lo whilst leaving the %pcrel_hi intact. After D72197, this same
sequence also occurs for symbols with global binding, which is triggered
in real-world code.

Instead, as discussed in D71978, we introduce a new FKF_IsTarget flag to
avoid these kinds of issues. All the resolution logic happens in one
place, with no coordination required between RISCAsmBackend and
RISCVMCExpr to ensure they implement the same logic twice. Although the
implementation of %pcrel_hi can be left as target independent, we make
it target dependent to ensure that they are handled identically to
%pcrel_lo, otherwise we risk one of them being constant folded but the
other being preserved. This also allows us to properly support fixup
pairs where the instructions are in different fragments.

Reviewers: asb, lenary, efriedma

Reviewed By: efriedma

Subscribers: arichardson, hiraditya, rbar, johnrusso, simoncook, sabuasal, niosHD, kito-cheng, shiva0217, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, Jim, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73211
2020-01-23 02:05:48 +00:00
Simon Pilgrim 46e2f89364 [MC] writeFragment - assert MCFragment::FT_Fill length is legal.
Silence (clang/MSVC) static analyzer warnings that the fragment data may either write out of bounds of the local array or reference uninitialized data.
2020-01-08 17:19:11 +00:00
James Henderson d68904f957 [NFC] Fix trivial typos in comments
Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D72143

Patch by Kazuaki Ishizaki.
2020-01-06 10:50:26 +00:00
Fangrui Song 586acd8490 [MC] Delete MCSection::{rbegin,rend} 2020-01-05 12:51:15 -08:00
Fangrui Song c764304adc [MC] Drop an unused rule about absolute temporary symbols 2020-01-05 11:39:52 -08:00
Philip Reames 14fc20ca62 Align branches within 32-Byte boundary (NOP padding)
WARNING: If you're looking at this patch because you're looking for a full
performace mitigation of the Intel JCC Erratum, this is not it!

This is a preliminary patch on the patch towards mitigating the performance
regressions caused by Intel's microcode update for Jump Conditional Code
Erratum.  For context, see:
https://www.intel.com/content/www/us/en/support/articles/000055650.html

The patch adds the required assembler infrastructure and command line options
needed to exercise the logic for INTERNAL TESTING.  These are NOT public flags,
and should not be used for anything other than LLVM's own testing/debugging
purposes.  They are likely to change both in spelling and meaning.

WARNING: This patch is knowingly incorrect in some cornercases.  We need, and
do not yet provide, a mechanism to selective enable/disable the padding.
Conversation on this will continue in parellel with work on extending this
infrastructure to support prefix padding.

The goal here is to have the assembler align specific instructions such that
they neither cross or end at a 32 byte boundary.  The impacted instructions are:
a. Conditional jump.
b. Fused conditional jump.
c. Unconditional jump.
d. Indirect jump.
e. Ret.
f. Call.

The new options for llvm-mc are:
    -x86-align-branch-boundary=NUM aligns branches within NUM byte boundary.
    -x86-align-branch=TYPE[+TYPE...] specifies types of branches to align.

A new MCFragment type, MCBoundaryAlignFragment, is added, which may emit
NOP to align the fused/unfused branch.

alignBranchesBegin inserts MCBoundaryAlignFragment before instructions,
alignBranchesEnd marks the end of the branch to be aligned,
relaxBoundaryAlign grows or shrinks sizes of NOP to align the target branch.

Nop padding is disabled when the instruction may be rewritten by the linker,
such as TLS Call.

Process Note: I am landing a patch by skan as it has been LGTMed, and
continuing to iterate on the review is simply slowing us down at this point.
We can and will continue to iterate in tree.

Patch By: skan
Differential Revision: https://reviews.llvm.org/D70157
2019-12-20 11:35:50 -08:00
Fangrui Song 9574757dba [MC] Delete MCCodePadder
D34393 added MCCodePadder as an infrastructure for padding code with
NOP instructions. It lacked tests and was not being worked on since
then.

Intel has now worked on an assembler patch to mitigate performance loss
after applying microcode update for the Jump Conditional Code Erratum.

https://www.intel.com/content/www/us/en/support/articles/000055650/processors.html

This new patch shares similarity with MCCodePadder, but has a concrete
use case in mind and is being actively developed. The infrastructure it
introduces can potentially be used for general performance improvement
via alignment. Delete the unused MCCodePadder so that people can develop
the new feature from a clean state.

Reviewed By: jyknight, skan

Differential Revision: https://reviews.llvm.org/D71106
2019-12-09 19:21:31 -08:00
Guillaume Chatelet 18f805a7ea [Alignment][NFC] Remove unneeded llvm:: scoping on Align types
llvm-svn: 373081
2019-09-27 12:54:21 +00:00
Guillaume Chatelet af11cc7eb5 [Alignment] Move OffsetToAlignment to Alignment.h
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet, JDevlieghere, alexshap, rupprecht, jhenderson

Subscribers: sdardis, nemanjai, hiraditya, kbarton, jakehehrlich, jrtc27, MaskRay, atanasyan, jsji, seiya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D67499

llvm-svn: 371742
2019-09-12 15:20:36 +00:00
Hsiangkai Wang 18ccfadd46 [DebugInfo] Generate fixups as emitting DWARF .debug_frame/.eh_frame.
It is necessary to generate fixups in .debug_frame or .eh_frame as
relaxation is enabled due to the address delta may be changed after
relaxation.

There is an opcode with 6-bits data in debug frame encoding. So, we
also need 6-bits fixup types.

Differential Revision: https://reviews.llvm.org/D58335

llvm-svn: 366524
2019-07-19 02:03:34 +00:00
Hsiangkai Wang 657277e0f1 Revert "[DebugInfo] Generate fixups as emitting DWARF .debug_frame/.eh_frame."
This reverts commit 17e3cbf5fe656483d9016d0ba9e1d0cd8629379e.

llvm-svn: 366444
2019-07-18 15:06:50 +00:00
Hsiangkai Wang e43ce1a958 [DebugInfo] Generate fixups as emitting DWARF .debug_frame/.eh_frame.
It is necessary to generate fixups in .debug_frame or .eh_frame as
relaxation is enabled due to the address delta may be changed after
relaxation.

There is an opcode with 6-bits data in debug frame encoding. So, we
also need 6-bits fixup types.

Differential Revision: https://reviews.llvm.org/D58335

llvm-svn: 366442
2019-07-18 14:47:34 +00:00
Shiva Chen 5af037f1e9 [RISCV] Insert R_RISCV_ALIGN relocation type and Nops for code alignment when linker relaxation enabled
Linker relaxation may change code size. We need to fix up the alignment
of alignment directive in text section by inserting Nops and R_RISCV_ALIGN
relocation type. So then linker could satisfy the alignment by removing Nops.

To do this:

1. Add shouldInsertExtraNopBytesForCodeAlign target hook to calculate
   the Nops we need to insert.

2. Add shouldInsertFixupForCodeAlign target hook to insert
   R_RISCV_ALIGN fixup type.

Differential Revision: https://reviews.llvm.org/D47755

llvm-svn: 352616
2019-01-30 11:16:59 +00:00
Chandler Carruth 2946cd7010 Update the file headers across all of the LLVM projects in the monorepo
to reflect the new license.

We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.

Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.

llvm-svn: 351636
2019-01-19 08:50:56 +00:00
Alex Lorenz afa75d7843 [macho] save the SDK version stored in module metadata into the version min and
build version load commands in the object file

This commit introduces a new metadata node called "SDK Version". It will be set
by the frontend to mark the platform SDK (macOS/iOS/etc) version which was used
during that particular compilation.
This node is used when machine code is emitted, by either saving the SDK version
into the appropriate macho load command (version min/build version), or by
emitting the assembly for these load commands with the SDK version specified as
well.
The assembly for both load commands is extended by allowing it to contain the
sdk_version X, Y [, Z] trailing directive to represent the SDK version
respectively.

rdar://45774000

Differential Revision: https://reviews.llvm.org/D55612

llvm-svn: 349119
2018-12-14 01:14:10 +00:00
Hsiangkai Wang 2089e4c8f1 [DebugInfo] Fix build failed in clang-x86_64-linux-selfhost-modules.
Only generate symbol difference expression if needed.

llvm-svn: 338484
2018-08-01 04:17:41 +00:00
Hsiangkai Wang 5c63af0d04 [DebugInfo] Generate fixups as emitting DWARF .debug_line.
It is necessary to generate fixups in .debug_line as relaxation is
enabled due to the address delta may be changed after relaxation.

DWARF will record the mappings of lines and addresses in
.debug_line section. It will encode the information using special
opcodes, standard opcodes and extended opcodes in Line Number
Program. I use DW_LNS_fixed_advance_pc to encode fixed length
address delta and DW_LNE_set_address to encode absolute address
to make it possible to generate fixups in .debug_line section.

Differential Revision: https://reviews.llvm.org/D46850

llvm-svn: 338477
2018-08-01 02:18:06 +00:00
Fangrui Song f78650a8de Remove trailing space
sed -Ei 's/[[:space:]]+$//' include/**/*.{def,h,td} lib/**/*.{cpp,h}

llvm-svn: 338293
2018-07-30 19:41:25 +00:00
Peter Smith 1503fc0fd0 [MC] Move bundling and MCSubtargetInfo to MCEncodedFragment [NFC]
Instruction bundling is only supported on descendants of the
MCEncodedFragment type. By moving the bundling functionality and
MCSubtargetInfo to this class it makes it easier to set and extract the
MCSubtargetInfo when it is necessary.

This is a refactoring change that will make it easier to pass the
MCSubtargetInfo through to writeNops when nop padding is required.

Differential Revision: https://reviews.llvm.org/D45959

llvm-svn: 334814
2018-06-15 09:48:18 +00:00
Sam Clegg 8c32e913b5 [MC] Move MCAssembler::dump into the correct cpp file. NFC
Differential Revision: https://reviews.llvm.org/D46556

llvm-svn: 334713
2018-06-14 14:04:23 +00:00
Peter Smith 57f661bd7d [MC] Pass MCSubtargetInfo to fixupNeedsRelaxation and applyFixup
On targets like Arm some relaxations may only be performed when certain
architectural features are available. As functions can be compiled with
differing levels of architectural support we must make a judgement on
whether we can relax based on the MCSubtargetInfo for the function. This
change passes through the MCSubtargetInfo for the function to
fixupNeedsRelaxation so that the decision on whether to relax can be made
per function. In this patch, only the ARM backend makes use of this
information. We must also pass the MCSubtargetInfo to applyFixup because
some fixups skip error checking on the assumption that relaxation has
occurred, to prevent code-generation errors applyFixup must see the same
MCSubtargetInfo as fixupNeedsRelaxation.

Differential Revision: https://reviews.llvm.org/D44928

llvm-svn: 334078
2018-06-06 09:40:06 +00:00