This is the first of a series of patches leading up to the implementation
of P0674r1, i.e. array support in allocate_shared. I am splitting this
up into multiple patches because the overall change is very tricky and
I want to isolate potential breakage.
Use RegisterClass::contains instead of going through getMinimalPhysRegClass
and hasSuperClassEq.
Remove the special case for NoRegister. It's identical to the
handling for any other regsiter that isn't VRM2/M4/M8.
The loop-based probing done for stack clash protection altered R1D which
corrupted the backchain value to be stored after the probing was done.
By using R0D instead for the loop exit value, R1D is not modified.
Review: Ulrich Weigand.
Differential Revision: https://reviews.llvm.org/D92803
This fixes a subtle bug where SCCP could incorrectly optimize a private callable while waiting for its arguments to be resolved.
Fixes PR#48457
Differential Revision: https://reviews.llvm.org/D92976
This fixes a crash when no elements are parsed, but the type expects at least one.
Fixes PR#47763
Differential Revision: https://reviews.llvm.org/D92982
This was missed when supported for unsigned/signed integer types was first added, and results in crashes if a user tries to create/print a constant with the incorrect integer type.
Fixes PR#46222
Differential Revision: https://reviews.llvm.org/D92981
The wrapper clears shadow for addr and addrlen when written to.
Reviewed By: stephan.yichao.zhao
Differential Revision: https://reviews.llvm.org/D93046
Inline asm can contain constructs like .bytes which may have arbitrary size.
In some cases, this causes us to miscalculate the size of blocks and therefore
offsets, causing us to incorrectly compress a JT.
To be safe, just bail out of the whole thing if we find any inline asm.
Fixes PR48255
Differential Revision: https://reviews.llvm.org/D92865
I tried to put it in the same place in the pipeline as the legacy PM.
Fixes PR48399.
Reviewed By: asbirlea, nikic
Differential Revision: https://reviews.llvm.org/D93002
There is an in-progress proposal for the following pseudo-instructions
in the assembler, to complement the existing `sext.w` rv64i instruction:
- sext.b
- sext.h
- zext.b
- zext.h
- zext.w
The `.b` and `.h` variants are available with rv32i and rv64i, and `zext.w` is
only available with `rv64i`.
These are implemented primarily as pseudo-instructions, as these instructions
expand to multiple real instructions. In the case of `zext.b`, this expands to a
single rv32/64i instruction, so it is implemented with an InstAlias (like
`sext.w` is on rv64i).
The proposal is available here: https://github.com/riscv/riscv-asm-manual/pull/61
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D92793
Right now we have one large AST for all types in LLDB. All ODR violations in
types we reconstruct are resolved by just letting the ASTImporter handle the
conflicts (either by merging types or somehow trying to introduce a duplicated
declaration in the AST). This works ok for the normal types we build from debug
information as most of them are just simple CXXRecordDecls or empty template
declarations.
However, with a loaded `std` C++ module we have alternative versions of pretty
much all declarations in the `std` namespace that are much more fleshed out than
the debug information declarations. They have all the information that is lost
when converting to DWARF, such as default arguments, template default arguments,
the actual uninstantiated template declarations and so on.
When we merge these C++ module types into the big scratch AST (that might
already contain debug information types) we give the ASTImporter the tricky task
of somehow creating a consistent AST out of all these declarations. Usually this
ends in a messy AST that contains a mostly broken mix of both module and debug
info declarations. The ASTImporter in LLDB is also importing types with the
MinimalImport setting, which usually means the only information we have when
merging two types is often just the name of the declaration and the information
that it contains some child declarations. This makes it pretty much impossible
to even implement a better merging logic (as the names of C++ module
declarations and debug info declarations are identical).
This patch works around this whole merging problem by separating C++ module
types from debug information types. This is done by splitting up the single
scratch AST into two: One default AST for debug information and a dedicated AST
for C++ module types.
The C++ module AST is implemented as a 'specialised AST' that lives within the
default ScratchTypeSystemClang. When we select the scratch AST we can explicitly
request that we want such a isolated sub-AST of the scratch AST. I kept the
infrastructure more general as we probably can use the same mechanism for other
features that introduce conflicting types (such as programs that are compiled
with a custom -wchar-size= option).
There are just two places where we explicitly have request the C++ module AST:
When we export persistent declarations (`$mytype`) and when we create our
persistent result variable (`$0`, `$1`, ...). There are a few formatters that
were previously assuming that there is only one scratch AST which I cleaned up
in a preparation revision here (D92757).
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D92759
There are a few things that I wanted to reorganize for a while:
- the loop that incrementally goes through classes on failure looked
horrible in assembly, mostly because of `LIKELY`/`UNLIKELY` within
the loop. So remove those, we are already in an unlikely scenario
- hooks are not used by default on Android/Fuchsia/etc so mark the
tests for the existence of the weak functions as unlikely
- mark of couple of conditions as likely/unlikely
- in `reallocate`, the old size was computed again while we already
have it in a variable. So just use the one we have.
- remove the bitwise AND trick and use a logical AND, that has one
less test by using a purposeful underflow when `Size` is 0 (I
actually looked at the assembly of the previous code to steal that
trick)
- move the read of the options closer to where they are used, mark them
as `const`
Overall this makes things a tiny bit faster, but cleaner.
Differential Revision: https://reviews.llvm.org/D92689
The test is reduced from the example in D82005.
Similar to 94f6d365e, the test here would assert in
the DomTree when we tried to convert a select to a
phi with an unreachable block operand.
We may want to add some kind of guard code in DomTree
itself to avoid this sort of problem.
This is just a small change in the Flang tool within libclangDriver.
Currently it passes `-triple` when calling `flang-new -fc1` for various
driver Jobs. As there is no support for code-generation, `-triple` is
not required and should remain unsupported. It is safe to remove it.
This hasn't been a problem as the affected driver Jobs are not yet
implemented or used. However, we will be adding support for them in the
near future and the fact `-triple` is added will become a problem.
Differential Revision: https://reviews.llvm.org/D93027
This change implements pseudo probe encoding and emission for CSSPGO. Please see RFC here for more context: https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s
Pseudo probes are in the form of intrinsic calls on IR/MIR but they do not turn into any machine instructions. Instead they are emitted into the binary as a piece of data in standalone sections. The probe-specific sections are not needed to be loaded into memory at execution time, thus they do not incur a runtime overhead.
**ELF object emission**
The binary data to emit are organized as two ELF sections, i.e, the `.pseudo_probe_desc` section and the `.pseudo_probe` section. The `.pseudo_probe_desc` section stores a function descriptor for each function and the `.pseudo_probe` section stores the actual probes, each fo which corresponds to an IR basic block or an IR function callsite. A function descriptor is stored as a module-level metadata during the compilation and is serialized into the object file during object emission.
Both the probe descriptors and pseudo probes can be emitted into a separate ELF section per function to leverage the linker for deduplication. A `.pseudo_probe` section shares the same COMDAT group with the function code so that when the function is dead, the probes are dead and disposed too. On the contrary, a `.pseudo_probe_desc` section has its own COMDAT group. This is because even if a function is dead, its probes may be inlined into other functions and its descriptor is still needed by the profile generation tool.
The format of `.pseudo_probe_desc` section looks like:
```
.section .pseudo_probe_desc,"",@progbits
.quad 6309742469962978389 // Func GUID
.quad 4294967295 // Func Hash
.byte 9 // Length of func name
.ascii "_Z5funcAi" // Func name
.quad 7102633082150537521
.quad 138828622701
.byte 12
.ascii "_Z8funcLeafi"
.quad 446061515086924981
.quad 4294967295
.byte 9
.ascii "_Z5funcBi"
.quad -2016976694713209516
.quad 72617220756
.byte 7
.ascii "_Z3fibi"
```
For each `.pseudoprobe` section, the encoded binary data consists of a single function record corresponding to an outlined function (i.e, a function with a code entry in the `.text` section). A function record has the following format :
```
FUNCTION BODY (one for each outlined function present in the text section)
GUID (uint64)
GUID of the function
NPROBES (ULEB128)
Number of probes originating from this function.
NUM_INLINED_FUNCTIONS (ULEB128)
Number of callees inlined into this function, aka number of
first-level inlinees
PROBE RECORDS
A list of NPROBES entries. Each entry contains:
INDEX (ULEB128)
TYPE (uint4)
0 - block probe, 1 - indirect call, 2 - direct call
ATTRIBUTE (uint3)
reserved
ADDRESS_TYPE (uint1)
0 - code address, 1 - address delta
CODE_ADDRESS (uint64 or ULEB128)
code address or address delta, depending on ADDRESS_TYPE
INLINED FUNCTION RECORDS
A list of NUM_INLINED_FUNCTIONS entries describing each of the inlined
callees. Each record contains:
INLINE SITE
GUID of the inlinee (uint64)
ID of the callsite probe (ULEB128)
FUNCTION BODY
A FUNCTION BODY entry describing the inlined function.
```
To support building a context-sensitive profile, probes from inlinees are grouped by their inline contexts. An inline context is logically a call path through which a callee function lands in a caller function. The probe emitter builds an inline tree based on the debug metadata for each outlined function in the form of a trie tree. A tree root is the outlined function. Each tree edge stands for a callsite where inlining happens. Pseudo probes originating from an inlinee function are stored in a tree node and the tree path starting from the root all the way down to the tree node is the inline context of the probes. The emission happens on the whole tree top-down recursively. Probes of a tree node will be emitted altogether with their direct parent edge. Since a pseudo probe corresponds to a real code address, for size savings, the address is encoded as a delta from the previous probe except for the first probe. Variant-sized integer encoding, aka LEB128, is used for address delta and probe index.
**Assembling**
Pseudo probes can be printed as assembly directives alternatively. This allows for good assembly code readability and also provides a view of how optimizations and pseudo probes affect each other, especially helpful for diff time assembly analysis.
A pseudo probe directive has the following operands in order: function GUID, probe index, probe type, probe attributes and inline context. The directive is generated by the compiler and can be parsed by the assembler to form an encoded `.pseudoprobe` section in the object file.
A example assembly looks like:
```
foo2: # @foo2
# %bb.0: # %bb0
pushq %rax
testl %edi, %edi
.pseudoprobe 837061429793323041 1 0 0
je .LBB1_1
# %bb.2: # %bb2
.pseudoprobe 837061429793323041 6 2 0
callq foo
.pseudoprobe 837061429793323041 3 0 0
.pseudoprobe 837061429793323041 4 0 0
popq %rax
retq
.LBB1_1: # %bb1
.pseudoprobe 837061429793323041 5 1 0
callq *%rsi
.pseudoprobe 837061429793323041 2 0 0
.pseudoprobe 837061429793323041 4 0 0
popq %rax
retq
# -- End function
.section .pseudo_probe_desc,"",@progbits
.quad 6699318081062747564
.quad 72617220756
.byte 3
.ascii "foo"
.quad 837061429793323041
.quad 281547593931412
.byte 4
.ascii "foo2"
```
With inlining turned on, the assembly may look different around %bb2 with an inlined probe:
```
# %bb.2: # %bb2
.pseudoprobe 837061429793323041 3 0
.pseudoprobe 6699318081062747564 1 0 @ 837061429793323041:6
.pseudoprobe 837061429793323041 4 0
popq %rax
retq
```
**Disassembling**
We have a disassembling tool (llvm-profgen) that can display disassembly alongside with pseudo probes. So far it only supports ELF executable file.
An example disassembly looks like:
```
00000000002011a0 <foo2>:
2011a0: 50 push rax
2011a1: 85 ff test edi,edi
[Probe]: FUNC: foo2 Index: 1 Type: Block
2011a3: 74 02 je 2011a7 <foo2+0x7>
[Probe]: FUNC: foo2 Index: 3 Type: Block
[Probe]: FUNC: foo2 Index: 4 Type: Block
[Probe]: FUNC: foo Index: 1 Type: Block Inlined: @ foo2:6
2011a5: 58 pop rax
2011a6: c3 ret
[Probe]: FUNC: foo2 Index: 2 Type: Block
2011a7: bf 01 00 00 00 mov edi,0x1
[Probe]: FUNC: foo2 Index: 5 Type: IndirectCall
2011ac: ff d6 call rsi
[Probe]: FUNC: foo2 Index: 4 Type: Block
2011ae: 58 pop rax
2011af: c3 ret
```
Reviewed By: wmi
Differential Revision: https://reviews.llvm.org/D91878
The NPM runs loop passes over loops in forward program order, rather
than the legacy loop PM's reverse program order. This seems to produce
better results as shown here.
I verified that changing the loop order to reverse program order results
in the same IR with the NPM.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D92817
Prevent lldb from crashing when multiple threads are concurrently
accessing the SB API with reproducer capture enabled.
The API instrumentation records both the input arguments and the return
value, but it cannot block for the duration of the API call. Therefore
we introduce a sequence number that allows to to correlate the function
with its result and add locking to ensure those two parts are emitted
atomically.
Using the sequence number, we can detect situations where the return
value does not succeed the function call, in which case we print an
error saying that concurrency is not (currently) supported. In the
future we might attempt to be smarter and read ahead until we've found
the return value matching the current call.
Differential revision: https://reviews.llvm.org/D92820
- use ConvertOpToLLVMPattern to avoid explicit casting and in most cases the
constructor can be reused to save a few lines of code.
Differential Revision: https://reviews.llvm.org/D92989
If SETUNE isn't legal, UO can use the NOT of the SETO expansion.
Removes some complex isel patterns. Most of the test changes are
from using XORI instead of SEQZ.
Differential Revision: https://reviews.llvm.org/D92008
This makes it slightly easier to deal with custom attributes and
CallBase already provides hasFnAttr versions that support both AttrKind
and StringRef arguments in a similar fashion.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D92567
This patch adds vcmla and the rotated variants as defined in
"Arm Neon Intrinsics Reference for ACLE Q3 2020" [1]
The *_lane_* are still missing, but they can be added separately.
This patch only adds the builtin mapping for AArch64.
[1] https://developer.arm.com/documentation/ihi0073/latest
Reviewed By: t.p.northover
Differential Revision: https://reviews.llvm.org/D92930
Extended subgroups are library style extensions and therefore
they require no changes in the frontend. This commit:
1. Moves extension macro definitions to the internal headers.
2. Removes extension pragmas because they are not needed.
Tags: #clang
Differential Revision: https://reviews.llvm.org/D92231
Several data formatters assume their types are in the Target's scratch AST and
build new types from that scratch AST instance. However, types from different
ASTs shouldn't be mixed, so this (unchecked) assumption may lead to problems if
we ever have more than one scratch AST or someone somehow invokes data
formatters on a type that are not in the scratch AST.
Instead we can use in all the formatters just the TypeSystem of the type we're
formatting. That's much simpler and avoids all the headache of finding the right
TypeSystem that matches the one of the formatted type.
Right now LLDB only has one scratch TypeSystemClang instance and we format only
types that are in the scratch AST, so this doesn't change anything in the
current way LLDB works. The intention here is to allow follow up refactorings
that introduce multiple scratch ASTs with the same Target.
Differential Revision: https://reviews.llvm.org/D92757
The wrapper clears shadow for any bytes written to addr or addrlen.
Reviewed By: stephan.yichao.zhao
Differential Revision: https://reviews.llvm.org/D92964
The semantic analysis of index-names of FORALL statements looks up symbols with
the same name as the index-name. This is needed to exclude symbols that are
not objects. But if the symbol found is host-, use-, or construct-associated
with another entity, the check fails.
I fixed this by getting the root symbol of the symbol found and doing the check
on the root symbol. This required creating a non-const version of
"GetAssociationRoot()".
Differential Revision: https://reviews.llvm.org/D92970