This function is needed for when it is necessary to split the subvector
operand of an llvm.experimental.vector.insert call. Splitting the
subvector operand means performing two insertions: one inserting the
lower part of the split subvector into the destination vector, and
another for inserting the upper part.
Through experimenting, it seems quite rare to need split the subvector
operand, but this is necessary to avoid assertion errors.
Differential Revision: https://reviews.llvm.org/D92760
Although this was something that I was hoping we would not have to do,
this patch makes t2DoLoopStartTP a terminator in order to keep it at the
end of it's block, so not allowing extra MVE instruction between it and
the end. With t2DoLoopStartTP's also starting tail predication regions,
it also marks them as having side effects. The t2DoLoopStart is still
not a terminator, giving it the extra scheduling freedom that can be
helpful, but now that we have a TP version they can be treated
differently.
Differential Revision: https://reviews.llvm.org/D91887
This is the first in a series of patches that attempts to migrate
existing cost instructions to return a new InstructionCost class
in place of a simple integer. This new class is intended to be
as light-weight and simple as possible, with a full range of
arithmetic and comparison operators that largely mirror the same
sets of operations on basic types, such as integers. The main
advantage to using an InstructionCost is that it can encode a
particular cost state in addition to a value. The initial
implementation only has two states - Normal and Invalid - but these
could be expanded over time if necessary. An invalid state can
be used to represent an unknown cost or an instruction that is
prohibitively expensive.
This patch adds the new class and changes the getInstructionCost
interface to return the new class. Other cost functions, such as
getUserCost, etc., will be migrated in future patches as I believe
this to be less disruptive. One benefit of this new class is that
it provides a way to unify many of the magic costs in the codebase
where the cost is set to a deliberately high number to prevent
optimisations taking place, e.g. vectorization. It also provides
a route to represent the extremely high, and unknown, cost of
scalarization of scalable vectors, which is not currently supported.
Differential Revision: https://reviews.llvm.org/D91174
This was separated in the past because the cl::opt was in the .cpp file
but DevirtSCCRepeatedPass::run() was in the .h file. Now that
DevirtSCCRepeatedPass::run() is in the .cpp file, get rid of the tiny
maxDevirtIterationsReached(), it's bad for readability.
The declaration was introduced on Aug 2, 2016 in commit
c43aa5a5b6 without a corresponding
definition.
Note that we do have a definition for
MmeorySSA::OptimizeUses::optimizeUses but not for
MmeorySSA::optimizeUses.
MemoryAccess::setNewAccessRelation() in assert-builds checks whether the
access relation for a READ has a memory location for every instance of
the domain. Otherwise, we would not have value to load from. That check
already considered that instances outside the Scop's context do not
matter since they are never executed (or would be undefined behavior).
In this patch also take instances of the InvalidContext into account,
as these can also be assumed to never occur. InvalidContext was
introduced to avoid the computational complexity of subtracting
restrictions from the AssumedContext. However, this additional check in
setNewAccessRelation is only done in assert-builds.
The assertion case with an InvalidContext may occur with DeLICM on a
conditionally infinite loops, as it is the case in the following code:
for (int i = 0; i < n; i+=b)
vreg = ...;
*Dest = vreg;
The loop is infinite when b=0, and [b] -> { : b = 0 } is part of the
InvalidContext. When DeLICM tries to map the memory for %vreg to *Dest,
there is no store instance that uses the value of vreg when b = 0, hence
no location to map it to. However, the case is irrelevant since Polly's
runtime condition check ensures that this is never case.
Fixes llvm.org/PR48445
The compiler is making no effort to preserve upper elements. To do so would require another source operand tied with the destination and a different intrinsic interface to give control of this source to the programmer.
This patch changes the tail policy to agnostic so that the CPU doesn't need to make an effort to preserve them.
This is consistent with the RVV intrinsic spec here https://github.com/riscv/rvv-intrinsic-doc/blob/master/rvv-intrinsic-rfc.md#configuration-setting
Differential Revision: https://reviews.llvm.org/D93080
Generally these calls aren't vulnerable to ADL because they involve only
primitive types. The ones in <list> and <vector> drag in namespace std
but that's OK; the ones in <fstream> and <strstream> are vulnerable
iff `CharT` is an enum type, which seems far-fetched.
But absolutely zero of them *need* ADL to happen; so in my opinion
they should all be consistently qualified, just like calls to any
other (non-user-customizable) functions in namespace std.
Also: Include <cstring> and <cwchar> in <__string>.
We seemed to be getting lucky that <memory> included <iterator>
included <iosfwd> included <wchar.h>. That gave us the
global-namespace `wmemmove`, but not `_VSTD::wmemmove`.
This is now fixed.
I didn't touch these headers:
<ext/__hash> uses strlen, safely
<support/ibm/locale_mgmt_aix.h> uses memcpy, safely
<string.h> uses memchr and strchr, safely
<wchar.h> uses wcschr, safely
<__bsd_locale_fallbacks.h> uses wcsnrtombs, safely
Differential Revision: https://reviews.llvm.org/D93061
This matches how libc++ does it in all other C++ headers
(that is, headers not ending in ".h").
We need to include <cstring> if we want to use `_VSTD::memmove`
instead of unqualified ADL `memmove`. Even though ADL doesn't
physically matter in <charconv>'s specific case, I'm trying
to migrate libc++ to using `_VSTD::memmove` for all cases
(because some of them do matter, and this way it's easier to
grep for outliers).
Differential Revision: https://reviews.llvm.org/D92875
Historically, we have told contributors that GnuWin32 is a pre-requisite
because our tests depend on utilities such as sed, grep, diff, and more.
However, Git on Windows includes versions of these utilities in its
installation. Furthermore, GnuWin32 has not been updated in many years.
For these reasons, it makes sense to have the ability to run llvm tests
in a way that is both:
a) Easier on the user (less stuff to install)
b) More up-to-date (The verions that ship with git are at least as
new, if not newer, than the versions in GnuWin32.
We add support for this here by attempting to detect where Git is
installed using the Windows registry, confirming the existence of
several common Unix tools, and then adding this location to lit's PATH
environment.
Differential Revision: https://reviews.llvm.org/D84380
[libomptarget][nfc] Remove data_sharing type aliasing
Libomptarget previous used __kmpc_data_sharing_slot to access values of type
__kmpc_data_sharing_{worker,master}_slot_static. This aliasing violation was
benign in practice. The master type has since been removed, so a single type
can be used instead.
This is particularly helpful for the transition to an openmp deviceRTL, as the
c++/openmp compiler for amdgcn currently rejects the flexible array member for
being an incomplete type. Serves the same purpose as abandoned D86324.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D93075
Migrate to the `FileEntryRef` overload of `SourceManager::createFileID`
(using `FileManager::getOptionalFileRef`) in RefactoringTest.cpp and
RewriterTestContext.h.
No functionality change.
Differential Revision: https://reviews.llvm.org/D92967
lto-object-path.ll, like stabs.s, is disabled on Windows as the path
separators make it difficult to write a test that works across
platforms.
This diff also disables implicit-dylibs.s on Windows as we seem to emit
LC_LOAD_DYLIBs in a different order on that platform. This seems like a
bug in LLD that needs to be addressed (in a future diff).
Allow exclusion/discarding of custom sections with COMDAT groups.
It piggybacks on the existing COMDAT-handling code, but applies to custom sections as well.
Differential Revision: https://reviews.llvm.org/D92950
This change implements pseudo probe encoding and emission for CSSPGO. Please see RFC here for more context: https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s
Pseudo probes are in the form of intrinsic calls on IR/MIR but they do not turn into any machine instructions. Instead they are emitted into the binary as a piece of data in standalone sections. The probe-specific sections are not needed to be loaded into memory at execution time, thus they do not incur a runtime overhead.
**ELF object emission**
The binary data to emit are organized as two ELF sections, i.e, the `.pseudo_probe_desc` section and the `.pseudo_probe` section. The `.pseudo_probe_desc` section stores a function descriptor for each function and the `.pseudo_probe` section stores the actual probes, each fo which corresponds to an IR basic block or an IR function callsite. A function descriptor is stored as a module-level metadata during the compilation and is serialized into the object file during object emission.
Both the probe descriptors and pseudo probes can be emitted into a separate ELF section per function to leverage the linker for deduplication. A `.pseudo_probe` section shares the same COMDAT group with the function code so that when the function is dead, the probes are dead and disposed too. On the contrary, a `.pseudo_probe_desc` section has its own COMDAT group. This is because even if a function is dead, its probes may be inlined into other functions and its descriptor is still needed by the profile generation tool.
The format of `.pseudo_probe_desc` section looks like:
```
.section .pseudo_probe_desc,"",@progbits
.quad 6309742469962978389 // Func GUID
.quad 4294967295 // Func Hash
.byte 9 // Length of func name
.ascii "_Z5funcAi" // Func name
.quad 7102633082150537521
.quad 138828622701
.byte 12
.ascii "_Z8funcLeafi"
.quad 446061515086924981
.quad 4294967295
.byte 9
.ascii "_Z5funcBi"
.quad -2016976694713209516
.quad 72617220756
.byte 7
.ascii "_Z3fibi"
```
For each `.pseudoprobe` section, the encoded binary data consists of a single function record corresponding to an outlined function (i.e, a function with a code entry in the `.text` section). A function record has the following format :
```
FUNCTION BODY (one for each outlined function present in the text section)
GUID (uint64)
GUID of the function
NPROBES (ULEB128)
Number of probes originating from this function.
NUM_INLINED_FUNCTIONS (ULEB128)
Number of callees inlined into this function, aka number of
first-level inlinees
PROBE RECORDS
A list of NPROBES entries. Each entry contains:
INDEX (ULEB128)
TYPE (uint4)
0 - block probe, 1 - indirect call, 2 - direct call
ATTRIBUTE (uint3)
reserved
ADDRESS_TYPE (uint1)
0 - code address, 1 - address delta
CODE_ADDRESS (uint64 or ULEB128)
code address or address delta, depending on ADDRESS_TYPE
INLINED FUNCTION RECORDS
A list of NUM_INLINED_FUNCTIONS entries describing each of the inlined
callees. Each record contains:
INLINE SITE
GUID of the inlinee (uint64)
ID of the callsite probe (ULEB128)
FUNCTION BODY
A FUNCTION BODY entry describing the inlined function.
```
To support building a context-sensitive profile, probes from inlinees are grouped by their inline contexts. An inline context is logically a call path through which a callee function lands in a caller function. The probe emitter builds an inline tree based on the debug metadata for each outlined function in the form of a trie tree. A tree root is the outlined function. Each tree edge stands for a callsite where inlining happens. Pseudo probes originating from an inlinee function are stored in a tree node and the tree path starting from the root all the way down to the tree node is the inline context of the probes. The emission happens on the whole tree top-down recursively. Probes of a tree node will be emitted altogether with their direct parent edge. Since a pseudo probe corresponds to a real code address, for size savings, the address is encoded as a delta from the previous probe except for the first probe. Variant-sized integer encoding, aka LEB128, is used for address delta and probe index.
**Assembling**
Pseudo probes can be printed as assembly directives alternatively. This allows for good assembly code readability and also provides a view of how optimizations and pseudo probes affect each other, especially helpful for diff time assembly analysis.
A pseudo probe directive has the following operands in order: function GUID, probe index, probe type, probe attributes and inline context. The directive is generated by the compiler and can be parsed by the assembler to form an encoded `.pseudoprobe` section in the object file.
A example assembly looks like:
```
foo2: # @foo2
# %bb.0: # %bb0
pushq %rax
testl %edi, %edi
.pseudoprobe 837061429793323041 1 0 0
je .LBB1_1
# %bb.2: # %bb2
.pseudoprobe 837061429793323041 6 2 0
callq foo
.pseudoprobe 837061429793323041 3 0 0
.pseudoprobe 837061429793323041 4 0 0
popq %rax
retq
.LBB1_1: # %bb1
.pseudoprobe 837061429793323041 5 1 0
callq *%rsi
.pseudoprobe 837061429793323041 2 0 0
.pseudoprobe 837061429793323041 4 0 0
popq %rax
retq
# -- End function
.section .pseudo_probe_desc,"",@progbits
.quad 6699318081062747564
.quad 72617220756
.byte 3
.ascii "foo"
.quad 837061429793323041
.quad 281547593931412
.byte 4
.ascii "foo2"
```
With inlining turned on, the assembly may look different around %bb2 with an inlined probe:
```
# %bb.2: # %bb2
.pseudoprobe 837061429793323041 3 0
.pseudoprobe 6699318081062747564 1 0 @ 837061429793323041:6
.pseudoprobe 837061429793323041 4 0
popq %rax
retq
```
**Disassembling**
We have a disassembling tool (llvm-profgen) that can display disassembly alongside with pseudo probes. So far it only supports ELF executable file.
An example disassembly looks like:
```
00000000002011a0 <foo2>:
2011a0: 50 push rax
2011a1: 85 ff test edi,edi
[Probe]: FUNC: foo2 Index: 1 Type: Block
2011a3: 74 02 je 2011a7 <foo2+0x7>
[Probe]: FUNC: foo2 Index: 3 Type: Block
[Probe]: FUNC: foo2 Index: 4 Type: Block
[Probe]: FUNC: foo Index: 1 Type: Block Inlined: @ foo2:6
2011a5: 58 pop rax
2011a6: c3 ret
[Probe]: FUNC: foo2 Index: 2 Type: Block
2011a7: bf 01 00 00 00 mov edi,0x1
[Probe]: FUNC: foo2 Index: 5 Type: IndirectCall
2011ac: ff d6 call rsi
[Probe]: FUNC: foo2 Index: 4 Type: Block
2011ae: 58 pop rax
2011af: c3 ret
```
Reviewed By: wmi
Differential Revision: https://reviews.llvm.org/D91878
We have two types of relocations that we apply on startup:
1. Relocations that apply to wasm globals
2. Relocations that apply to wasm memory
The first set of relocations use only the `__memory_base` import to
update a set of internal globals. Because wasm globals are thread local
these need to run on each thread. Memory relocations, like static
constructors, must only be run once.
To ensure global relocations run on all threads and because the only
depend on the immutable `__memory_base` import we can run them during
the WebAssembly start functions, instead of waiting until the
post-instantiation __wasm_call_ctors.
Differential Revision: https://reviews.llvm.org/D93066
after destroying an InstantiatingTemplate object.
This previously caused us to (silently!) bail out of class template
instantiation, thinking we'd produced an error, in some corner cases.
llvm-cov -path-equivalence=/tmp,... is used by some checked-in coverage mapping
files where the original filename is under /tmp. If the test itself produces the
coverage mapping file, there is no need for /tmp.
For coverage_emptylines.cpp: the source filename is under the build directory.
If the build directory is under /tmp, the path mapping will make
llvm-cov fail to find the file.
This CL changes the asm syntax for section flags, making them more like ELF
(previously "passive" was the only option). Now we also allow "G" to designate
COMDAT group sections. In these sections we set the appropriate comdat flag on
function symbols, and also avoid auto-creating a new section for them.
This also adds asm-based tests for the changes D92691 to go along with
the direct-to-object tests.
Differential Revision: https://reviews.llvm.org/D92952
This is a reland of rG4564553b8d8a with a fix to the lit pipeline in
llvm/test/MC/WebAssembly/comdat.ll
Dylibs that are "public" -- i.e. top-level system libraries -- are considered
implicitly linked when another library re-exports them. That is, we should load
them & bind directly to their symbols instead of via their re-exporting
umbrella library. This diff implements that behavior by default, as well as an
opt-out flag.
In theory, this is just a performance optimization, but in practice it seems
that it's needed for correctness.
Fixes llvm.org/PR48395.
Reviewed By: thakis
Differential Revision: https://reviews.llvm.org/D93000
We need to initialize AsmParsers before any calls to `addFile`, as
bitcode files may require them. Otherwise we trigger `Assertion T &&
T->hasMCAsmParser()' failed`.
Reviewed By: #lld-macho, compnerd
Differential Revision: https://reviews.llvm.org/D92913
`-mcpu` and `-code-model` tests were copied from similar ones in
LLD-ELF.
There doesn't seem to be an equivalent test for `-mattr` in LLD-ELF, so
I've verified our behavior by cribbing a test from
CodeGen/X86/recip-fastmath.ll.
Reviewed By: #lld-macho, compnerd, MaskRay
Differential Revision: https://reviews.llvm.org/D92912
This was causing a crash as we were attempting to look up the
nonexistent parent OutputSection of the debug sections. We didn't detect
it earlier because there was no test for PIEs with debug info (PIEs
require us to emit rebases for X86_64_RELOC_UNSIGNED).
This diff filters out the debug sections while loading the ObjFiles. In
addition to fixing the above problem, it also lets us avoid doing
redundant work -- we no longer parse / apply relocations / attempt to
emit dyld opcodes for these sections that we don't emit.
Fixes llvm.org/PR48392.
Reviewed By: thakis
Differential Revision: https://reviews.llvm.org/D92904
This makes it possible for STABS entries to reference the debug info
contained in the LTO-compiled output.
I'm not sure how to test the file mtime within llvm-lit -- GNU and BSD
`stat` take different command-line arguments. I've omitted the check for
now.
Reviewed By: clayborg
Differential Revision: https://reviews.llvm.org/D92537