For instance, some recent clang emits this code on x86_64:
0x100002b99 <+57>: callq 0x100002b40 ; step_out_of_here at main.cpp:11
-> 0x100002b9e <+62>: xorl %eax, %eax
0x100002ba0 <+64>: popq %rbp
0x100002ba1 <+65>: retq
and the "xorl %eax, %eax" is attributed to the same line as the callq. Since
step out is supposed to stop just on returning from the function, you can't guarantee
it will end up on the next line. I changed the test to check that we were either
on the call line or on the next line, since either would be right depending on the
debug information.
These experimental builtin functions and the feature macro they were gated
behind have been removed.
Reviewed By: aheejin
Differential Revision: https://reviews.llvm.org/D98907
For Zvlsseg, we create several tuple register classes. When spilling for
these tuple register classes, we need to iterate NF times to load/store
these tuple registers.
Differential Revision: https://reviews.llvm.org/D98629
On ELF, we place the metadata sections (`__sancov_guards`, `__sancov_cntrs`,
`__sancov_bools`, `__sancov_pcs` in section groups (either `comdat any` or
`comdat noduplicates`).
With `--gc-sections`, LLD since D96753 and GNU ld `-z start-stop-gc` may garbage
collect such sections. If all `__sancov_bools` are discarded, LLD will error
`error: undefined hidden symbol: __start___sancov_cntrs` (other sections are similar).
```
% cat a.c
void discarded() {}
% clang -fsanitize-coverage=func,trace-pc-guard -fpic -fvisibility=hidden a.c -shared -fuse-ld=lld -Wl,--gc-sections
...
ld.lld: error: undefined hidden symbol: __start___sancov_guards
>>> referenced by a.c
>>> /tmp/a-456662.o:(sancov.module_ctor_trace_pc_guard)
```
Use the `extern_weak` linkage (lowered to undefined weak symbols) to avoid the
undefined error.
Differential Revision: https://reviews.llvm.org/D98903
We returned the input chain instead of the output chain from the
new load. This bypasses the load in the chain. I haven't found a
good way to test this yet. IR order prevents my initial attempts
at causing reordering.
This is only adding support to the dfsan instrumentation pass but not
to the runtime.
Added more RUN lines for testing: for each instrumentation test that
had a -dfsan-fast-16-labels invocation, a new invocation was added
using fast8.
Reviewed By: stephan.yichao.zhao
Differential Revision: https://reviews.llvm.org/D98734
This adds a tosa.apply_scale operation that handles the scaling operation
common to quantized operatons. This scalar operation is lowered
in TosaToStandard.
We use a separate ApplyScale factorization as this is a replicable pattern
within TOSA. ApplyScale can be reused within pool/convolution/mul/matmul
for their quantized variants.
Tests are added to both tosa-to-standard and tosa-to-linalg-on-tensors
that verify each pass is correct.
Reviewed By: silvas
Differential Revision: https://reviews.llvm.org/D98753
This reverts commit 962b73dd0f.
This commit was reverted because of some internal SPEC test failures.
It turns out that this wasn't actually relevant to anything in open source, so
it's safe to recommit this.
Includes lowering for tosa.concat with indice computation with subtensor insert
operations. Includes tests along two different indices.
Differential Revision: https://reviews.llvm.org/D98813
This is the alternative approach to D96931.
In LTO, for each module with inlineasm block, prepend directive ".lto_discard <sym>, <sym>*" to the beginning of the inline
asm. ".lto_discard" is both a module inlineasm block marker and (optionally) provides a list of symbols to be discarded.
In MC while emitting for inlineasm, discard symbol binding & symbol
definitions according to ".lto_disard".
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D98762
It is reported that after enabling hidden helper thread, the program
can hit the assertion `new_gtid < __kmp_threads_capacity` sometimes. The root
cause is explained as follows. Let's say the default `__kmp_threads_capacity` is
`N`. If hidden helper thread is enabled, `__kmp_threads_capacity` will be offset
to `N+8` by default. If the number of threads we need exceeds `N+8`, e.g. via
`num_threads` clause, we need to expand `__kmp_threads`. In
`__kmp_expand_threads`, the expansion starts from `__kmp_threads_capacity`, and
repeatedly doubling it until the new capacity meets the requirement. Let's
assume the new requirement is `Y`. If `Y` happens to meet the constraint
`(N+8)*2^X=Y` where `X` is the number of iterations, the new capacity is not
enough because we have 8 slots for hidden helper threads.
Here is an example.
```
#include <vector>
int main(int argc, char *argv[]) {
constexpr const size_t N = 1344;
std::vector<int> data(N);
#pragma omp parallel for
for (unsigned i = 0; i < N; ++i) {
data[i] = i;
}
#pragma omp parallel for num_threads(N)
for (unsigned i = 0; i < N; ++i) {
data[i] += i;
}
return 0;
}
```
My CPU is 20C40T, then `__kmp_threads_capacity` is 160. After offset,
`__kmp_threads_capacity` becomes 168. `1344 = (160+8)*2^3`, then the assertions
hit.
Reviewed By: protze.joachim
Differential Revision: https://reviews.llvm.org/D98838
Suppresses an implicit TypeSize to uint64_t conversion warning.
We might be able to just not offset it since we're writing to a
Fixed stack object, but I wasn't sure so I just did what
DAGTypeLegalizer::IncrementPointer does.
Reviewed By: sdesmalen
Differential Revision: https://reviews.llvm.org/D98736
In the existing OrcLazy mode, modules go through partitioning and outgoing calls are replaced by reexport stubs that resolve on call-through. In greedy mode that this patch unlocks for lli, modules materialize as a whole and trigger materialization for all required symbols recursively. This is useful for testing (e.g. D98785) and it's more similar to the way MCJIT works.
This reverts commit 32a744ab20.
CI is broken:
test/Dialect/Linalg/bufferize.mlir:274:12: error: CHECK: expected string not found in input
// CHECK: %[[MEMREF:.*]] = tensor_to_memref %[[IN]] : memref<?xf32>
^
-mbranch-protection protects the LR on the stack with PAC.
When the frames are walked the LR need to be cleared.
This inline assembly later will be replaced with a new builtin.
Test: build with -DCMAKE_C_FLAGS="-mbranch-protection=standard".
Reviewed By: kubamracek
Differential Revision: https://reviews.llvm.org/D98008
`BufferizeAnyLinalgOp` fails because `FillOp` is not a `LinalgGenericOp` and it fails while reading operand sizes attribute.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D98671
The cause is the non-async-signal-safety printf function (et al.). If
the test managed to interrupt the process and inject a signal before the
printf("@started") call returned (but after it has actually written the
output), that string could end up being printed twice (presumably,
because the function did not manage the clear the userspace buffer, and
so the print call in the signal handler would print it once again).
This patch fixes the issue by replacing the printf call in the signal
handler with a sprintf+write combo, which should not suffer from that
problem (though I wouldn't go as far as to call it async signal safe).
This reverts commit 6b053c9867.
The build is broken:
ld.lld: error: undefined symbol: llvm::VPlan::printDOT(llvm::raw_ostream&) const
>>> referenced by LoopVectorize.cpp
>>> LoopVectorize.cpp.o:(llvm::LoopVectorizationPlanner::printPlans(llvm::raw_ostream&)) in archive lib/libLLVMVectorize.a
The aim is to use the correct vasprintf implementation for z/OS libc++, where a copy of va_list ap is needed. In particular, it avoids the potential that the initial internal call to vsnprintf will modify ap and the subsequent call to vsnprintf will use that modified ap.
Differential Revision: https://reviews.llvm.org/D97473
I foresee two uses for this:
1) It's easier to use those in debugger.
2) Once we start implementing more VPlan-to-VPlan transformations (especially
inner loop massaging stuff), using the vectorized LLVM IR as CHECK targets in
LIT test would become too obscure. I can imagine that we'd want to CHECK
against VPlan dumps after multiple transformations instead. That would be
easier with plain text dumps than with DOT format.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D96628
Updates the names (e.g. widen => extend, saturate => sat) and opcodes of all
SIMD instructions to match the finalized SIMD spec. Deliberately does not change
the public interface in wasm_simd128.h yet; that will require more care.
Depends on D98466.
Differential Revision: https://reviews.llvm.org/D98676
Removes the instruction definitions, intrinsics, and builtins for qfma/qfms,
signselect, and prefetch instructions, which were not included in the final
WebAssembly SIMD spec.
Depends on D98457.
Differential Revision: https://reviews.llvm.org/D98466
Replace semantics::SymbolSet with alternatives that clarify
whether the set should order its contents by source position
or not. This matters because positionally-ordered sets must
not be used for Symbols that might be subjected to name
replacement during name resolution, and address-ordered
sets must not be used (without sorting) in circumstances
where the order of their contents affects the output of the
compiler.
All set<> and map<> instances in the compiler that are keyed
by Symbols now have explicit Compare types in their template
instantiations. Symbol::operator< is no more.
Differential Revision: https://reviews.llvm.org/D98878
Both libc++ and libc++abi have options of merging with another archive. In the case of libc++abi, libunwind can be merged into it and in the case of libc++, libc++abi can be merged into it.
This is realized using add_custom_command with POST_BUILD and the usage of the CMake generator expression TARGET_LINKER_FILE in the arguments. For such generator expressions CMake doc states: "This target-level dependency does NOT add a file-level dependency that would cause the custom command to re-run whenever the executable is recompiled" [1]
This patch adds a DEPENDS argument to both add_custom_command invocations so that the archives also have a file-level dependency on the target they are merging with. That way, changes in say, libunwind source code, will be updated in the libc++abi and/or libc++ static libraries as well.
[1] https://cmake.org/cmake/help/v3.20/command/add_custom_command.html
Differential Revision: https://reviews.llvm.org/D98129
`__cpp_lib_default_template_type_for_algorithm_values` is 52 characters long,
which is enough to reduce the multiplier to less-than-zero, producing an empty
string between the name of the macro and its numeric value. Ensure there's
always a space between the name of the macro and its value.
Differential Revision: https://reviews.llvm.org/D98869
This documents current regular LLVM sync-ups that are happening in the
Getting Involved section.
I hope this gives a bit more visibility to regular sync-ups that are
happening in the LLVM community, documenting another way communication
in the community happens.
Of course the downside is that this is another location that sync-up
metadata needs to be maintained. That being said, the structure as
proposed means that no changes are needed once a new sync-up is added,
apart from maybe removing the entry once it becomes clear that that
particular sync-up series is completely cancelled.
Documenting a few pointers on how current sync-ups happen may also
encourage others to organize useful sync-ups on specific topics.
I've started with adding the sync-ups I'm aware of. There's a good
chance I've missed some.
If most sync-ups end up having a public google calendar, we could also
create and maintain a public google calendar that shows all events
happening in the LLVM community, including dev meetings, sync-ups,
socials, etc - assuming that would be valuable.
Differential Revision: https://reviews.llvm.org/D98797
Now that the WebAssembly SIMD specification is finalized and engines are
generally up-to-date, there is no need for a separate target feature for gating
SIMD instructions that engines have not implemented. With this change,
v128.const is now enabled by default with the simd128 target feature.
Differential Revision: https://reviews.llvm.org/D98457
`--shuffle-sections=<seed>` applies to all sections. The new
`--shuffle-sections=<section-glob>=<seed>` makes shuffling selective. To the
best of my knowledge, the option is only used as debugging, so just drop the
original form.
`--shuffle-sections '.init_array*=-1'` `--shuffle-sections '.fini_array*=-1'`.
reverses static constructors/destructors of the same priority.
Useful to detect some static initialization order fiasco.
`--shuffle-sections '.data*=-1'`
reverses `.data*` sections. Useful to detect unfunded pointer comparison results
of two unrelated objects.
If certain sections have an intrinsic order, the old form cannot be used.
Differential Revision: https://reviews.llvm.org/D98679
Add register usage information to the runtime metadata so that it can be used during kernel launch (that change will be in a different commit). Add this information to the kernel trace.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D98829
value profile annotated after inlining.
In https://reviews.llvm.org/D96806 and https://reviews.llvm.org/D97350, we
use the magic number -1 in the value profile to avoid repeated indirect call
promotion to the same target for an indirect call. Function updateIDTMetaData
is used to mark an target as being promoted in the value profile with the
magic number. updateIDTMetaData is also used to update the value profile
when an indirect call is inlined and new inline instance profile should be
applied. For the second case, currently updateIDTMetaData mixes up the
existing value profile of the indirect call with the new profile, leading
to the problematic senario that a target count is larger than the total count
in the value profile.
The patch fixes the problem. When updateIDTMetaData is used to update the
value profile after inlining, all the values in the existing value profile
will be dropped except the values with the magic number counts.
Differential Revision: https://reviews.llvm.org/D98835
This reverts commit 11b70b9e3a.
The bot failure was due to ArgumentPromotion deleting functions
without deleting their analyses. This was separately fixed in 4b1c807.