This makes it simpler to determine when two registers are actually the
same vs just partially aliasing.
The only real caveat is that it becomes impossible to know which name
was used for the register previously. (i.e. parsing assembly and then
disassembling it can result in the register name changing.)
Differential Revision: https://reviews.llvm.org/D98536
This is required in order to determine during disassembly whether a
Reg bead without associated DA bead is referring to a data register.
Differential Revision: https://reviews.llvm.org/D98534
This reverts commit d09adfd399.
That commit caused failures in
clang-tidy/infrastructure/validate-check-names.cpp on windows
buildbots.
That change exposed a surprising issue, not directly related to
this change in itself, but in how TestRunner quotes command line
arguments that later are going to be interpreted by a msys based
tool (like grep.exe, when provided by Git for Windows). This
worked accidentally before, when grep was invoked via not.exe
which took a more conservative approach to windows argument quoting.
TestExitDuringExpression test_exit_before_one_thread_unwind fails
sporadically on both Arm and AArch64 linux buildbots.
This seems like a thread timing issue. I am marking it skip for now.
When running in a Windows Container, the Git for Windows Unix tools
(C:\Program Files\Git\usr\bin) just hang if this variable isn't
passed through.
Currently, running the LLVM/clang tests in a Windows Container fails
if that directory is added to the path, but succeeds after this change.
(After this change, the previously used GnuWin tools can be left out
entirely, too, as lit automatically picks up the Git for Windows tools
if necessary.)
Differential Revision: https://reviews.llvm.org/D98858
Keep running "not --crash" via the external "not" executable, but
for plain negations, and for cases that use the shell "!" operator,
just skip that argument and invert the return code.
The libcxx tests only use the shell operator "!" for negations,
never the "not" executable, because libcxx tests can be run without
having a fully built llvm tree available providing the "not"
executable.
This allows using the internal shell for libcxx tests.
Differential Revision: https://reviews.llvm.org/D98859
This change combines for ROCm what was done for CUDA in D97463, D98203, D98360, and D98396.
I did not try to compile SerializeToHsaco.cpp or test mlir/test/Integration/GPU/ROCM because I don't have an AMD card. I fixed the things that had obvious bit-rot though.
Reviewed By: whchung
Differential Revision: https://reviews.llvm.org/D98447
When deleting operations in DCE, the algorithm uses a post-order walk of
the IR to ensure that value uses were erased before value defs. Graph
regions do not have the same structural invariants as SSA CFG, and this
post order walk could delete value defs before uses. This problem is
guaranteed to occur when there is a cycle in the use-def graph.
This change stops DCE from visiting the operations and blocks in any
meaningful order. Instead, we rely on explicitly dropping all uses of a
value before deleting it.
Reviewed By: mehdi_amini, rriddle
Differential Revision: https://reviews.llvm.org/D98919
Add extra `type.isa<FloatType>()` check to `FloatAttr::get(Type, double)` method.
Otherwise it tries to call `type.cast<FloatType>()`, which fails with assertion in Debug mode.
The `!type.isa<FloatType>()` case just redirercts the call to `FloatAttr::get(Type, APFloat)`,
which will perform the actual check and emit appropriate error.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D98764
We can prove more predicates when we have a context when eliminating ICmp.
As first (and very obvious) approximation we can use the ICmp instruction itself,
though in the future we are going to use a common dominator of all its users.
Need some refactoring before that.
Observed ~0.5% negative compile time impact.
Differential Revision: https://reviews.llvm.org/D98697
Reviewed By: lebedev.ri
C functions may be declared and defined in different prototypes like below. This patch unifies the checks for mangling names in symbol linkage name emission and debug linkage name emission so that the two names are consistent.
static int go(int);
static int go(a) int a;
{
return a;
}
Test Plan:
Differential Revision: https://reviews.llvm.org/D98799
This changes adds attribute field for metadata of context profile. Currently we have an inline attribute that indicates whether the leaf frame corresponding to a context profile was inlined in previous build.
This will be used to help estimating inlining and be taken into account when trimming context. Changes for that in llvm-profgen will follow. It will also help tuning.
Differential Revision: https://reviews.llvm.org/D98823
By definition of Implication operator, `false -> true` and `false -> false`. It means that
`false` implies any predicate, no matter true or false. We don't need to go any further
trying to prove the statement we need and just always say that `false` implies it in this case.
In practice it means that we are trying to prove something guarded by `false` condition,
which means that this code is unreachable, and we can safely prove any fact or perform any
transform in this code.
Differential Revision: https://reviews.llvm.org/D98706
Reviewed By: lebedev.ri
For instance, some recent clang emits this code on x86_64:
0x100002b99 <+57>: callq 0x100002b40 ; step_out_of_here at main.cpp:11
-> 0x100002b9e <+62>: xorl %eax, %eax
0x100002ba0 <+64>: popq %rbp
0x100002ba1 <+65>: retq
and the "xorl %eax, %eax" is attributed to the same line as the callq. Since
step out is supposed to stop just on returning from the function, you can't guarantee
it will end up on the next line. I changed the test to check that we were either
on the call line or on the next line, since either would be right depending on the
debug information.
These experimental builtin functions and the feature macro they were gated
behind have been removed.
Reviewed By: aheejin
Differential Revision: https://reviews.llvm.org/D98907
For Zvlsseg, we create several tuple register classes. When spilling for
these tuple register classes, we need to iterate NF times to load/store
these tuple registers.
Differential Revision: https://reviews.llvm.org/D98629
On ELF, we place the metadata sections (`__sancov_guards`, `__sancov_cntrs`,
`__sancov_bools`, `__sancov_pcs` in section groups (either `comdat any` or
`comdat noduplicates`).
With `--gc-sections`, LLD since D96753 and GNU ld `-z start-stop-gc` may garbage
collect such sections. If all `__sancov_bools` are discarded, LLD will error
`error: undefined hidden symbol: __start___sancov_cntrs` (other sections are similar).
```
% cat a.c
void discarded() {}
% clang -fsanitize-coverage=func,trace-pc-guard -fpic -fvisibility=hidden a.c -shared -fuse-ld=lld -Wl,--gc-sections
...
ld.lld: error: undefined hidden symbol: __start___sancov_guards
>>> referenced by a.c
>>> /tmp/a-456662.o:(sancov.module_ctor_trace_pc_guard)
```
Use the `extern_weak` linkage (lowered to undefined weak symbols) to avoid the
undefined error.
Differential Revision: https://reviews.llvm.org/D98903
We returned the input chain instead of the output chain from the
new load. This bypasses the load in the chain. I haven't found a
good way to test this yet. IR order prevents my initial attempts
at causing reordering.
This is only adding support to the dfsan instrumentation pass but not
to the runtime.
Added more RUN lines for testing: for each instrumentation test that
had a -dfsan-fast-16-labels invocation, a new invocation was added
using fast8.
Reviewed By: stephan.yichao.zhao
Differential Revision: https://reviews.llvm.org/D98734
This adds a tosa.apply_scale operation that handles the scaling operation
common to quantized operatons. This scalar operation is lowered
in TosaToStandard.
We use a separate ApplyScale factorization as this is a replicable pattern
within TOSA. ApplyScale can be reused within pool/convolution/mul/matmul
for their quantized variants.
Tests are added to both tosa-to-standard and tosa-to-linalg-on-tensors
that verify each pass is correct.
Reviewed By: silvas
Differential Revision: https://reviews.llvm.org/D98753
This reverts commit 962b73dd0f.
This commit was reverted because of some internal SPEC test failures.
It turns out that this wasn't actually relevant to anything in open source, so
it's safe to recommit this.
Includes lowering for tosa.concat with indice computation with subtensor insert
operations. Includes tests along two different indices.
Differential Revision: https://reviews.llvm.org/D98813
This is the alternative approach to D96931.
In LTO, for each module with inlineasm block, prepend directive ".lto_discard <sym>, <sym>*" to the beginning of the inline
asm. ".lto_discard" is both a module inlineasm block marker and (optionally) provides a list of symbols to be discarded.
In MC while emitting for inlineasm, discard symbol binding & symbol
definitions according to ".lto_disard".
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D98762
It is reported that after enabling hidden helper thread, the program
can hit the assertion `new_gtid < __kmp_threads_capacity` sometimes. The root
cause is explained as follows. Let's say the default `__kmp_threads_capacity` is
`N`. If hidden helper thread is enabled, `__kmp_threads_capacity` will be offset
to `N+8` by default. If the number of threads we need exceeds `N+8`, e.g. via
`num_threads` clause, we need to expand `__kmp_threads`. In
`__kmp_expand_threads`, the expansion starts from `__kmp_threads_capacity`, and
repeatedly doubling it until the new capacity meets the requirement. Let's
assume the new requirement is `Y`. If `Y` happens to meet the constraint
`(N+8)*2^X=Y` where `X` is the number of iterations, the new capacity is not
enough because we have 8 slots for hidden helper threads.
Here is an example.
```
#include <vector>
int main(int argc, char *argv[]) {
constexpr const size_t N = 1344;
std::vector<int> data(N);
#pragma omp parallel for
for (unsigned i = 0; i < N; ++i) {
data[i] = i;
}
#pragma omp parallel for num_threads(N)
for (unsigned i = 0; i < N; ++i) {
data[i] += i;
}
return 0;
}
```
My CPU is 20C40T, then `__kmp_threads_capacity` is 160. After offset,
`__kmp_threads_capacity` becomes 168. `1344 = (160+8)*2^3`, then the assertions
hit.
Reviewed By: protze.joachim
Differential Revision: https://reviews.llvm.org/D98838
Suppresses an implicit TypeSize to uint64_t conversion warning.
We might be able to just not offset it since we're writing to a
Fixed stack object, but I wasn't sure so I just did what
DAGTypeLegalizer::IncrementPointer does.
Reviewed By: sdesmalen
Differential Revision: https://reviews.llvm.org/D98736
In the existing OrcLazy mode, modules go through partitioning and outgoing calls are replaced by reexport stubs that resolve on call-through. In greedy mode that this patch unlocks for lli, modules materialize as a whole and trigger materialization for all required symbols recursively. This is useful for testing (e.g. D98785) and it's more similar to the way MCJIT works.