There is no need to check for enabled pragma for core or optional core features,
thus this check is removed
Reviewed By: Anastasia
Differential Revision: https://reviews.llvm.org/D97058
Forward references to blocks lead to `Block`s being allocated in the
parser, but they are not necessarily included into a region if parsing
fails, leading to a leak. Clean them up in parser destructor.
Reviewed By: rriddle, mehdi_amini
Differential Revision: https://reviews.llvm.org/D98403
Add simplification of smul.fix and smul.fix.sat according to
X * 0 -> 0
X * undef -> 0
X * (1 << scale) -> X
This includes the commuted patterns and splatted vectors.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D98299
Do constant folding according to
posion * C -> poison
C * poison -> poison
undef * C -> 0
C * undef -> 0
for smul_fix and smul_fix_sat intrinsics (for any scale).
Reviewed By: nikic, aqjune, nagisa
Differential Revision: https://reviews.llvm.org/D98410
This restricts the attributes to integers for constants of type
IndexType. So far an attribute like StringAttr as in
%c1 = constant "" : index
is valid.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D98216
Implements parts of:
- P0898R3 Standard Library Concepts
- P1754 Rename concepts to standard_case for C++20, while we still can
Depends on D97443
Reviewed By: Quuxplusone, EricWF, #libc
Differential Revision: https://reviews.llvm.org/D97911
Since D86233 we have `mustprogress` which, in combination with
`readonly`, implies `willreturn`. The idea is that every side-effect
has to be modeled as a "write". Consequently, `readonly` means there
is no side-effect, and `mustprogress` guarantees that we cannot "loop"
forever without side-effect.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D94125
The update_test_checks script can now check for global symbols and is able
to handle them properly when they differ across prefixes, e.g.,
attribute #0 might be different in different runs.
This patch simply updates all the Attributor tests with the new script.
Reviewed By: sstefan1
Differential Revision: https://reviews.llvm.org/D97906
The shuffle idiom is differently implemented in our supported targets.
To reduce the "target_impl" file we now move the shuffle idiom in it's
own self-contained header that provides the implementation for AMDGPU
and NVPTX. A fallback can be added later on.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D95752
Nested `omp [begin|end] declare variant` inherit the selectors from
surrounding `omp (begin|end) declare variant` constructs. To stop such
propagation the user can add the `disable_selector_propagation` to the
`extension` set in the `implementation` selector.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D95765
If we have nested declare variant context, it doesn't make sense to
inherit the match extension from the parent. Instead, just skip it.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D95764
This allows to check for various globals (metadata/attributes/...) and
also resolves problems with globals (metadata/attributes/...) being
reused across different prefixes.
Reviewed By: sstefan1
Differential Revision: https://reviews.llvm.org/D94741
D96109 added support for unique internal linkage names for both internal
linkage functions and global variables. There was a lot of discussion on how to
get the demangling right for functions but I completely missed the point that
demanglers do not support suffixes for global vars. For example:
$ c++filt _ZL3foo
foo
$ c++filt _ZL3foo.uniq.123
_ZL3foo.uniq.123
The demangling for functions works as expected.
I am not sure of the impact of this. I don't understand how debuggers and other
tools depend on the correctness of global variable demangling so I am
pre-emptively disabling it until we can get the demangling support added.
Importantly, uniquefying global variables is not needed right now as we do not
do profile attribution to global vars based on sampling. It was added for
completeness and so this feature is not exactly missed.
Differential Revision: https://reviews.llvm.org/D98392
Summary: This is a minor patch to add names for the debug line prologue, as a follow-up of D95998.
Reviewed By: dblaikie, ikudrin, shchenz
Differential Revision: https://reviews.llvm.org/D98383
We don't support any other shuffles currently.
This changes the bswap/bitreverse tests that check for this in
their expansion code. Previously we expanded a byte swapping
shuffle through memory. Now we're scalarizing and doing bit
operations on scalars to swap bytes.
In the future we can probably use vrgather.vx to do a byte swap
shuffle.
Implements parts of:
- P0898R3 Standard Library Concepts
- P1754 Rename concepts to standard_case for C++20, while we still can
Depends on D97359
Reviewed By: EricWF, #libc, Quuxplusone, zoecarver
Differential Revision: https://reviews.llvm.org/D97443
Implements parts of:
- P0898R3 Standard Library Concepts
- P1754 Rename concepts to standard_case for C++20, while we still can
Depends on D97162
Reviewed By: EricWF, #libc, Quuxplusone
Differential Revision: https://reviews.llvm.org/D97359
Currently, BPF backend does not support all variants of
atomic_load_{add,and,or,xor}, atomic_swap and atomic_cmp_swap
For example, it only supports 32bit (with alu32 mode) and 64bit
operations for atomic_load_{and,or,xor}, atomic_swap and
atomic_cmp_swap. Due to historical reason, atomic_load_add is
always supported with 32bit and 64bit.
If user used an unsupported atomic operation, currently,
codegen selectiondag cannot find bpf support and will issue
a fatal error. This is not user friendly as user may mistakenly
think this is a compiler bug.
This patch added Custom rule for unsupported atomic operations
and will emit better error message during ReplaceNodeResults()
callback. The following is an example output.
$ cat t.c
short sync(short *p) {
return __sync_val_compare_and_swap (p, 2, 3);
}
$ clang -target bpf -O2 -g -c t.c
t.c:2:11: error: Unsupported atomic operations, please use 64 bit version
return __sync_val_compare_and_swap (p, 2, 3);
^
fatal error: error in backend: Cannot select: t19: i64,ch =
AtomicCmpSwap<(load store seq_cst seq_cst 2 on %ir.p)> t0, t2,
Constant:i64<2>, Constant:i64<3>, t.c:2:11
t2: i64,ch = CopyFromReg t0, Register:i64 %0
t1: i64 = Register %0
t11: i64 = Constant<2>
t10: i64 = Constant<3>
In function: sync
PLEASE submit a bug report ...
Fatal error will still happen since we did not really do proper
lowering for these unsupported atomic operations. But we do get
a much better error message.
Differential Revision: https://reviews.llvm.org/D98471
The inlining of this function needs to be disabled as it is part of the
inpsected stack traces. It's string representation will look different
depending on if it was inlined or not which will cause it's string comparison
to fail.
When it was inlined in only one of the two execution stacks,
minimize_two_crashes.test failed on SystemZ. For details see
https://bugs.llvm.org/show_bug.cgi?id=49152.
Reviewers: Ulrich Weigand, Matt Morehouse, Arthur Eubanks
Differential Revision: https://reviews.llvm.org/D97975
As llvm.amdgcn.kill is lowered to a terminator it can cause
else branch annotations to end up in the wrong block.
Do not annotate conditionals as else branches where there is
a kill to avoid this.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D97427
Right now, when you have an invalid memory address, asan would just crash and does not offer much useful info.
This patch attempted to give a bit more detail on the access.
Differential Revision: https://reviews.llvm.org/D98280
Some linux distributions produce versioned llvm-symbolizer binaries,
e.g. my llvm-11 installation puts the symbolizer binary at
/usr/bin/llvm-symbolizer-11.0.0 . However if you then try to run
a binary containing ASAN with
ASAN_SYMBOLIZER_PATH=..../llvm-symbolizer-FOO , it will fail on startup
with "isn't a known symbolizer".
Although it is possible to work around this by setting up symlinks,
that's kindof ugly - supporting versioned binaries is a nicer solution.
(There are now multiple stack overflow and blog posts talking about
this exact issue :) .)
Originally added in:
https://reviews.llvm.org/D8285
Reviewed By: eugenis
Differential Revision: https://reviews.llvm.org/D97682
This patch introduces progressive lowering patterns for rewriting
vector.transfer_read/write to vector.load/store and vector.broadcast
in certain supported cases.
Reviewed By: dcaballe, nicolasvasilache
Differential Revision: https://reviews.llvm.org/D97822
The "Inputs" subdirectory is used for all files read by the test, not
only those used as input to the execution - so even though this file is
used as a golden reference for the output of the test, it's still an
input to the test execution (it is read in the process of executing the
test).
From what I can tell, the loop inside applyExternalSymbolRelocations()
used to call getSymbolAddress(). After the JITSymbolResolver interface
redesign, the functionality has changed, and the loop should no longer
trigger repopulation of ExternalSymbolRelocations. If that's the case,
there is no need to update the loop iterator manually, and
ExternalSymbolRelocations can be cleared at once. This way, when there
are many external symbols in the program, the function runs much faster.
Reviewed By: lhames
Differential Revision: https://reviews.llvm.org/D97531
If a log message is triggered between execv and child, this test fails.
In the meantime, disable the test to unblock CI
rdar://74992832
Reviewed By: delcypher
Differential Revision: https://reviews.llvm.org/D98453
This instruction is only valid on 2D MSAA and 2D MSAA Array
surfaces. Remove intrinsic support for other dimension types,
and block assembly for unsupported dimensions.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D98397
We have amdgpu_gfx functions that have high register pressure. If
we do not reserve VGPR for SGPR spill, we will fall into the path
to spill the SGPR to memory, which does not only have correctness issue,
but also have really bad performance.
I don't know why there is the check for hasStackObjects(), in our case,
we don't have stack objects at the time of finalizeLowering(). So just
remove the check that we always reserve a VGPR for possible SGPR spill
in non-entry functions.
Reviewed by: arsenm
Differential Revision: https://reviews.llvm.org/D98345
I met some code generation behavior change when I tried to remove
the hasStackObject() check when reserving VGPR for SGPR spill.
For example, the function `callee_no_stack_no_fp_elim_all` in the lit
test file `callee-frame-setup.ll`.
The generated code changed from:
```
s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
s_mov_b32 s4, s33
s_mov_b32 s33, s32
s_mov_b32 s33, s4
s_setpc_b64 s[30:31]
```
into something like:
```
s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
v_writelane_b32 v63, s33, 0
s_mov_b32 s33, s32
v_readlane_b32 s33, v63, 0
s_setpc_b64 s[30:31]
```
I think we still prefer the old version where only scalar instructions are needed.
The idea here is free the reserved VGPR if no SGPR spills. So we will very likely
to use a free SGPR for fp/sp spill.
Reviewed by: arsenm
Differential Revision: https://reviews.llvm.org/D98344
Attempting to build a standalone libFuzzer in Fuchsia's default toolchain for the purpose of cross-compiling the unit tests revealed a number of not-quite-proper type conversions. Fuchsia's toolchain include `-std=c++17` and `-Werror`, among others, leading to many errors like `-Wshorten-64-to-32`, `-Wimplicit-float-conversion`, etc.
Most of these have been addressed by simply making the conversion explicit with a `static_cast`. These typically fell into one of two categories: 1) conversions between types where high precision isn't critical, e.g. the "energy" calculations for `InputInfo`, and 2) conversions where the values will never reach the bits being truncated, e.g. `DftTimeInSeconds` is not going to exceed 136 years.
The major exception to this is the number of features: there are several places that treat features as `size_t`, and others as `uint32_t`. This change makes the decision to cap the features at 32 bits. The maximum value of a feature as produced by `TracePC::CollectFeatures` is roughly:
(NumPCsInPCTables + ValueBitMap::kMapSizeInBits + ExtraCountersBegin() - ExtraCountersEnd() + log2(SIZE_MAX)) * 8
It's conceivable for extremely large targets and/or extra counters that this limit could be reached. This shouldn't break fuzzing, but it will cause certain features to collide and lower the fuzzers overall precision. To address this, this change adds a warning to TracePC::PrintModuleInfo about excessive feature size if it is detected, and recommends refactoring the fuzzer into several smaller ones.
Reviewed By: morehouse
Differential Revision: https://reviews.llvm.org/D97992
This patch adds support for vectorizing loops with 'iter_args' when those loops
are not a vector dimension. This allows vectorizing outer loops with an inner
'iter_args' loop (e.g., reductions). Vectorizing scenarios where 'iter_args'
loops are vector dimensions would require more work (e.g., analysis,
generating horizontal reduction, etc.) not included in this patch.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D97892
The S_[GL]PROC32_ID symbol records are supposed to point to function ID
records. If they don't, they are corrupt. The warning message here was
very technical, but a user has encountered it in the wild. Add some more
information and some more testing.
This patch extends the matrix spec to allow matrix-by-scalar division.
Originally support for `/` was left out to avoid ambiguity for the
matrix-matrix version of `/`, which could either be elementwise or
specified as matrix multiplication M1 * (1/M2).
For the matrix-scalar version, no ambiguity exists; `*` is also
an elementwise operation in that case. Matrix-by-scalar division
is commonly supported by systems including Matlab, Mathematica
or NumPy.
Reviewed By: rjmccall
Differential Revision: https://reviews.llvm.org/D97857
This patch replaces the root-terminal vectorization approach implemented in the
Affine vectorizer with a topological order approach that vectorizes all the
operations within the target loop nest. These are the most important changes
introduced by the new algorithm:
* Removed tracking of root and terminal ops. Existing vectorization
functionality is preserved and extended so that loop nests without
root-terminal chains can be vectorized.
* Vectorizing a loop nest now only requires a single topological traversal.
* A new vector loop nest is incrementally built along the vectorization
process. The original scalar loop is kept intact. No cloning guard is needed
to recover the scalar loop if vectorization fails. This approach also
simplifies the challenging task of replacing a loop operation amid the
vectorization process without invalidating the analysis information that
depends on the original loop.
* Vectorization of specific operations has been implemented as independent,
preparing them to be moved to a potential vectorization interface.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D97442
These patterns are obviously dead, they are using format
operand which is not selected and we have no corresponding
SelectMUBUF() function.
Differential Revision: https://reviews.llvm.org/D98451