The instruction is used to modify wave priority with the intent
to affect VALU execution and currently we can reschedule VALU
around it since that VALU does not have side effects.
Differential Revision: https://reviews.llvm.org/D130654
This patch adds canonicalization conditions for omp.atomic.update thus
eliminating it when it becomes just a write or a no-op due to other
changes during canonicalization.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D126531
Teach libDebugInfo (llvm-dwarfdump) and lldb about DWARF tags and
attributes for pointer authentication. These values have been emitted by
Apple clang for several releases. Although upstream LLVM doesn't emit
these values yet, we hope to upstream that part sometime soon.
Differential revision: https://reviews.llvm.org/D130215
The problem here is target independent, but particularly painful on RISCV. If we chose to vectorize such that vscale x 2 x i32 is our widest type and fits in a register, a naive expansion of i64 comparisons results in comparisons and index types at <scalabe x 2 x i64>. This requires both an LMUL of 2, and a VSETVLI toggle in the loop. Note that we could have used <vscale x 2 x i32> for the compairons legally given the range of the trip count.
Use a delegating constructor to remove the last use of the deprecated
ctor of `TypeErasedDataflowAnalysis`, and then delete it.
Differential Revision: https://reviews.llvm.org/D130653
Without this, the intrinsic will be expanded to an integer; thereby an
explicit copy (from GPR to SIMD register) will be codegen'd. This matches the
general convention of using "v1" types to represent scalar integer operations in
vector registers.
The similar approach is observed in D56616, and the pattern likely applies on
other intrinsic that accepts integer scalars (e.g.,
int_aarch64_neon_sqdmulls_scalar)
Differential Revision: https://reviews.llvm.org/D130548
This adds similar heuristics to G_GLOBAL_VALUE, querying the cost of
materializing a specific constant in code size. Doing so prevents us from
sinking constants which require multiple instructions to generate into
use blocks.
Code size savings on CTMark -Os:
Program size.__text
before after diff
ClamAV/clamscan 381940.00 382052.00 0.0%
lencod/lencod 428408.00 428428.00 0.0%
SPASS/SPASS 411868.00 411876.00 0.0%
kimwitu++/kc 449944.00 449944.00 0.0%
Bullet/bullet 463588.00 463556.00 -0.0%
sqlite3/sqlite3 284696.00 284668.00 -0.0%
consumer-typeset/consumer-typeset 414492.00 414424.00 -0.0%
7zip/7zip-benchmark 595244.00 594972.00 -0.0%
mafft/pairlocalalign 247512.00 247368.00 -0.1%
tramp3d-v4/tramp3d-v4 372884.00 372044.00 -0.2%
Geomean difference -0.0%
Differential Revision: https://reviews.llvm.org/D130554
I am playing with the LoopDataPrefetch pass and found out that it
bails to work with a pointer in a non-zero address space. This
patch adds the target callback to check if an address space is to
be considered for prefetching. Default implementation still only
allows address space 0, so this is NFCI.
This does not currently affect any known targets, but seems to be
generally useful for the future.
Differential Revision: https://reviews.llvm.org/D129795
D98179 added a mechanism to sort tests by test time to run slow tests
early, increasing potential parallelism. It also added a feature where
negative tests would be marked as negative, allowing subsequent test
runs to run them earlier. Unfortunately it never actually stored the
negative time, even if all the other code seemed to be inplace to sort
them early. Luckily the fix seems simple.
Differential Revision: https://reviews.llvm.org/D130570
I don't have any evidence these particular uses are actually causing any
issues, but we should avoid accidentally truncating immediate values
depending on the host.
We can use slli.uw by C followed by sh1add. Similar can be done
for multiples of 5 and 9. We need to make sure that C is less than
32 to stay in bounds of the 5-bit immediate for slli.uw.
We have existing patterns for (mul X, 3<<C) that use sh1add
followed by slli. That order doesn't allow the and to be folded.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D130146
A mul by a negated power of 2 is a slli followed by neg. This doesn't
require any constant materialization and may be lower latency than mul.
The neg may also be foldable into other arithmetic.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D130047
We can't guarantee the long always 64 bits like WINDOWS or LLP64 data
model (rare but we should consider).
So use int64_t from inttypes.h and safe in this case.
Fixes https://github.com/llvm/llvm-project/issues/55911 .
Change `sinf` range reduction to mod pi/16 to be shared with `cosf`.
Previously, `sinf` used range reduction `mod pi`, but this cannot be used to implement `cosf` since the minimax algorithm for `cosf` does not converge due to critical points at `pi/2`. In order to be able to share the same range reduction functions for both `sinf` and `cosf`, we change the range reduction to `mod pi/16` for the following reasons:
- The table size is sufficiently small: 32 entries for `sin(k * pi/16)` with `k = 0..31`. It could be reduced to 16 entries if we treat the final sign separately, with an extra multiplication at the end.
- The polynomials' degrees are reduced to 7/8 from 15, with extra computations to combine `sin` and `cos` with trig sum equality.
- The number of exceptional cases reduced to 2 (with FMA) and 3 (without FMA).
- The latency is reduced while maintaining similar throughput as before.
Reviewed By: zimmermann6
Differential Revision: https://reviews.llvm.org/D130629
After evaluating the new style I noticed inner namespaces are now
indented. I am not fond of that style and I've seen some other review
comment in this regard so I propose we remove this option and use the
LLVM default not to indent it.
Reviewed By: ldionne, philnik, var-const, #libc
Differential Revision: https://reviews.llvm.org/D129441
All of our other tests are functionality tests constrained to some
specific configuration. This one is intended to float with the
default configuration so that changes in that default are visible
in reviews. Note that our current default does not enable
vectorization at all; thus the current output is unvectorized.
Sparse compiler failed on the provided test (when the sparse kernel is nested in a scf structrual operator).
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D130609
The options -f{no-}color-diagnostics have been supported in driver. This
supports the behaviors in scanning, parsing, and semantics, and the
behaviors are exactly the same as the driver.
To illustrate the added behaviour, consider the following input file:
```! file.f90
program m
integer :: i = k
end
```
In the following invocations, "error: Must be a constant value" _will be_
formatted:
```
$ flang-new file.f90
error: Semantic errors in file.f90
./file.f90:2:18: error: Must be a constant value
integer :: i = k
```
Note that "error: Semantic errors in file.f90" is also formatted, which
is supported in https://reviews.llvm.org/D126164.
Also note that only "error", "warning" and "portability" are formatted.
Check the following input file:
```! file2.f90
program m
integer :: i =
end
```
```
$ flang-new test2.f90
error: Could not parse test2.f90
./test2.f90:2:11: error: expected '('
integer :: i =
^
./test2.f90:2:3: in the context: statement function definition
integer :: i =
^
...
```
The "error: Could not parse test2.f90" and "error: expected '('" are
formatted. Others such as "in the context" are not formatted yet, which
may or may not be supported.
Reviewed By: awarzynski
Differential Revision: https://reviews.llvm.org/D126166
There is post comment of adding TODO/FIXME for privatization of loop
bounds in D127137. D127137 fixes the bug in OpenMP firstprivate clause,
which should be refactored later according to the post comment. Add
FIXME for it.
Differential Revision: https://reviews.llvm.org/D130625
Adds the papers and LWG issues voted in during the July 2022 plenary.
Note the updating of the project based statuses is left to the active
contributors of these projects.
Reviewed By: #libc, huixie90, philnik
Differential Revision: https://reviews.llvm.org/D130595
It errors out in the Bazel CI:
AMDGPULowerModuleLDSPass.cpp:384:12: error: chosen constructor is
explicit in copy-initialization
return {SGV, std::move(Map)};
Reviewed By: rupprecht
Differential Revision: https://reviews.llvm.org/D130623
The header file OpenMPDialect.h is added in Bridge.cpp in D130027,
but it is unused. Remove it.
Differential Revision: https://reviews.llvm.org/D130625
The behaviour of this patch is not great, but it has some side-effects
that are required for OpenMPOpt to work. The problem is that when we use
`-mlink-builtin-bitcode` we only import used symbols from the runtime.
Then OpenMPOpt will insert calls to symbols that were not previously
included. This patch removed this implicit behaviour as these functions
were kept alive by the `noinline` simply because it kept calls to them
in the module. This caused regression in some tests that relied on some
OpenMPOpt passes without using LTO. Reverting for the LLVM15 release but
will try to fix it more correctly on main.
This reverts commit d61d72dae6.
Fixes#56752
This patch changes legacy LTO to set data-sections by default. The user can
explicitly unset data-sections. The reason for this patch is to match the
behaviour of lld and gold plugin. Both lld and gold plugin have data-sections on
by default.
This patch also fixes the forwarding of the clang options -fno-data-sections and
-fno-function-sections to libLTO. Now, when -fno-data/function-sections are
specified in clang, -data/function-sections=0 will be passed to libLTO to
explicitly unset data/function-sections.
Reviewed By: w2yehia, MaskRay
Differential Revision: https://reviews.llvm.org/D129401
This patch changes legacy LTO to set data-sections by default. The user can
explicitly unset data-sections. The reason for this patch is to match the
behaviour of lld and gold plugin. Both lld and gold plugin have data-sections on
by default.
This patch also fixes the forwarding of the clang options -fno-data-sections and
-fno-function-sections to libLTO. Now, when -fno-data/function-sections are
specified in clang, -data/function-sections=0 will be passed to libLTO to
explicitly unset data/function-sections.
Reviewed By: w2yehia, MaskRay
Differential Revision: https://reviews.llvm.org/D129401
Add custom attribute for complex dialect. Although this commit does not have significant impact on the conversion framework, it will lead us to construct complex numbers in a readable and tidy manner.
Related discussion: https://reviews.llvm.org/D127476
Reviewed By: pifon2a, akuegel
Differential Revision: https://reviews.llvm.org/D130149
The code relied on ManagedStatic.h being included indirectly. This is
about to change as uses of ManagedStatic are removed throughout the
codebase.
Differential Revision: https://reviews.llvm.org/D130575
The fold in it's current state only checks whether the amount of dynamic indices is 1. This does however not check for the presence of any struct indices, leading to an incorrect fold.
This patch fixes that issue by checking that struct indices are 1, which in addition to the pre-existing check that dynamic indices are 1, guarantees that the single index is a dynamic one.
Differential Revision: https://reviews.llvm.org/D129374
The DumpDataExtractor function had two branches for printing floating
point values. One branch (APFloat) was used if we had a Target object
around and could query it for the appropriate semantics. If we didn't
have a Target, we used host operations to read and format the value.
This patch changes second path to use APFloat as well. To make it work,
I pick reasonable defaults for different byte size. Notably, I did not
include x87 long double in that list (as it is ambibuous and
architecture-specific). This exposed a bug where we were printing
register values using the target-less branch, even though the registers
definitely belong to a target, and we had it available. Fixing this
prompted the update of several tests for register values due to slightly
different floating point outputs.
The most dubious aspect of this patch is the change in
TypeSystemClang::GetFloatTypeSemantics to recognize `10` as a valid size
for x87 long double. This was necessary because because sizeof(long
double) on x86_64 is 16 even though it only holds 10 bytes of useful
data. This generalizes the hackaround present in the target-free branch
of the dumping function.
Differential Revision: https://reviews.llvm.org/D129750