The loop flattening pass requires loops to be in simplified form. If the
loops are not in simplified form, the pass cannot operate. This patch
simplifies all loops before flattening. As a result, all loops will be
simplified regardless of whether anything ends up being flattened.
This change was inspired by observing a certain loop that was not flatten
because the loops were not in simplified form. This loop is added as a
test to verify that it is now flattened.
Differential Revision: https://reviews.llvm.org/D102249
Change-Id: I45bcabe70fb99b0d89f0effafc82eb9e0585ec30
LLVM's build system contains support for configuring a distribution, but
it can often be useful to be able to configure multiple distributions
(e.g. if you want separate distributions for the tools and the
libraries). Add this support to the build system, along with
documentation and usage examples.
Reviewed By: phosek
Differential Revision: https://reviews.llvm.org/D89177
This patch begins to translate acc.enter_data operation to call to tgt runtime call.
It currently only translate create/copyin operands of memref type. This acts as a basis to add support
for FIR types in the Flang/OpenACC support. It follows more or less a similar path than clang
with `omp target enter data map` directives.
This patch is taking a different approach than D100678 and perform a translation to LLVM IR
and make use of the OpenMPIRBuilder instead of doing a conversion to the LLVMIR dialect.
OpenACC support in Flang will rely on the current OpenMP runtime where 1:1 lowering can be
applied. Some extension will be added where features are not available yet.
Big part of this code will be shared for other standalone data operations in the OpenACC
dialect such as acc.exit_data and acc.update.
It is likely that parts of the lowering can also be shared later with the ops for
standalone data directives in the OpenMP dialect when they are introduced.
This is an initial translation and it probably needs more work.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D101504
llvm-dev message: https://lists.llvm.org/pipermail/llvm-dev/2021-May/150465.html
In an ELF shared object, a default visibility defined symbol is preemptible by default.
This creates some missed optimization opportunities. -fno-semantic-interposition can optimize -fPIC:
* in Clang: avoid GOT/PLT cost for variable access/function calls to external linkage definition in the same TU
* in GCC: enable interprocedural optimizations (including inlining) and avoid PLT
See https://gist.github.com/MaskRay/2d4dfcfc897341163f734afb59f689c6 for more information.
-Bsymbolic-functions is more aggressive than -fvisibility-inlines-hidden (present since 2012) as it applies
to all function definitions. It can
* avoid PLT for cross-TU function calls && reduce dynamic symbol lookup
* reduce dynamic symbol lookup for taking function addresses and optimize out GOT/TOC on x86-64/ppc64
With both options, the libLLVM.so and libclang-cpp.so performance should
be closer to PIE binary linking against `libLLVM*.a` and `libclang*.a`
(In a -DLLVM_TARGETS_TO_BUILD=X86 build, the number of JUMP_SLOT decreases from 12716 to 1628, and the number of GLOB_DAT decreases from 1918 to 1313
The built clang with `-DLLVM_LINK_LLVM_DYLIB=on -DCLANG_LINK_CLANG_DYLIB=on` is significantly faster.
See the Linux kernel build result https://bugs.archlinux.org/task/70697
)
Some implication:
Interposing a subset of functions is no longer supported.
(This is fragile anyway and cannot really be supported. For Mach-O we don't use
`ld -interpose`, so interposition is not supported on Mach-O at all.)
Compiling a program which takes the address of any LLVM function with
`{gcc,clang} -fno-pic` and expects the address to equal to the address taken
from libLLVM.so or libclang-cpp.so is unsupported. I am fairly confident that
llvm-project shouldn't have different behaviors depending on such pointer
equality (as we've been using -fvisibility-inlines-hidden which applies to
inline functions for a long time), but if we accidentally do, users should be
aware that they should not make assumption on pointer equality in `-fno-pic`
mode.
Reviewed By: phosek
Differential Revision: https://reviews.llvm.org/D102090
The current static checker for linalg does not work on the decreasing
index cases well. So, this is to Update the current static bound checker
for linalg to cover decreasing index cases.
Reviewed By: hanchung
Differential Revision: https://reviews.llvm.org/D102302
On AVX1 targets we can handle v4i64 logical shifts by 32 bits as a pair of v8f32 shuffles with zero.
I was hoping to put this in LowerScalarImmediateShift, but performing that early causes regressions where other instructions were respliting the subvectors.
Extend the range of calls beyond an architecture's limited branch range by first calling a thunk, which loads the far address into a scratch register (x16 on ARM64) and branches through it.
Other ports (COFF, ELF) use multiple passes with successively-refined guesses regarding the expansion of text-space imposed by thunk-space overhead. This MachO algorithm places thunks during MergedOutputSection::finalize() in a single pass using exact thunk-space overheads. Thunks are kept in a separate vector to avoid the overhead of inserting into the `inputs` vector of `MergedOutputSection`.
FIXME:
* arm64-stubs.s test is broken
* add thunk tests
* Handle thunks to DylibSymbol in MergedOutputSection::finalize()
Differential Revision: https://reviews.llvm.org/D100818
[libomptarget][amdgpu][nfc] Expand errorcheck macros
These macros expand to continue, which is confusing, or exit,
which is incompatible with continuing execution on offloading fail.
Expanding the macros in place makes the code look untidy but the
control flow obvious and amenable to improving. In particular, exit
becomes easier to eliminate.
Reviewed By: pdhaliwal
Differential Revision: https://reviews.llvm.org/D102230
On z/OS, umask() returns an int because mode_t is type int, however it is being compared to an unsigned int. This patch fixes the following warning we see when compiling Path.cpp.
```
comparison of integers of different signs: 'const int' and 'const unsigned int'
```
Reviewed By: muiez
Differential Revision: https://reviews.llvm.org/D102326
This factors out the pass timing code into a separate `TimingManager`
that can be plugged into the `PassManager` from the outside. Users are
able to provide their own implementation of this manager, and use it to
time additional code paths outside of the pass manager. Also allows for
multiple `PassManager`s to run and contribute to a single timing report.
More specifically, moves most of the existing infrastructure in
`Pass/PassTiming.cpp` into a new `Support/Timing.cpp` file and adds a
public interface in `Support/Timing.h`. The `PassTiming` instrumentation
becomes a wrapper around the new timing infrastructure which adapts the
instrumentation callbacks to the new timers.
Reviewed By: rriddle, lattner
Differential Revision: https://reviews.llvm.org/D100647
This patch disables the SIFormMemoryClauses pass at -O1. This pass has a
significant impact on compilation time, so we only want it to be enabled
starting from -O2.
Differential Revision: https://reviews.llvm.org/D101939
This patch extends the vector type-conversion and legalization capabilities of
scalable vector types.
Firstly, `vscale x 1` types now behave more like the corresponding `vscale x
2+` types. This enables the integer promotion legalization of extended scalable
types, such as the promotion of `<vscale x 1 x i5>` to `<vscale x 1 x i8>`.
These `vscale x 1` types are also now better handled by
`getVectorTypeBreakdown`, where what looks like older handling for 1-element
fixed-length vector types was spuriously updated to include scalable types.
Widening of scalable types is now better supported, by using `INSERT_SUBVECTOR`
to insert the smaller scalable vector "value" type into the wider scalable
vector "part" type. This allows AArch64 to pass and return `vscale x 1` types
by value by widening.
There are still cases where we are unable to legalize `vscale x 1` types, such
as where expansion would require splitting the vector in two.
Reviewed By: sdesmalen
Differential Revision: https://reviews.llvm.org/D102073
Add a conversion pass to convert higher-level type before translation.
This conversion extract meangingful information and pack it into a struct that
the translation (D101504) will be able to understand.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D102170
This removed the pointless need for extension pragma since
it doesn't disable anything properly and it doesn't need to
enable anything that is not possible to disable.
The change doesn't break existing kernels since it allows to
compile more cases i.e. without pragma statements but the
pragma continues to be accepted.
Differential Revision: https://reviews.llvm.org/D100985
Much like other LLVM binary utilities, `llvm-cov` has a symlink compatibility feature where it runs in `gcov` compatibility mode if the binary name ends in `gcov`. This is identical to invoking `llvm-cov gcov ...`.
Differential Revision: https://reviews.llvm.org/D102299
Currently clang does not emit device template variables
instantiated only in host functions, however, nvcc is
able to do that:
https://godbolt.org/z/fneEfferY
This patch fixes this issue by refactoring and extending
the existing mechanism for emitting static device
var ODR-used by host only. Basically clang records
device variables ODR-used by host code and force
them to be emitted in device compilation. The existing
mechanism makes sure these device variables ODR-used
by host code are added to llvm.compiler-used, therefore
they are guaranteed not to be deleted.
It also fixes non-ODR-use of static device variable by host code
causing static device variable to be emitted and registered,
which should not.
Reviewed by: Artem Belevich
Differential Revision: https://reviews.llvm.org/D102237
getVectorNumElements() returns a value for scalable vectors
without any warning so it is effectively getVectorMinNumElements().
By renaming it and making getVectorNumElements() forward to
it, we can insert a check for scalable vectors into getVectorNumElements()
similar to EVT. I didn't do that in this patch because there are still more
fixes needed, but I was able to temporarily do it and passed the RISCV
lit tests with these changes.
The changes to isPow2VectorType and getPow2VectorType are copied from EVT.
The change to TypeInfer::EnforceSameNumElts reduces the size of AArch64's isel table.
We're now considering SameNumElts to require the scalable property to match which
removes some unneeded type checks.
This was motivated by the bug I fixed yesterday in 80b9510806
Reviewed By: frasercrmck, sdesmalen
Differential Revision: https://reviews.llvm.org/D102262
Fixes a bug in the DAG combiner that eliminates the stores because it missed
to inspect the address space of the pointers.
%v = load %ptr_as1
// no chain side effect
store %v, %ptr_as2
As well as
store %v, %ptr_as1
store %v, %ptr_as2
Fixes a test for above in X86.
Differential Revision: https://reviews.llvm.org/D102096
Adds a test in X86, exposing a bug in DAG combine eliminating stores that
are the same value but no the same address space.
Differential Revision: https://reviews.llvm.org/D102243
When a ptest is used to set flags from the output of rdffr, the ptest
can be eliminated, using a flags-setting rdffrs instead.
Additionally, check that nothing consumes flags between rdffr and ptest;
this case appears to have been missed previously.
* There is no unpredicated RDFFRS instruction.
* If substituting RDFFR_PP, require that the mask argument of the
PTEST matches that of the RDFFR_PP.
* Move some precondition code up inside optimizePTestInstr, so that it
covers the new code paths for RDFFR which return earlier.
* Only consider RDFFR, PTEST in same basic block.
* Check for other flag setting instructions between the two, abort if
found.
* Drop an old TODO comment about removing dead PTEST instructions.
RDFFR_P to follow in later patch.
Differential Revision: https://reviews.llvm.org/D101357
I've changed a test in each of these files:
Transforms/InstCombine/vec_demanded_elts.ll
Transforms/InstCombine/vec_demanded_elts-inseltpoison.ll
to use a variable GEP index instead of a constant value so that
we're testing the more general case.
The bug (PR50227, affecting COFF) that caused the revert in
6f5670a4c3 has been fixed in
382c505d9c now, so it should be safe
to reenable the pass for that target (and ELF).
In PR50227 it's also mentioned that the same pass seems to cause
problems on aarch64 on darwin, so leaving it disabled there for now.
`__mh_(execute|dylib|dylinker|bundle|preload|object)_header` are special symbols whose values hold the VMA of the Mach header to support introspection. They are attached to the first section in `__TEXT`, even though their addresses are outside `__TEXT`, and they do not refer to code.
It is normally harmless, but when the first section of `__TEXT` has no other symbols, `__mh_*_header` is considered by the disassembler when determing function boundaries. Since `__mh_*_header` refers to an address outside `__TEXT`, the boundary determination fails and disassembly quits.
Since `__TEXT,__text` normally has symbols, this bug is obscured. Experiments placing `__stubs` and `__stub_helper` first exposed the bug, since neither has symbols.
Differential Revision: https://reviews.llvm.org/D101786
Improve the code generation of build_vector.
Use the v_pack_b32_f16 instruction instead of
v_and_b32 + v_lshl_or_b32
Differential Revision: https://reviews.llvm.org/D98081
Patch by Julien Pagès!
We can not rely on (C+X)-->(X+C) already happening,
because we might not have visited that `add` yet.
The added testcase would get stuck in an endless combine loop.
MachineRegisterInfo caches the reserved register set that is computed by
by TargetRegisterInfo::getReservedRegs, so call into MRI to get the
reserved regs to avoid recomputing them.
In particular this speeds up AMDGPU's SIFormMemoryClauses pass because
AMDGPU has a particularly complicated reserved set that is expensive to
compute.
Differential Revision: https://reviews.llvm.org/D102318
There should be a follow up to this for changing the traversal mode, but some of the tests don't like that.
Reviewed By: steveire
Differential Revision: https://reviews.llvm.org/D101614