It is better to set all barrier patterns to use "dist" when at least
one environment variable specifies "dist". Otherwise if only one
environment is set to "dist" and others left blank inadvertently,
it would result in mixing dist barrier with default hyper barrier
pattern.
Differential Revision: https://reviews.llvm.org/D112597
This patch extends the `FIRToLLVMLowering` pass in Flang by adding a
hook to transform `fir.call` to `llvm.call`.
This is part of the upstreaming effort from the `fir-dev` branch in [1].
[1] https://github.com/flang-compiler/f18-llvm-project
Patch originally written by:
Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>
Co-authored-by: V Donaldson <vdonaldson@nvidia.com>
Differential Revision: https://reviews.llvm.org/D113278
VE integrated asm has been the default in Clang. Also use the default setting for integrated asm in the backend.
Reviewed By: kaz7
Differential Revision: https://reviews.llvm.org/D113384
The revision updates the packing loop search in hoist padding. Instead of considering all loops in the backward slice, we now compute a separate backward slice containing the index computations only. This modification ensures we do not add packing loops that are not used to index the packed buffer due to spurious dependencies. One instance where such spurious dependencies can appear is the extract slice operation introduced between the tile loops of a double tiling.
Depends On D112412
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D112713
We already have patterns for fptosi and fptoui plus fmul to fixed point
convert, this adds equivalent patterns for fptosi.sat and fptoui.sat,
which should apply equally well for the legal saturating variants.
Differential Revision: https://reviews.llvm.org/D113199
If the source has an addendum, the descriptor that is being established
to describe a section over the source needs to copy the addendum so that
derived type information is correctly set in the descriptor being
established.
This allows namelist IO with derived type to work correctly.
Differential Revision: https://reviews.llvm.org/D113258
At the moment in LoopVectorizationCostModel::selectEpilogueVectorizationFactor
we bail out if the main vector loop uses a scalable VF. This patch adds
support for generating epilogue vector loops using a fixed-width VF when the
main vector loop uses a scalable VF.
I've changed LoopVectorizationCostModel::selectEpilogueVectorizationFactor
so that we convert the scalable VF into a fixed-width VF and do profitability
checks on that instead. In addition, since the scalable and fixed-width VFs
live in different VPlans that means I had to change the calls to
LVP.hasPlanWithVFs so that we only pass in the fixed-width VF.
New tests added here:
Transforms/LoopVectorize/AArch64/sve-epilog-vect.ll
Differential Revision: https://reviews.llvm.org/D109432
Performing the rearrangement for add/sub and mul instructions to match the madd/msub pattern
Reviewed By: dmgreen, sdesmalen, david-arm
Differential Revision: https://reviews.llvm.org/D111862
`__vector_base` exists for historical reasons and cannot be eliminated
entirely without breaking the ABI. Member variables are left
untouched -- this patch only does changes that clearly cannot affect the
ABI.
Differential Revision: https://reviews.llvm.org/D112976
Add a separate file to test FIR types conversion to LLVM types.
Conversion comes from `flang/lib/Optimizer/CodeGen/TypeConverter.h`
This patch is part of the upstreaming effort from fir-dev branch.
Reviewed By: kiranchandramohan, awarzynski
Differential Revision: https://reviews.llvm.org/D113283
However, whether applications rely on the std::bad_function_call vtable
being in the dylib is still controlled by the ABI macro, since changing
that would be an ABI break.
Differential Revision: https://reviews.llvm.org/D92397
Add comments to explain why XXPERMDIs and XXPERMDI have different input register
classes, vsfrc for XXPERMDIs and vsrc for XXPERMDI.
This addresses the comments in abandoned patch D113178, we keep using `f0` instead
of using `vs0` for XXPERMDIs on purpose.
CSKY is a ARCH which supports mixture of 16-bit and 32-bit instructions natively,
and there is not an indivual predictor or feature to enable/disable 16-bit instruction.
So I think it's better to add 16-bit instruction early, and naturally to use 16-bit and 32-bit instructions.
Differential Revision: https://reviews.llvm.org/D112919
For v8i16 shuffle patterns that are lowered with AND+PACKUS, check to see if the sources are from a 256-bit vector and perform the masking using BLENDW at the 256-bit level.
With the test changes we can see more examples of duplicate XMM/YMM zero vectors (PR26018) :(
This patch add the conversion pattern for fir.extract_value
and fir.insert_value. fir.extract_value is lowered to llvm.extractvalue
anf fir.insert_value is lowered to llvm.insertvalue.
This patch also adds the type conversion for the BoxType and RecordType
needed to have some comprehensive tests.
This patch is part of the upstreaming effort from fir-dev branch.
This patch was landed and reverted once.
TypeBuilderFunc getModel<Fortran::ISO::CFI_index_t>() was clashing
with getModel<long long> on windows since they both are 64 bits
signed interger. On linux CFI_index_t is long. Change CFI_index_t
to getModel<long>.
Reviewed By: awarzynski
Differential Revision: https://reviews.llvm.org/D112961
Co-authored-by: Jean Perier <jperier@nvidia.com>
Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>
For some optimizations on comparisons it's necessary that the
union/intersect is exact and not a superset. Add methods that
return Optional<ConstantRange> only if the result is exact.
For the sake of simplicity this is implemented by comparing
the subset and superset approximations for now, but it should be
possible to do this more directly, as unionWith() and intersectWith()
already distinguish the cases where the result is imprecise for the
preferred range type functionality.
This rewrites the fcvt-fixed.ll test case to be separate functions, not
one large function with volatile global stores. It also adds fp16 and
fptoi.sat testing at the same time.
When accumulating the GEP offset in BasicAA, we should use the
pointer index size rather than the pointer size.
Differential Revision: https://reviews.llvm.org/D112370
D98452 introduced a mismatch between clang expectations for
builtin name for baremetal targets on arm. Fix it by
adding a case for baremetal. This now matches the output of
"clang -target armv7m-none-eabi -print-libgcc-file-name \
-rtlib=compiler-rt"
Reviewed By: mstorsjo
Differential Revision: https://reviews.llvm.org/D113357
Make test_allocator etc. constexpr-friendly so they can be used to test constexpr string and possibly constexpr vector
Reviewed By: Quuxplusone, #libc, ldionne
Differential Revision: https://reviews.llvm.org/D110994
Be more consistent in the naming convention for the various RET instructions to specify in terms of bitwidth.
Helps prevent future scheduler model mismatches like those that were only addressed in D44687.
Differential Revision: https://reviews.llvm.org/D113302
Currently FoldConstantArithmetic only handles binops, so replacing other uses of FoldConstantVectorArithmetic (in particular for SETCC nodes), still require more work.
The LLVM-C API is relatively small so we've previously added doxygen tags
so it's easier to navigate the LLVM-C web docs. Over the years, more
headers were added without proper doxygen tags, effectively hiding them
from the main LLVM-C doxygen page. This patch adds comments to headers
which did not have them.
Reviewed By: lhames
Differential Revision: https://reviews.llvm.org/D112474
https://sourceware.org/bugzilla/show_bug.cgi?id=22742
uc_mcontext.__reserved probably should not be considered user visible API but
unfortunate it is: it is the only way to access cpu states of some Linux
asm/sigcontext.h extensions. That said, the declaration may be
long double __reserved[256]; (used by musl)
instead of
unsigned char __reserved[4096] __attribute__((__aligned__(16))); (glibc)
to avoid dependency on a GNU variable attribute.
GCC introduced `__attribute__((mode(unwind_word)))` to work around
Cell Broadband Engine SPU (which was removed from GCC in 2019-09),
which is irrelevant to hwasan.
_Unwind_GetGR/_Unwind_GetCFA from llvm-project/libunwind don't use unwind_word.
Using _Unwind_Word can lead to build failures if libunwind's unwind.h is
preferred over unwind.h in the Clang resource directory (e.g. built with GCC).
Nathan Chancellor reported a crash due to commit
3466e00716 (Reland "[Attr] support btf_type_tag attribute").
The following test can reproduce the crash:
$ cat efi.i
typedef unsigned long efi_query_variable_info_t(int);
typedef struct {
struct {
efi_query_variable_info_t __attribute__((regparm(0))) * query_variable_info;
};
} efi_runtime_services_t;
efi_runtime_services_t efi_0;
$ clang -m32 -O2 -g -c -o /dev/null efi.i
The reason is that FunctionTypeLoc.getParam(Idx) may return a
nullptr which should be checked before dereferencing the
result pointer. This patch fixed this issue.