Adds aarch64-sve-vector-bits-{min,max} to allow the size of SVE
data registers (in bits) to be specified. This allows the code
generator to make assumptions it normally couldn't. As a starting
point this information is used to mark fixed length vector types
that can fit within the specified size as legal.
Reviewers: rengolin, efriedma
Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80384
For a loop, a join block is a block that is reachable along multiple
disjoint paths from the exiting block of a loop. If the exit condition
of the loop is divergent, then such join blocks must also be marked
divergent. This currently fails in some cases because not all join
blocks are identified correctly.
The workaround is to conservatively mark every join block of any
branch (not necessarily the exiting block of a loop) as divergent.
https://bugs.llvm.org/show_bug.cgi?id=46372
Reviewed By: simoll
Differential Revision: https://reviews.llvm.org/D81806
Summary:
This is to provide a utility to remove unsupported constraints or for
pipelines that happen to receive these but cannot lower them due to not
supporting assertions.
Differential Revision: https://reviews.llvm.org/D81560
Currently the matrix lowering turns volatile loads/stores into
non-volatile ones. This patch updates the lowering to preserve the
volatile bit.
Reviewers: anemet, Gerolf, hfinkel, andrew.w.kaylor, LuoYuanke, nicolasvasilache
Reviewed By: anemet
Differential Revision: https://reviews.llvm.org/D81498
This patch add __builtin_matrix_column_major_store to Clang,
as described in clang/docs/MatrixTypes.rst. In the initial version,
the stride is not optional yet.
Reviewers: rjmccall, jfb, rsmith, Bigcheese
Reviewed By: rjmccall
Differential Revision: https://reviews.llvm.org/D72782
For example:
svint32_t svget4(svint32x4_t tuple, uint64_t imm_index)
returns the subvector at `index`, which must be in range `0..3`.
svint32x3_t svset3(svint32x3_t tuple, uint64_t index, svint32_t vec)
returns a tuple vector with `vec` inserted into `tuple` at `index`,
which must be in range `0..2`.
Reviewers: c-rhodes, efriedma
Reviewed By: c-rhodes
Tags: #clang
Differential Revision: https://reviews.llvm.org/D81464
We're missing a plain English explanation of how this pass is supposed
to operate -- add one to the file comment.
Differential Revision: https://reviews.llvm.org/D80929
This patch add __builtin_matrix_column_major_load to Clang,
as described in clang/docs/MatrixTypes.rst. In the initial version,
the stride is not optional yet.
Reviewers: rjmccall, rsmith, jfb, Bigcheese
Reviewed By: rjmccall
Differential Revision: https://reviews.llvm.org/D72781
The ScopedBuilder class in EDSC is being gradually phased out in favor of core
OpBuilder-based helpers with callbacks. Provide helper functions that are
compatible with `edsc::ScopedContext` and can be used to create and populate
blocks using callbacks that take block arguments as callback arguments. This
removes the need for `edsc::BlockHandle`, forward-declaration of `Value`s used
for block arguments and the tag `edsc::Append` class, leading to noticable
reduction in the verbosity of the code using helper functions.
Remove "eager mode" construction tests that are only relevant to the
`BlockBuilder`-based approach.
`edsc::BlockHandle` and `edsc::BlockBuilder` are now deprecated and will be
removed soon.
Differential Revision: https://reviews.llvm.org/D82008
This needed two fixes:
* 32-bit instructions were read in the wrong order. The machine code
swaps the two 16-bit instruction words, which wasn't undone when
decoding instructions.
* Jump and call instructions don't encode the lowest address bit,
which is always zero. Therefore, the address needed to be shifted by
one to fix that.
Differential Revision: https://reviews.llvm.org/D81961
Added NextPowerOf2() routine to TypeSize and rewritten the code
in getVectorTypeBreakdown to avoid warnings being generated.
Differential Revision: https://reviews.llvm.org/D81578
This patch adjust the load/store matrix intrinsics, formerly known as
llvm.matrix.columnwise.load/store, to improve the naming and allow
passing of extra information (volatile).
The patch performs the following changes:
* Rename columnwise.load/store to column.major.load/store. This is more
expressive and also more in line with the naming in Clang.
* Changes the stride arguments from i32 to i64. The stride can be
larger than i32 and this makes things more uniform with the way
things are handled in Clang.
* A new boolean argument is added to indicate whether the load/store
is volatile. The lowering respects that when emitting vector
load/store instructions
* MatrixBuilder is updated to require both Alignment and IsVolatile
arguments, which are passed through to the generated intrinsic. The
alignment is set using the `align` attribute.
The changes are grouped together in a single patch, to have a single
commit that breaks the compatibility. We probably should be fine with
updating the intrinsics, as we did not yet officially support them in
the last stable release. If there are any concerns, we can add
auto-upgrade rules for the columnwise intrinsics though.
Reviewers: anemet, Gerolf, hfinkel, andrew.w.kaylor, LuoYuanke, nicolasvasilache, rjmccall, ftynse
Reviewed By: anemet, nicolasvasilache
Differential Revision: https://reviews.llvm.org/D81472
Instead of asserting the number of elements is the same, we should be
comparing the element counts instead. In addition, when looking at
concats of extract_subvectors it's fine to use getVectorMinNumElements()
for scalable vectors.
I discovered these warnings when compiling the structured loads tests in
this file:
test/CodeGen/AArch64/sve-intrinsics-loads.ll
Differential Revision: https://reviews.llvm.org/D81936
Setup declarative rewrite rules to lower the `shape` dialect to the `std`
dialect with two exemplary rules for `from/to_extent_tensor`.
Differential Revision: https://reviews.llvm.org/D82022
The rearranges PerformANDCombine and PerformORCombine to try and make
sure we don't call isConstantSplat on any i1 vectors. As pointed out in
D81860 it may not be very well defined in those cases.
In EVT::getVectorElementCount() when the type is not simple we
should return getExtendedVectorElementCount() from the function
instead of constructing the ElementCount object manually.
I discovered this warning in an existing test:
test/CodeGen/AArch64/sve-intrinsics-loads.ll
Differential Revision: https://reviews.llvm.org/D81927
There are now quite a few SVE tests in LLVM and Clang that do not
emit warnings related to invalid use of EVT::getVectorNumElements()
and VectorType::getNumElements(). For these tests I have added
additional checks that there are no warnings in order to prevent
any future regressions.
Differential Revision: https://reviews.llvm.org/D80712
Summary:
This patch changes speficic extremum functions rewrite to generic MIN/MAX.
It applies to AMAX0, AMIN0, AMAX1, AMIN1, MAX0, MIN0, MAX1, MIN1, DMAX1,
and DMIN1.
- Do not re-write specific extremums to MAX/MIN in intrinsic Probe and let
folding rewrite it and introduc the conversion on the MIN/MAX result.
- Also make operand promotion explicit in MIN/MAX folding.
For instance, after this patch:
AMAX0(int8, int4) is rewritten to REAL(MAX(int8, INT(int4, 8)))
All this care is to avoid rewritting it to MAX(REAL(int8), REAL(int4))
that may not always be numerically equivalent to the first rewrite.
Reviewers: klausler, schweitz, sscalpone, jdoerfert, DavidTruby
Reviewed By: klausler, schweitz
Subscribers: llvm-commits, flang-commits
Tags: #flang, #llvm
Differential Revision: https://reviews.llvm.org/D81940
This also enables running the AArch64 SLSHardening pass with GlobalISel,
so add a test for that.
Differential Revision: https://reviews.llvm.org/D81403
The enum values for AArch64 registers are not all consecutive.
Therefore, the computation
"__llvm_slsblr_thunk_x" + utostr(Reg - AArch64::X0)
is not always correct. utostr(Reg - AArch64::X0) will not generate the
expected string for the registers that do not have consecutive values in
the enum.
This happened to work for most registers, but does not for AArch64::FP
(i.e. register X29).
This can get triggered when the X29 is not used as a frame pointer.
Differential Revision: https://reviews.llvm.org/D81997
- This will allow calling these functions from Op's that support this interface (like FuncOp) directly:
```
FuncOp func = ...
func.isPrivate()
```
Differential Revision: https://reviews.llvm.org/D82060
Summary:
llvm-mc emits `__bss` sections with an offset of zero, but we weren't expecting
that in our input, so we were copying non-zero data from the start of the file and
putting it in `__bss`, with obviously undesirable runtime results. (It appears that
the kernel will copy those nonzero bytes as long as the offset is nonzero, regardless
of whether S_ZERO_FILL is set.)
I debated on whether to make a special ZeroFillSection -- separate from a
regular InputSection -- but it seemed like too much work for now. But I'm happy
to refactor if anyone feels strongly about having it as a separate class.
Depends on D80857.
Reviewers: ruiu, pcc, MaskRay, smeenai, alexshap, gkm, Ktwu, christylee
Reviewed By: smeenai
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80859
Summary:
Turns out this case is actually really common -- it happens whenever there's
a reference to an `extern` variable that ends up statically linked.
Depends on D80856.
Reviewers: ruiu, pcc, MaskRay, smeenai, alexshap, gkm, Ktwu, christylee
Reviewed By: smeenai
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80857
Summary:
As far as I can tell, it's identical to _GOT_LOAD. llvm-mc has the following
comment explaining why _GOT exists:
```
// x86_64 distinguishes movq foo@GOTPCREL so that the linker can
// rewrite the movq to an leaq at link time if the symbol ends up in
// the same linkage unit.
```
Depends on D80855.
Reviewers: ruiu, pcc, MaskRay, smeenai, alexshap, gkm, Ktwu, christylee
Reviewed By: MaskRay, smeenai
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80856
Summary:
As mentioned in https://reviews.llvm.org/D81326#2093931, I'm not sure it
makes sense to use the default target triple to determine -arch.
Long-term we should probably detect it from the input object files, but
in the meantime it would be nice not to have to add it to all our tests
by using a convenient default.
Reviewers: #lld-macho
Subscribers: arphaman, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D81983
This is fixing warning from clang:
warning: private field 'ModuleSlice' is not used [-Wunused-private-field]
SmallPtrSetImpl<Function *> &ModuleSlice;
^
Differential Revision: https://reviews.llvm.org/D82027
Summary:
For PPC BinaryOperator of fp128 will become libcall, we shouldn't
convert loop to CTR loop if the loop contain libCall.
But currently, in the PPCTTIImpl::mightUseCTR() function, we only deal
with BinaryOperator for ppc_fp128, don't deal with the fp128.
Reviewed By: shchenz
Differential Revision: https://reviews.llvm.org/D81353
This patch enables yaml2elf emit the .debug_abbrev section.
The generated .debug_abbrev is verified using `llvm-dwarfdump`.
Known issues that will be addressed later:
- Current implementation doesn't support generating multiple abbreviation tables in one .debug_abbrev section.
Reviewed By: jhenderson, grimar
Differential Revision: https://reviews.llvm.org/D81820
Summary: A bug is reported in bugzilla-45628, where the swap_with_shift case can’t be matched to a single HW instruction xxswapd as expected.
In fact the case matches the idiom of rotate. We have MatchRotate to handle an ‘or’ of two operands and generate a rot[lr] if the case matches the idiom of rotate. While PPC doesn’t support ROTL v1i128. We can custom lower ROTL v1i128 to the vector_shuffle. The vector_shuffle will be matched to a single HW instruction during the phase of instruction selection.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D81076
A hasWildcard pattern iterates over symVector, which can be slow when there
are many --export-dynamic-symbol. In optimistic cases, most patterns don't use
a wildcard character. hasWildcard: false can avoid a symbol table iteration.
While here, add two tests using `[` and `?`, respectively.