- Currently, the "." (Dot) character, when not identifying an Identifier or a Constant, refers to the current PC (Program Counter)
- However, in z/OS, for the HLASM dialect, it strictly accepts only the "*" as the current PC (Support for this will be put up in a follow-up patch)
- The changes in this patch allow individual platforms to choose whether they would like to use the "." (Dot) character as a marker for the current PC or not.
- It is achieved by introducing a new field in MCAsmInfo.h called `DotIsPC` (similar to `DollarIsPC`)
Reviewed By: abhina.sreeskantharajan
Differential Revision: https://reviews.llvm.org/D100975
GNU ld -r can create .rela.eh_frame with unordered r_offset values.
(With LLD, we can craft such a case by reordering sections in .eh_frame.)
This is currently unsupported and will trigger
`assert(pieces[i].inputOff <= off ...` in `OffsetGetter::get`
(the content is corrupted in a -DLLVM_ENABLE_ASSERTIONS=off build).
This patch supports this case.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D101116
This replaces D98479.
This allows type legalization to form SPLAT_VECTOR_PARTS so we don't
lose the splattedness when the scalar type is split.
I'm handling SPLAT_VECTOR_PARTS for fixed vectors separately so
we can continue using non-VL nodes for scalable vectors.
I limited to RV32+vXi64 because DAGCombiner::visitBUILD_VECTOR likes
to form SPLAT_VECTOR before seeing if it can replace the BUILD_VECTOR
with other operations. Especially interesting is a splat BUILD_VECTOR of
the extract_vector_elt which can become a splat shuffle, but won't if
we form SPLAT_VECTOR first. We either need to reorder visitBUILD_VECTOR
or add visitSPLAT_VECTOR.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D100803
This seems like a reasonable upper bound on VL. WG discussions for
the V spec would probably allow us to use 2^16 as an upper bound
on VLEN, but this is good enough for now.
This allows us to remove sext and zext if user happens to assign
the size_t result into an int and then uses it as a VL intrinsic
argument which is size_t.
Reviewed By: frasercrmck, rogfer01, arcbbb
Differential Revision: https://reviews.llvm.org/D101472
This patch is child of D89671, contains the clang
implementation to use the OpenMP IRBuilder's section
construct.
Co-author: @anchu-rajendran
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D91054
This patch adds section support in the OpenMP IRBuilder module, along with a test for the same.
Reviewed By: fghanim
Differential Revision: https://reviews.llvm.org/D89671
This extension is primarily targeting SPIR-V compilations flow
as the IR translation is the same between 1.x and 2.x atomics.
Differential Revision: https://reviews.llvm.org/D101089
As suggested in D99294, this adds a getVPSingleValue helper to use for
recipes that are guaranteed to define a single value. This replaces uses
of getVPValue() which used to default to I = 0.
At the intrinsic layer the sve.insr operation takes a scalar. When this
scalar is an integer we are forcing a data transition between GPRs and
ZPRs that is potentially costly.
Often the integer scalar is the result of a vector extract, when
performing a reduction for example. In such cases we should keep all
data within the ZPRs.
Co-authored-by: Paul Walker <paul.walker@arm.com>
Differential Revision: https://reviews.llvm.org/D101169
By converting the SVE intrinsic to a normal LLVM insertelement we give
the code generator a better chance to remove transitions between GPRs
and VPRs
Co-authored-by: Paul Walker <paul.walker@arm.com>
Depends on D101302
Differential Revision: https://reviews.llvm.org/D101167
As part of this the ptrue coalescing done in SVEIntrinsicOpts has been
modified to not introduce redundant converts, since the convert removal
will no longer run after that optimisation to clean up.
Differential Revision: https://reviews.llvm.org/D101302
This enables to express more complex parallel loops in the affine framework,
for example, in cases of tiling by sizes not dividing loop trip counts perfectly
or inner wavefront parallelism, among others. One can't use affine.max/min
and supply values to the nested loop bounds since the results of such
affine.max/min operations aren't valid symbols. Making them valid symbols
isn't an option since they would introduce selection trees into memref
subscript arithmetic as an unintended and undesired consequence. Also
add support for converting such loops to SCF. Drop some API that isn't used in
the core repo from AffineParallelOp since its semantics becomes ambiguous in
presence of max/min bounds. Loop normalization is currently unavailable for
such loops.
Depends On D101171
Reviewed By: bondhugula
Differential Revision: https://reviews.llvm.org/D101172
Introduce a basic support for parallelizing affine loops with reductions
expressed using iteration arguments. Affine parallelism detector now has a flag
to assume such reductions are parallel. The transformation handles a subset of
parallel reductions that are can be expressed using affine.parallel:
integer/float addition and multiplication. This requires to detect the
reduction operation since affine.parallel only supports a fixed set of
reduction operators.
Reviewed By: chelini, kumasento, bondhugula
Differential Revision: https://reviews.llvm.org/D101171
This libstc++ hack isn't ready for removal. Updating the comment to
note what I found. While I have not proven Ville's
__is_throw_swappable patch made this go away, that patch did remove
the use of noexcept(noexcept(swap(....))). I'm not sure when gcc grew
deferred noexcept parsing.
Differential Revision: https://reviews.llvm.org/D101441
This allows calling buildSpillLoadStore for an empty basic block, where
MI points at the end of the block instead of to an instruction.
This only happens with downstream CFI changes, so I was not able to
create a testcase that works with upstream LLVM.
Differential Revision: https://reviews.llvm.org/D101356
Otherwise the CMP glue may be used in multiple nodes, needing to be
emitted multiple times. Currently this either increases instruction
count or fails as it attempt to insert the same node multiple times.
FillOp allows complex ops, and filling a properly sized buffer with
a default zero complex number is implemented.
Differential Revision: https://reviews.llvm.org/D99939
This will allow the bindings to be built as a library and reused in out-of-tree
projects that want to provide bindings on top of MLIR bindings.
Reviewed By: stellaraccident, mikeurbach
Differential Revision: https://reviews.llvm.org/D101075
verifyFunctionAttrs has a comment that the value V is printed in error messages. The recently added errors for attributes didn't print V. Make them print V.
Change the stringification of AttributeList. Firstly they started with 'PAL[' which stood for ParamAttrsList. Change that to 'AttributeList[' matching its current name AttributeList. Print out semantic meaning of the index instead of the raw index value (i.e. 'return', 'function' or 'arg(n)').
Differential revision: https://reviews.llvm.org/D101484
This patch enables support on SPE for constrained arithmetic and
comparison operations. This fixes bugzilla 50070.
One thing not covered is fcmp vs. fcmps on SPE. Some condition code
generates singaling comparison while some not. In this patch, all are
considered as singaling. So there might be still some issue when
compiling from C code.
Reviewed By: jhibbits
Differential Revision: https://reviews.llvm.org/D101282
AArch64 kernel builds default to having /smaps and
the "VmFlags" line was added in 3.8. Long before MTE
was supported.
So we can assume that if you're AArch64 with MTE,
you can run this test.
The previous method of checking had a race condition
where the process we read smaps for, could finish before
we get to read the file.
I explored some alternatives but in the end I think
it's fine to just assume we have what we need.
Reviewed By: omjavaid
Differential Revision: https://reviews.llvm.org/D100493
This is an complementary/alternative fix for D99068. It takes a slightly
different approach by explicitly summing up all of the required split
part type sizes and ensuring we allocate enough space for them. It also
takes the maximum alignment of each part.
Compared with D99068 there are fewer changes to the stack objects in
existing tests. However, @luismarques has shown in that patch that there
are opportunities to reduce our stack usage in the future.
Reviewed By: luismarques
Differential Revision: https://reviews.llvm.org/D99087
This fixes another bogus build error on gcc, e.g. https://lab.llvm.org/buildbot/#/builders/110/builds/2974.
/home/ssglocal/clang-cmake-x86_64-avx2-linux/clang-cmake-x86_64-avx2-linux/llvm/clang/lib/Format/TokenAnnotator.cpp:3412:34: error: binding ‘const clang::format::FormatStyle’ to reference of type ‘clang::format::FormatStyle&’ discards qualifiers
auto ShouldAddSpacesInAngles = [&Style = this->Style,
^
We spend some time during sqlite3 compilation regrowing this vector,
bump it up to avoid this.
Gives around 1-2% improvement in codegen-only time for sqlite3 at -O0.
This fixes a bogus build error on gcc, e.g. https://lab.llvm.org/buildbot/#/builders/110/builds/2973.
/home/ssglocal/clang-cmake-x86_64-avx2-linux/clang-cmake-x86_64-avx2-linux/llvm/clang/lib/Format/TokenAnnotator.cpp:3097:53: error: binding ‘const clang::SourceRange’ to reference of type ‘clang::SourceRange&’ discards qualifiers
auto HasExistingWhitespace = [&Whitespace = Right.WhitespaceRange]() {
^
This revision adds support for vectorizing more general linalg operations with projected permutation maps.
This is achieved by eagerly broadcasting the intermediate vector to the common size
of the iteration domain of the linalg op. This allows a much more natural expression of
generalized vectorization but may introduce additional computations until all the
proper canonicalizations are implemented.
This generalization modifies the vector.transfer_read/write permutation logic and
exposes the fact that the logic employed in vector.contract was too ad-hoc.
As a consequence, changes occur in the permutation / transposition logic for contraction. In turn this prompts supporting more cases in the lowering of contract
to matrix intrinsics, which is required to make the corresponding tests pass.
Differential revision: https://reviews.llvm.org/D101165
As X32 uses 32-bit pointers without having 32-bit indirect branch
instructions, we need to fix up indirect branches by extending the
branch targets to 64 bits. This was already done for BRIND but not yet
for NT_BRIND. The same logic works for both, so this applies that
existing logic to NT_BRIND as well.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D101499
The patch extends the OpDSL with support for:
- Constant values
- Capture scalar parameters
- Access the iteration indices using the index operation
- Provide predefined floating point and integer types.
Up to now the patch only supports emitting the new nodes. The C++/yaml path is not fully implemented. The fill_rng_2d operation defined in emit_structured_generic.py makes use of the new DSL constructs.
Differential Revision: https://reviews.llvm.org/D101364
A need for such an option came up in a few libc++ reviews. That's because libc++ has both code in C++03 and newer standards.
Currently, it uses `Standard: C++03` setting for clang-format, but this breaks e.g. u8"string" literals.
Also, angle brackets are the only place where C++03-specific formatting needs to be applied.
Reviewed By: MyDeveloperDay, HazardyKnusperkeks
Differential Revision: https://reviews.llvm.org/D101344
This is a compile time optimization. DILocation:get() is expensive to call, and
we were calling it to create a line zero debug loc for *every* instruction we
translated. We only really need to do this just before we build constants in the
entry block, so I moved this code there. This reduces the LLVM -O0 codegen time
of sqlite3 IR by around 0.7% instructions executed and by about ~2% in CPU time.
We can probably do better with a more involved change, since the reason we need
to create one for each new constant is because we're using the debug scope and
inlined-at loc. If we just use a single instruction's scope and drop the
inlined-at, we can just cache these and have them be free.
The ARMConstantIsland pass will convert any t2B to tB if they are within
range after it has added or moved any constant pools. They don't need to
be deliberately converted beforehand, and it doesn't deal with needing
to convert tB to t2B very well.
Fix format specifier.
Fix warnings about non-standard attribute placement.
Make free_race2.c test a bit more interesting:
test access with/without an offset.
Reviewed By: vitalybuka
Differential Revision: https://reviews.llvm.org/D101424
Pointers in non-zero address spaces need to be address space
casted before appending to the used list.
Reviewed by: vitalybuka
Differential Revision: https://reviews.llvm.org/D101363