This is a preparation to look at possible performance improvements.
Reviewed By: #libc, ldionne
Differential Revision: https://reviews.llvm.org/D129421
In our implementation the year is always less than or equal to the
class' `max()`. It's unlikely this ever changes since changing the
year's range will be an ABI break. A static_assert is added as a
guard.
This was reported by @philnik.
Reviewed By: ldionne, #libc
Differential Revision: https://reviews.llvm.org/D129442
There were two assertions in DefaultArgStorage::setInherited previously.
It requires the DefaultArgument is either empty or an argument value. It
would crash if it has a pointer refers to the previous declaration or
contains a chain to the previous declaration.
But there are edge cases could hit them actually. One is
InheritDefaultArguments.cppm that I found recently. Another one is pr31469.cpp,
which was created fives years ago.
This patch tries to fix the two failures by handling full cases in
DefaultArgStorage::setInherited.
This is guaranteed to not introduce any breaking change since it lives
in the path we wouldn't touch before. And the added assertions for
sameness should keep the correctness.
Reviewed By: v.g.vassilev
Differential Revision: https://reviews.llvm.org/D128974
This is to finish the change started by D125816 , D126263 and D126577 (replace SANITIZER_MAC by SANITIZER_APPLE).
Dropping definition of SANITIZER_MAC completely, to remove any possible confusion.
Differential Revision: https://reviews.llvm.org/D129502
parameters.
The current implementation to judge the similarity of TypeConstraint in
ASTContext::isSameTemplateParameter is problematic, it couldn't handle
the following case:
```C++
template <__integer_like _Tp, C<_Tp> Sentinel>
constexpr _Tp operator()(_Tp &&__t, Sentinel &&last) const {
return __t;
}
```
When we see 2 such declarations from different modules, we would judge
their similarity by `ASTContext::isSame*` methods. But problems come for
the TypeConstraint. Originally, we would profile each argument one by
one. But it is not right. Since the profiling result of `_Tp` would
refer to two different template type declarations. So it would get
different results. It is right since the `_Tp` in different modules
refers to different declarations indeed. So the same declaration in
different modules would meet incorrect our-checking results.
It is not the thing we want. We want to know if the TypeConstraint have
the same expression.
Reviewer: vsapsai, ilya-biryukov
Differential Revision: https://reviews.llvm.org/D129068
The availability of LiveIntervals affects kill flags in the output, so
declare the use to avoid strange effects where the output of this pass
is different depending on what other passes are scheduled after it.
Differential Revision: https://reviews.llvm.org/D129555
Summary:
Previous assumptions held that the LTO stage would only have a single
output. This is incorrect when using thinLTO which outputs multiple
files. Additionally there were some bugs with how we hanlded input that
cause problems when performing thinLTO. This patch addresses these
issues.
The performance of Thin-LTO is currently pretty bad. But I am content to
leave it that way as long as it compiles.
When lowering llvm::stackprotect intrinsic, the SDAG implementation
checks useLoadStackGuardNode() to either create a LOAD_STACK_GUARD or use
the first argument of the intrinsic. This check is not present in the
IRTranslator, which results in always generating a LOAD_STACK_GUARD even
if the target does not support it.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D129505
Calling runtime TRANSPOSE requires a temporary array for the result,
and, sometimes, a temporary array for the argument. Lowering it inline
should provide faster code.
I added -opt-transpose control just for debugging purposes temporary.
I am going to make driver changes that will disable inline lowering
for -O0. For the time being I would like to enable it by default
to expose the code to more tests.
Differential Revision: https://reviews.llvm.org/D129497
Between issues such as
https://github.com/llvm/llvm-project/issues/56323, the fact that this
lowering (unlike the code in amdgpu-to-rocdl) does not correctly set
up bounds checks (and thus will cause page faults on reads that might
need to be padded instead), and that fixing these problems would,
essentially, involve replicating amdgpu-to-rocdl, remove
--vector-to-rocdl for being broken. In addition, the lowering does not
support many aspects of transfer_{read,write}, like supervectors, and
may not work correctly in their presence.
We (the MLIR-based convolution generator at AMD) do not use this
conversion pass, nor are we aware of any other clients.
Migration strategies:
- Use VectorToLLVM
- If buffer ops are particularly needed in your application, use
amdgpu.raw_buffer_{load,store}
A VectorToAMDGPU pass may be introduced in the future.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D129308
When building modules, override secondary outputs (dependency file,
dependency targets, serialized diagnostic file) in addition to the pcm
file path. This avoids inheriting per-TU command-line options that
cause non-determinism in the results (non-deterministic command-line for
the module build, non-determinism in which TU's .diag and .d files will
contain the module outputs). In clang-scan-deps we infer whether to
generate dependency or serialized diagnostic files based on an original
command-line. In a real build system this should be modeled explicitly.
Differential Revision: https://reviews.llvm.org/D129389
Existing implementation of structured op splitting creates several
affine.apply and affine.min operations in its subshape computation.
As these shapes are further used in data slice extraction, this may lead
to slice shapes being dynamic even when the original shapes and the
splitting point are static. This is particularly visible when splitting
is combined with further subsetting transformations such as tiling. Use
composition and folding more aggressively in splitting to avoid this.
In particular, introduce a `createComposedAffineMin` function that the
affine map used in "min" with the maps used by any `affine.apply` that
may be feeding the operands to the "min". This enables production of
more static shapes. Also introduce a `createComposedFoldedAffineApply`
function that combines the existing `createComposedAffineApply` with
in-place folding to propagate constants produced by zero-input affine
maps. Using these when splitting allows the subsequent canonicalizer
pass to recover static shapes for structured ops.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D129379
This was promised 5 years ago in https://reviews.llvm.org/D32796,
let's do it.
Both flags are still accepted. No behavior change except for which
form shows up in --help output and in dumps of internal state
(such as with RC_DEBUG_OPTIONS).
Differential Revision: https://reviews.llvm.org/D129226
This allows vectorizing linalg reductions without changing the operation
order. Therefore this produce a valid vectorization even if operations
are not associative.
Differential Revision: https://reviews.llvm.org/D129535
When calculating the cost of Instruction::Br in getInstructionCost
we query PredicatedBBsAfterVectorization to see if there is a
scalar predicated block. However, this meant that the decisions
being made for a given fixed-width VF were affecting the cost for a
scalable VF. As a result we were returning InstructionCost::Invalid
pointlessly for a scalable VF that should have a low cost. I
encountered this for some loops when enabling tail-folding for
scalable VFs.
Test added here:
Transforms/LoopVectorize/AArch64/sve-tail-folding-cost.ll
Differential Revision: https://reviews.llvm.org/D128272
Noticed while investigating the SystemZ regressions in D77804, prefer handling the knownbits analysis/simplification in the bitop nodes directly before falling back to SimplifyMultipleUseDemandedBits
The goal of this change is fixing most of compile time slowdown seen after a630ea3003 commit on lencod and sqlite3 benchmarks.
There are 3 improvements included in this patch:
1. In getNumOperands when possible get value directly from SmallNumOps.
2. Inline getLargePtr by moving its definition to header.
3. In TBAAStructTypeNode::getField get all operands once instead taking operands in loop one after one.
Differential Revision: https://reviews.llvm.org/D129468
The existing implementation of the TilingInterface for Linalg ops was not
modifying the `linalg.index` ops contained within other Linalg ops (they need
to be summed up with the values of respective tile loop induction variables),
which led to the interface-based tiling being incorrect for any Linalg op with
index semantics.
In the process, fix the function performing the index offsetting to use the
pattern rewriter API instead of RAUW as it is being called from patterns and
may mess up the internal state of the rewriter. Also rename the function to
clearly catch all uses.
Depends On D129365
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D129366
A recent commit introduced helper functions with semantically meaningful names
to populate the lists of memory effects in transform ops, use them whenever
possible.
Depends On D129287
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D129365
Introduce a structured transform op that emits IR computing the multi-tile
sizes with requested parameters (target size and divisor) for the given
structured op. The sizes may fold to arithmetic constant operations when the
shape is constant. These operations may then be used to call the existing
tiling transformation with a single non-zero dynamic size (i.e. perform
strip-mining) for each of the dimensions separately, thus achieving multi-size
tiling with optional loop interchange. A separate test exercises the entire
script.
Depends On D129217
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D129287
Introduce an off-by default `-Winvalid-utf8` warning
that detects invalid UTF-8 code units sequences in comments.
Invalid UTF-8 in other places is already diagnosed,
as that cannot appear in identifiers and other grammar constructs.
The warning is off by default as its likely to be somewhat disruptive
otherwise.
This warning allows clang to conform to the yet-to be approved WG21
"P2295R5 Support for UTF-8 as a portable source file encoding"
paper.
Reviewed By: aaron.ballman, #clang-language-wg
Differential Revision: https://reviews.llvm.org/D128059
Extend the definition of the Tile structured transform op to enable it
accepting handles to operations that produce tile sizes at runtime. This is
useful by itself and prepares for more advanced tiling strategies. Note that
the changes are relevant only to the transform dialect, the tiling
transformation itself already supports dynamic sizes.
Depends On D129216
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D129217
Normally we do not link the device libraries if the user passed
`nogpulib` we do this for the standard bitcode library. This behaviour
was not added when using the static library for LTO, causing it to
always be linked in. This patch fixes that.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D129534
Summary:
We use a static assert to make sure that someone doesn't change the size
of an argument struct without properly updating all the other logic.
This originally only checked the size on a 64-bit system with 8-byte
pointers, causing builds on 32-bit systems to fail. This patch allows
either pointer size to work.
Fixes#56486
This patch fixes NativePDB/local-variables.cpp test for AArch64 Windows.
There are two changes:
1) Replace function breakpoint with line breakpoint required due to pr56288
2) Adjust "target modules dump ast" test as the output was slightly different
on AArch64/Windows.
When performing a !nonnull load from uninitialized memory, we
should preserve the nonnull assume just like in all other cases.
We already do this correctly in the generic mem2reg code, but
don't handle this case when using the optimized single-block
implementation.
Make sure that the optimized implementation exhibits the same
behavior as the generic implementation.
C++20 deprecated ATOMIC_FLAG_INIT thinking it was deprecated in C when it
wasn't. It is expected to be undeprecated in C++23 as part of LWG3659
(https://wg21.link/LWG3659), which is currently Tentatively Ready.
This handles the case where the user includes <stdatomic.h> in C++ code in a
freestanding compile mode. The corollary libc++ changes are in
1544d1f9fd.
commit: 51d3f421f4
failed due to missing python-pip om machine.
Now the ompd gdb-plugin code will be skipped with a warning
if pip is not available in the machine.
This commit adds SBSection.GetAlignment(), and SBSection.alignment as a python property to lldb.
Reviewed By: clayborg, JDevlieghere, labath
Differential Revision: https://reviews.llvm.org/D128069
This commit adds SBSection.GetAlignment(), and SBSection.alignment as a python property to lldb.
Reviewed By: clayborg, JDevlieghere, labath
Differential Revision: https://reviews.llvm.org/D128069
- create the headers (but not include them from `<algorithm>`);
- define the niebloid and its member functions with the right signatures
(as no-ops);
- make sure all the right headers are included that are required by each
algorithm's signature;
- update `CMakeLists.txt` and the module map;
- create the test files with the appropriate synopses.
The synopsis in `<algorithm>` is deliberately not updated because that
could be taken as a readiness signal. The new headers aren't included
from `<algorithm>` for the same reason.
Differential Revision: https://reviews.llvm.org/D129549
InlineAsm constraint string verification can fail for many reasons,
but used to always print a generic "invalid type for inline asm
constraint string" message -- which is especially confusing if
the actual error is unrelated to the type, e.g. a failure to parse
the constraint string.
Change the verify API to return an Error with a more specific
error message, and print that in the IR parser.
This member variable was removed a while ago in
443427357f. It was previously used in
materialization code paths that have since been removed. Nowadays,
`m_object_pointer_type` gets set but not used anywhere.
This patch simply removes all remaining instances of it and any
supporting code.
**Testing**
* API tests pass
Differential Revision: https://reviews.llvm.org/D129367