The zip files were too large to be practical, so they were never
shipped. Reverting to reduce build time and complexity of the script.
This reverts commit 4486aa03c5.
Test that chunk size is passed to the static init function.
Using three different variations:
1. Single constant.
2. Expression with constants.
3. Variable value.
Reviewed By: peixin, shraiysh
Differential Revision: https://reviews.llvm.org/D126383
VP intrinsics show UB if the %evl parameter is out of bounds - they must
not carry the speculatable attribute. The out-of-bounds UB disappears
when the %evl parameter is expanded into the mask or expansion replaces
the entire VP intrinsic with non-VP code.
This patch
- Removes the speculatable attribute on all VP intrinsics.
- Generalizes the isSafeToSpeculativelyExecute function to let VP
expansion know whether the VP intrinsic replacement will be
speculatable. VP expansion may only discard %evl where this is the
case.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D125296
Check in updates based on how the latest release was built [0] and add
the bug fix from [1] which allows LLDB to start.
Other changes which had accumulated in the local release script:
- Don't build the clang format plugin (VS has the functionality built
in now)
- Disable tests that have been failing (I'll try to follow up and
re-enable them)
- Switch to Python 3.10
- Jump through more hoops to make LLDB pick the right Python.
0. https://discourse.llvm.org/t/14-0-4-final-has-been-tagged/62750/3
1. https://github.com/llvm/llvm-project/issues/54589
It appears that float support is complete, or at least, the stackmap records
emitted are not inconceivable (I must admit that I don't know about many of the
architectures under test here).
One curiosity, the SystemZ tests highlight an undocumented (or maybe incorrect)
quirk of the stackmap format: in the case of a Register record, the Offset or
SmallConstant field can encode a sub-register index! I've only ever seen this
field zero for Register entries up until now.
Since the SPIR/SPIR-V targets enable all known features, we must
ensure the Work-group Collective Functions feature macro is set for
OpenCL 3.0.
Fixes https://github.com/llvm/llvm-project/issues/55770
Add ops to the structured transform extension of the transform dialect that
perform interchange, padding and scalarization on structured ops. Along with
tiling that is already defined, this provides a minimal set of transformations
necessary to build vectorizable code for a single structured op.
Define two helper traits: one that implements TransformOpInterface by applying
a function to each payload op independently and another that provides a simple
"functional-style" producer/consumer list of memory effects for the transform
ops.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D126374
znver1 ymm variants of VPMOVSX**/VPMOVZX** instructions require double pumping.
Now matches AMD SoG, Agner and instlatx64 numbers.
Thanks to @fabian-r for the report
This option was added in D89854. It prevents GVN from performing
load PRE in a loop, if doing so would require critical edge
splitting on the backedge. From the review:
> I know that GVN Load PRE negatively impacts peeling,
> loop predication, so the passes expecting that latch has
> a conditional branch.
In the PhaseOrdering test in this patch, splitting the backedge
negatively affects vectorization: After critical edge splitting,
the loop gets rotated, effectively peeling off the first loop
iteration. The effect is that the first element is handled
separately, then the bulk of the elements use a vectorized
reduction (but using unaligned, off-by-one memory accesses) and
then a tail of 15 elements is handled separately again.
It's probably worth noting that the loop load PRE from D99926 is
not affected by this change (as it does not need backedge
splitting). This is about normal load PRE that happens to occur
inside a loop.
Differential Revision: https://reviews.llvm.org/D126382
I don't see a point here in the lit tests here since sqrt, mul and other ops
expand as well. I just added "smoke" tests to verify that the conversion works
and does not create any illegal ops.
I will create a patch that adds a simple integration test to
mlir/test/Integration/Dialect/ComplexOps/ that will compare the values.
Differential Revision: https://reviews.llvm.org/D126539
Code beads is useless since the only user, M68k, has moved on to
a new encoding/decoding infrastructure.
Reviewed By: myhsu
Differential Revision: https://reviews.llvm.org/D126349
This patch allows user to use C++20 module by -fcxx-modules. Previously,
we could only use it under -std=c++20. Given that user could use C++20
coroutine standalonel by -fcoroutines-ts. It makes sense to offer an
option to use C++20 modules without enabling C++20.
Reviewed By: iains, MaskRay
Differential Revision: https://reviews.llvm.org/D120540
This whole part with recomputation of BPI and BFI looks redundant,
and we tried to get rid of it in D124439. Unfortunately, it causes
some hard-to-reproduce failures due to invalid state of analysis.
Until this is investigated and fixed, let's try to reuse at least
part of available analyzes.
DT is available at this point, and there is no need to recompute it.
Please revert if you see it causing *any* behavior changes.
If two different texts are inserted at the same offset, clang-apply-replacements prints the conflict error and discards all fixes. This patch adds support for adjusting conflict offset and keeps running to continually fix them.
https://godbolt.org/z/P938EGoxj doesn't have any fixes when I run run-clang-tidy.py to generate a YAML file with clang-tidy and fix them with clang-apply-replacements. The YAML file has two different header texts insertions at the same offset, unlike clang-tidy with '-fix', clang-apply-replacements does not adjust for this conflict.
Reviewed By: aaron.ballman
Differential Revision: https://reviews.llvm.org/D123924
This patch allows user to use C++20 module by -fcxx-modules. Previously,
we could only use it under -std=c++20. Given that user could use C++20
coroutine standalonel by -fcoroutines-ts. It makes sense to offer an
option to use C++20 modules without enabling C++20.
Reviewed By: iains, MaskRay
Differential Revision: https://reviews.llvm.org/D120540
The checker missed a check for parameter type of primary template of specialization template and this could cause build breakages.
Reviewed By: aaron.ballman, flx
Differential Revision: https://reviews.llvm.org/D116593
This pattern is what we get after DAG combine for C code like this.
short *ptr1, *ptr2, *ptr3;
unsigned diff = ptr1 - ptr2;
return ptr3[diff];
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D126588
The tests here show the codegen for something like this C code.
unsigned diff = ptr1 - ptr2;
return ptr3[diff];
The pointer difference is truncated to 32-bits before being used
again as an index. In SelectionDAG this appears as an AND between
a SRL and a SHL. DAGCombiner will remove the shifts leaving only
an AND. The Mask now has 1,2, or 3 trailing zeros and 31, 30, or 29
leading zeros. We end up falling back to constant materialization
to create this mask.
We could instead use srli followed by slli.uw. Or since
we have an add, we can use srli followed by shXadd.uw.
Differential Revision: https://reviews.llvm.org/D126589
This reverts the revert commit ad95255b92.
The updated version also creates a load when the store may not execute.
In those cases, we still need to introduce a load in a function where
there may not have been one before, so this doesn't completely resolve
issue #51248.
Original message:
When only a store is sunk, there is no need to create a load in the
pre-header, as the result of the load will never get used.
The dead load can can introduce UB, if the function is marked as
writeonly.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D123473
If both a v2i32 DUP(x) and a v4i32 DUP(x) node exists, we can re-use the
larger node using a vector extract to obtain the smaller. This comes up
in the smull/smlal code, but needs a small fixup to allow the smull2
code in tryExtendDUPToExtractHigh/performAddSubLongCombine to still
match smull2 extracts.
Differential Revision: https://reviews.llvm.org/D126449
Clang 3.7 and below is not actively used or supported in the test suite now, so
remove the workaround in the test.
Differential Revision: https://reviews.llvm.org/D126603
znver1/2 models were incorrectly modelling the fpupipe (should be pipe2 for shift-by-scalar-amount and pipe1 for shift-by-element-amount) and znver1 ymm variants also require double pumping.
Now matches AMD SoG, Agner and instlatx64 numbers.
Thanks to @fabian-r for the report
musl-libc doesn't support dladdr in statically linked binaries:
> Are you using static or dynamic linking? If static, dladdr is just a
> stub that always fails. It could be implemented to work under some
> conditions, but it would be highly dependent on what options you
> compile the binary with, since by default static binaries do not
> contain the bloat that would be needed to perform introspection.
Source: https://www.openwall.com/lists/musl/2013/01/15/25 (in response
to a bug report).
Libclang unfortunately uses dladdr to find the ResourcesPath so will
fail if it is linked statically on Alpine Linux. This patch fixes this
issue by falling back to getMainExecutable if dladdr returns an error.
Reference: https://github.com/llvm/llvm-project/issues/40641#issuecomment-981011427
Differential Revision: https://reviews.llvm.org/D124815