If the vector is a splat of some scalar value, findScalarElement()
can simply return the scalar value if it knows the requested lane
is in the vector.
This is only needed for scalable vectors, because the InsertElement/ShuffleVector
case is already handled explicitly for the fixed-width case.
This helps to recognize an InstCombine fold like:
extractelt(bitcast(splat(%v))) -> bitcast(%v)
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D107254
d8faf03807 implemented general-regs-only for X86 by disabling all features
with vector instructions. But the CRC32 instruction in SSE4.2 ISA, which uses
only GPRs, also becomes unavailable. This patch adds a CRC32 feature for this
instruction and allows it to be used with general-regs-only.
Reviewed By: pengfei
Differential Revision: https://reviews.llvm.org/D105462
Adds support for a feature macro `__opencl_c_generic_adress_space`
in C++ for OpenCL 2021 enabling a respective optional core feature
from OpenCL 3.0. Testing is only performed in SemaOpenCL because
generic address space functionality is yet to be implemented in
C++ for OpenCL 2021.
This change aims to achieve compatibility between C++ for OpenCL
2021 and OpenCL 3.0.
Differential Revision: https://reviews.llvm.org/D108461
It looks like it was a typo. Instead of `*maybeConstantIndex`,
`initTensorOp.getStaticSize(*maybeConstantIndex)` should be used to access the
dim size of the tensor. There is a test for that in `canonicalize.mlir`, but it
was working correctly because `ReplaceStaticShapeDims` was canonicalizing DimOp
before `FoldInitTensorWithDimOp`. So, to make the patterns more "orthogonal",
this case is disabled.
Differential Revision: https://reviews.llvm.org/D109247
To enable Flang testing on Windows, shell scripts have to be ported to Python. In this patch the "test_errors.sh" script is ported to python ("test_errors.py"). The RUN line of existing tests was changed to make use of the python script.
Used python regex in place of awk/sed.
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D107575
Add documentation of clang-nvlink-wrapper tool in clang.
Add it to the release notes of clang. Fix a small MSVC
warning.
Differential Revision: https://reviews.llvm.org/D109225
Currently, we use SExtValue to decide whether to invert tbz or tbnz.
However, for the case zext (xor x, c), we should use ZExt rather
than SExt otherwise we will generate totally opposite branches.
Reviewed By: paquette
Differential Revision: https://reviews.llvm.org/D108755
Implementation of `three_way_comparable` and `three_way_comparable_with` concepts from <compare> header.
Please note that I have temporarily removed `<compare>` header from `<utility>` due to cyclic dependency that prevents using `<concepts>` header in `<compare>` one.
I tried to quickly resolve those issues including applying suggestions from @cjdb and dive deeper by myself but the problem seems more complicated that we thought initially.
I am in progress to prepare the patch with resolving this cyclic dependency between headers but for now I decided to put all that I have to the review to unblock people that depend on that functionality. At first glance the patch with resolving cyclic dependency is not so small (unless I find the way to make it smaller and cleaner) so I don't want to mix everything to one review.
Reviewed By: ldionne, cjdb, #libc, Quuxplusone
Differential Revision: https://reviews.llvm.org/D103478
Given a select_cc producing a constant and a invertion of the constant
for a comparison more than zero, we can produce an xor with ashr
instead, which produces smaller code. The ashr either sets all bits or
clear all bits depending on if the value is negative. This is then xor'd
with the constant to optionally negate the value.
https://alive2.llvm.org/ce/z/DTFaBZ
This includes a OneUseCheck on the Cmp, which seems to make thinks a
little worse and will be removed in a followup.
Differential Revision: https://reviews.llvm.org/D109149
Pulled out of D109149, this folds set_cc seteq (ashr X, BW-1), -1 ->
set_cc setlt X, 0 to prevent some regressions later on when folding
select_cc setgt X, -1, C, ~C -> xor (ashr X, BW-1), C
Differential Revision: https://reviews.llvm.org/D109214
Recommit of 707ce34b06. Don't introduce a
dependency to the LLVMPasses component, instead register the required
passes individually.
Add methods for loop unrolling to the OpenMPIRBuilder class and use them in Clang if `-fopenmp-enable-irbuilder` is enabled. The unrolling methods are:
* `unrollLoopFull`
* `unrollLoopPartial`
* `unrollLoopHeuristic`
`unrollLoopPartial` and `unrollLoopHeuristic` can use compiler heuristics to automatically determine the unroll factor. If possible, that is if no CanonicalLoopInfo is required to pass to another method, metadata for LLVM's LoopUnrollPass is added. Otherwise the unroll factor is determined using the same heurstics as user by LoopUnrollPass. Not requiring a CanonicalLoopInfo, especially with `unrollLoopHeuristic` allows greater flexibility.
With full unrolling and partial unrolling with known unroll factor, instead of duplicating instructions by the OpenMPIRBuilder, the full unroll is still delegated to the LoopUnrollPass. In case of partial unrolling the loop is first tiled using the existing `tileLoops` methods, then the inner loop fully unrolled using the same mechanism.
Reviewed By: jdoerfert, kiranchandramohan
Differential Revision: https://reviews.llvm.org/D107764
Migrate the tests regarding fixup and relaxation on branch and call
targets.
This patch wraps up the migration from `test/CodeGen/M68k/Encoding` to
`test/MC/M68k`.
Add tests monitoring issues fix. They should be fixed when
https://reviews.llvm.org/D57059 ("Initial support for the vectorization
of the non-power-of-2 vectors") is landed.
Failing to do so results in `std::bad_function_call` being
thrown when a pass tries to emit a diagnostic.
I've copied the relevant test over from LLD-ELF's test suite.
Reviewed By: #lld-macho, thevinster
Differential Revision: https://reviews.llvm.org/D109274
The temporary object was used as a workaround when the target parser may
change STI. D14346 made the MCSubtargetInfo argument to
createMCAsmParser const, so we no longer need the temporary object.
As part of the nontrivial unswitching we could end up removing child
loops. This patch add a notification to the pass manager when
that happens (using the markLoopAsDeleted callback).
Without this there could be stale LoopAccessAnalysis results cached
in the analysis manager. Those analysis results are cached based on
a Loop* as key. Since the BumpPtrAllocator used to allocate
Loop objects could be resetted between different runs of for
example the loop-distribute pass (running on different functions),
a new Loop object could be created using the same Loop pointer.
And then when requiring the LoopAccessAnalysis for the loop we
got the stale (corrupt) result from the destroyed loop.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D109257
The current inconsistency confuse contributors which coding guidlines to follow.
It would be better to have it consistent using clang-format tool.
Reviewed By: mhjacobson
Differential Revision: https://reviews.llvm.org/D109270
The xmm variant have half the throughput (and +1cy latency) of the mmx variants, but are still 1uop.
I still need to do more thorough testing of SLM on test-suite before fixing the obvious bad numbers for WritePMULLD.
But this helps the D103695 helper script get to more accurate numbers for vXi32 multiplies of extended operands (i.e. we can use PMADDWD, PMULLW/PMULHW etc). Matches what Intel AoM / Agner / llvm-exegesis reports.