Introduce V_MOV_B32_indirect_read for indexed vgpr reads
(and rename the old V_MOV_B32_indirect to
V_MOV_B32_indirect_write) so they can be unambiguously
distinguished from regular V_MOV_B32_e32. Previously they
were distinguished by looking for extra implicit operands
but this is fragile because regular moves sometimes have
extra implicit operands too:
- either by accident, when instructions end up with
duplicate implicit operands (see e.g. D100939)
- or by design, when SIInstrInfo::copyPhysReg breaks a
multi-dword copy into individual subreg mov instructions
and adds implicit operands for the super-register.
The effect of this is that SIInstrInfo::isFoldableCopy can
be simplified and identifies more foldable copies. The test
diffs show that more immediate 0 values have been folded as
inline operands.
SIInstrInfo::isReallyTriviallyReMaterializable could
probably be simplified too but that is not part of this
patch.
Differential Revision: https://reviews.llvm.org/D114230
If in addition to AVX512BW (that provides `{k}<->{i8,i16}` casts and i16 shuffles),
we have AVX512VBMI, which provides i8 shuffles, we are in an optimal situation.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D114071
Note that there are many other missing costs, i'm *only* adding the ones that are queried
from `getReplicationShuffleCost()` for the existing (quite exhaustive) test coverage.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D114070
Here we get pretty lucky. AVX512F does not provide any instructions
to convert between a `k` vector mask and a vector,
but AVX512BW adds `{k}<->nX{i8,i16}`conversions,
and just as it happens, with AVX512BW we have a i16 shuffle.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D113915
Check for a hidden ISD::ROTR (rotl(sub(0,x))) - vXi8 lowering can handle both (its always beneficial for splats, but otherwise only if we have VPTERNLOG).
We currently hit infinite loops in TargetLowering::expandROT if we set ISD::ROTR to custom, which needs addressing before we extend this much further.
Patch to fix some of the regressions in D77804.
By folding to rotate/funnel-shift by constant amounts for illegal types, we prevent SimplifyDemandedBits from destroying the patterns prematurely, allowing us to use the rotate/funnel-shift legalization that was added in D112443.
Differential Revision: https://reviews.llvm.org/D113192
Let's describe accurately what the users can expect from the checker in
a direct way.
Also, add an example warning message.
Reviewed By: martong, Szelethus
Differential Revision: https://reviews.llvm.org/D113401
Currently `fir.no_reassoc` is just removed in the conversion.
This patch is part of the upstreaming effort from fir-dev branch.
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D114154
Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>
Sometimes it would be useful to see which Decl kind caused some issue,
along with the name of the concrete instance of the Decl in the source
code.
Reviewed By: martong
Differential Revision: https://reviews.llvm.org/D113668
When aligning the start address of an output section introduces a gap between the current dot pointer
and the new aligned address, we were already properly expanding the memory region, if available.
D74286 introduced a new behavior to also align the LMA address if an LMA region is specified.
However, this did not expand the corresponding LMA region.
Now, we also expand the LMA region if it is set.
This fixes PR52510.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D114166
Overriding methods should not get a readability-identifier-naming
warning because the issue can only be fixed in the base class; but the
current check for whether a method is overriding does not take the
override attribute into account.
Differential Revision: https://reviews.llvm.org/D113830
Printing and parsing of constc didn't agree with each other. This patch
treats the parsing of constc as the final word and fixes the printing
accordingly.
More concretely, this patch prints the RealAttrs that make up the
ConstcOp directly instead of casting to mlir::FloatAttr (which blows
up). It also fixes parseFirRealAttr to invoke APFloat's method for
getting the size of a floating point type instead of computing it as
8 * kind (which blows up for BFloat, with kind == 3 and size == 16).
Kudos to Kiran Chandramohan <kiran.chandramohan@arm.com> for noticing
that we were missing tests for constc in fir-ops.fir.
Differential Revision: https://reviews.llvm.org/D114081
The clang-tidy/infrastructure/pr37091.cpp test inherits the top-level .clang-tidy configuration because it doesn't specify its own checks. It'd be a more stable test if it operates independently of the top-level .clang-tidy settings.
I've made the clang-tidy/infrastructure/pr37091.cpp test independent of the top-level .clang-tidy (picked an arbitrary check that I saw another clang-tidy/infrastructure test was also using: clang-tidy/infrastructure/temporaries.cpp)
Reviewed By: kbobyrev
Differential Revision: https://reviews.llvm.org/D114034
As discussed in D109579, this patch exposes `runRegionDCE` and
`eraseUnreachableBlocks` so they can be used as separate utilities in
other passes.
Reviewed By: rriddle, mehdi_amini
Differential Revision: https://reviews.llvm.org/D114160
Patch [1] added further InstCombine trunc+icmp => mask+icmp
optimization and this caused a couple of bpf selftest failure.
Previous llvm BPF backend patch [2] introduced llvm.bpf.compare
builtin to handle such situations.
This patch further added support ">" and ">=" icmp opcodes.
Tested with bpf selftests and all tests are passed including two
previously failed ones.
Note Patch [1] also added optimization if the to-be-compared
constant is negative-power-of-2 (-C) or not-of-power-of-2 (~C).
This patch didn't implement these two cases as typical bpf
program compares a scalar to a positive length or boundary value,
and this scalar later is used as a index into an array buffer
or packet buffer.
[1] https://reviews.llvm.org/D112634
[2] https://reviews.llvm.org/D112938
Differential Revision: https://reviews.llvm.org/D114215
Mark [cmp.concept] implementation as completed in our documentation.
Reviewed By: ldionne, #libc
Differential Revision: https://reviews.llvm.org/D114203
Do more efforts to use sp if it is possible to lower a frame index.
Reviewers: reames, loicottet, ostannard, t.p.northover
Reviewed By: reames
Subscribers: arphaman, danilaml, hiraditya, kristof.beyls, llvm-commits, Matt, yrouban
Differential Revision: https://reviews.llvm.org/D111133
This test case had been missing when the original code
was introduced by 2fcb863b2b.
Signed-off-by: Matheus Izvekov <mizvekov@gmail.com>
Differential Revision: https://reviews.llvm.org/D114207
There are still another 2 uses of PointerType::getElementType in
Evaluator when evaluating BitCast's on pointers. BitCast's on pointers
should be removed when opaque ptr is ready, so I just keep them as is.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D114131
We need libfuzzer libraries on Arm32 so that we can fuzz
Arm32 binaries on Linux (Chrome OS). Android already
allows Arm32 for libfuzzer.
Reviewed By: morehouse
Differential Revision: https://reviews.llvm.org/D112091
This is a second speculative compile time optimization to address a reported regression. My actual suspicion is that availability of no-self-wrap is making some *other* bit of code trigger, but let's rule this out.
Recently we started generate DBG_VALUEs with $noreg operands.
This crashes SIPostRABundler, and it should not iterate these
registers anyway.
Fixes: SWDEV-311733
Differential Revision: https://reviews.llvm.org/D114202
The @llvm.ptrauth.sign/sign.generic intrinsics map cleanly to
the various AArch64 PAC[IDG][Z][AB] instructions. Select them.
Differential Revision: https://reviews.llvm.org/D91087
`vector::InsertElementOp` and `vector::ExtractElementOp` have had their `position`
operand changed to accept `AnySignlessIntegerOrIndex` for better operability with
operations that use `index`, such as affine loops.
LLVM's `extractelement` and `insertelement` can also accept `i64`, so lowering
directly to these operations without explicitly inserting casts is allowed. SPIRV's
equivalent ops can also accept `i64`.
Reviewed By: nicolasvasilache, jpienaar
Differential Revision: https://reviews.llvm.org/D114139
This patch resolves many of the failures in the `filesystems/` buckets in the libc++ tests. It adds the correct flag to `fopen` and marks a test case as unsupported. In particular, that test assumes time is stored as a 64 bit value when on MVS it is stored as 32 bit.
Differential Revision: https://reviews.llvm.org/D113298
The aim of this patch is to resolve the missing `table_size` symbol (see reduced test case). That const variable is declared and defined in //libcxx/include/locale//; however, the test case suggests that the symbol is missing. This is due to a C++ pitfall (highlighted [[ https://quuxplusone.github.io/blog/2020/09/19/value-or-pitfall/ | here ]]). In summary, assigning the reference of `table_size` doesn't enforce the const-ness and expects to find `table_size` in the DLL. The fix is to use `constexpr` or have an out-of-line definition in the src (for consistency).
Differential Revision: https://reviews.llvm.org/D110647