Summary:
This change permits scalar bfloats to be loaded, stored, moved and
used as function call arguments and return values, whenever the bf16
feature is supported by the subtarget.
Previously that was only supported in the presence of the fullfp16
feature, because the code generation strategy depended on instructions
from that extension. This change adds alternative code generation
strategies so that those operations can be done even without fullfp16.
The strategy for loads and stores is to replace VLDRH/VSTRH with
integer LDRH/STRH plus a move between register classes. I've written
isel patterns for those, conditional on //not// having the fullfp16
feature (so that in the fullfp16 case, the existing patterns will
still be used).
For function arguments and returns, instead of writing isel patterns
to match `VMOVhr` and `VMOVrh`, I've avoided generating those SDNodes
in the first place, by factoring out the code that constructs them
into helper functions `MoveToHPR` and `MoveFromHPR` which have a
fallback for non-fullfp16 subtargets.
The current output code is not especially pretty: in the new test file
you can see unnecessary store/load pairs implementing no-op bitcasts,
and lots of pointless moves back and forth between FP registers and
GPRs. But it at least works, which is an improvement on the previous
situation.
Reviewers: dmgreen, SjoerdMeijer, stuij, chill, miyuki, labrinea
Reviewed By: dmgreen, labrinea
Subscribers: labrinea, kristof.beyls, hiraditya, danielkiss, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D82372
The indexing was messed up, so the result was completely broken.
Shuffle constant exprs are rare in practice; without vscale types,
constant folding generally elminates them. So sort of hard to trip over.
Fixes regression from D72467.
Differential Revision: https://reviews.llvm.org/D80330
There's more smarts in AArch64ISelLowering that we don't have yet, but this
change incrementally improves some of the more common patterns. I think future
iterations will want to use some combination of PostLegalizerCombiner and the
selector to catch the other cases.
Differential Revision: https://reviews.llvm.org/D82340
This function is deceptive at best: it doesn't return what you'd expect.
If you have an arbitrary GlobalValue and you want to determine the
alignment of that pointer, Value::getPointerAlignment() returns the
correct value. If you want the actual declared alignment of a function
or variable, GlobalObject::getAlignment() returns that.
This patch switches all the users of GlobalValue::getAlignment to an
appropriate alternative.
Differential Revision: https://reviews.llvm.org/D80368
Summary:
According to HowToUpdateDebugInfo.rst:
```
Preserving the debug locations of speculated instructions can make
it seem like a condition is true when it's not (or vice versa), which
leads to a confusing single-stepping experience
```
This patch follows the recommendation to drop debug locations on
speculated instructions.
Reviewers: aprantl, davide
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D82420
I forgot to copy the new fixed function ABI into GlobalISel, so this
was mismatched with the DAG compiled calling function. This was
allocating part of the argument list to v31, which was supposed to be
reserved for the workitem IDs.
Implement them on top of sdiv/udiv, similar to what we do for integer
types.
Potential future work: implementing i8/i16 srem/urem, optimizations for
constant divisors, optimizing the mul+sub to mls.
Differential Revision: https://reviews.llvm.org/D81511
This has two advantages: one, it's simpler, and two, it doesn't require
heroic pattern matching with scalable vectors.
Also includes a small fix to DataLayout to allow the scalable vector
testcase to work correctly.
Differential Revision: https://reviews.llvm.org/D82061
This patch adds tests for folds of ADDIs into load/stores, focusing on
load/stores with nonzero offsets. When the offset is nonzero we currently
don't do the fold. A follow-up patch will improve on that.
Differential Revision: https://reviews.llvm.org/D79689
LDRD and STRD along with UBFX and SBFX are selected from DAGToDAG
transforms, so do not have tblgen patterns. They don't get marked as
having side effects so cannot be scheduled as efficiently as you would
like.
This specifically marks then as not having side effects.
Differential Revision: https://reviews.llvm.org/D82358
Summary: `nomerge` attribute was added at D78659. So, we can remove the EmptyAsm workaround in ASan the MSan and use this attribute.
Reviewers: vitalybuka
Reviewed By: vitalybuka
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D82322
This patch extends storeIsNoop to also detect stores of 0 to an calloced
object. This basically ports the logic from legacy DSE to the MemorySSA
backed version.
It triggers in a few cases on MultiSource, SPEC2000, SPEC2006 with -O3
LTO:
Same hash: 218 (filtered out)
Remaining: 19
Metric: dse.NumNoopStores
Program base patch2 diff
test-suite...CFP2000/177.mesa/177.mesa.test 1.00 15.00 1400.0%
test-suite...6/482.sphinx3/482.sphinx3.test 1.00 14.00 1300.0%
test-suite...lications/ClamAV/clamscan.test 2.00 28.00 1300.0%
test-suite...CFP2006/433.milc/433.milc.test 1.00 8.00 700.0%
test-suite...pplications/oggenc/oggenc.test 2.00 9.00 350.0%
test-suite.../CINT2000/176.gcc/176.gcc.test 6.00 6.00 0.0%
test-suite.../CINT2006/403.gcc/403.gcc.test NaN 137.00 nan%
test-suite...libquantum/462.libquantum.test NaN 3.00 nan%
test-suite...6/464.h264ref/464.h264ref.test NaN 7.00 nan%
test-suite...decode/alacconvert-decode.test NaN 2.00 nan%
test-suite...encode/alacconvert-encode.test NaN 2.00 nan%
test-suite...ications/JM/ldecod/ldecod.test NaN 9.00 nan%
test-suite...ications/JM/lencod/lencod.test NaN 39.00 nan%
test-suite.../Applications/lemon/lemon.test NaN 2.00 nan%
test-suite...pplications/treecc/treecc.test NaN 4.00 nan%
test-suite...hmarks/McCat/08-main/main.test NaN 4.00 nan%
test-suite...nsumer-lame/consumer-lame.test NaN 3.00 nan%
test-suite.../Prolangs-C/bison/mybison.test NaN 1.00 nan%
test-suite...arks/mafft/pairlocalalign.test NaN 30.00 nan%
Reviewers: efriedma, zoecarver, asbirlea
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D82204
Summary:
Make use of both the - (1) clustered bytes and (2) cluster length, to decide on
the max number of mem ops that can be clustered. On an average, when loads
are dword or smaller, consider `5` as max threshold, otherwise `4`. This
heuristic is purely based on different experimentation conducted, and there is
no analytical logic here.
Reviewers: foad, rampitec, arsenm, vpykhtin
Reviewed By: rampitec
Subscribers: llvm-commits, kerbowa, hiraditya, t-tye, Anastasia, tpr, dstuttard, yaxunl, nhaehnle, wdng, jvesely, kzhuravl, thakis
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D82393
Adding `default_target` fixed the build by excluding these tests... but
this excluded these tests from ever running!
The correct feature check is `default_triple`
Summary:
In order to enable mass testing of opt under NPM, specifically passes
specified via -foo-pass.
This is gated under a new opt flag -enable-new-pm. Currently
the pass flag parser looks for legacy PM passes with the name "foo" (for
opt arg "-foo") and creates a PassInfo for each one. Here we take the
(legacy PM) pass name and try to match it with one defined in (NPM)
PassRegistry.def. Ultimately if we want all tests to pass like this,
we'll need to port all passes to NPM and register them in
PassRegistry.def under the same name as they were reigstered in the
legacy PM.
Maybe at some point we'll migrate all -foo to --passes=foo, but that
would be after the NPM switch.
Flipping on the flag causes 2XXX failures under check-llvm. By far most
of them are passes either not ported to NPM or don't have the same name
in PassRegistry.def as their old name.
Reviewers: hans, echristo, asbirlea, leonardchan
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D82320
The VLLDM and VLSTM instructions are incompletely specified. They
(potentially) write (or read, respectively) registers Q0-Q7, VPR, and
FPSCR, but the compiler is unaware of it.
In the new test case `cmse-vlldm-no-reorder.ll` case the compiler
missed an anti-dependency and reordered a `VLLDM` ahead of the
instruction, which stashed the return value from the non-secure call,
effectively clobbering said value.
This test case does not fail with upstream LLVM, because of scheduling
differences and I couldn't find a test case for the VLSTM either.
Differential Revision: https://reviews.llvm.org/D81586
Summary:
As discussed previously when landing patch for OpenMP in Flang, the idea is
to share common part of the OpenMP declaration between the different Frontend.
While doing this it was thought that moving to tablegen instead of Macros will also
give a cleaner and more powerful way of generating these declaration.
This first part of a future series of patches is setting up the base .td file for
DirectiveLanguage as well as the OpenMP version of it. The base file is meant to
be used by other directive language such as OpenACC.
In this first patch, the Directive and Clause enums are generated with tablegen
instead of the macros on OMPConstants.h. The next pacth will extend this
to other enum and move the Flang frontend to use it.
Reviewers: jdoerfert, DavidTruby, fghanim, ABataev, jdenny, hfinkel, jhuber6, kiranchandramohan, kiranktp
Reviewed By: jdoerfert, jdenny
Subscribers: arphaman, martong, cfe-commits, mgorny, yaxunl, hiraditya, guansong, jfb, sstefan1, aaron.ballman, llvm-commits
Tags: #llvm, #openmp, #clang
Differential Revision: https://reviews.llvm.org/D81736
This patch helps add support for emitting the .debug_pubnames section to yaml2elf.
Known issues:
- Current implementation doesn't support emitting multiple sets of entries.
- Doesn't support DWARF64.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D82296
This patch contains:
- Support in LLVM CodeGen for bfloat16 types for ld2/3/4 and st2/3/4.
- New bfloat16 ACLE builtins for svld(2|3|4)[_vnum] and svst(2|3|4)[_vnum]
Reviewers: stuij, efriedma, c-rhodes, fpetrogalli
Reviewed By: fpetrogalli
Tags: #clang, #lldb, #llvm
Differential Revision: https://reviews.llvm.org/D82187
Summary:
As [[ https://bugs.llvm.org/show_bug.cgi?id=45360 | PR45360 ]] reports,
with new cost-model we can sometimes end up being able to expand `udiv`/`urem` instructions.
And that exposes at least one instance of when we do that
regardless of whether or not it is safe to do.
In this particular case, it's `SimplifyIndvar::replaceIVUserWithLoopInvariant()`.
It seems to me, we simply need to check with `isSafeToExpandAt()` first.
The test isn't great. I'm not sure how to make it only run `-indvars`.
Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=45360 | PR45360 ]].
Reviewers: mkazantsev, reames, helloqirun
Reviewed By: mkazantsev
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D82108
Summary:
This patch adds base support for code generating fixed length
vector operations targeting a known SVE vector length. To achieve
this we lower fixed length vector operations to equivalent scalable
vector operations, whereby SVE predication is used to limit the
elements processed to those present within the fixed length vector.
Specifically this patch implements load and store operations, which
get lowered to their masked counterparts thusly:
V = load(Addr) =>
V = extract_fixed_vector(masked_load(make_pred(V.NumElts), Addr))
store(V, (Addr)) =>
masked_store(insert_fixed_vector(V), make_pred(V.NumElts), Addr))
Reviewers: rengolin, efriedma
Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80385
Summary:
The previous version relied on the standard calling convention using
std::reverse() to try to force the AVR ABI. But this only works for
simple cases, it fails for example with aggregate types.
This patch rewrites the calling convention with custom C++ code, that
implements the ABI defined in https://gcc.gnu.org/wiki/avr-gcc.
To do that it adds a few 16-bit pseudo registers for unaligned argument
passing, such as R24R23. For example this function:
define void @fun({ i8, i16 } %a)
will pass %a.0 in R22 and %a.1 in R24R23.
There are no instructions that can use these pseudo registers, so a new
register class, DREGSMOVW, is defined to make them apart.
Also the ArgCC_AVR_BUILTIN_DIV is no longer necessary, as it is
identical to the C++ behavior (actually the clobber list is more strict
for __div* functions, but that is currently unimplemented).
Reviewers: dylanmckay
Subscribers: Gaelan, Sh4rK, indirect, jwagen, efriedma, dsprenkels, hiraditya, Jim, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68524
Patch by Rodrigo Rivas Costa.
Previously, if there was an error whilst parsing the operands of an
extended opcode, the operands would be treated as zero and printed. This
could potentially be slightly confusing. This patch changes the
behaviour to print the raw bytes instead.
Reviewed by: ikudrin
Differential Revision: https://reviews.llvm.org/D81570
Summary:
- When promoting a pointer from memory to register, SROA skips pointers
from different address spaces. However, as `ptrtoint` and `inttoptr`
are defined as no-op casts if that integer type has the same as the
pointer value, generate the pair of `ptrtoint`/`inttoptr` (no-op cast)
sequence to convert pointers from different address spaces if they
have the same size.
Reviewers: arsenm, chandlerc, lebedev.ri
Subscribers:
Differential Revision: https://reviews.llvm.org/D81943
We can sometimes replace a select with a Phi node if all of its values
are available on respective incoming edges.
Differential Revision: https://reviews.llvm.org/D82005
Reviewed By: nikic
Summary:
- AssertAlign node records the guaranteed alignment on its source node,
where these alignments are retrieved from alignment attributes in LLVM
IR. These tracked alignments could help DAG combining and lowering
generating efficient code.
- In this patch, the basic support of AssertAlign node is added. So far,
we only generate AssertAlign nodes on return values from intrinsic
calls.
- Addressing selection in AMDGPU is revised accordingly to capture the
new (base + offset) patterns.
Reviewers: arsenm, bogner
Subscribers: jvesely, wdng, nhaehnle, tpr, hiraditya, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D81711
This patch implements builtins for the following prototypes for the VSX Permute
Control Vector Generate with Mask Instructions:
vector unsigned char vec_genpcvm (vector unsigned char, const int);
vector unsigned short vec_genpcvm (vector unsigned short, const int);
vector unsigned int vec_genpcvm (vector unsigned int, const int);
vector unsigned long long vec_genpcvm (vector unsigned long long, const int);
Differential Revision: https://reviews.llvm.org/D81774
This is the integer sibling to D81491.
(a[0] + a[1] + a[2] + a[3]) - (b[0] + b[1] + b[2] +b[3]) -->
(a[0] - b[0]) + (a[1] - b[1]) + (a[2] - b[2]) + (a[3] - b[3])
Removing the "experimental" from these intrinsics is likely
not too far away.
Add disassembly support for the movw, adiw, and sbiw instructions.
I had previously committed test cases for the adiw and sbiw
instructions, but had accidentally made them not runnable so they were
skipped all this time. Oops. This patch fixes that by adding support for
disassembling those instructions.
Differential Revision: https://reviews.llvm.org/D82093
Some instructions have a fixed Z register and don't have an explicit
register operand. This can be worked around by simply printing the
operand directly if the particular register class is detected.
The LPM and ELPM instructions also needed a custom decoder, which is
also included in this patch.
Differential Revision: https://reviews.llvm.org/D82088
These can often only use a limited range of registers, and apparently
need special decoding support.
Differential Revision: https://reviews.llvm.org/D81971
This is a set of instructions that take just a single register as an
operand, with no immediates. Because all instructions share the same
format, I haven't added exhaustive bit testing to all instructions but
just to the inc instruction.
Differential Revision: https://reviews.llvm.org/D81968
I'm not entirely sure why this was ever needed, but when I remove both
adjustments all tests still pass.
This fixes a bug where a long branch (using the `jmp` instead of the
`rjmp` instruction) was incorrectly adjusted by 2 because it jumps to an
absolute address instead of a PC-relative address. I could have added
AVR::fixup_call to the list of exceptions, but it seemed more sensible
to me to just remove this code.
Differential Revision: https://reviews.llvm.org/D78459
This diff adds support for deleting an rpath from a Mach-O binary.
Patch by Sameer Arora!
Test plan: make check-all
Differential revision: https://reviews.llvm.org/D81527
Summary: Previous code would try to verify DW_AT_ranges and if any ranges would overlap, it would stop attributing any ranges after this to the DIE which caused incorrect errors to be reported that a DIE's address ranges were not contained in the parent DIE's ranges. Added a fix and a test.
Reviewers: aprantl, labath, probinson, JDevlieghere, jhenderson
Subscribers: hiraditya, MaskRay, cmtice, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79962
This caused a Chromium test to miscompile. See discussion on the Phabricator
review.
> This patch extends MatchVectorAllZeroTest to handle OR vector reduction patterns where the result is compared against zero.
>
> Fixes PR45378
>
> Differential Revision: https://reviews.llvm.org/D81547
This reverts 057c9c7ee0
Add a new builtin-function __builtin_expect_with_probability and
intrinsic llvm.expect.with.probability.
The interface is __builtin_expect_with_probability(long expr, long
expected, double probability).
It is mainly the same as __builtin_expect besides one more argument
indicating the probability of expression equal to expected value. The
probability should be a constant floating-point expression and be in
range [0.0, 1.0] inclusive.
It is similar to builtin-expect-with-probability function in GCC
built-in functions.
Differential Revision: https://reviews.llvm.org/D79830
Currently we stop exploring candidates too early in some cases.
In particular, we can continue checking the defining accesses of
non-removable MemoryDefs and defs without analyzable write location
(read clobbers are already ruled out using MemorySSA at this point).
This features matches ELFAsmParser and makes it possible to use `.section ".llvm.call-graph-profile","n"`
Reviewed By: zequanwu
Differential Revision: https://reviews.llvm.org/D82240
Summary:
Currently when --passes is used, any passes specified via -foo are
ignored. Explicitly bail out when that happens.
This requires changing some tests. Most were straightforward, but
codegenprepare-produced-address-math.ll is tricky. One of its RUNs runs
CodeGenPrepare. I tried porting CodeGenPrepare to the NPM, but ended up
getting stuck when I needed a TargetMachine. NPM doesn't have support
for MachineFunctions yet. So I just deleted that RUN line, since it was
mass-added in https://reviews.llvm.org/D54848 and is likely not that
useful.
Reviewers: echristo, hans
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D82271
Summary:
As discussed previously when landing patch for OpenMP in Flang, the idea is
to share common part of the OpenMP declaration between the different Frontend.
While doing this it was thought that moving to tablegen instead of Macros will also
give a cleaner and more powerful way of generating these declaration.
This first part of a future series of patches is setting up the base .td file for
DirectiveLanguage as well as the OpenMP version of it. The base file is meant to
be used by other directive language such as OpenACC.
In this first patch, the Directive and Clause enums are generated with tablegen
instead of the macros on OMPConstants.h. The next pacth will extend this
to other enum and move the Flang frontend to use it.
Reviewers: jdoerfert, DavidTruby, fghanim, ABataev, jdenny, hfinkel, jhuber6, kiranchandramohan, kiranktp
Reviewed By: jdoerfert, jdenny
Subscribers: cfe-commits, mgorny, yaxunl, hiraditya, guansong, jfb, sstefan1, aaron.ballman, llvm-commits
Tags: #llvm, #openmp, #clang
Differential Revision: https://reviews.llvm.org/D81736
This patch helps add support for error handling in `DWARFYAML::emitDebugInfo()`.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D82275
Before this patch, we use `(uint32_t)AbbrCode - (uint32_t)FirstAbbrCode` to index the abbreviation. It's impossible for we to use the preceeding abbreviation of the previous one (e.g., if the previous DIE's `AbbrCode` is 2, we are unable to use the abbreviation with index 1). In this patch, we use `AbbrCode` to index the abbreviation directly.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D82173
This saves creating/destroying a builder every time we
perform some transform.
The tests show instruction ordering diffs resulting from
always inserting at the root instruction now, but those
should be benign.
We have a division by zero crash currently when
the sh_entzize of the dynamic symbol table is 0.
Differential revision: https://reviews.llvm.org/D82180
It is possible to trigger a crash when a dynamic symbol has a
broken (too large) st_name and the DT_STRSZ is also broken.
We have the following code in the `Elf_Sym_Impl<ELFT>::getName`:
```
template <class ELFT>
Expected<StringRef> Elf_Sym_Impl<ELFT>::getName(StringRef StrTab) const {
uint32_t Offset = this->st_name;
if (Offset >= StrTab.size())
return createStringError(object_error::parse_failed,
"st_name (0x%" PRIx32
") is past the end of the string table"
" of size 0x%zx",
Offset, StrTab.size());
...
```
The problem is that `StrTab` here is a `ELFDumper::DynamicStringTab` member
which is not validated properly on initialization. So it is possible to bypass the
`if` even when the `st_name` is huge.
This patch fixes the issue.
Differential revision: https://reviews.llvm.org/D82201
For little endian targets, if we only need the lowest element and none of the extended bits then we can just use the (bitcasted) source vector directly.
We already do this in SimplifyDemandedBits, this adds the SimplifyMultipleUseDemandedBits equivalent.
This reverts commit 29b2c1ca72.
The patch causes the DT verifier failure like:
DominatorTree is different than a freshly computed one!
Not sure the patch itself it wrong but revert to investigate the failure.
We can't consider variable safe if out-of-lifetime access is possible.
So if StackLifetime can't prove that the instruction always uses
the variable when it's still alive, we consider it unsafe.
Usually DominatorTree provides this info, but here we use
StackLifetime. The reason is that in the next patch StackLifetime
will be used for actual lifetime checks and we can avoid
forwarding the DominatorTree into this code.
When writing a unit test on replacing standard epilogue sequences with `BR __mspabi_func_epilog_<N>`, by manually asm-clobbering `rN` - `r10` for N = 4..10, everything worked well except for seeming inability to clobber r4.
The problem was that MSP430 code generator of LLVM used an obsolete name FP for that register. Things were worse because when `llc` read an unknown register name, it silently ignored it.
Differential Revision: https://reviews.llvm.org/D82184
This commit technically permits LLVM to emit the debug information for ELF files for MSP430 architecture. Aside from this, it only defines the register numbers as defined by part 10.1 of MSP430 EABI specification (assuming the 1-byte subregisters share the register numbers with corresponding full-size registers).
This commit was basically tested by me with TI-provided GCC 8.3.1 toolchain by compiling an example program with `clang` (please note manual linking may be required due to upstream `clang` not yet handling the `-msim` option necessary to run binaries on the GDB-provided simulator) and then running it and single-stepping with `msp430-elf-gdb` like this:
```
$sysroot/bin/msp430-elf-gdb ./test -ex "target sim" -ex "load ./test"
(gdb) ... traditional GDB commands follow ...
```
While this implementation is most probably far from completeness and is considered experimental, it can already help with debugging MSP430 programs as well as finding issues in LLVM debug info support for MSP430 itself.
One of the use cases includes trying to find a point where UBSan check in a trap-on-error mode was triggered.
The expected debug information format is described in the [MSP430 Embedded Application Binary Interface](http://www.ti.com/lit/an/slaa534/slaa534.pdf) specification, part 10.
Differential Revision: https://reviews.llvm.org/D81488
Current LLVM implementation uses `MCAsmInfo::CodePointerSize` as addr_size when emitting the DWARF data. llvm-dwarfdump, on the other hand, handles `addr_size`s of 4 and 8 properly and considers all other sizes as an error. This works for most of mainline targets except for MSP430 and AVR.
msp430-gcc v8.3.1 emits DWARF32 with addr_size = 4 (DWARF32 does not imply addr_size = 4, 32 refers to internal offset width of 4 bytes) that is handled by llvm-dwarfdump already. Still, emitting 2-byte target pointers on MSP430 seems correct as well (but not for MSP430X that is supported by msp430-gcc but not by LLVM and has 20-bit address space).
This patch make it possible for MSP430 debug info support to be tested with llvm-dwarfdump.
Differential Revision: https://reviews.llvm.org/D82055
When describing parameter value loaded by a COPY instruction, consider
case where needed Reg value is a sub- or super- register of the COPY
instruction's destination register. Without this patch, compile process
will crash with the assertion "TargetInstrInfo::describeLoadedValue
can't describe super- or sub-regs for copy instructions".
Patch by Nikola Tesic
Differential revision: https://reviews.llvm.org/D82000
Currently we allow peeling of the loops if there is a exiting latch block
and all other exits are blocks ending with deopt.
Actually we want that exit would end up with deopt unconditionally but
it is not required that exit itself ends with deopt.
Reviewers: reames, ashlykov, fhahn, apilipenko, fedor.sergeev
Reviewed By: apilipenko
Subscribers: hiraditya, zzheng, dantrushin, llvm-commits
Differential Revision: https://reviews.llvm.org/D81140
As we traverse the CFG backwards, we could end up reaching unreachable
blocks. For unreachable blocks, we won't have computed post order
numbers and because DomAccess is reachable, unreachable blocks cannot be
on any path from it.
This fixes a crash with unreachable blocks.
If a collection of interconnected phi nodes is only ever loaded, stored
or bitcast then we can convert the whole set to the bitcast type,
potentially helping to reduce the number of register moves needed as the
phi's are passed across basic block boundaries. This has to be done in
CodegenPrepare as it naturally straddles basic blocks.
The alorithm just looks from phi nodes, looking at uses and operands for
a collection of nodes that all together are bitcast between float and
integer types. We record visited phi nodes to not have to process them
more than once. The whole subgraph is then replaced with a new type.
Loads and Stores are bitcast to the correct type, which should then be
folded into the load/store, changing it's type.
This comes up in the biquad testcase due to the way MVE needs to keep
values in integer registers. I have also seen it come up from aarch64
partner example code, where a complicated set of sroa/inlining produced
integer phis, where float would have been a better choice.
I also added undef and extract element handling which increased the
potency in some cases.
This adds it with an option that defaults to off, and disabled for 32bit
X86 due to potential issues around canonicalizing NaNs.
Differential Revision: https://reviews.llvm.org/D81827
GetUnderlyingObject() (and by required symmetry
DecomposeGEPExpression()) will call SimplifyInstruction() on the
passed value if other checks fail. This simplification is very
expensive, but has little effect in practice. This patch removes
the SimplifyInstruction call(), and replaces it with a check for
single-argument phis (which can occur in canonical IR in LCSSA
form), which is the only useful simplification case I was able to
identify.
At O3 the geomean CTMark improvement is -1.7%. The largest
improvement is SPASS with ThinLTO at -6%.
In test-suite, I see only two tests with a hash difference and
no code size difference (PAQ8p, Ptrdist), which indicates that
the simplification only ends up being useful very rarely. (I would
have liked to figure out which simplification is responsible here,
but wasn't able to spot it looking at transformation logs.)
The AMDGPU test case that is update was using two selects with
undef condition, in which case GetUnderlyingObject will return
the first select operand as the underlying object. This will of
course not happen with non-undef conditions, so this was not
testing anything realistic. Additionally this illustrates potential
unsoundness: While GetUnderlyingObject will pick the first operand,
the select might be later replaced by the second operand, resulting
in inconsistent assumptions about the undef value.
Differential Revision: https://reviews.llvm.org/D82261