These two intrinsics are lowered to calls so should prevent the formation of
CTR loops. In a subsequent patch, we will handle all currently known intrinsics
and prevent the formation of HW loops if any unknown intrinsics are encountered.
Differential revision: https://reviews.llvm.org/D68841
This is a fix to D69112 where we common up the logic of writing CsectGroup.
However, we forget to skip the Sections that are empty in that patch.
Reviewed by: daltenty, xingxue
Differential Revision: https://reviews.llvm.org/D69447
Summary:
(Split of off D67120)
SizeOpts/MachineSizeOpts changes for profile guided size optimization.
(A second try after previously committed as r375254 and reverted as r375375.)
Subscribers: mgorny, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69409
Emit a remarks section by default for the following formats:
* bitstream
* yaml-strtab
while still providing -remarks-section=<bool> to override the defaults.
I want to add the ability to rerun the outliner in certain cases, and I
thought this could be an NFC change that could make a subsequent change
that allows for rerunning the outliner a cleaner diff.
Differential Revision: https://reviews.llvm.org/D69482
The static functions `positiveOffsetOpcode`, `negativeOffsetOpcode` and
`immediateOffsetOpcode` (lib/Target/ARM/Thumb2InstrInfo.cpp) currently can
return `0` as default opcode which is meaningless in this situation.
This patch replaces this default value by llvm_unreachable.
Reviewers: t.p.northover, tellenbach
Reviewed By: tellenbach
Subscribers: tellenbach, kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69432
Patch By: Lorenzo Casalino <lorenzo.casalino93@gmail.com>
Summary:
Getelementptr has vector type if any of its operands are vectors
(the scalar operands being implicitly broadcast to all vector elements).
Extractelement applied to a vector getelementptr can be folded by
applying the extractelement in turn to all of the vector operands.
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69379
The legalization of v2i1->i2 or v4i1->i4 bitcasts followed by a setcc can create an and after the bitcast. If we're lucky enough that the input to the bitcast is a concat_vectors where the first operand is a setcc that can natively 0 all the upper bits of ak-register, then we should replace the other operands of the concat_vectors with zero in order to remove the AND.
With the AND removed we might be able to use a kortest on the result.
Differential Revision: https://reviews.llvm.org/D69205
Currently we may do iterleaving by more than estimated trip count
coming from the profile or computed maximum trip count. The solution is to
use "best known" trip count instead of exact one in interleaving analysis.
Patch by Evgeniy Brevnov.
Differential Revision: https://reviews.llvm.org/D67948
Summary:
Debug info affects output from "opt -inline", InlineFunction could
not handle the llvm.dbg.value when it exist between alloca
instructions.
Problem was that the first alloca in a sequence of allocas was
handled differently from the subsequence alloca instructions. Now
all static alloca instructions are treated the same (being removed
if the have no uses). So it does not matter if there are dbg
instructions (or any other instructions) in between.
Fix the issue: https://bugs.llvm.org/show_bug.cgi?id=43291k
Patch by: yechunliang (Chris Ye)
Reviewers: bjope, jmorse, vsk, probinson, jdoerfert, mtrofin, aprantl, fhahn
Reviewed By: bjope
Subscribers: uabelho, ormris, aprantl, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68633
Summary:
An outstanding load with same destination sgpr as call could cause PC to be
updated with junk value on return.
Reviewers: arsenm, rampitec
Reviewed By: arsenm
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69474
This patch reworks the AIX call lowering to use CCState. Some defensive errors
are added in this patch to protect from emitting bad code for calling convention
logic that has not been implemented by design. The use of CCState follows the
precedent of other targets and enables the reuse of calling convention logic in
LowerFormalArguments, which will be rewritten to also use CCState in a late
patch.
Patch by Chris Bowler.
Differential Revision: https://reviews.llvm.org/D69101
This is an extra fold for a canonical form of uadd_sat, as shown in
D68651. It essentially selects uadd from an add and a select.
Differential Revision: https://reviews.llvm.org/D69244
Summary:
A new function pass (Transforms/CFGuard/CFGuard.cpp) inserts CFGuard checks on
indirect function calls, using either the check mechanism (X86, ARM, AArch64) or
or the dispatch mechanism (X86-64). The check mechanism requires a new calling
convention for the supported targets. The dispatch mechanism adds the target as
an operand bundle, which is processed by SelectionDAG. Another pass
(CodeGen/CFGuardLongjmp.cpp) identifies and emits valid longjmp targets, as
required by /guard:cf. This feature is enabled using the `cfguard` CC1 option.
Reviewers: thakis, rnk, theraven, pcc
Subscribers: ychen, hans, metalcanine, dmajor, tomrittervg, alex, mehdi_amini, mgorny, javed.absar, kristof.beyls, hiraditya, steven_wu, dexonsmith, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D65761
In the Pre-RA machine sinker, previously we were relying on all DBG_VALUEs
being immediately after the instruction that defined their operands. This
isn't a valid assumption, as a variable location change doesn't
necessarily correspond to where the value is computed. In this patch, we
collect DBG_VALUEs that might need sinking as we walk through a block,
and sink all of them if their defining instruction is sunk.
This patch adds some copy propagation too, so that if we sink a copy inst,
the now non-dominated paths can use the copy source for the variable
location.
Differential Revision: https://reviews.llvm.org/D58386
This enhances D69127 (rGe6c145e0548e3b3de6eab27e44e1504387cf6b53)
to handle the looser "any_extend" cast in addition to zext.
This is a prerequisite step for canonicalizing in the other direction
(narrow the popcount) in IR - PR43688:
https://bugs.llvm.org/show_bug.cgi?id=43688
This phi simplification transform was added with:
D45448
However as shown in PR43802:
https://bugs.llvm.org/show_bug.cgi?id=43802
...we must be careful not to propagate poison when we do the substitution.
There might be some more complicated analysis possible to retain the overflow flag,
but it should always be safe and easy to drop flags (we have similar behavior in
instcombine and other passes).
Differential Revision: https://reviews.llvm.org/D69442
When we sink DBG_VALUEs between blocks, we simply move the DBG_VALUE
instruction to below the sunk instruction. However, we should also mark
the variable as being undef at the original location, to terminate any
earlier variable location. This patch does that -- plus, if the
instruction being sunk is a copy, it attempts to propagate the copy
through the DBG_VALUE, replacing the destination with the source.
Differential Revision: https://reviews.llvm.org/D58238
We would previously have no soft-float softening for cbrt, so could hit
a crash failing to select. This fills in what appears to be missing.
Differential Revision: https://reviews.llvm.org/D69345
Summary:
Writing support for three ACLE functions:
unsigned int __cls(uint32_t x)
unsigned int __clsl(unsigned long x)
unsigned int __clsll(uint64_t x)
CLS stands for "Count number of leading sign bits".
In AArch64, these two intrinsics can be translated into the 'cls'
instruction directly. In AArch32, on the other hand, this functionality
is achieved by implementing it in terms of clz (count number of leading
zeros).
Reviewers: compnerd
Reviewed By: compnerd
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D69250
Summary:
Until this commit, these have lowered to a call to abort().
`llvm.trap()` now lowers to `unimp`, which should trap on all systems.
`llvm.debugtrap()` now lowers to `ebreak`, which is exactly what this
instruction is for.
Reviewers: asb, luismarques
Reviewed By: asb
Subscribers: hiraditya, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, kito-cheng, shiva0217, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, Jim, s.egerton, pzheng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69390
Summary:
The PATCHABLE_EVENT_CALL uses i32 in the intrinsic. This
results in the register allocator picking a 32-bit register. We
need to use the 64-bit register when forming the MOV64rr
instructions. Otherwise we print illegal assembly in the text
output.
I think prior to this it was impossible for SrcReg to be equal
to DstReg so the NOP code was not reachable.
While there use Register instead of unsigned.
Also add a FIXME for what looks like a bug.
Reviewers: dberris
Reviewed By: dberris
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69365
Similar to:
rG4c47617627fb
This makes the DAG behavior consistent with IR's insertelement.
https://bugs.llvm.org/show_bug.cgi?id=42689
I've tried to maintain test intent for AArch64 and WebAssembly
by replacing undef index operands with something else.
If the target's preferred shift amount VT can't hold any shift
amount for the promoted VT, we should use i32. The specific shift
amount shouldn't matter. The type will be adjusted later when the
shift itself is type legalized. This avoids an assert in getNode.
Fixes PR43820.
This combine is only valid if the inner setcc produces a 0/1 result
or the inner type is MVT::i1.
I haven't seen this cause any issues, just happened to notice it
while reviewing combines in this function.
While there also fix another call to use the value type from the
SDValue for the operand instead of calling SDNode::getValueType(0).
Though its likely the use is result 0, its not guaranteed.
Summary:
We don't pattern match pairwise shuffles in SelectionDAG. So we
should only return the optimized costs if its not a pairwise
shuffle.
I think SLP vectorizer gives priority to non pairwise shuffle if
the cost is the same. And the look up for reduction intrinsics
passes false for the pairwise flag. So this probably has no real
effect today.
Reviewers: RKSimon
Reviewed By: RKSimon
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69083
Summary:
Compare two values, and if they are different, return the position of the
most significant bit that is different in the values.
Needed for D69387.
Reviewers: nikic, spatel, sanjoy, RKSimon
Reviewed By: nikic
Subscribers: xbolva00, hiraditya, dexonsmith, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69439
PTEST and especially the MOVMSK instructions are slow on Knights Landing
or later. As a bonus, this patch increases instruction parallelism by
emitting:
KORTEST(PCMPNEQ(a, b), PCMPNEQ(c, d)) == 0
Instead of:
KORTEST(AND(PCMPEQ(a, b), PCMPEQ(c, d))) == ~0
https://reviews.llvm.org/D69157
There is a minor flaw in the implementation of function lowerPhis.
This function replaces values of regclass Vreg_1 (boolean values)
involved in PHIs into an SGPR. Currently it iterates over the MBBs
and performs an inplace lowering of PHIs and fails to lower any
incoming value that itself is another PHI of Vreg_1 regclass.
The failure occurs only when the MBB where the incoming PHI value
belongs is not visited/lowered yet.
To fix this problem, collect all Vreg_1 PHIs upfront and then
perform the lowering.
Differential Revision: https://reviews.llvm.org/D69182
Using `?` as an optional marker is very useful in Clang's AST-node
emitters because otherwise we need a separate class just to encode
the presence or absence of a base node reference.
This makes the DAG behavior consistent with IR's extractelement after:
rGb32e4664a715
https://bugs.llvm.org/show_bug.cgi?id=42689
I've tried to maintain test intent for WebAssembly.
The AMDGPU test is trying to test for crashing or other bad behavior,
but I'm not sure if that's possible after this change.
That used to fail in the last testcase function because after
%0:sreg_64.sub0 was folded into %3:sreg_32_xm0_xexec COPY, it
was further folded into S_STORE_DWORD_IMM. Its legal effective
subreg class is SReg_32 while instruction expects more restricted
SReg_32_XM0_EXEC. However, SIInstrInfo::isLegalRegOperand()
passed the legality check and it was caught in the verifier.
Borrowed code from the verifier to check for RC legality.
Differential Revision: https://reviews.llvm.org/D69445
Ilya Leoshkevich (<iii@linux.ibm.com>) reported an issue that
with -mattr=+alu32 CO-RE has a segfault in BPF MISimplifyPatchable
pass.
The pattern will be transformed by MISimplifyPatchable
pass looks like below:
r5 = ld_imm64 @"b:0:0$0:0"
r2 = ldw r5, 0
... r2 ... // use r2
The pass will remove the intermediate 'ldw' instruction
and replacing all r2 with r5 likes below:
r5 = ld_imm64 @"b:0:0$0:0"
... r5 ... // use r5
Later, the ld_imm64 insn will be replaced with
r5 = <patched immediate>
for field relocation purpose.
With -mattr=+alu32, the input code may become
r5 = ld_imm64 @"b:0:0$0:0"
w2 = ldw32 r5, 0
... w2 ... // use w2
Replacing "w2" with "r5" is incorrect and will
trigger compiler internal errors.
To fix the problem, if the register class of ldw* dest
register is sub_32, we just replace the original ldw*
register with:
w2 = w5
Directly replacing all uses of w2 with in-place
constructed w5 for the use operand seems not working in all cases.
The latest kernel will have -mattr=+alu32 on by default,
so added this flag to all CORE tests.
Tested with latest kernel bpf-next branch as well with this patch.
Differential Revision: https://reviews.llvm.org/D69438
Both tryFoldOMod() and tryFoldClamp() remove original instruction,
so the check MI.modifiesRegister() may use a deleted MI.
Differential Revision: https://reviews.llvm.org/D69448
Custom lower this to a target instruction with the merge operands. I
think it might be better to directly select this and emit a
REG_SEQUENCE, but this would be more work since it would require
splitting the tablegen patterns for these cases from the other
atomics.
Summary:
In loadSRsrcFromVGPR, if MBB is the same as Succ, Remiander is not the immediate dominator of Succ.
Reviewer:
arsenm
Differential Revision:
https://reviews.llvm.org/D69358
Summary:
If there are a GUID collision between two globals checking the
summarylist from the import index to make assumption can be dangerous.
Do not assume that a GlobalValue that has a GlobalVarSummary
actually is a GlobalVariable as it can be another GlobalValue with
the same GUID that the summary is connected to.
Patch by Joel Klinghed (the_jk@opera.com)
Reviewers: evgeny777, tejohnson
Reviewed By: tejohnson
Subscribers: tejohnson, dblaikie, MaskRay, mehdi_amini, inglorion, hiraditya, steven_wu, dexonsmith, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67322
zext (ctpop X) --> ctpop (zext X)
This is a prerequisite step for canonicalizing in the other direction (narrow the popcount) in IR - PR43688:
https://bugs.llvm.org/show_bug.cgi?id=43688
I'm not sure if any other targets are affected, but I found a missing fold for PPC, so added tests based on that.
The reason we widen all the way to 64-bit in these tests is because the initial DAG looks something like this:
t5: i8 = ctpop t4
t6: i32 = zero_extend t5 <-- created based on IR, but unused node?
t7: i64 = zero_extend t5
Differential Revision: https://reviews.llvm.org/D69127
Summary:
Add instruction marker to MachineInstr ExtraInfo. This does almost the
same thing as Pre/PostInstrSymbols, except that it doesn't create a label until
printing instructions. This allows for labels to be put around instructions that
are deleted/duplicated somewhere.
Also undo the workaround in r375137.
Reviewers: rnk
Subscribers: MatzeB, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69136
Summary:
There are `*_ov()` functions already, so at least for consistency it may be good to also have saturating variants.
These may or may not be needed for `ConstantRange`'s `shlWithNoWrap()`
Reviewers: spatel, nikic
Reviewed By: nikic
Subscribers: hiraditya, dexonsmith, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69398
Summary:
There are `*_ov()` functions already, so at least for consistency it may be good to also have saturating variants.
These may or may not be needed for `ConstantRange`'s `mulWithNoWrap()`
Reviewers: spatel, nikic
Reviewed By: nikic
Subscribers: hiraditya, dexonsmith, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69397
Summary:
Ternary expression checks for ISD::ADD instead of ISD::UADDO inside DAGTypeLegalizer::ExpandIntRes_UADDSUBO.
This means the ternary expression will evaluate to ISD::SUBCARRY for both ISD::UADDO and ISD::USUBO nodes.
Targets are likely to implement both, so impact will be very limited in practice.
Reviewers: bogner, lebedev.ri
Reviewed By: lebedev.ri
Subscribers: lebedev.ri, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68123
Complete fp16 support by ensuring that load extension / truncate store
operations are properly expanded.
Reviewers: asb, lenary
Reviewed By: lenary
Differential Revision: https://reviews.llvm.org/D69246
selectImpl is able to select G_FSQRT when we set bank for vector
operands to fprb. Add detailed tests.
Note: G_FSQRT is generated from llvm-ir intrinsics llvm.sqrt.*,
and at the moment MIPS is not able to generate this intrinsic for
vector type (some targets generate vector llvm.sqrt.* from calls
to a builtin function).
__builtin_msa_fsqrt_<format> will be transformed into G_FSQRT
in legalizeIntrinsic and selected in the same way.
Differential Revision: https://reviews.llvm.org/D69376
SHT_NOTE is the section that consists of
namesz, descsz, type, name + padding, desc + padding data.
This patch teaches yaml2obj, obj2yaml to dump and parse them.
This patch implements the section how it is described here:
https://docs.oracle.com/cd/E23824_01/html/819-0690/chapter6-18048.html
Which says: "For 64–bit objects and 32–bit objects, each entry is an array of 4-byte words in
the format of the target processor"
The official specification is different
http://www.sco.com/developers/gabi/latest/ch5.pheader.html#note_section
And says: "n 64-bit objects (files with e_ident[EI_CLASS] equal to ELFCLASS64), each entry is an array
of 8-byte words in the format of the target processor. In 32-bit objects (files with e_ident[EI_CLASS]
equal to ELFCLASS32), each entry is an array of 4-byte words in the format of the target processor"
Since LLVM uses the first, 32-bit way, this patch follows it.
Differential revision: https://reviews.llvm.org/D68983
We were already going to all of the trouble of computing maximum constant exit counts for each loop exit, we might as well expose them through the API. The change in IndVars is mostly to demonstrate that the wired up code works, but it als very slightly strengthens the transform. The strengthened case is rather narrow though: it requires one exactly analyzeable exit, one imprecisely analyzeable exit (with the upper bound less than the precise one), and one unanalyzeable exit. I coudn't construct a reasonably stable test case.
This does increase the memory usage of the BackedgeTakenCount by a factor of 2 in the worst case.
I also noticed the loop in IndVars is O(#Exits ^ 2). This doesn't change with this patch. A future patch will cache this result inside of SCEV to avoid requering.
This is a first step in figuring out a proper API for maximum (non constant) exit counts. This may evolve a bit as we get experience with the API needs; suggestions very welcome. This patch just tried to provide a framework that we can later add maximum too in a clean and obvious way.
(This time verified locally.)
It was failing with:
llvm/lib/MC/XCOFFObjectWriter.cpp:168:56: error: array must be initialized with a brace-enclosed initializer
std::array<Section *const, 2> Sections = {&Text, &BSS};
^
This reverts commit 32ce14e55e.
In post-commit review, Pavel pointed out that there's a simpler way to
ignore SIGPIPE in lldb that doesn't rely on llvm's handlers.
new functions are compatible before upgrading a function call to an
intrinsic call.
Sometimes users insert calls to ARC runtime functions that are not
compatible with the corresponding intrinsic functions (for example,
'i8* @objc_storeStrong' instead of 'void @objc_storeStrong'). Don't
upgrade those calls.
rdar://problem/56447127
An SUnit can be neither intruction not SDNode. It is all
null if represents a nop. Fixed a crash on using SU->getInstr().
Differential Revision: https://reviews.llvm.org/D69395
It was failing with
llvm/lib/MC/XCOFFObjectWriter.cpp:168:53: error: array must be initialized with a brace-enclosed initializer
std::array<Section *const, 2> Sections{&Text, &BSS};
^
Summary:
Right now we handle each CsectGroup(ProgramCodeCsects, BSSCsects)
individually when assigning indices, writing symbol table, and
writing section raw data. However, there is already a pattern there,
and we could common up those actions for every CsectGroup. This will
make adding new CsectGroup(Read Write data, Read only data, TC/TOC,
mergeable string) easier, and less error prone.
Reviewed by: sfertile, daltenty, DiggerLin
Approved by: daltenty
Differential Revision: https://reviews.llvm.org/D69112
The MVE VADC instruction reads and writes the carry bit at bit 29 of
the FPSCR register. The corresponding ACLE intrinsic is specified to
work with an integer in which the carry bit is stored at bit 0. So if
a user writes a code sequence in C that passes the carry from one VADC
to the next, like this,
s0 = vadcq_u32(a0, b0, &carry);
s1 = vadcq_u32(a1, b1, &carry);
then clang will generate IR for each of those operations that shifts
the carry bit up into bit 29 before the VADC, and after it, shifts it
back down and masks off all but the low bit. But in this situation
what you really wanted was two consecutive VADC instructions, so that
the second one directly reads the value left in FPSCR by the first,
without wasting several instructions on pointlessly clearing the other
flag bits in between.
This commit explains to InstCombine that the other bits of the flags
operand don't matter, and adds a test that demonstrates that all the
code between the two VADC instructions can be optimized away as a
result.
Reviewers: dmgreen, miyuki, ostannard
Subscribers: kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67162
The VST2 and VST4 instructions take two or four vector registers as
input, and store part of each register to memory in an interleaved
pattern. They come in variants indicating which part of each register
they store (VST20 and VST21; VST40 to VST43 inclusive); the intention
is that issuing each of those variants in turn has the combined effect
of loading or storing the whole set of registers to a memory block of
equal size. The corresponding VLD2 and VLD4 instructions load from
memory in the same interleaved format: each one overwrites only part
of its output register set, and again, the idea is that if you use
VLD4{0,1,2,3} or VLD2{0,1} together, you end up having written to the
whole of each register.
I've implemented the stores and loads quite differently. The loads
were easiest to implement as a single intrinsic that expands to all
four VLD4x instructions or both VLD2x, delivering four complete output
registers. (Implementing each individual load as a separate
instruction taking four input registers to partially overwrite is
possible in theory, but pointless, and when I tried it, I found it
would need extra work to get the register allocation not to be
horrible.) Since that intrinsic delivers multiple outputs, it has to
be instruction-selected in custom C++.
But the store instructions are easier to model individually, because
they don't overwrite any register at all and you can write a DAG Isel
pattern in Tablegen for each one.
Hence, my new intrinsic `int_arm_mve_vld4q` expands to four load
instructions, delivers four full output vectors, and is handled by C++
code, whereas `int_arm_mve_vst4q` expands to just one store
instruction, takes four input vectors and a constant indicating which
lanes to store, and is handled entirely in Tablegen. (And similarly
for vld2q/vst2q.) This is asymmetric, but it was the easiest way to do
each one.
Reviewers: dmgreen, miyuki, ostannard
Subscribers: kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68700
This adds some initial example IR intrinsics for MVE instructions that
deliver multiple output values, and hence, have to be instruction-
selected by custom C++ code instead of Tablegen patterns.
I've added the writeback gather load instructions (taking a vector of
base addresses and a single common offset, returning a vector of
loaded values and an updated vector of base addresses); one example
from the long shift family (taking and returning a 64-bit value in two
GPRs); and the VADC instruction (which propagates a carry bit from
each vector-lane addition to the next, taking an input carry flag in
FPSCR and outputting the final one in FPSCR as well).
To support the VPT-predicated forms of these instructions, I've
written some helper functions to add the cluster of MVE predicate
operands to the end of a MachineInstr. `AddMVEPredicateToOps` is used
when the instruction actually is predicated (so it takes a predicate
mask argument), and `AddEmptyMVEPredicateToOps` is for when the
instruction is unpredicated (so it fills in $noreg for the mask). Each
one comes in a form suitable for `vpred_n`, and one for `vpred_r`
which takes the extra 'inactive' parameter.
For VADC, the representation of the carry flag in the IR intrinsic is
a word intended to be moved directly to and from `FPSCR_nzcvqc`, i.e.
with the carry flag in bit 29 of the word. (The user-facing ACLE
intrinsic will want it to be in bit 0, but I'll do that on the clang
side.)
Reviewers: dmgreen, miyuki, ostannard
Subscribers: kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68699
This commit, together with the next few, will add a representative
sample of the kind of IR intrinsics that we'll need in order to
implement the user-facing ACLE intrinsics for MVE. Supporting all of
them will take more work; the intention of this initial series of
commits is to implement an intrinsic or two from lots of different
categories, as examples and proofs of concept.
This initial commit introduces a small number of IR intrinsics for
instructions simple enough that they can use Tablegen ISel patterns:
the predicated versions of the VADD and VSUB instructions (both
integer and FP), VMIN and VMAX, and the float->half VCVT instruction
(predicated and unpredicated).
When using VPT-predicated instructions in automatic code generation,
it will be convenient to specify the predicate value as a vector of
the appropriate number of i1. To make it easy to specify all sizes of
an instruction in one go and give each one the matching predicate
vector type, I've added a system of Tablegen informational records
describing MVE's vector types: each one gives the underlying LLVM IR
ValueType (which may not be the same if the MVE vector is of
explicitly signed or unsigned integers) and an appropriate vNi1 to use
as the predicate vector.
(Also, those info records include the usual encoding for the types, so
that as we add associations between each instruction encoding and one
of the new `MVEVectorVTInfo` records, we can remove some of the
existing template parameters and replace them with references to the
vector type info's fields.)
The user-facing ACLE intrinsics will receive a predicate mask as a
16-bit integer, so I've also provided a pair of intrinsics i2v and
v2i, to convert between an integer and a vector of i1 by just changing
the register class.
Reviewers: dmgreen, miyuki, ostannard
Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67158
selectImpl is able to select G_FABS when we set bank for vector
operands to fprb. Add detailed tests.
Note: G_FABS is generated from llvm-ir intrinsics llvm.fabs.*,
and at the moment MIPS is not able to generate this intrinsic for
vector type (some targets generate vector llvm.fabs.* from calls
to a builtin function).
We can handle fabs using __builtin_msa_fmax_a_<format> and passing
same vector as both arguments. __builtin_msa_fmax_a_<format> will
be directly selected into FMAX_A_<format> in legalizeIntrinsic.
Differential Revision: https://reviews.llvm.org/D69346
Select vector G_FADD, G_FSUB, G_FMUL and G_FDIV for MIPS32 with MSA. We
have to set bank for vector operands to fprb and selectImpl will do the
rest. __builtin_msa_fadd_<format>, __builtin_msa_fsub_<format>,
__builtin_msa_fmul_<format> and __builtin_msa_fdiv_<format> will be
transformed into G_FADD, G_FSUB, G_FMUL and G_FDIV in legalizeIntrinsic
respectively and selected in the same way.
Differential Revision: https://reviews.llvm.org/D69340
Select vector G_SDIV, G_SREM, G_UDIV and G_UREM for MIPS32 with MSA. We
have to set bank for vector operands to fprb and selectImpl will do the
rest. __builtin_msa_div_s_<format>, __builtin_msa_mod_s_<format>,
__builtin_msa_div_u_<format> and __builtin_msa_mod_u_<format> will be
transformed into G_SDIV, G_SREM, G_UDIV and G_UREM in legalizeIntrinsic
respectively and selected in the same way.
Differential Revision: https://reviews.llvm.org/D69333
Potentially sgpr to sgpr copy should also be possible.
That is however trickier because we may end up with a
wrong register class at use because of xm0/xexec permutations.
Differential Revision: https://reviews.llvm.org/D69280
This broke various Windows builds, see comments on the Phabricator
review.
This also reverts the follow-up 20bf0cf.
> Summary:
> This fold, helps recover from the rest of the D62266 ARM regressions.
> https://rise4fun.com/Alive/TvpC
>
> Note that while the fold is quite flexible, i've restricted it
> to the single interesting pattern at the moment.
>
> Reviewers: efriedma, craig.topper, spatel, RKSimon, deadalnix
>
> Reviewed By: deadalnix
>
> Subscribers: javed.absar, kristof.beyls, llvm-commits
>
> Tags: #llvm
>
> Differential Revision: https://reviews.llvm.org/D62450
This roughly mimics `std::thread(...).detach()` except it allows to
customize the stack size. Required for https://reviews.llvm.org/D50993.
I've decided against reusing the existing `llvm_execute_on_thread` because
it's not obvious what to do with the ownership of the passed
function/arguments:
1. If we pass possibly owning functions data to `llvm_execute_on_thread`,
we'll lose the ability to pass small non-owning non-allocating functions
for the joining case (as it's used now). Is it important enough?
2. If we use the non-owning interface in the new use case, we'll force
clients to transfer ownership to the spawned thread manually, but
similar code would still have to exist inside
`llvm_execute_on_thread(_async)` anyway (as we can't just pass the same
non-owning pointer to pthreads and Windows implementations, and would be
forced to wrap it in some structure, and deal with its ownership.
Patch by Dmitry Kozhevnikov!
Differential Revision: https://reviews.llvm.org/D51103
MipsMCAsmInfo was using '$' prefix for Mips32 and '.L' for Mips64
regardless of -target-abi option. By passing MCTargetOptions to MCAsmInfo
we can find out Mips ABI and pick appropriate prefix.
Tags: #llvm, #clang, #lldb
Differential Revision: https://reviews.llvm.org/D66795
Summary:
The default implementation of the describeLoadedValue() hook uses the
MoveImm property to determine if an instruction moves an immediate. If
an instruction has that property the function returns the second
operand, assuming that that is the immediate value the instruction
moves. As far as I can tell, the MoveImm property does not imply that
the second operand is the immediate value, nor that any other operand
necessarily holds the immediate value; it just means that the
instruction moves some immediate value.
One example where the second operand is not the immediate is SystemZ's
LZER instruction, which moves a zero immediate implicitly: $f0S = LZER.
That case triggered an out-of-bound assertion when getting the operand.
I have added a test case for that instruction.
Another example is ARM's MVN instruction, which holds the logical
bitwise NOT'd value of the immediate that is moved. For the following
reproducer:
extern void foo(int);
int main() { foo(-11); }
an incorrect call site value would be emitted:
$ clang --target=arm foo.c -O1 -g -Xclang -femit-debug-entry-values \
-c -o - | ./build/bin/llvm-dwarfdump - | \
grep -A2 call_site_parameter
0x00000058: DW_TAG_GNU_call_site_parameter
DW_AT_location (DW_OP_reg0 R0)
DW_AT_GNU_call_site_value (DW_OP_lit10)
Another example is the A2_combineii instruction on Hexagon which moves
two immediates to a super-register: $d0 = A2_combineii 20, 10.
Perhaps these are rare exceptions, and most MoveImm instructions hold
the immediate in the second operand, but in my opinion the default
implementation of the hook should only describe values that it can, by
some contract, guarantee are safe to describe, rather than leaving it up
to the targets to override the exceptions, as that can silently result
in incorrect call site values.
This patch adds X86's relevant move immediate instructions to the
target's hook implementation, so this commit should be a NFC for that
target. We need to do the same for ARM and AArch64.
Reviewers: djtodoro, NikolaPrica, aprantl, vsk
Reviewed By: vsk
Subscribers: kristof.beyls, hiraditya, llvm-commits
Tags: #debug-info, #llvm
Differential Revision: https://reviews.llvm.org/D69109
Select vector G_MUL for MIPS32 with MSA. We have to set bank
for vector operands to fprb and selectImpl will do the rest.
Manual selection of G_MUL is now done for gprb only.
__builtin_msa_mulv_<format> will be transformed into G_MUL
in legalizeIntrinsic and selected in the same way.
Differential Revision: https://reviews.llvm.org/D69310
Select vector G_SUB for MIPS32 with MSA. We have to set bank
for vector operands to fprb and selectImpl will do the rest.
__builtin_msa_subv_<format> will be transformed into G_SUB
in legalizeIntrinsic and selected in the same way.
__builtin_msa_subvi_<format> will be directly selected into
SUBVI_<format> in legalizeIntrinsic.
Differential Revision: https://reviews.llvm.org/D69306
We should do the fold only if both constants are plain,
non-opaque constants, at least that is the DAG.FoldConstantArithmetic()
requirement.
And if the constant we are comparing with is zero - we shouldn't be
trying to do this fold in the first place.
Fixes https://bugs.llvm.org/show_bug.cgi?id=43769
This adds support for reserving GPRs such that the compiler will not
choose a register for register allocation. The implementation follows
the same design as for AArch64; each reserved register becomes a target
feature and used for getting the reserved registers for a given
MachineFunction. The backend checks that it does not need to write to
any reserved register; if it does a relevant error is generated.
Differential Revision: https://reviews.llvm.org/D67185
Summary:
This fold, helps recover from the rest of the D62266 ARM regressions.
https://rise4fun.com/Alive/TvpC
Note that while the fold is quite flexible, i've restricted it
to the single interesting pattern at the moment.
Reviewers: efriedma, craig.topper, spatel, RKSimon, deadalnix
Reviewed By: deadalnix
Subscribers: javed.absar, kristof.beyls, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62450
This adds an instcombine matcher for code that attempts to perform signed
saturating arithmetic by casting to a higher type. Unsigned cases are already
matched, this adds extra matches for the more complex signed cases, which
involves matching the min(max(add a b)) nodes with proper extends to ensure
legality.
Differential Revision: https://reviews.llvm.org/D68651
llvm-svn: 375505
MachineRegisterInfo::createGenericVirtualRegister sets
RegClassOrRegBank to static_cast<RegisterBank *>(nullptr).
MIParser on the other hand doesn't. When we attempt to constrain
Register Class on such VReg, additional COPY is generated.
This way we avoid COPY instructions showing in test that have MIR
input while they are not present with llvm-ir input that was used
to create given MIR for a -run-pass test.
Differential Revision: https://reviews.llvm.org/D68946
llvm-svn: 375502
Select vector G_ADD for MIPS32 with MSA. We have to set bank
for vector operands to fprb and selectImpl will do the rest.
__builtin_msa_addv_<format> will be transformed into G_ADD
in legalizeIntrinsic and selected in the same way.
__builtin_msa_addvi_<format> will be directly selected into
ADDVI_<format> in legalizeIntrinsic. MIR tests for it have
unnecessary additional copies. Capture current state of tests
with run-pass=legalizer with a test in test/CodeGen/MIR/Mips.
Differential Revision: https://reviews.llvm.org/D68984
llvm-svn: 375501
This re-commits r375152 which was pulled in r375233 because it broke
the EXPENSIVE_CHECKS bot on Windows.
The reason for the failure was a bug in the pass that the commit turned
on by default. This patch fixes that bug and turns the pass back on.
This patch has been verified on the buildbot that originally failed
thanks to Simon Pilgrim.
Differential revision: https://reviews.llvm.org/D52431
llvm-svn: 375497
Currently injected-sources-native.test fails with "Expected<T>
value was in success state.
(Note: Expected<T> values in success mode must still be checked
prior to being destroyed)"
when llvm is compiled with LLVM_ENABLE_ABI_BREAKING_CHECKS in Release.
The problem is that getStringForID returns Expected<StringRef>
and Expected value must always be checked, even if it is in success state.
Checking with assert only helps in Debug and is wrong.
Differential revision: https://reviews.llvm.org/D69251
llvm-svn: 375492
Teach the CombinerHelper how to turn shuffle_vectors, that
concatenate vectors, into concat_vectors and add this combine
to the AArch64 pre-legalizer combiner.
Differential Revision: https://reviews.llvm.org/D69149
llvm-svn: 375452
Only handle simple inter-block redefs of m0 to the same value. This
avoids interference from redefs of m0 in SILoadStoreOptimzer. I was
initially teaching that pass to ignore redefs of m0, but having them
not exist beforehand is much simpler.
This is in preparation for deleting the current special m0 handling in
SIFixSGPRCopies to allow the register coalescer to handle the
difficult cases.
llvm-svn: 375449
r375293 removed the SGPR spilling with scalar stores path, so this is
no longer necessary. This also always had the defect of adding the def
even when this path wasn't in use.
llvm-svn: 375448
If a PHI defines AGPR legalize its operands to AGPR.
At the moment we can get an AGPR PHI with VGPR operands.
I am not aware of any problems as it seems to be handled
gracefully in RA, but this is not right anyway.
It also slightly decreases VGPR pressure in some cases
because we do not have to a copy via VGPR.
Differential Revision: https://reviews.llvm.org/D69206
llvm-svn: 375446
Summary:
Reduce include dependencies by no longer including Pass.h from
DataLayout.h. That include seemed irrelevant to DataLayout, as
well as being irrelevant to several users of DataLayout.
Reviewers: rnk
Reviewed By: rnk
Subscribers: mehdi_amini, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D69261
llvm-svn: 375436
The static analyzer is warning about a potential null dereference, but we should be able to use cast<> directly and if not assert will fire for us.
llvm-svn: 375430
The static analyzer is warning about a potential null dereference, but we should be able to use cast<> directly and if not assert will fire for us.
llvm-svn: 375429