The machine scheduler currently biases register copies to/from
physical registers to be closer to their point of use / def to
minimize their live ranges. This change extends this to also physical
register assignments from immediate values.
This causes a reduction in reduction in overall register pressure and
minor reduction in spills and indirectly fixes an out-of-registers
assertion (PR39391).
Most test changes are from minor instruction reorderings and register
name selection changes and direct consequences of that.
Reviewers: MatzeB, qcolombet, myatsina, pcc
Subscribers: nemanjai, jvesely, nhaehnle, eraman, hiraditya,
javed.absar, arphaman, jfb, jsji, llvm-commits
Differential Revision: https://reviews.llvm.org/D54218
llvm-svn: 346894
Narrower vectors will be widened to 128 bits without changing the element size. And generic type legalization can already handle widening mulhu/mulhs.
Differential Revision: https://reviews.llvm.org/D54513
llvm-svn: 346879
Add support for the expansion of funnelshift/rotates to getIntrinsicInstrCost.
This also required us to move the X86 fshl/fshr costs to the same place as the rotates to avoid expansion and get correct scalarization vs vectorization costs.
llvm-svn: 346854
This patch removes the last use of the constant pool shuffle decode helper and consistently uses the 'getTargetShuffleMaskIndices' versions instead. The constant pool versions are now purely used for assembly comments.
The avx512vbmi intrinsic upgrades had to be altered as they were being decoded as broadcasts, similar to what I fixed in rL346032. I don't think the change is critical - although its annoying that we lose the {k}{z} instruction test coverage as they are tricky to generate....
Differential Revision: https://reviews.llvm.org/D54083
llvm-svn: 346850
Summary:
This adds support for the 'event section' specified in the exception
handling proposal. (This was named 'exception section' first, but later
renamed to 'event section' to take possibilities of other kinds of
events into consideration. But currently we only store exception info in
this section.)
The event section is added between the global section and the export
section. This is for ease of validation per request of the V8 team.
This patch:
- Creates the event symbol type, which is a weak symbol
- Makes 'throw' instruction take the event symbol '__cpp_exception'
- Adds relocation support for events
- Adds WasmObjectWriter / WasmObjectFile (Reader) support
- Adds obj2yaml / yaml2obj support
- Adds '.eventtype' printing support
Reviewers: dschuff, sbc100, aardappel
Subscribers: jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D54096
llvm-svn: 346825
To make ISD::VSELECT available(legal) so long as there are altivec instruction, otherwise it's default behavior is expanding,
which is legalized at type-legalization phase. Use xxsel to match vselect if vsx is open, or use vsel.
Differential Revision: https://reviews.llvm.org/D49531
llvm-svn: 346824
If we keep track of if the ContainsCalls bit is set in the MBB flags for each
candidate, then we have a better chance of not checking the candidate for calls
at all.
This saves quite a few checks in some CTMark tests (~200 in Bullet, for
example.)
llvm-svn: 346816
We already determine a bunch of information about an MBB in
getMachineOutlinerMBBFlags. We can reuse that information to avoid calculating
things that must be false/true.
The first thing we can easily check is if an outlined sequence could ever
contain calls. There's no reason to walk over the outlined range, checking for
calls, if we already know that there are no calls in the block containing the
sequence.
llvm-svn: 346809
Since we never outline anything with fewer than 2 occurrences, there's no
reason to compute cost model information if there's less than that.
llvm-svn: 346803
An extractelement with non-constant index will be lowered either to
scratch or movrel loop in most cases. This patch converts such
instruction into a set of selects if vector size is not too big.
Differential Revision: https://reviews.llvm.org/D54351
llvm-svn: 346800
Previously, the extend_vector_inreg opcode required their input register to be the same total width as their output. But this doesn't match up with how the X86 instructions are defined. For X86 the input just needs to be a legal type with at least enough elements to cover the output.
This patch weakens the check on these nodes and allows them to be used as long as they have more input elements than output elements. I haven't changed type legalization behavior so it will still create them with matching input and output sizes.
X86 will custom legalize these nodes by shrinking the input to be a 128 bit vector and once we've done that we treat them as legal operations. We still have one case during type legalization where we must custom handle v64i8 on avx512f targets without avx512bw where v64i8 isn't a legal type. In this case we will custom type legalize to a *extend_vector_inreg with a v16i8 input. After that the input is a legal type so type legalization should ignore the node and doesn't need to know about the relaxed restriction. We are no longer allowed to use the default expansion for these nodes during vector op legalization since the default expansion uses a shuffle which required the widths to match. Custom legalization for all types will prevent us from reaching the default expansion code.
I believe DAG combine works correctly with the released restriction because it doesn't check the number of input elements.
The rest of the patch is changing X86 to use either the vector_inreg nodes or the regular zero_extend/sign_extend nodes. I had to add additional isel patterns to handle any_extend during isel since simplifydemandedbits can create them at any time so we can't legalize to zero_extend before isel. We don't yet create any_extend_vector_inreg in simplifydemandedbits.
Differential Revision: https://reviews.llvm.org/D54346
llvm-svn: 346784
Summary:
Without this change I get the following error:
lib/Target/AVR/AVRGenAsmMatcher.inc:1135:1: error: redundant #include of module 'LLVM_Utils.Support.Format' appears within namespace 'llvm' [-Wmodules-import-nested-redundant]
Reviewers: dylanmckay
Reviewed By: dylanmckay
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D53425
llvm-svn: 346750
If a loaded value is replicated it is best to combine these two operations
into a VLREP (load and replicate), but isel will not produce this if the load
has other users as well.
This patch handles this by putting the other users of the load to use the
REPLICATE 0-element instead of the load. This way the load has only the
REPLICATE node as user, and we get a VLREP.
Review: Ulrich Weigand
https://reviews.llvm.org/D54264
llvm-svn: 346746
Turns out it's way simpler to do this check with one LRU. Instead of
maintaining two, just keep one. Check if each of the registers is available,
and then check if it's a live out from the block. If it's a live out, but
available in the block, we know we're in an unsafe case.
llvm-svn: 346721
Instead of returning Flags, return true if the MBB is safe to outline from.
This lets us check for unsafe situations, like say, in AArch64, X17 is live
across a MBB without being defined in that MBB. In that case, there's no point
in performing an instruction mapping.
llvm-svn: 346718
This patch adds the ability to use a PALIGNR to rotate a pair of inputs to select a range containing all the referenced elements, followed by a single input permute to put them in the right location.
Differential Revision: https://reviews.llvm.org/D54267
llvm-svn: 346706
Summary:
This is to replace the ELFAsmParser that WebAssembly was using, which
so far was a stub that didn't do anything, and couldn't work correctly
with wasm.
This new class is there to implement generic directives related to
wasm as a binary format. Wasm target specific directives are still
parsed in WebAssemblyAsmParser as before. The two classes now
cooperate more correctly too.
Also implemented .result which was missing. Any unknown directives
will now result in errors.
Reviewers: dschuff, sbc100
Subscribers: mgorny, jgravelle-google, eraman, aheejin, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D54360
llvm-svn: 346700
Truncate and shuffle lowering are already capable of matching to PACKUS using known bits analysis.
This features one test change where we now prefer to extend v16i16->v16i32 then trunc v16i32->v16i8 over extract_subvector+packus when avx512f is available, but avx512bw is not.
llvm-svn: 346697
Sometimes after basic block placement we end up with a code like:
sreg = s_mov_b64 -1
vcc = s_and_b64 exec, sreg
s_cbranch_vccz
This happens as a join of a block assigning -1 to a saved mask and
another block which consumes that saved mask with s_and_b64 and a
branch.
This is essentially a single s_cbranch_execz instruction when moved
into a single new basic block.
Differential Revision: https://reviews.llvm.org/D54164
llvm-svn: 346690
When we repeat the 2 shifting operands then this is a bit rotation - annoyingly this has to be done in the other getIntrinsicInstrCost than most intrinsics as we need to check the operands are the same.
llvm-svn: 346688
Improve getCastInstrCost() by respecting the different types of Src and Dst
for vector integer <-> fp conversions.
This means that extracting from integer becomes more expensive (by the
extraction penalty), and the extraction from fp becomes cheaper (no longer
has a false extraction penalty).
Review: Ulrich Weigand
https://reviews.llvm.org/D54423
llvm-svn: 346663
This extends the .option support from D45864 to enable/disable the relax
feature flag from D44886
During parsing of the relax/norelax directives, the RISCV::FeatureRelax
feature bits of the SubtargetInfo stored in the AsmParser are updated
appropriately to reflect whether relaxation is currently enabled in the
parser. When an instruction is parsed, the parser checks if relaxation is
currently enabled and if so, gets a handle to the AsmBackend and sets the
ForceRelocs flag. The AsmBackend uses a combination of the original
RISCV::FeatureRelax feature bits set by e.g -mattr=+/-relax and the
ForceRelocs flag to determine whether to emit relocations for symbol and
branch diffs. Diff relocations should therefore only not be emitted if the
relax flag was not set on the command line and no instruction was ever parsed
in a section with relaxation enabled to ensure correct diffs are emitted.
Differential Revision: https://reviews.llvm.org/D46423
Patch by Lewis Revill.
llvm-svn: 346655
Iterate over all elements and count the number of uses among them for each
used load. Then make sure to REPLICATE the load which has the most uses in
order to minimize the number of needed element insertions.
Review: Ulrich Weigand
https://reviews.llvm.org/D54322
llvm-svn: 346637
getConstant will create a BUILD_VECTOR for us and use a legal type if necessary. So just create the simple node and let BUILD_VECTOR legalization do the canonicalization.
llvm-svn: 346603
This is a long-awaited follow-up suggested in D33578. Since then, we've picked up even more
opportunities for vector narrowing from changes like D53784, so there are a lot of test diffs.
Apart from 2-3 strange cases, these are all wins.
I've structured this to be no-functional-change-intended for any target except for x86
because I couldn't tell if AArch64, ARM, and AMDGPU would improve or not. All of those
targets have existing regression tests (4, 4, 10 files respectively) that would be
affected. Also, Hexagon overrides the shouldReduceLoadWidth() hook, but doesn't show
any regression test diffs. The trade-off is deciding if an extra vector load is better
than a single wide load + extract_subvector.
For x86, this is almost always better (on paper at least) because we often can fold
loads into subsequent ops and not increase the official instruction count. There's also
some unknown -- but potentially large -- benefit from using narrower vector ops if wide
ops are implemented with multiple uops and/or frequency throttling is avoided.
Differential Revision: https://reviews.llvm.org/D54073
llvm-svn: 346595
There are two AGU units, and per 1cy, there can be either two loads,
or a load and a store; but not two stores, or two loads and a store.
Additionally, loads shouldn't affect the store scheduler and vice versa.
(but *should* affect the PdEX scheduler.)
Required rL346545.
Fixes https://bugs.llvm.org/show_bug.cgi?id=39465
llvm-svn: 346587
The sdivrem will emit its own MOVSX to move %ah to the low byte of a register. By using a MOVSX for an any_extend this allows a post-isel peephole to merge them.
llvm-svn: 346581
This gives shuffle lowering the freedom to use zero_extend_vector_inreg for the unpckl shuffle. Shuffle combining usually makes this swap later, but not when AVX512 is enabled it seems.
While there also use DAG.getConstant to create a 0 vector instead of using the helper the forces a specific BUILD_VECTOR. I don't think that helper is usually needed. We're basically free to create a constant build_vector anytime and it will be legalized on its own.
llvm-svn: 346574
This patch adds support for funclets in frame lowering and ISel
lowering. Together with D50288 and D50166, it enables C++ exception
handling.
Patch by Sanjin Sijaric, with some fixes by me.
Differential Revision: https://reviews.llvm.org/D51524
llvm-svn: 346568
LDRcp should be deleted when the dest register is dead in register
coalescing. Without MemOp, dead LDRcp will cause dead constant pool
value which references to non-existing label.
Patch by Yin Ma.
Differential Revision: https://reviews.llvm.org/D54173
llvm-svn: 346563
With avx512f but not avx512bw we need to extend to v16i32 then truncate that to to v16i8. Previously we emitted both nodes during lowering, but I'm trying to switch to using target independent nodes and with that switched the extend+truncate wou
This patch changes the implementation to what will be necessary with that patch which helps minimize test diffs.
llvm-svn: 346552
This makes X86ISD::VSEXT more similar to ISD::SIGN_EXTEND and ISD::ZERO_EXTEND.
I'm hoping to replace X86ISD::VSEXT/VZEXT with target independent nodes. Making the target specific nodes similar to the target independent nodes helps minimize test diffs in that patch.
llvm-svn: 346539
Eliminate the stack frame in functions with the noreturn nounwind
attributes, and when the noreturn-stack-elim target feature is
enabled. This reduces the code and stack space needed for noreturn
functions.
Differential Revision: https://reviews.llvm.org/D54210
llvm-svn: 346532
Both -fPIC and -G0 disable placement of globals in small data section,
but if a global has an explicit section assigmnent placing it in small
data, it should go there anyway.
llvm-svn: 346523
Currently in llvm, CalleeSavedInfo can only assign a callee saved register to
stack frame index to be spilled in the prologue. We would like to enable
spilling gprs to vector registers. This patch adds the capability to spill to
other registers aside from just the stack. It also adds the changes for power9
to spill gprs to volatile vector registers when they are available.
This happens only for leaf functions when using the option
-ppc-enable-pe-vector-spills.
Differential Revision: https://reviews.llvm.org/D39386
llvm-svn: 346512
This reverts commit r345972. Need to update the description + possibly
to update the patch itself after discussion with Eric Christofer.
llvm-svn: 346508
A minor improvement of buildVector() that skips creating an
INSERT_VECTOR_ELT for a Value which has already been used for the
REPLICATE.
Review: Ulrich Weigand
https://reviews.llvm.org/D54315
llvm-svn: 346504
Now that we have mixed type sizes, i1 values need to be explicitly
handled as we want to avoid promoting these values.
Differential Revision: https://reviews.llvm.org/D54308
llvm-svn: 346499
I noticed that we weren't generating broadcasts as much I thought we would with
D54271, and this is part of the problem.
Widening the shuffle elements means adding bitcasts and hiding the relationship
between a splatted scalar and the vector. If we can form a broadcast, do that
before going through the rest of the shuffle lowering because broadcasts should
be cheap and can often be load-folded.
Differential Revision: https://reviews.llvm.org/D54280
llvm-svn: 346498
Summary:
This simplifies the code and moves everything to tablegen for consistency. This
also prepares the ground for adding issue counters.
Reviewers: gchatelet, john.brawn, jsji
Subscribers: nemanjai, mgorny, javed.absar, kbarton, tschuett, llvm-commits
Differential Revision: https://reviews.llvm.org/D54297
llvm-svn: 346489
Previously, during the search, all values had to have the same
'TypeSize', which is equal to number of bits of the integer type of
the icmp operand. All values in the tree had to match this size;
meaning that, if we searched from i16, we wouldn't accept i8s. A
change in type size requires zext and truncs to perform the casts so,
to allow mixed narrow types, the handling of these instructions is
now slightly different:
- we allow casts if their result or operand is <= TypeSize.
- zexts are sinks if their result > TypeSize.
- truncs are still sinks if their operand == TypeSize.
- truncs are still sources if their result == TypeSize.
The transformation bails on finding an icmp that operates on data
smaller than the current TypeSize.
Differential Revision: https://reviews.llvm.org/D54108
llvm-svn: 346480
A few code movement things:
- AreSymmetrical is now a method of BinOpChain.
- Created a lambda in CreateParallelMACPairs to reduce loop nesting.
- A Reduction object now gets pasted in a couple of places instead,
including CreateParallelMACPairs so it doesn't need to return a
value.
I've also added RecordSequentialLoads, which is run before the
transformation begins, and caches the interesting loads. This can then
be queried later instead of cross checking many load values.
Differential Revision: https://reviews.llvm.org/D54254
llvm-svn: 346479
Summary:
Reorders the sections in the SIMD tablegen file to roughly match the
new opcode ordering. Depends on D54126.
Reviewers: aheejin, dschuff
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D54134
llvm-svn: 346464
Summary:
The pass incorrectly assumed if there's a longjmp declaration in the
module, there is also a setjmp function declaration. Fixed it, and now
the pass only converts longjmp and does not do any other transformation
when there's no setjmp declaration in the module.
Fixes PR39562.
Reviewers: jgravelle-google, sbc100
Subscribers: dschuff, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D54273
llvm-svn: 346445
As discussed in D54073, we have a potential regression from more aggressive vector narrowing here, so let's try to avoid that by changing build-vector lowering slightly.
Insert-vector-element lowering always does this since there's no "pinsr" for ymm/zmm:
// If the vector is wider than 128 bits, extract the 128-bit subvector, insert
// into that, and then insert the subvector back into the result.
...but we can sometimes do better for insert-into-constant-vector by using shuffle lowering.
Differential Revision: https://reviews.llvm.org/D54271
llvm-svn: 346433
It was discovered in randomized testing that the SystemZ implementation of
shouldCoalesce() could be caused to crash when subreg liveness was
enabled. This was because an undef use of the virtual register was copied
outside current MBB at the point of shouldCoalesce() being called. For more
details, see https://bugs.llvm.org/show_bug.cgi?id=39276.
This patch changes the check for MBB locality from livein/liveout checks to
do checks for all instructions of both intervals being inside MBB. This
avoids the cases with dead defs / undef uses outside MBB, which are not
affecting liveness in/out of MBB.
The original test case included as a reduced .mir test case.
Review: Ulrich Weigand
https://reviews.llvm.org/D54197
llvm-svn: 346406
Generalize code in Thumb2InstrInfo::storeRegToStackSlot() and
loadRegToStackSlot() to allow the GPR class or any of its sub-classes
(including hGPR) to be stored/loaded by ARM::t2STRi12/ARM::t2LDRi12.
Differential Revision: https://reviews.llvm.org/D51927
llvm-svn: 346401
Promote alloca can vectorize a small array by bitcasting it to a
vector type. Extend vectorization for the case when alloca is
already a vector type. We still want to replace GEPs with an
insert/extract element instructions in this case.
Differential Revision: https://reviews.llvm.org/D54219
llvm-svn: 346376
Summary:
This change implements assembler parser, code emitter, ELF object writer
and disassembler for the MSP430 ISA. Also, more instruction forms are added
to the target description.
Reviewers: asl
Reviewed By: asl
Subscribers: pftbest, krisb, mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D53661
llvm-svn: 346374
In this context, usesWindowsCFI() is basically the same thing as
isOSWindows(), but it makes the relevant property of the target
more explicit.
llvm-svn: 346366
Summary:
This is not needed, because we don't actually insert relevant branches
for KILLs that late in the compilation flow.
Besides, this was always checking for the wrong kill opcode anyway...
Reviewers: msearles, rampitec, scott.linder, kanarayan
Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D54085
llvm-svn: 346362
Like the comment says, this isn't the most efficient fix in terms of
codesize, but it works.
Differential Revision: https://reviews.llvm.org/D54129
llvm-svn: 346358
The lowering was missing live-ins in certain cases, like a sequence of
multiple tMOVCCr_pseudo instructions. This would lead to a verifier
failure, and on pre-v6 Thumb CPSR would be incorrectly clobbered.
For reasons I don't completely understand, it's hard to get a sequence
of multiple tMOVCCr_pseudo instructions; the issue only seems to show up
with 64-bit comparisons where the result is zero-extended. I added some
extra testcases in case that changes in the future. Probably some
optimization opportunities here if anyone is interested. (@test_slt_not
is the case that was getting miscompiled.)
The code to check the liveness of CPSR was stolen from
X86ISelLowering.cpp; maybe it could be refactored into common helper,
but I have no idea where to put it.
Differential Revision: https://reviews.llvm.org/D54192
llvm-svn: 346355
This allows testing AMDGPU alias analysis like any
other alias analysis pass. This fixes the existing
test pointlessly running opt -O3 when it really
just wants to run the one analysis.
Before there was no way to test this using -aa-eval
with opt, since the default constructed pass
is run. The wrapper subclass allows the
default constructor to pass the necessary callback.
llvm-svn: 346353
Summary:
The conditional branch created to support -fsplit-stack for X86 is
left unbiased/unhinted, resulting in less than ideal block placement:
the __morestack call block is kept on the main hot path. Bias the
branch to insure that the stack allocation block is treated as a
"cold" block during machine basic block placement.
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D54123
llvm-svn: 346336
Set operands order for G_MERGE_VALUES and G_UNMERGE_VALUES so
that least significant bits always go first, regardless of endianness.
Differential Revision: https://reviews.llvm.org/D54098
llvm-svn: 346305
Add this option for debugging and providing workaround.
By default it is off so no behavior change in backend.
Differential Revision: https://reviews.llvm.org/D54158
llvm-svn: 346267
Change the type in a couple of lists and sets that only store physical
registers from unsigned to MCPhysRegs. The later is only 16bits and
saves us a bit of memory.
llvm-svn: 346254
The `sigrie` instruction signals a Reserved Instruction Exception.
This patch adds support for assembling / disassembling the instruction.
Differential Revision: http://reviews.llvm.org/D53861
llvm-svn: 346230
Cleanup CCMP pattern matching code in preparation for review/bugfix:
- Rename `isConjunctionDisjunctionTree()` to `canEmitConjunction()`
(it won't accept arbitrary disjunctions and is really about whether we
can transform the subtree into a conjunction that we can emit).
- Rename `emitConjunctionDisjunctionTree()` to `emitConjunction()`
llvm-svn: 346203
This reverts rL345880. It caused some test failures on the
webassembly waterfall. e.g. binaryen2.test_mainenv fails due
the fact that `envp` ends up being undef rather than 0.
Differential Revision: https://reviews.llvm.org/D54117
llvm-svn: 346187
MachineFunction can only be used in code using lib/CodeGen, hence we
can keep a more specific reference to LLVMTargetMachine rather than just
TargetMachine around.
Do the same for references in ScheduleDAG and RegUsageInfoCollector.
llvm-svn: 346183
The main caller of this already has an MVT and several targets called getSimpleVT inside without checking isSimple. This makes the simpleness explicit.
llvm-svn: 346180
SimplifyDemandedBits can turn a sign_extend back into an any_extend and trigger an infinite loop. So instead legalize it the same way as a sign_extend, but preserve the opcode. Then just pattern match it the same as sign_extend during isel.
I don't have a reduced test case for such an infinite loop yet.
llvm-svn: 346170
On Power9, we don't have patterns to select the following intrinsics:
llvm.ppc.vsx.stxvw4x.be
llvm.ppc.vsx.stxvd2x.be
This patch adds support for these.
Differential Revision: https://reviews.llvm.org/D53581
llvm-svn: 346148
Expand on LONG_BRANCH_LUi and LONG_BRANCH_(D)ADDiu pseudo
instructions by creating variants which support
less operands/accept GPR64Opnds as their operand in order
to appease the machine verifier pass.
Differential Revision: https://reviews.llvm.org/D53977
llvm-svn: 346133
The new atomic optimizer I previously added in D51969 did not work
correctly when a pixel shader was using derivatives, and had helper
lanes active.
To fix this we add an llvm.amdgcn.ps.live call that guards a branch
around the entire atomic operation - ensuring that all helper lanes are
inactive within the wavefront when we compute our atomic results.
I've added a test case that can cause derivatives, and exposes the
problem.
Differential Revision: https://reviews.llvm.org/D53930
llvm-svn: 346128
Turn the assert in PrepareConstants into a conditon so that we can
handle mul instructions with negative immediates.
Differential Revision: https://reviews.llvm.org/D54094
llvm-svn: 346126
r345840 slightly changed the way promotion happens which could
result in zext and truncs having the same source and destination
types. This fixes that issue.
We can now also remove the zext and trunc in the following case:
(zext (trunc (promoted op)), i32)
This means that we can no longer treat a value, that is only used by
a sink, to be safe to promote.
I've also added in some extra asserts and replaced a cast for a
dyn_cast.
Differential Revision: https://reviews.llvm.org/D54032
llvm-svn: 346125
This patch fixes a bug in the AVR FRMIDX expansion logic.
The expansion would leave a leftover operand from the original FRMIDX,
but now attached to a MOVWRdRr instruction. The MOVWRdRr instruction
did not expect this operand and so LLVM rejected the machine
instruction.
This would trigger an assertion:
Assertion failed: ((isImpReg || Op.isRegMask() || MCID->isVariadic() ||
OpNo < MCID->getNumOperands() || isMetaDataOp) &&
"Trying to add an operand to a machine instr that is already done!"),
function addOperand, file llvm/lib/CodeGen/MachineInstr.cpp
Tim fixed this so that now the FRMIDX is expanded correctly into
a well-formed MOVWRdRr.
Patch by Tim Neumann
llvm-svn: 346117
v2i8/v2i16/v2i32 are promoted to v2i64. pmuludq takes a v2i64 input and produces a v2i64 output. Since we don't about the upper bits of the type legalized multiply we can use the pmuludq to produce the multiply result for the bits we do care about.
llvm-svn: 346115
This is an AVR-specific workaround for a limitation of the register
allocator that only exposes itself on targets with high register
contention like AVR, which only has three pointer registers.
The three pointer registers are X, Y, and Z.
In most nontrivial functions, Y is reserved for the frame pointer,
as per the calling convention. This leaves X and Z. Some instructions,
such as LPM ("load program memory"), are only defined for the Z
register. Sometimes this just leaves X.
When the backend generates a LDDWRdPtrQ instruction with Z as the
destination pointer, it usually trips up the register allocator
with this error message:
LLVM ERROR: ran out of registers during register allocation
This patch is a hacky workaround. We ban the LDDWRdPtrQ instruction
from ever using the Z register as an operand. This gives the
register allocator a bit more space to allocate, fixing the
regalloc exhaustion error.
Here is a description from the patch author Peter Nimmervoll
As far as I understand the problem occurs when LDDWRdPtrQ uses
the ptrdispregs register class as target register. This should work, but
the allocator can't deal with this for some reason. So from my testing,
it seams like (and I might be totally wrong on this) the allocator reserves
the Z register for the ICALL instruction and then the register class
ptrdispregs only has 1 register left and we can't use Y for source and
destination. Removing the Z register from DREGS fixes the problem but
removing Y register does not.
More information about the bug can be found on the avr-rust issue
tracker at https://github.com/avr-rust/rust/issues/37.
A bug has raised to track the removal of this workaround and a proper
fix; PR39553 at https://bugs.llvm.org/show_bug.cgi?id=39553.
Patch by Peter Nimmervoll
llvm-svn: 346114
Summary: This also enables some constant folding from KnownBits propagation. This helps on some cases vXi64 case in 32-bit mode where constant vectors appear as vXi32 and a bitcast. This can prevent getNode from constant folding sra/shl/srl.
Reviewers: RKSimon, spatel
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D54069
llvm-svn: 346102
These methods were just wrappers around getNode with additional asserts (identical and repeated 3 times). But getNode already has a switch that can be used to hold these asserts that allows them to be shared for all 3 opcodes. This also enables checking on the places that create these nodes without using the wrappers.
The rest of the patch is just changing all callers to use getNode directly.
llvm-svn: 346087
Use MachineFrameInfo's OffsetAdjustment field to pass this information
from the target to CodeViewDebug.cpp. The X86 backend doesn't use it for
any other purpose.
This fixes PR38857 in the case where there is a non-aligned quantity of
CSRs and a non-aligned quantity of locals.
llvm-svn: 346062
The majority of the changes are because the rest of shuffle lowering/combining prefers to replace the undef input with the other operand. Using UNPCKL directly seemed to avoid this and just grabbed a randomish register for the undef which can create false dependencies.
llvm-svn: 346050
Summary:
The assembler was able to assemble and then dump back to .s, but
was failing to parse certain directives necessary for valid .o
output:
- .type directives are now recognized to distinguish function symbols
and others.
- .size is now parsed to provide function size.
- .globaltype (introduced in https://reviews.llvm.org/D54012) is now
recognized to ensure symbols like __stack_pointer have a proper type
set for both .s and .o output.
Also added tests for the above.
Reviewers: sbc100, dschuff
Subscribers: jgravelle-google, aheejin, dexonsmith, kristina, llvm-commits, sunfish
Differential Revision: https://reviews.llvm.org/D53842
llvm-svn: 346047
We already have custom lowering for the AVX case in LegalizeVectorOps. So its better to keep the regular extend op around as long as possible.
I had to qualify one place in DAG combine that created illegal vector extending load operations. This change by itself had no effect on any tests which is why its included here.
I've made a few cleanups to the custom lowering. The sign extend code no longer creates an identity shuffle with undef elements. The zero extend code now emits a zero_extend_vector_inreg instead of an unpckl with a zero vector.
For the high half of the custom lowering of zero_extend/any_extend, we're now using an unpckh with a zero vector or undef. Previously we used used a pshufd to move the upper 64-bits to the lower 64-bits and then used a zero_extend_vector_inreg. I think the zero vector should require less execution resources and be smaller code size.
Differential Revision: https://reviews.llvm.org/D54024
llvm-svn: 346043
A number of intrinsics, such as llvm.sin.f32, would result in a failure to
select. This patch adds expansions for the relevant selection DAG nodes, as
well as exhaustive testing for all f32 and f64 intrinsics.
The codegen for FMA remains a TODO item, pending support for the various
RISC-V FMA instruction variants.
The llvm.minimum.f32.* and llvm.maximum.* tests are commented-out, pending
upstream support for target-independent expansion, as discussed in
http://lists.llvm.org/pipermail/llvm-dev/2018-November/127408.html.
Differential Revision: https://reviews.llvm.org/D54034
Patch by Luís Marques.
llvm-svn: 346034
Summary:
EH stack depth is incremented at `try` and decremented at `catch`. When
there are more than two catch instructions for a try instruction, we
shouldn't count non-first catches when calculating EH stack depths.
This patch fixes two bugs:
- CFGStackify: Exclude `catch_all` in the terminate catch pad when
calculating EH pad stack, because when we have multiple catches for a
try we should count only the first catch instruction when calculating
EH pad stack.
- InstPrinter: The initial intention was also to exclude non-first
catches, but it didn't account nested try-catches, so it failed on
this case:
```
try
try
catch
end
catch <-- (1)
end
```
In the example, when we are at the catch (1), the last seen EH
instruction is not `try` but `end_try`, violating the wrong assumption.
We don't need these after we switch to the second proposal because there
is gonna be only one `catch` instruction. But anyway before then these
bugfixes are necessary for keep trunk in working state.
Reviewers: dschuff
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D53819
llvm-svn: 346029
Let i8/i16 uint/sint to fp conversions cost 1 if operand is a load.
Since the load already does the extension, there is no extra cost (previously
returned 2).
Review: Ulrich Weigand
https://reviews.llvm.org/D54028
llvm-svn: 346009
Model this function more closely after the BasicTTIImpl version, with
separate handling of loads and stores. For loads, the set of actually loaded
vectors is checked.
This makes it more readable and just slightly more accurate generally.
Review: Ulrich Weigand
https://reviews.llvm.org/D53071
llvm-svn: 345998
Small-data (i.e. GP-relative) loads and stores allow 16-bit scaled
offset. For a load of a value of type T, the small-data area is
equivalent to an array "T sdata[65536]". This implies that objects
of smaller sizes need to be closer to the beginning of sdata,
while larger objects may be farther away, or otherwise the offset
may be insufficient to reach it. Similarly, an object of a larger
size should not be accessed via a load of a smaller size.
llvm-svn: 345975
Summary:
If the output of debug directives only is requested, we should drop
emission of ',debug' option from the target directive. Required for
supporting of nvprof profiler.
Reviewers: probinson, echristo, dblaikie
Subscribers: Hahnfeld, jholewinski, llvm-commits, JDevlieghere, aprantl
Differential Revision: https://reviews.llvm.org/D46061
llvm-svn: 345972
UBSan detected an error in our ISelLowering that is exposed only when
you have a dmask == 0x1. Fix this by adding in an explicit check to
ensure we don't do the UBSan detected shl << 32.
llvm-svn: 345962
Summary:
Assembly output can use globals like __stack_pointer implicitly,
but has no way of indicating the type of such a global, which makes
it hard for tools processing it (such as the MC Assembler) to
reconstruct this information.
The improved assembler directives parsing (in progress in
https://reviews.llvm.org/D53842) will make use of this information.
Also deleted code for the .import_global directive which was unused.
New test case in userstack.ll
Reviewers: dschuff, sbc100
Subscribers: jgravelle-google, aheejin, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D54012
llvm-svn: 345917
Summary: Different variants of idot8 codegen dag patterns are not generated by llvm-tablegen due to a huge
increase in the compile time. Support the pattern that clang FE generates after reordering the
additions in integer-dot8 source language pattern.
Author: FarhanaAleen
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D53937
llvm-svn: 345902
Summary:
Like `block` or `loop`, `try` can take an optional signature which can
be omitted. This patch allows `try`'s signature to be omitted. Also
added some tests for EH instructions.
Reviewers: aardappel
Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D53873
llvm-svn: 345888
I added these annotations in r345878 because I wasn't sure if the
fallthrough was intended. Krzysztof Parzyszek confirmed that they should
be breaks, so that's what this patch does.
Reviewers: kparzysz
Differential Revision: https://reviews.llvm.org/D53991
llvm-svn: 345883
This patch should not introduce any behavior changes. It consists of
mostly one of two changes:
1. Replacing fall through comments with the LLVM_FALLTHROUGH macro
2. Inserting 'break' before falling through into a case block consisting
of only 'break'.
We were already using this warning with GCC, but its warning behaves
slightly differently. In this patch, the following differences are
relevant:
1. GCC recognizes comments that say "fall through" as annotations, clang
doesn't
2. GCC doesn't warn on "case N: foo(); default: break;", clang does
3. GCC doesn't warn when the case contains a switch, but falls through
the outer case.
I will enable the warning separately in a follow-up patch so that it can
be cleanly reverted if necessary.
Reviewers: alexfh, rsmith, lattner, rtrieu, EricWF, bollu
Differential Revision: https://reviews.llvm.org/D53950
llvm-svn: 345882
Clang's -Wimplicit-fallthrough check fires on these switch cases. GCC
does not warn when a case body that ends in a switch falls through to a
case label of an outer switch.
It's not clear if these fall throughs are truly intended. The Hexagon
tests pass regardless of whether these case blocks fall through or
break.
For now, I have applied the intended fallthrough annotation macro with a
FIXME comment to unblock enabling the warning. I will send a follow-up
patch that converts them to breaks to the Hexagon maintainers.
llvm-svn: 345878
Summary:
This function was causing a crash when `MaxElements == 1` because
it was trying to create a single element vector type.
Reviewers: dsanders, aemerson, aditya_nandakumar
Reviewed By: dsanders
Subscribers: rovka, kristof.beyls, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D53734
llvm-svn: 345875
This patch adds support for expanding vector CTPOP instructions and removes the x86 'bitmath' lowering which replicates the same expansion.
Differential Revision: https://reviews.llvm.org/D53258
llvm-svn: 345869
Previously this case fell through to unreachable, so it is clearly not
covered by any test case in LLVM. It may be dynamically unreachable, in
fact. However, if it were to run, this is what it would logically do.
The assert suggests that the intended behavior was not to allow folding
offsets from jump table indices, which makes sense.
llvm-svn: 345868
This was added in r330630. GCC's -Wimplicit-fallthrough seems to not
fire when the previous case contains a switch itself.
This fallthrough was bening because the helper function implementing the
case used dyn_cast to re-check the type of the node in question. After
fixing the fallthrough, we can strengthen the cast.
llvm-svn: 345864
While mutating instructions, we sign extended negative constant
operands for binary operators that can safely overflow. This was to
allow instructions, such as add nuw i8 %a, -2, to still be able to
perform a subtraction. However, the code to handle constants doesn't
take into consideration that instructions, such as sub nuw i8 -2, %a,
require the i8 -2 to be converted into i32 254.
This is a relatively simple fix, but I've taken the time to
reorganise the code a bit - mainly that instructions that can be
promoted are cached and splitting up the Mutate function.
Differential Revision: https://reviews.llvm.org/D53972
llvm-svn: 345840
When matching MipsISD::JmpLink t9, TargetExternalSymbol:i32'...',
wrong JALR16_MM is selected. This patch adds missing pattern for
JmpLink, so that JAL instruction is selected.
Differential Revision: https://reviews.llvm.org/D53366
llvm-svn: 345830
Reapplying an updated version of rL345395 (reverted in rL345451), now the issues noticed in PR39483 have been fixed.
This patch allows resolveTargetShuffleInputs to remove UNDEF inputs from cases where we have more than 2 inputs.
llvm-svn: 345824