On Power9, we don't have patterns to select the following intrinsics:
llvm.ppc.vsx.stxvw4x.be
llvm.ppc.vsx.stxvd2x.be
This patch adds support for these.
Differential Revision: https://reviews.llvm.org/D53581
llvm-svn: 346148
The constrained intrinsic tests have grown in number. Split off
the FMA tests into their own file to reduce double coverage.
Differential Revision: https://reviews.llvm.org/D53932
llvm-svn: 346137
Expand on LONG_BRANCH_LUi and LONG_BRANCH_(D)ADDiu pseudo
instructions by creating variants which support
less operands/accept GPR64Opnds as their operand in order
to appease the machine verifier pass.
Differential Revision: https://reviews.llvm.org/D53977
llvm-svn: 346133
The new atomic optimizer I previously added in D51969 did not work
correctly when a pixel shader was using derivatives, and had helper
lanes active.
To fix this we add an llvm.amdgcn.ps.live call that guards a branch
around the entire atomic operation - ensuring that all helper lanes are
inactive within the wavefront when we compute our atomic results.
I've added a test case that can cause derivatives, and exposes the
problem.
Differential Revision: https://reviews.llvm.org/D53930
llvm-svn: 346128
Turn the assert in PrepareConstants into a conditon so that we can
handle mul instructions with negative immediates.
Differential Revision: https://reviews.llvm.org/D54094
llvm-svn: 346126
r345840 slightly changed the way promotion happens which could
result in zext and truncs having the same source and destination
types. This fixes that issue.
We can now also remove the zext and trunc in the following case:
(zext (trunc (promoted op)), i32)
This means that we can no longer treat a value, that is only used by
a sink, to be safe to promote.
I've also added in some extra asserts and replaced a cast for a
dyn_cast.
Differential Revision: https://reviews.llvm.org/D54032
llvm-svn: 346125
This patch fixes a bug in the AVR FRMIDX expansion logic.
The expansion would leave a leftover operand from the original FRMIDX,
but now attached to a MOVWRdRr instruction. The MOVWRdRr instruction
did not expect this operand and so LLVM rejected the machine
instruction.
This would trigger an assertion:
Assertion failed: ((isImpReg || Op.isRegMask() || MCID->isVariadic() ||
OpNo < MCID->getNumOperands() || isMetaDataOp) &&
"Trying to add an operand to a machine instr that is already done!"),
function addOperand, file llvm/lib/CodeGen/MachineInstr.cpp
Tim fixed this so that now the FRMIDX is expanded correctly into
a well-formed MOVWRdRr.
Patch by Tim Neumann
llvm-svn: 346117
v2i8/v2i16/v2i32 are promoted to v2i64. pmuludq takes a v2i64 input and produces a v2i64 output. Since we don't about the upper bits of the type legalized multiply we can use the pmuludq to produce the multiply result for the bits we do care about.
llvm-svn: 346115
This is an AVR-specific workaround for a limitation of the register
allocator that only exposes itself on targets with high register
contention like AVR, which only has three pointer registers.
The three pointer registers are X, Y, and Z.
In most nontrivial functions, Y is reserved for the frame pointer,
as per the calling convention. This leaves X and Z. Some instructions,
such as LPM ("load program memory"), are only defined for the Z
register. Sometimes this just leaves X.
When the backend generates a LDDWRdPtrQ instruction with Z as the
destination pointer, it usually trips up the register allocator
with this error message:
LLVM ERROR: ran out of registers during register allocation
This patch is a hacky workaround. We ban the LDDWRdPtrQ instruction
from ever using the Z register as an operand. This gives the
register allocator a bit more space to allocate, fixing the
regalloc exhaustion error.
Here is a description from the patch author Peter Nimmervoll
As far as I understand the problem occurs when LDDWRdPtrQ uses
the ptrdispregs register class as target register. This should work, but
the allocator can't deal with this for some reason. So from my testing,
it seams like (and I might be totally wrong on this) the allocator reserves
the Z register for the ICALL instruction and then the register class
ptrdispregs only has 1 register left and we can't use Y for source and
destination. Removing the Z register from DREGS fixes the problem but
removing Y register does not.
More information about the bug can be found on the avr-rust issue
tracker at https://github.com/avr-rust/rust/issues/37.
A bug has raised to track the removal of this workaround and a proper
fix; PR39553 at https://bugs.llvm.org/show_bug.cgi?id=39553.
Patch by Peter Nimmervoll
llvm-svn: 346114
Summary: This also enables some constant folding from KnownBits propagation. This helps on some cases vXi64 case in 32-bit mode where constant vectors appear as vXi32 and a bitcast. This can prevent getNode from constant folding sra/shl/srl.
Reviewers: RKSimon, spatel
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D54069
llvm-svn: 346102
Use MachineFrameInfo's OffsetAdjustment field to pass this information
from the target to CodeViewDebug.cpp. The X86 backend doesn't use it for
any other purpose.
This fixes PR38857 in the case where there is a non-aligned quantity of
CSRs and a non-aligned quantity of locals.
llvm-svn: 346062
The majority of the changes are because the rest of shuffle lowering/combining prefers to replace the undef input with the other operand. Using UNPCKL directly seemed to avoid this and just grabbed a randomish register for the undef which can create false dependencies.
llvm-svn: 346050
We already have custom lowering for the AVX case in LegalizeVectorOps. So its better to keep the regular extend op around as long as possible.
I had to qualify one place in DAG combine that created illegal vector extending load operations. This change by itself had no effect on any tests which is why its included here.
I've made a few cleanups to the custom lowering. The sign extend code no longer creates an identity shuffle with undef elements. The zero extend code now emits a zero_extend_vector_inreg instead of an unpckl with a zero vector.
For the high half of the custom lowering of zero_extend/any_extend, we're now using an unpckh with a zero vector or undef. Previously we used used a pshufd to move the upper 64-bits to the lower 64-bits and then used a zero_extend_vector_inreg. I think the zero vector should require less execution resources and be smaller code size.
Differential Revision: https://reviews.llvm.org/D54024
llvm-svn: 346043
A number of intrinsics, such as llvm.sin.f32, would result in a failure to
select. This patch adds expansions for the relevant selection DAG nodes, as
well as exhaustive testing for all f32 and f64 intrinsics.
The codegen for FMA remains a TODO item, pending support for the various
RISC-V FMA instruction variants.
The llvm.minimum.f32.* and llvm.maximum.* tests are commented-out, pending
upstream support for target-independent expansion, as discussed in
http://lists.llvm.org/pipermail/llvm-dev/2018-November/127408.html.
Differential Revision: https://reviews.llvm.org/D54034
Patch by Luís Marques.
llvm-svn: 346034
This is necessary as I'm wanting to remove the 'Constant Pool' shuffle decoding from getTargetShuffleMask - but using getTargetShuffleMaskIndices allows the shuffle combiner to realize that these calls are really broadcasts.....
As with a lot of the X86ISD::VPERMV3 code this causes some vperm2i/vperm2t shuffles to flip depending on optimal commutation.
llvm-svn: 346032
Summary:
EH stack depth is incremented at `try` and decremented at `catch`. When
there are more than two catch instructions for a try instruction, we
shouldn't count non-first catches when calculating EH stack depths.
This patch fixes two bugs:
- CFGStackify: Exclude `catch_all` in the terminate catch pad when
calculating EH pad stack, because when we have multiple catches for a
try we should count only the first catch instruction when calculating
EH pad stack.
- InstPrinter: The initial intention was also to exclude non-first
catches, but it didn't account nested try-catches, so it failed on
this case:
```
try
try
catch
end
catch <-- (1)
end
```
In the example, when we are at the catch (1), the last seen EH
instruction is not `try` but `end_try`, violating the wrong assumption.
We don't need these after we switch to the second proposal because there
is gonna be only one `catch` instruction. But anyway before then these
bugfixes are necessary for keep trunk in working state.
Reviewers: dschuff
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D53819
llvm-svn: 346029
As reported in PR38952, postra-machine-sink relies on DBG_VALUE insns being
adjacent to the def of the register that they reference. This is not always
true, leading to register copies being sunk but not the associated DBG_VALUEs,
which gives the debugger a bad variable location.
This patch collects DBG_VALUEs as we walk through a BB looking for copies to
sink, then passes them down to performSink. Compile-time impact should be
negligable.
Differential Revision: https://reviews.llvm.org/D53992
llvm-svn: 345996
Small-data (i.e. GP-relative) loads and stores allow 16-bit scaled
offset. For a load of a value of type T, the small-data area is
equivalent to an array "T sdata[65536]". This implies that objects
of smaller sizes need to be closer to the beginning of sdata,
while larger objects may be farther away, or otherwise the offset
may be insufficient to reach it. Similarly, an object of a larger
size should not be accessed via a load of a smaller size.
llvm-svn: 345975
reduceBuildVecConvertToConvertBuildVec vectorizes int2float in the DAGCombiner, which means that even if the LV/SLP has decided to keep scalar code using the cost models, this will override this.
While there are cases where vectorization is necessary in the DAG (mainly due to legalization artefacts), I don't think this is the case here, we should assume that the vectorizers know what they are doing.
Differential Revision: https://reviews.llvm.org/D53712
llvm-svn: 345964
Summary:
Assembly output can use globals like __stack_pointer implicitly,
but has no way of indicating the type of such a global, which makes
it hard for tools processing it (such as the MC Assembler) to
reconstruct this information.
The improved assembler directives parsing (in progress in
https://reviews.llvm.org/D53842) will make use of this information.
Also deleted code for the .import_global directive which was unused.
New test case in userstack.ll
Reviewers: dschuff, sbc100
Subscribers: jgravelle-google, aheejin, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D54012
llvm-svn: 345917
I'm having trouble creating a test case for the ISD::TRUNCATE part of this that shows any codegen differences. But I was able to test the setcc path which is what the test changes here cover.
llvm-svn: 345908
Instruction mapping in the outliner uses "illegal numbers" to signify that
something can't ever be part of an outlining candidate. This means that the
number is unique and can't be part of any repeated substring.
Because each of these is unique, we can use a single unique number to represent
a range of things we can't outline.
The outliner tries to leverage this using a flag which is set in an MBB when
the previous instruction we tried to map was "illegal". This patch improves
that logic to work across MBBs. As a bonus, this also simplifies the mapping
logic somewhat.
This also updates the machine-outliner-remarks test, which was impacted by the
order of Candidates on an OutlinedFunction changing. This order isn't
guaranteed, so I added a FIXME to fix that in a follow-up. The order of
Candidates on an OutlinedFunction isn't important, so this still is NFC.
llvm-svn: 345906
Summary: Different variants of idot8 codegen dag patterns are not generated by llvm-tablegen due to a huge
increase in the compile time. Support the pattern that clang FE generates after reordering the
additions in integer-dot8 source language pattern.
Author: FarhanaAleen
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D53937
llvm-svn: 345902
Summary:
This function was causing a crash when `MaxElements == 1` because
it was trying to create a single element vector type.
Reviewers: dsanders, aemerson, aditya_nandakumar
Reviewed By: dsanders
Subscribers: rovka, kristof.beyls, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D53734
llvm-svn: 345875
The test causes a crash because we were trying to extract v4f32 to v3f32, and the
narrowing factor was then 4/3 = 1 producing a bogus narrow type.
This should fix:
https://bugs.llvm.org/show_bug.cgi?id=39511
llvm-svn: 345842
Summary:
This adds dummy implementation of `EmitRawText` in `MCNullStreamer`.
This fixes the behavior of `AsmPrinter` with `MCNullStreamer` on targets
on which no integrated assembler is used. An attempt to emit inline asm
on such a target would previously lead to a crash, since `AsmPrinter` does not
check for `hasRawTextSupport` in `EmitInlineAsm` and calls `EmitRawText`
anyway if integrated assembler is disabled (the behavior has changed
in D2686).
Error message printed by MCStreamer:
> EmitRawText called on an MCStreamer that doesn't support it, something
> must not be fully mc'ized
Patch by Eugene Sharygin
Reviewers: dsanders, echristo
Reviewed By: dsanders
Subscribers: eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D53938
llvm-svn: 345841
While mutating instructions, we sign extended negative constant
operands for binary operators that can safely overflow. This was to
allow instructions, such as add nuw i8 %a, -2, to still be able to
perform a subtraction. However, the code to handle constants doesn't
take into consideration that instructions, such as sub nuw i8 -2, %a,
require the i8 -2 to be converted into i32 254.
This is a relatively simple fix, but I've taken the time to
reorganise the code a bit - mainly that instructions that can be
promoted are cached and splitting up the Mutate function.
Differential Revision: https://reviews.llvm.org/D53972
llvm-svn: 345840
When matching MipsISD::JmpLink t9, TargetExternalSymbol:i32'...',
wrong JALR16_MM is selected. This patch adds missing pattern for
JmpLink, so that JAL instruction is selected.
Differential Revision: https://reviews.llvm.org/D53366
llvm-svn: 345830
Reapplying an updated version of rL345395 (reverted in rL345451), now the issues noticed in PR39483 have been fixed.
This patch allows resolveTargetShuffleInputs to remove UNDEF inputs from cases where we have more than 2 inputs.
llvm-svn: 345824
In MipsBranchExpansion::splitMBB, upon splitting
a block with two direct branches, remove the successor
of the newly created block (which inherits successors from
the original block) which is pointed to by the last
branch in the original block only if the targets of two
branches differ.
This is to fix the failing test when ran with
-verify-machineinstrs enabled.
Differential Revision: https://reviews.llvm.org/D53756
llvm-svn: 345821
Summary:
Also reduce the test case for implicit defs and test it with all
register classes.
Reviewers: aheejin, dschuff
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D53855
llvm-svn: 345794