These were selecting directly to the VOP2 form instead
of VOP3 like the i32 instructions. Fixes regressions in
future commits where an immediate isn't folded because it was
initially used for the second operand.
Because uniform 16-bit operations are promoted to i32, it's
difficult to get a simple testcase where this matters. Fold
failures in SIFoldOperands here tend to be hidden by commute
and fold in SIShrinkInstructions.
llvm-svn: 289189
Supporting them properly is a reasonably complex chunk of work, so to allow bot
testing before then we should at least be able to fall back to DAG ISel.
llvm-svn: 289150
We were falsely claiming that we had an LSDA for the relevant EH
personality before this change, which could lead to the EH machinery
interpreting random adjacent data as an LSDA.
Fixes PR31317
This change is safe because cleanups can't contain exception handlers
today. We do these things to maintain that invariant:
- C++ destructors are naturally out-of-line
- __finally blocks are outlined in clang
- LLVM's inliner will not inline EH constructs into cleanups
llvm-svn: 289101
Not having this legal led to combine failures, resulting
in dumb things like bitcasts of constants not being folded
away.
The only reason I'm leaving the v_mov_b32 hack that f32
already uses is to avoid madak formation test regressions.
PeepholeOptimizer has an ordering issue where the immediate
fold attempt is into the sgpr->vgpr copy instead of the actual
use. Running it twice avoids that problem.
llvm-svn: 289096
The correct commutable opcode was set to itself, so this
was simply swapping the operands to commute instead of also
changing the opcode to v_subrev_u16.
llvm-svn: 289093
Multiple metadata values for records such as opencl.ocl.version, llvm.ident
and similar are created after linking several modules. For some of them, notably
opencl.ocl.version, this creates semantic problem because we cannot tell which
version of OpenCL the composite module conforms.
Moreover, such repetitions of identical values often create a huge list of
unneeded metadata, which grows bitcode size both in memory and stored on disk.
It can go up to several Mb when linked against our OpenCL library. Lastly, such
long lists obscure reading of dumped IR.
The pass unifies metadata after linking.
Differential Revision: https://reviews.llvm.org/D25381
llvm-svn: 289092
Summary:
Attaching !absolute_symbol to a global variable does two things:
1) Marks it as an absolute symbol reference.
2) Specifies the value range of that symbol's address.
Teach the X86 backend to allow absolute symbols to appear in place of
immediates by extending the relocImm and mov64imm32 matchers. Start using
relocImm in more places where it is legal.
As previously proposed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2016-October/105800.html
Differential Revision: https://reviews.llvm.org/D25878
llvm-svn: 289087
Summary:
LC can currently select scalar load for uniform memory access
basing on readonly memory address space only. This restriction
originated from the fact that in HW prior to VI vector and scalar caches
are not coherent. With MemoryDependenceAnalysis we can check that the
memory location corresponding to the memory operand of the LOAD is not
clobbered along the all paths from the function entry.
Reviewers: rampitec, tstellarAMD, arsenm
Subscribers: wdng, arsenm, nhaehnle
Differential Revision: https://reviews.llvm.org/D26917
llvm-svn: 289076
This re-adds checks for the patterns that were disabled with r288506.
Reviewers: spatel, delena, craig.topper
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D27346
llvm-svn: 289049
Summary:
Without the fix to isFrameOffsetLegal to consider the instruction's
immediate offset, the new test case hits the corresponding assertion in
resolveFrameIndex, because the LocalStackSlotAllocation pass re-uses a
different base register.
With only the fix to isFrameOffsetLegal, code quality reduces in a bunch of
places because frame base registers are added where they're not needed.
This is addressed by properly implementing needsFrameBaseReg, which also
helps to avoid unnecessary zero frame indices in a bunch of other places.
Fixes piglit glsl-1.50/execution/variable-indexing/gs-output-array-vec4-index-wr.shader_test
Reviewers: arsenm, tstellarAMD
Subscribers: qcolombet, kzhuravl, wdng, yaxunl, tony-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D27344
llvm-svn: 289048
ConstantExpr instances were emitting code into the current block rather than
the entry block. This meant they didn't necessarily dominate all uses, which is
clearly wrong.
llvm-svn: 288985
MachineIRBuilder had weird before/after and beginning/end flags for the insert
point. Unfortunately the non-default means that instructions will be inserted
in reverse order which is almost never what anyone wants.
Really, I think we just want (like IRBuilder has) the ability to insert at any
C++ iterator-style point (i.e. before any instruction or before MBB.end()). So
this fixes MIRBuilders to behave like IRBuilders in this respect.
llvm-svn: 288980
If we don't skip over DEBUG_VALUEs, we get differences between -g and non-g
code.
This fixes PR31242.
Differential Revision: https://reviews.llvm.org/D27485
llvm-svn: 288965
The second operand of an "ri" instruction may be an immediate, but it may
also be a globalvariable, so we should make any assumptions.
This fixes PR31271.
Differential Revision: https://reviews.llvm.org/D27481
llvm-svn: 288964
We are being inconsistent with these instructions (and all their variants.....) with a random mix of them using the default float domain.
Differential Revision: https://reviews.llvm.org/D27419
llvm-svn: 288902
The non-constant pool version of DecodeVPERMIL2PMask was not offsetting correctly for the second input. I've updated the code to match the implementation in the constant-pool version.
Annoyingly this bug was hidden for so long as it's tricky to combine to useful variable shuffle masks that don't become constant-pool entries.
llvm-svn: 288898
On some platforms (like MSP430) the second element of the result
structure for SMULO/UMULO may have a shorter type than the one
returned by SetCC. We need to truncate it to the right type, or
else some incorrect code may be generated later on.
This fixes issue https://github.com/rust-lang/rust/issues/37829
Patch by Vadzim Dambrouski!
Differential Revision: https://reviews.llvm.org/D27154
llvm-svn: 288857
Summary:
If we write an immediate to a VGPR and then copy the VGPR to an
SGPR, we can replace the copy with a S_MOV_B32 sgpr, imm, rather than
moving the copy to the SALU.
Reviewers: arsenm
Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye
Differential Revision: https://reviews.llvm.org/D27272
llvm-svn: 288849
We were rounding size in bits down rather than up, leading to 0-sized slots for
i1 (assert!) and bugs for other types not byte-aligned.
llvm-svn: 288848
Summary:
Prefer expansions such as: pmullw,pmulhw,unpacklwd,unpackhwd over pmulld.
On Silvermont [source: Optimization Reference Manual]:
PMULLD has a throughput of 1/11 [instruction/cycles].
PMULHUW/PMULHW/PMULLW have a throughput of 1/2 [instruction/cycles].
Fixes pr31202.
Analysis of this issue was done by Fahana Aleen.
Reviewers: wmi, delena, mkuper
Subscribers: RKSimon, llvm-commits
Differential Revision: https://reviews.llvm.org/D27203
llvm-svn: 288844
Handle the case where a sign extension has ended up being split into separate stages (typically to get around vector legal ops) and a zext + sext_in_reg gets inserted.
Differential Revision: https://reviews.llvm.org/D27461
llvm-svn: 288842
There were two problems:
+ AArch64 was reusing random data from its binary op tables, which is
complete nonsense for G_SEQUENCE.
+ Even when AArch64 gave up and said it couldn't handle G_SEQUENCE,
the generic code asserted.
llvm-svn: 288836