This patch refactors the code used in llc such that all the users of the
addPassesToEmitFile API have access to a homogeneous way of handling
start/stop-after/before options right out of the box.
Previously each user would have needed to duplicate this logic and set
up its own options.
NFC
llvm-svn: 299282
Add a new node to act as a fancy bitcast from f16 operations to
i32 that implicitly zero the high 16-bits of the result.
Alternatively could try making v2f16 legal and canonicalizing
on build_vectors.
llvm-svn: 299246
Summary:
This feature enables folding of logical shift operations of up to 3 places into addressing mode on Kryo and Falkor that have a fastpath LSL.
Reviewers: mcrosier, rengolin, t.p.northover
Subscribers: junbuml, gberry, llvm-commits, aemerson
Differential Revision: https://reviews.llvm.org/D31113
llvm-svn: 299240
Our _MM_HINT_T0/T1 constant values are 3/2 which matches gcc, but not icc or Intel documentation. Interestingly gcc had this same bug on their implementation of the gather/scatter builtins at one point too.
Fixes PR32411.
llvm-svn: 299234
Implementation of TargetInstrInfo::findCommutedOpIndices for MIPS target,
restricting commutativity to second and third operand only for
dpaadd_[su].df instructions therein.
Prior to this change, there were cases where the vector that is to be added
to the dot product of the other two could take a position other than the
first one in the instruction, generating false output in the destination
vector.
Such behavior has been noticed in the two functions generating v2i64 output
values so far. Other ones may exhibit such behavior as well, just not for
the vector operands which are present in the test at the moment.
Tests altered so that the function's first operand is a constant splat so
that it can be loaded with a ldi instruction, since that is the case in
which the erroneous instruction operand placement has occurred. We check
that the register which is present in the ldi instruction is placed as the
first operand in the corresponding dpadd instruction.
Patch by Stefan Maksimovic.
Differential Revision: https://reviews.llvm.org/D30827
llvm-svn: 299223
Since LOCR only accepts GR32 virtual registers, its operands must be copied
into this regclass in insertSelect(), when an LOCR is built. Otherwise, the
case where the source operand was GRX32 will produce invalid IR.
Review: Ulrich Weigand
llvm-svn: 299220
Currently ComputeNumSignBits returns the minimum number of sign bits for all elements of vector data, when we may only be interested in one/some of the elements.
This patch adds a DemandedElts argument that allows us to specify the elements we actually care about. The original ComputeNumSignBits implementation calls with a DemandedElts demanding all elements to match current behaviour. Scalar types set this to 1.
I've only added support for BUILD_VECTOR and EXTRACT_VECTOR_ELT so far, all others will default to demanding all elements but can be updated in due course.
Followup to D25691.
Differential Revision: https://reviews.llvm.org/D31311
llvm-svn: 299219
Even on older subtargets that lack vector support, there may be vector values
with just one element in the input program. These are converted during DAG
legalization to scalar values.
The pre-legalize SystemZ DAGCombiner methods should in this circumstance not
touch these nodes. This patch adds a check for this in
SystemZTargetLowering::combineEXTRACT_VECTOR_ELT().
Review: Ulrich Weigand
llvm-svn: 299213
Previously compiler often extracted common immediates into specific register, e.g.:
```
%vreg0 = S_MOV_B32 0xff;
%vreg2 = V_AND_B32_e32 %vreg0, %vreg1
%vreg4 = V_AND_B32_e32 %vreg0, %vreg3
```
Because of this SDWA peephole failed to find SDWA convertible pattern. E.g. in previous example this could be converted into 2 SDWA src operands:
```
SDWA src: %vreg2 src_sel:BYTE_0
SDWA src: %vreg4 src_sel:BYTE_0
```
With this change peephole check if operand is either immediate or register that is copy of immediate.
llvm-svn: 299202
Follow up to D25691, this sets up the plumbing necessary to support vector demanded elements support in known bits calculations in target nodes.
Differential Revision: https://reviews.llvm.org/D31249
llvm-svn: 299201
Add support for the new relocations and linking metadata section support in
https://github.com/WebAssembly/tool-conventions/blob/master/Linking.md. In
particular, this allows LLVM to indicate which variable is the stack pointer,
so that it can be linked with other objects.
This also adds support for emitting type relocations for call_indirect
instructions.
Right now, this is mainly tested by using wabt and hexdump to examine the
output on selected testcases. We'll add more tests as the design stablizes
and more of the pieces are in place.
llvm-svn: 299141
What we really want to do is distinguish functions that may
be called by other functions, and graphics shaders are not
called kernels.
llvm-svn: 299140
If set to false it does not remove global aliases. With this parameter
set to false it should be safe to run the pass before link.
Differential Revision: https://reviews.llvm.org/D31489
llvm-svn: 299108
This will result in a KMOVW or KMOVD being emitted during register allocation. And in at least some cases this might allow the register coalescer to remove the copy all together.
llvm-svn: 298984
We should be masking the value and emitting a register copy like we do in non-fast isel. Instead we were just updating the value map and emitting nothing.
After r298928 we started seeing cases where we would create a copy from GR8 to GR32 because the source register in a VK1 to GR32 copy was replaced by the GR8 going into a truncate.
This fixes PR32451.
llvm-svn: 298957
In PPCBoolRetToInt bool value is changed to i32 type. On ppc64 it may introduce an extra zero extension for the return value. This patch changes the integer type to i64 to avoid the zero extension on ppc64.
This patch fixed PR32442.
Differential Revision: https://reviews.llvm.org/D31407
llvm-svn: 298955
This is less important than increase threshold for private memory,
but still brings performance improvements in a wide range of tests.
Unrolling more for local memory serves three purposes: it allows
to combine ds operations if offset becomes static, saves registers
used for offsets in case of static offsets, and allows better lds
latency hiding.
Differential Revision: https://reviews.llvm.org/D31412
llvm-svn: 298948
This is incorrect to record region boundaries before scheduling,
it may change after scheduling. As a result second pass may see less
instructions to schedule than it should.
Differential Revision: https://reviews.llvm.org/D31434
llvm-svn: 298945
We currently perform the various fp_to_sint XMM conversion and then transfer to the MMX register (on 32-bit via the stack).
This patch improves support for MOVDQ2Q XMM to MMX transfers and adds the XMM->MMX fp_to_sint direct conversion patterns. The SSE2 specifications are the same as for XMM->XMM and XMM->MMX rounding/exceptions/etc.
Differential Revision: https://reviews.llvm.org/D30868
llvm-svn: 298943
Previously it was covered by the internalization. It turns out we cannot
run internalizer in FE, it break separate compilation tests. Thus early
inliner gets its own option.
Differential Revision: https://reviews.llvm.org/D31429
llvm-svn: 298935
We've had several bugs(PR32256, PR32241) recently that resulted from usages of AH/BH/CH/DH either before or after a copy to/from a mask register.
This ultimately occurs because we create COPY_TO_REGCLASS with VK1 and GR8. Then in CopyToFromAsymmetricReg in X86InstrInfo we find a 32-bit super register for the GR8 to emit the KMOV with. But as these tests are demonstrating, its possible for the GR8 register to be a high register and we end up doing an accidental extra or insert from bits 15:8.
I think the best way forward is to stop making copies directly between mask registers and GR8/GR16. Instead I think we should restrict to only copies between mask registers and GR32/GR64 and use EXTRACT_SUBREG/INSERT_SUBREG to handle the conversion from GR32 to GR16/8 or vice versa.
Unfortunately, this complicates fastisel a bit more now to create the subreg extracts where we used to create GR8 copies. We can probably make a helper function to bring down the repitition.
This does result in KMOVD being used for copies when BWI is available because we don't know the original mask register size. This caused a lot of deltas on tests because we have to split the checks for KMOVD vs KMOVW based on BWI.
Differential Revision: https://reviews.llvm.org/D30968
llvm-svn: 298928
Summary:
Similar to the ARM target in https://reviews.llvm.org/rL298380, this
patch adds identical infrastructure for disabling negative immediate
conversions, and converts the existing aliases to the new infrastucture.
Reviewers: rengolin, javed.absar, olista01, SjoerdMeijer, samparker
Reviewed By: samparker
Subscribers: samparker, aemerson, llvm-commits
Differential Revision: https://reviews.llvm.org/D31243
llvm-svn: 298908
Summary:
G_LOAD/G_STORE, add alternative RegisterBank mapping.
For G_LOAD, Fast and Greedy mode choose the same RegisterBank mapping (GprRegBank ) for the G_GLOAD + G_FADD , can't get rid of cross register bank copy GprRegBank->VecRegBank.
Reviewers: zvi, rovka, qcolombet, ab
Reviewed By: zvi
Subscribers: llvm-commits, dberris, kristof.beyls, eladcohen, guyblank
Differential Revision: https://reviews.llvm.org/D30979
llvm-svn: 298907
This patch enables schedulers to specify instructions that
cannot be issued with any other instructions.
It also fixes BeginGroup/EndGroup.
Reviewed by: Andrew Trick
Differential Revision: https://reviews.llvm.org/D30744
llvm-svn: 298885
We're not to the point of supporting the load/store patterns yet
(because they extensively use PatFrags).
But in the meantime, we can implement some of the simplest addressing
modes.
llvm-svn: 298863
CBZ/CBNZ represent a substantial portion of all conditional branches.
Look through G_ICMP to select them.
We can't use tablegen yet because the existing patterns match an
AArch64ISD node.
llvm-svn: 298856
Enabled clamp and omod for v_cvt_* opcodes which have src0 of an integer type
Reviewers: vpykhtin, arsenm
Differential Revision: https://reviews.llvm.org/D31327
llvm-svn: 298852
Among other things, this allows Machine LICM to hoist a costly 'mrs'
instruction from within a loop.
Differential Revision: http://reviews.llvm.org/D31151
llvm-svn: 298851
As we introduced target triple environment amdgiz and amdgizcl, the address
space values are no longer enums. We have to decide the value by target triple.
The basic idea is to use struct AMDGPUAS to represent address space values.
For address space values which are not depend on target triple, use static
const members, so that they don't occupy extra memory space and is equivalent
to a compile time constant.
Since the struct is lightweight and cheap, it can be created on the fly at
the point of usage. Or it can be added as member to a pass and created at
the beginning of the run* function.
Differential Revision: https://reviews.llvm.org/D31284
llvm-svn: 298846
This is a patch for an on-going bugzilla bug 21281 on the generated X86 code for a matrix transpose8x8 subroutine which requires vector interleaving. The generated code in AVX2 is currently non-optimal and requires 60 instructions as opposed to only 40 instructions generated for AVX1.
The patch includes a fix for the AVX2 case where vector unpack instructions use less operations than the vector blend operations available in AVX2.
In this case using vector unpack instructions is more efficient.
Reviewers:
zvi
delena
igorb
craig.topper
guyblank
eladcohen
m_zuckerman
aymanmus
RKSimon
llvm-svn: 298840
Fixed -verify-machineinstrs errors in fast-isel-select-sse.ll (one of many in PR27481)
The VMOVSSZrr/VMOVSSZrrk and VMOVSDZrr/VMOVSDZrrk instructions were assuming both source registers were V128X when the second is actually supposed to be FR32X/FR64X
Differential Revision: https://reviews.llvm.org/D31200
llvm-svn: 298805
Patch to generalize combinePCMPAnd1 (for handling SETCC + ZEXT cases) to work for any input that has zero/all bits set masked with an 'all low bits' mask.
Replaced the implicit assumption of shift availability with a call to SupportedVectorShiftWithImm.
Part 1 of 3.
Differential Revision: https://reviews.llvm.org/D31347
llvm-svn: 298779
This is the payoff for D31156 - if a target has efficient comparison instructions for vector-sized equality,
we can replace memcmp calls with inline code that is both smaller and faster.
Differential Revision: https://reviews.llvm.org/D31290
llvm-svn: 298775
Switch data layout by target triple environment amdgiz and amdgizcl indicating using of an address space mapping in which generic address space is 0.
amdgiz is for non-OpenCL environment where generic address space is 0.
amdgizcl is for OpenCL environment where generic address space is 0.
Differential Revision: https://reviews.llvm.org/D31211
llvm-svn: 298758
When I tested r298734, I thought that red zones were enabled by default like in
X86. Since red zones are behind a flag on AArch64 the testing wasn't true.
llvm-svn: 298747
If the branch condition for a loop was a phi which itself
was fed from a phi from a loop, it isn't safe to try
to delete the phi until after the loop is handled.
llvm-svn: 298737
AArch64 doesn't require -mno-red-zone; stack fixups are sufficient here. This was
unnecessarily copied over from the X86 target.
(You can now outline with red zones! Yay!)
Removing the requirement passes all Single/MultiSource tests.
llvm-svn: 298734
StructurizeCFG can't handle cases with multiple
returns creating regions with multiple exits.
Create a copy of UnifyFunctionExitNodes that only
unifies exit nodes that skips exit nodes
with uniform branch sources.
llvm-svn: 298729
r298178 capitalized the fields in `ArgListEntry`. All the official
targets were updated accordingly, but as an experimental target AVR
was missed.
llvm-svn: 298677
- Avoid explosive growth of the simplification queue by not queuing
expressions that are alredy in it.
- Add an iteration counter and abort after a sufficiently large number
of iterations (assuming that it's a symptom of an infinite loop).
llvm-svn: 298655
Summary:
The true and false operands for the CMOV are operands 0 and 1.
ARMISelLowering.cpp::computeKnownBits was looking at operands 1 and 2
instead. This can cause CMOV instructions to be incorrectly folded into
BFI if value set by the CMOV is another CMOV, whose known bits are
computed incorrectly.
This patch fixes the issue and adds a test case.
Reviewers: kristof.beyls, jmolloy
Subscribers: llvm-commits, aemerson, srhines, rengolin
Differential Revision: https://reviews.llvm.org/D31265
llvm-svn: 298624
Summary:
1. Support pointer type as function argumnet and return value
2. G_STORE/G_LOAD - set legal action for i8/i16/i32/i64/f32/f64/vec128
3. RegisterBank - support typeless operations like G_STORE/G_LOAD, for scalar use GPR bank.
4. Support instruction selection for G_LOAD/G_STORE
Reviewers: zvi, rovka, ab, qcolombet
Reviewed By: rovka
Subscribers: llvm-commits, dberris, kristof.beyls, eladcohen, guyblank
Differential Revision: https://reviews.llvm.org/D30973
llvm-svn: 298609
Summary: Cleanup some remnants of code from when the X86FixupBWInsts pass did both forward liveness analysis and backward liveness analysis.
Reviewers: MatzeB, myatsina, DavidKreitzer
Reviewed By: MatzeB
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D31264
llvm-svn: 298599
This patch fixes decoding of size and position for DINSM
and DINSU instructions.
Differential Revision: https://reviews.llvm.org/D31072
llvm-svn: 298593
Up until now, vpmovm2 instruction described its destination operand size
by the source operand size. This patch adds new pattern for the vpmovm2
instruction. The node describes new expansion of the destination (from
{128|256} to 512).
Differential Revision: https://reviews.llvm.org/D30654
llvm-svn: 298586
- Rename runtime metadata -> code object metadata
- Make metadata not flow
- Switch enums to use ScalarEnumerationTraits
- Cleanup and move AMDGPUCodeObjectMetadata.h to AMDGPU/MCTargetDesc
- Introduce in-memory representation for attributes
- Code object metadata streamer
- Create metadata for isa and printf during EmitStartOfAsmFile
- Create metadata for kernel during EmitFunctionBodyStart
- Finalize and emit metadata to .note during EmitEndOfAsmFile
- Other minor improvements/bug fixes
Differential Revision: https://reviews.llvm.org/D29948
llvm-svn: 298552
This is used for a specific type of return to a shader part's
epilog code. Rename to try avoiding confusion from a true
call's return.
llvm-svn: 298452
This adds a parameter to @llvm.objectsize that makes it return
conservative values if it's given null.
This fixes PR23277.
Differential Revision: https://reviews.llvm.org/D28494
llvm-svn: 298430
This patch introduces X86AsmParser with the ability to handle the aforementioned ops within compound "MS" arithmetical expressions.
Currently - only supported as a stand alone Operand, e.g.:
"TYPE X"
now allowed :
"4 + TYPE X * 128"
Clang side: https://reviews.llvm.org/D31174
Differential Revision: https://reviews.llvm.org/D31173
llvm-svn: 298425
including the amended (no UB anymore) fix for adding/subtracting -2147483648.
This reverts r298328 "[ARM] Revert r297443 and r297820."
and partially reverts r297842 "Revert "[Thumb1] Fix the bug when adding/subtracting -2147483648""
llvm-svn: 298417
[Hexagon] Recognize polynomial-modulo loop idiom again
Regain the ability to recognize loops calculating polynomial modulo
operation. This ability has been lost due to some changes in the
preceding optimizations. Add code to preprocess the IR to a form
that the pattern matching code can recognize.
llvm-svn: 298400
Summary:
This class is a list of AttributeSetNodes corresponding the function
prototype of a call or function declaration. This class used to be
called ParamAttrListPtr, then AttrListPtr, then AttributeSet. It is
typically accessed by parameter and return value index, so
"AttributeList" seems like a more intuitive name.
Rename AttributeSetImpl to AttributeListImpl to follow suit.
It's useful to rename this class so that we can rename AttributeSetNode
to AttributeSet later. AttributeSet is the set of attributes that apply
to a single function, argument, or return value.
Reviewers: sanjoy, javed.absar, chandlerc, pete
Reviewed By: pete
Subscribers: pete, jholewinski, arsenm, dschuff, mehdi_amini, jfb, nhaehnle, sbc100, void, llvm-commits
Differential Revision: https://reviews.llvm.org/D31102
llvm-svn: 298393
Summary:
To support negative immediates for certain arithmetic instructions, the
instruction is converted to the inverse instruction with a negated (or inverted)
immediate. For example, "ADD r0, r1, #FFFFFFFF" cannot be encoded as an ADD
instruction. However, "SUB r0, r1, #1" is equivalent.
These conversions are different from instruction aliases. An alias maps
several assembler instructions onto one encoding. A conversion, however, maps
an *invalid* instruction--e.g. with an immediate that cannot be represented in
the encoding--to a different (but equivalent) instruction.
Several instructions with negative immediates were being converted already, but
this was not systematically tested, nor did it cover all instructions.
This patch implements all possible substitutions for ARM, Thumb1 and
Thumb2 assembler and adds tests. It also adds a feature flag
(-mattr=+no-neg-immediates) to turn these substitutions off. This is
helpful for users who want their code to assemble to exactly what they
wrote.
Reviewers: t.p.northover, rovka, samparker, javed.absar, peter.smith, rengolin
Reviewed By: javed.absar
Subscribers: aadg, aemerson, llvm-commits
Differential Revision: https://reviews.llvm.org/D30571
llvm-svn: 298380
We could do better by splitting any oversized type into whatever vector size the target supports,
but I left that for future work if it ever comes up. The motivating case is memcmp() calls on 16-byte
structs, so I think we can wire that up with a TLI hook that feeds into this.
Differential Revision: https://reviews.llvm.org/D31156
llvm-svn: 298376