The DataLayout can calculate alignment of vectors based on the alignment
of the element type and the number of elements. In fact, it is the product
of these two values. The problem is that for vectors of N x i1, this will
return the alignment of N bytes, since the alignment of i1 is 8 bits. The
vector types of vNi1 should be aligned to N bits instead. Provide explicit
alignment for HVX vectors to avoid such complications.
llvm-svn: 260678
This was hardcoded to the static private size, but this
would be missing the offset and additional size for someday
when we have dynamic sizing.
Also stops always initializing flat_scratch even when unused.
In the future we should stop emitting this unless flat instructions
are used to access private memory. For example this will initialize
it almost always on VI because flat is used for global access.
llvm-svn: 260658
Introduce a subtarget feature for this, and leave the default with
the current behavior which assumes up to 16-byte loads/stores can
be used. The field also seems to have the ability to be set to 2 bytes,
but I'm not sure what that would be used for.
llvm-svn: 260651
This is just a trivial implementation:
- Support only arguments passed in registers.
- Support only "plain" arguments, i.e., no sext/zext attribute.
At this point, it is possible to play with the IRTranslator on AArch64:
llc -mtriple arm64-<vendor>-<os> -print-machineinstrs <input.ll> -o - -global-isel
For now, we only support the translation of program with adds and returns.
Follow-up patches are on their way to add a test case (the MIRParser is
not ready as it is).
llvm-svn: 260600
Summary:
It's possible to have resource descriptors and samplers stored in
VGPRs, either by a VMEM instruction or in the case of samplers,
floating-point calculations. When this happens, we need to use
v_readfirstlane to copy these values back to sgprs.
Reviewers: mareko, arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D17102
llvm-svn: 260599
Summary:
When we split SMRD instructions into two MUBUFs we were adding the users
of the newly created MUBUFs to the VALU worklist. However, the only
users these instructions had was the REG_SEQUENCE that was inserted
by splitSMRD when the original SMRD instruction was split.
We need to make sure to add the users of the original SMRD to the VALU
worklist before it is split.
I have a test case, but it requires one other bug fix, so it will be
added in a later commt.
Reviewers: mareko, arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D17101
llvm-svn: 260588
Summary:
Added support for "VOP3Only" attribute in VOP3bInst encoding.
Set VOP3Only=1 for V_DIV_SCALE_F64/32 insns.
Added support for multi-dest instructions in AMDGPUAs::cvt*().
Added lit test for "V_DIV_SCALE_F64|F32 vreg,vcc|sreg,vreg,vreg,vreg".
Reviewers: tstellarAMD, arsenm
Subscribers: arsenm, SamWot, nhaustov, vpykhtin
Differential Revision: http://reviews.llvm.org/D16995
Patch By: Artem Tamazov
llvm-svn: 260560
This patch allows the mixing of scaled and unscaled load/stores to form
load/store pairs.
This is a reapplication of r259812, which had an incorrect assert. The
test_stur_str_no_assert() test is a reduced version of the issue hit in
the AArch64 self-host.
PR24465
llvm-svn: 260523
If the two operands to an instruction were both
subregisters of the same super register, it would incorrectly
think this counted as the same constant bus use.
This fixes the verifier error in fmin_legacy.ll which
was missing -verify-machineinstrs.
llvm-svn: 260495
Separate methods to convert parsed instructions to MCInst:
- VOP3 only instructions (always create modifiers as operands in MCInst)
- VOP2 instrunctions with modifiers (create modifiers as operands
in MCInst when e64 encoding is forced or modifiers are parsed)
- VOP2 instructions without modifiers (do not create modifiers
as operands in MCInst)
- Add VOP3Only flag. Pass HasMods flag to VOP3Common.
- Simplify code that deals with modifiers (-1 is now same as
0). This is no longer needed.
- Add few tests (more will be added separately).
Update error message now correct.
Patch By: Nikolay Haustov
Differential Revision: http://reviews.llvm.org/D16778
llvm-svn: 260483
Summary:
Refactor common value, scope, and label tracking logic out of DwarfDebug
into a common base class called DebugHandlerBase.
Update an old LLVM IR test case to avoid an assertion in LexicalScopes.
Reviewers: dblaikie, majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D16931
llvm-svn: 260432
Summary:
This fixes a crash where subsequent spills would be unable to scavenge
a register. In particular, it fixes a crash in piglit's
spec@glsl-1.50@execution@geometry@max-input-components (the test still
has a shader that fails to compile because of too many SGPR spills, but
at least it doesn't crash any more).
This is a candidate for the release branch.
Reviewers: arsenm, tstellarAMD
Subscribers: qcolombet, arsenm
Differential Revision: http://reviews.llvm.org/D16558
llvm-svn: 260427
Instead of passing varargs directly on the user stack, allocate a buffer in
the caller's stack frame and pass a pointer to it. This simplifies the C
ABI (e.g. non-C callers of C functions do not need to use C's user stack if
they have their own mechanism) and allows further optimizations in the future
(e.g. fewer functions may need to use the stack).
Differential Revision: http://reviews.llvm.org/D17048
llvm-svn: 260421
The encodings for floating point conditions A(lways) and N(ever) were
incorrectly specified for the assembly parser, per Sparc manual v8 page
121. This change corrects that mistake.
Also, strangely, all of the branch instructions already had MC test
cases, except for the broken ones. Added the tests.
Patch by Chris Dewhurst
Differential Revision: http://reviews.llvm.org/D17074
llvm-svn: 260390
Now the parser supports `%got(sym)` expressions only but `%got(sym + const)`
variant is also valid and accepted by GAS.
Differential Revision: http://reviews.llvm.org/D16885
llvm-svn: 260305
I reinvented this functionality in http://reviews.llvm.org/D16828 because it was
hidden away as a static function. The changes in x86 are not based on a complete
audit. I suspect there are other possible uses there, and there are almost certainly
more potential users in other targets.
llvm-svn: 260295
Summary:
Fix case where a pre-inc/dec load/store would not be formed if the
add/sub that forms the inc/dec part of the operation was the first
instruction in the block being examined.
Reviewers: mcrosier, jmolloy, t.p.northover, junbuml
Subscribers: aemerson, rengolin, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D16785
llvm-svn: 260275
The logic to pair instructions and merge narrow instructions has become cloogy
and error prone. This patch beings to unravel these two similar, but distinct
optimizations.
llvm-svn: 260242
As mentioned in http://reviews.llvm.org/D16828 , the related masked load transform
will need this logic, so I'm moving it out to make that patch smaller.
llvm-svn: 260240
On AVX2 target we are poorly legalizing SIGN_EXTEND ops for which the input's legalized type doesn't have the same number of elements as the destination, resulting in an ANY_EXTEND followed by a SIGN_EXTEND_INREG.
This patch uses the existing SIGN_EXTEND -> SIGN_EXTEND_VECTOR_INREG combine to extend the input to the size of the result and using SIGN_EXTEND_VECTOR_INREG instead.
Differential Revision: http://reviews.llvm.org/D16994
llvm-svn: 260210
As discussed on PR26491, this patch adds support for lowering v4f32 shuffles to the MOVLHPS/MOVHLPS instructions. It also adds support for memory folding with their MOVLPS/MOVHPS load equivalents.
This first patch only really helps SSE1 targets as SSE2+ targets will widen the shuffle mask and use v2f64 equivalents (although they still combine to MOVLHPS/MOVHLPS for v2f64 splats). This will have to be addressed in a future patch, most likely when we add support for binary target shuffle combines.
Differential Revision: http://reviews.llvm.org/D16956
llvm-svn: 260168
Another opportunity to reduce masked stores: in D16691, we decided not to attempt the 'one mask element is set'
transform in InstCombine, but this should be a win for any AVX machine.
Code comments note that this transform could be extended for other targets / cases.
Differential Revision: http://reviews.llvm.org/D16828
llvm-svn: 260145
Summary:
We will hit this once we have enabled uniform branches. The
smrd-vccz-bug.ll test will be added with the uniform branch commit.
Reviewers: mareko, arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D16725
llvm-svn: 260137
This matches GCC and MSVC's behaviour, and saves on code size.
We were already not extending i1 return values on x86_64 after r127766. This
takes that patch further by applying it to x86 target as well, and also for i8
and i16.
The ABI docs have been unclear about the required behaviour here. The new i386
psABI [1] clearly states (Table 2.4, page 14) that i1, i8, and i16 return
vales do not need to be extended beyond 8 bits. The x86_64 ABI doc is being
updated to say the same [2].
Differential Revision: http://reviews.llvm.org/D16907
[1]. https://01.org/sites/default/files/file_attach/intel386-psabi-1.0.pdf
[2]. https://groups.google.com/d/msg/x86-64-abi/E8O33onbnGQ/_RFWw_ixDQAJ
llvm-svn: 260133
The combineX86ShufflesRecursively only supports unary shuffles, but was missing the opportunity to combine binary shuffles with a zero / undef second input.
This patch resolves target shuffle inputs, converting the shuffle mask elements to SM_SentinelUndef/SM_SentinelZero where possible. It then resolves the updated mask to check if we have created a faux unary shuffle.
Additionally, we now attempt to recursively call combineX86ShufflesRecursively for all input operands (we used to just recurse for unary integer shuffles and unary unpacks) - it safely returns early if its not a target shuffle.
Differential Revision: http://reviews.llvm.org/D16683
llvm-svn: 260063
Pulled out the code used by PSHUFB/VPERMV/VPERMV3 shuffle mask decoding into common helper functions.
The helper functions handle masks coming from BROADCAST/BUILD_VECTOR and ConstantPool nodes respectively.
llvm-svn: 260032
First step towards being able to decode AVX512 PMOVZX instructions without a massive bloat in the shuffle decode switch statement.
This should also make it easier to decode X86ISD::VZEXT target shuffles in the future.
llvm-svn: 259995
The current situation isn't great, because the amount of padding
requires is determined by the inverse order of the first encountered
use. We should eventually somehow sort these to minimize wasted space.
Another problem is the alignment of kernel arguments isn't
respected. The group_segment_alignment is always emitted as
the default 16, and typed arguments with higher alignments
or an explicitly set alignment are also ignored.
llvm-svn: 259912
Using the load immediate only when the immediate (whether signed or unsigned)
can fit in a 16-bit signed field. Namely, from -32768 to 32767 for signed and
0 to 65535 for unsigned. This patch also ensures that we sign-extend under the
right conditions.
llvm-svn: 259840
This only impacts the creation of pre-/post-index instructions. The bound was
set high enough such that it did not change code generation for SPEC200X.
llvm-svn: 259828
Choose between MOVD/MOVSS and MOVQ/MOVSD depending on the target vector type.
This has a lot fewer test changes than trying to add this to X86InstrInfo::setExecutionDomain.....
llvm-svn: 259816
This patch allows the mixing of scaled and unscaled load/stores to form
load/store pairs.
PR24465
http://reviews.llvm.org/D12116
Many thanks to Ahmed and Michael for fixes and code review.
This is a reapplication of r246769 and r259790. The tramp3d failure was caused
by an incorrect refactoring in the patch. Specifically, we weren't always
properly clearing the SExtIdx flag.
llvm-svn: 259812
During instruction selection, the AArch64 backend can recognise the
following pattern and generate an [U|S]MADDL instruction, i.e. a
multiply of two 32-bit operands with a 64-bit result:
(mul (sext i32), (sext i32))
However, when one of the operands is constant, the sign extension
gets folded into the constant in SelectionDAG::getNode(). This means
that the instruction selection sees this:
(mul (sext i32), i64)
...which doesn't match the pattern. Sign-extension and 64-bit
multiply instructions are generated, which are slower than one 32-bit
multiply.
Add a pattern to match this and generate the correct instruction, for
both signed and unsigned multiplies.
Patch by Chris Diamand!
llvm-svn: 259800
This patch adds support for consecutive (load/undef elements) 32-bit loads, followed by trailing undef/zero elements to be combined to a single MOVD load.
Differential Revision: http://reviews.llvm.org/D16729
llvm-svn: 259796
This patch allows the mixing of scaled and unscaled load/stores to form
load/store pairs.
PR24465
http://reviews.llvm.org/D12116
Many thanks to Ahmed and Michael for fixes and code review.
This is a reapplication of r246769, which was reverted in r246782 due to a
test-suite failure. I'm unable to reproduce the issue at this time.
llvm-svn: 259790
Use hash table (key is a memory operand) to store found LEA instructions to reduce compile time.
Differential Revision: http://reviews.llvm.org/D16404
llvm-svn: 259770
Add support for TLS access for Windows on ARM. This generates a similar access
to MSVC for ARM.
The changes to the tablegen data is needed to support loading an external symbol
global that is not for a call. The adjustments to the DAG to DAG transforms are
needed to preserve the 32-bit move.
llvm-svn: 259676
The GNU toolchain emits __aeabi_divmod for soft-divide on ARM cores
which happens to be a lot faster than __divsi3/__modsi3 when the core
has hardware divide instructions. Do the same here.
Fixes PR26450.
llvm-svn: 259657
MIPS ABI states that .sbss and .sdata sections must have SHF_MIPS_GPREL
flag. See Figure 4–7 on page 69 in the following document:
ftp://www.linux-mips.org/pub/linux/mips/doc/ABI/mipsabi.pdf.
Differential Revision: http://reviews.llvm.org/D15740
llvm-svn: 259641
Follow up to D16217 and D16729
This change uncovered an odd pattern where VZEXT_LOAD v4i64 was being lowered to a load of the lower v2i64 (so the 2nd i64 destination element wasn't being zeroed), I can't find any use/reason for this and have removed the pattern and replaced it so only the 1st i64 element is loaded and the upper bits all zeroed. This matches the description for X86ISD::VZEXT_LOAD
Differential Revision: http://reviews.llvm.org/D16768
llvm-svn: 259635
The purpose of PPCVSXFMAMutate is to elide copies by changing FMA forms
on PPC.
%vreg6<def> = COPY %vreg96
%vreg6<def,tied1> = XSMADDASP %vreg6<tied0>, %vreg5<kill>, %vreg7
;v6 = v6 + v5 * v7
is replaced by
%vreg5<def,tied1> = XSMADDMSP %vreg5<tied0>, %vreg7, %vreg96
;v5 = v5 * v7 + v96
This was broken in the case where the target register was also used as a
multiplicand. Fix this case by checking for it and replacing both uses
with the copied register.
%vreg6<def> = COPY %vreg96
%vreg6<def,tied1> = XSMADDASP %vreg6<tied0>, %vreg5<kill>, %vreg6
;v6 = v6 + v5 * v6
is replaced by
%vreg5<def,tied1> = XSMADDMSP %vreg5<tied0>, %vreg96, %vreg96
;v5 = v5 * v96 + v96
llvm-svn: 259617
If we can't assume the pointer value isn't within the bounds
of the object, it seems risky to try to replace the pointer
calculations.
llvm-svn: 259573
When promoting allocas to LDS, we know we are indexing
into a specific area just created, and the calculation
will also never overflow.
Also emit some of the muls as nsw nuw, because instcombine
infers this already from the range metadata. I think
putting this on the other adds and muls might be OK too,
but I'm not 100% sure.
llvm-svn: 259545
Summary:
Enables eip-based addressing, e.g.,
lea constant(%eip), %rax
lea constant(%eip), %eax
in MC, (used for the x32 ABI). EIP-base addressing is also valid in x86_64,
it is left enabled for that architecture as well.
Patch by João Porto
Differential Revision: http://reviews.llvm.org/D16581
llvm-svn: 259528
Re-commit of r258951 after fixing layering violation.
The BPF and WebAssembly backends had identical code for emitting errors
for unsupported features, and AMDGPU had very similar code. This merges
them all into one DiagnosticInfo subclass, that can be used by any
backend.
There should be minimal functional changes here, but some AMDGPU tests
have been updated for the new format of errors (it used a slightly
different format to BPF and WebAssembly). The AMDGPU error messages will
now benefit from having precise source locations when debug info is
available.
llvm-svn: 259498
description and changed the regression test accordingly.
The default configuration of a Cortex-R7 is to implement the
VFPv3-D16 architecture and the feature line as it was is too
restrictive.
llvm-svn: 259480
Fix a crash in `getMemOpBaseRegImmOfs` that happens if the base of
`MemOp` is a frame index memory operand. The fix is to have
`getMemOpBaseRegImmOfs` bail out in such cases. We can possibly be more
clever here, if needed.
llvm-svn: 259456
Officially, we don't acknowledge non-default configurations of MXCSR,
as getting there would require usage of the FENV_ACCESS pragma (at
least insofar as rounding mode is concerned).
We don't support the pragma, so we can assume that the default
rounding mode - round to nearest, ties to even - is always used.
However, it's inconsistent with the rest of the instruction set,
where MXCSR is always effective (unless otherwise specified).
Also, it's an unnecessary obstacle to the few brave souls that use
fenv.h with LLVM.
Avoid the hard-coded rounding mode for fp_to_f16; use MXCSR instead.
llvm-svn: 259448
These sets do linear searching in small mode; It is not a good idea to
use huge numbers as the small value here, save people from themselves by
adding a static_assert.
Differential Revision: http://reviews.llvm.org/D16706
llvm-svn: 259419
Summary:
This is an extension to the existing implementation of r242436 which
restricts to only select inputs. This version fixes missed opportunities
in pr26084 by attempting to lower conditional compare sequences of
and/or trees with setcc leafs. This will additionaly handle the case
when a tree with select input is not a conjunction-disjunction tree
but some of the sub trees are conjunction-disjunction trees.
Reviewers: jmolloy, t.p.northover, mcrosier, MatzeB
Subscribers: mcrosier, llvm-commits, junbuml, haicheng, mssimpso, gberry
Differential Revision: http://reviews.llvm.org/D16291
llvm-svn: 259387
Summary:
Factor out common code for callee-save register pair calculation. This
is intended to simplify follow-on changes that reduce the number of
registers saved/restored.
Depends on D16732
Reviewers: mcrosier, jmolloy, t.p.northover
Subscribers: aemerson, rengolin, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D16734
llvm-svn: 259384
We've found another bug in the code generation logic conditions for a
certain class of always-false conditions, those of the form
if ((a & 1) < 0)
These only reach the back end when compiling without optimization.
The bug was introduced by the choice of using TEST UNDER MASK
to implement a check for
if ((a & MASK) < VAL)
as
if ((a & MASK) == 0)
where VAL is less than the the lowest bit of MASK. This is correct
in all cases except for VAL == 0, in which case the original
condition is always false, but the replacement isn't.
Fixed by excluding that particular case.
llvm-svn: 259381
Summary:
Simplify callee-save register save/restore code generation by
remembering the size of the callee-save area when it is computed so we
don't have to scan the prologue/epilogue instructions again later to
reconstruct it.
This is intended to simplify follow-on changes that reduce the number of
registers saved/restored.
Reviewers: mcrosier, jmolloy, t.p.northover
Subscribers: aemerson, rengolin, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D16732
llvm-svn: 259365
Summary:
The bugs were:
* teq and similar take 4-bit unsigned immediates on microMIPS.
* teqi and similar have side-effects like teq do.
* shll_s.w and shra_r.w take 5-bit unsigned immediates.
* The various DSP ext* instructions take a 5-bit immediate.
* repl.qh takes an 8-bit unsigned immediate.
* repl.ph takes a 10-bit unsigned immediate.
* rddsp/wrdsp take a 10-bit unsigned immediate.
* teqi and similar take signed 16-bit immediates (10-bit for microMIPS).
* Out-of-range immediate macros for or/xor take a simm32/simm64 depending
on architecture. I'll fix the simm64 case properly when I reach simm32.
lui is a bit more lenient than GAS and accepts signed immediates in addition
to unsigned. This is because MipsMCExpr can produce signed values when
constant folding and it currently lacks a way of knowing it should fold to
an unsigned value.
Reviewers: vkalintiris
Subscribers: dsanders, llvm-commits
Differential Revision: http://reviews.llvm.org/D15446
llvm-svn: 259360
Minor patch to trace back through target shuffles to the source of the inserted element in a (V)INSERTPS shuffle.
Differential Revision: http://reviews.llvm.org/D16652
llvm-svn: 259343
Previously the code assumed all uses of FI on loads and stores were as
addresses. This checks whether the use is the address or a value and
handles the latter case as it does for non-memory instructions.
llvm-svn: 259306
The previous code was incorrect (can't getReg a frameindex). We could instead optimize it to reduce tree height, but I'm not sure that's worthwhile yet because we then try to eliminate the frameindex.
This patch also fixes frame index elimination for operations which may load or store: it used to assume the base was operand 2 and immediate offset operand 1. That's not true for stores, where they're 4 and 3.
llvm-svn: 259305
The AMDGPUPromoteAlloca pass was emitting the read.local.size
calls, which with HSA was incorrectly selected to reading from
the offset mesa uses off of the kernarg pointer.
Error on intrinsics which aren't supported by HSA, and start
emitting the correct IR to read the workgroup size
out of the dispatch pointer.
Also initialize the pass so it can be tested with opt, and
start moving towards not depending on the subtarget as an
argument.
Start emitting errors for the intrinsics not handled with HSA.
llvm-svn: 259297
Only the dispatch.ptr intrinsic is supposed to be used now to get
the workgroup size, and the read.local.size intrinsics do not
work correctly.
llvm-svn: 259296
Refine the test for whether an instruction is in an expression tree so that
it detects when one tree ends and another begins, so we can place a block
at that point, rather than continuing to find the first instruction not in
a tree at all.
llvm-svn: 259294
The basic optimisation was to convert (mul $LHS, $complex_constant) into
roughly "(shl (mul $LHS, $simple_constant), $simple_amt)" when it was expected
to be cheaper. The original logic checks that the mul only has one use (since
we're mangling $complex_constant), but when used in even more complex
addressing modes there may be an outer addition that can pick up the wrong
value too.
I *think* the ARM addressing-mode problem is actually unreachable at the
moment, but that depends on complex assessments of the profitability of
pre-increment addressing modes so I've put a real check in there instead of an
assertion.
llvm-svn: 259228
Add support for frame pointer use in prolog/epilog.
Supports dynamic allocas but not yet over-aligned locals.
Target-independend CG generates SP updates, but we still need to write
back the SP value to memory when necessary.
llvm-svn: 259220
The trap instruction is emitted as a data-in-text rather
than an instruction. This patch uses the .inst directive
for emitting trap.
Differential Revision: http://reviews.llvm.org/D16684
llvm-svn: 259182
check that the sign extended constant fits into 16-bits if we want a
zero extended value, otherwise go ahead and put it together piecemeal.
Fixes PR26356.
llvm-svn: 259177
Since we only have pair - not single - nontemporal store instructions,
we have to extract the high part into a separate register to be able
to use them.
When the initial nontemporal codegen support was added, I wrote the
extract using the nonsensical UBFX [0,32[.
Use the correct LSR form instead.
llvm-svn: 259134
Summary:
Also delete all the stub functions that are identical to the
implementations in TargetInstrInfo.cpp.
Reviewers: arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D16609
llvm-svn: 259054
Enable truncate 128/256bit packed byte/word with AVX512BW but without AVX512VL, use 512bit instructions.
Differential Revision: http://reviews.llvm.org/D16531
llvm-svn: 259044
Author: milena.vujosevic.janicic
Reviewers: dsanders
FastIsel is not supported for microMIPS, thus it needs to be disabled.
Test micromips-zero-mat-uses.ll is deleted since the tested sequence of instructions is not generated for microMIPS without FastISel.
Differential Revision: http://reviews.llvm.org/D15892
llvm-svn: 259039
Re-commit of r258951 after fixing layering violation.
The related LLVM patch adds a backend diagnostic type for reporting
unsupported features, this adds a printer for them to clang.
In the case where debug location information is not available, I've
changed the printer to report the location as the first line of the
function, rather than the closing brace, as the latter does not give the
user any information. This also affects optimisation remarks.
Differential Revision: http://reviews.llvm.org/D16590
llvm-svn: 259035
move ptestm{q|d} intrinsics from patterns form (in td file) to the intrinsics table
Differential Revision: http://reviews.llvm.org/D16633
llvm-svn: 259029
This patch revamps the RegStackifier pass with a new tree traversal mechanism,
enabling three major new features:
- Stackification of values with multiple uses, using the result value of set_local
- More aggressive stackification of instructions with side effects
- Reordering operands in commutative instructions to enable more stackification.
llvm-svn: 259009
This patch is part of the work to make PPCLoopDataPrefetch
target-independent
(http://thread.gmane.org/gmane.comp.compilers.llvm.devel/92758).
As it was discussed in the above thread, getPrefetchDistance is
currently using instruction count which may change in the future.
llvm-svn: 258995
Summary:
Just does the simple allocation of a stack object and passes
a pointer to the callee.
Differential Revision: http://reviews.llvm.org/D16610
llvm-svn: 258989
Various bits we want to use the new ABI actually compile with "-arch armv7k
-miphoneos-version-min=9.0". Not ideal, but also not ridiculous given how
slices work.
llvm-svn: 258975
The BPF and WebAssembly backends had identical code for emitting errors
for unsupported features, and AMDGPU had very similar code. This merges
them all into one DiagnosticInfo subclass, that can be used by any
backend.
There should be minimal functional changes here, but some AMDGPU tests
have been updated for the new format of errors (it used a slightly
different format to BPF and WebAssembly). The AMDGPU error messages will
now benefit from having precise source locations when debug info is
available.
The implementation of DiagnosticInfoUnsupported::print must be in
lib/Codegen rather than in the existing file in lib/IR/ to avoid
introducing a dependency from IR to CodeGen.
Differential Revision: http://reviews.llvm.org/D16590
llvm-svn: 258951
Summary:
We didn't have entries in the commuting table for the 32-bit
instructions. I don't think we hit this problem now, but we
will once uniform branching is enabled. Tests will come in
a later commit.
Reviewers: arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D16600
llvm-svn: 258936
Summary:
This is a candidate for stable, along with all patches that add the "stoney"
processor.
Reviewers: tstellarAMD
Subscribers: arsenm
Differential Revision: http://reviews.llvm.org/D16485
llvm-svn: 258922
When no device name is specified, default to kaveri
for HSA since SI is not supported and it woud fail.
Default to "tahiti" instead of "SI" since these are
effectively the same, and tahiti is an actual device.
Move default device handling to the TargetMachine
rather than the AMDGPUSubtarget. The module ISA version
is computed from the device name provided with the target
machine, so the attributes printed by the AsmPrinter were
inconsistent with those computed in the subtarget.
Also remove DevName field from subtarget since it's redundant
with getCPU() in the superclass.
llvm-svn: 258901
This brings the compile time of Function.cpp from ~40s down to ~4s for
me locally. It also shaves off about 400KB of object file size in a
release+asserts build.
I also realized that the AMDGPU backend does not have any GCC builtin
names to match, so the extra lookup was a no-op. I removed it to silence
a zero-length string table array warning. There should be no functional
change here.
This change really ends the story of PR11951.
llvm-svn: 258897
The AMDGPU backend was the last user of the old StringMatcher
recognition code. Move it over to the new lookupLLVMIntrinsicName
funciton, which is now improved to handle all of the interesting edge
cases exposed by AMDGPU intrinsic names.
llvm-svn: 258875
Summary:
This patch is provided in preparation for removing autoconf on 1/26. The proposal to remove autoconf on 1/26 was discussed on the llvm-dev thread here: http://lists.llvm.org/pipermail/llvm-dev/2016-January/093875.html
"I felt a great disturbance in the [build system], as if millions of [makefiles] suddenly cried out in terror and were suddenly silenced. I fear something [amazing] has happened."
- Obi Wan Kenobi
Reviewers: chandlerc, grosbach, bob.wilson, tstellarAMD, echristo, whitequark
Subscribers: chfast, simoncook, emaste, jholewinski, tberghammer, jfb, danalbert, srhines, arsenm, dschuff, jyknight, dsanders, joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D16471
llvm-svn: 258861
r258781 optimized memcpy/memmove/memcpy so the intrinsic call can return its first argument, but missed the frame index case. Teach it to ignore that case so C code doesn't assert out in these cases.
llvm-svn: 258851
Currently, AnalyzeBranch() fails non-equality comparison between floating points
on X86 (see https://llvm.org/bugs/show_bug.cgi?id=23875). This is because this
function can modify the branch by reversing the conditional jump and removing
unconditional jump if there is a proper fall-through. However, in the case of
non-equality comparison between floating points, this can turn the branch
"unanalyzable". Consider the following case:
jne.BB1
jp.BB1
jmp.BB2
.BB1:
...
.BB2:
...
AnalyzeBranch() will reverse "jp .BB1" to "jnp .BB2" and then "jmp .BB2" will be
removed:
jne.BB1
jnp.BB2
.BB1:
...
.BB2:
...
However, AnalyzeBranch() cannot analyze this branch anymore as there are two
conditional jumps with different targets. This may disable some optimizations
like block-placement: in this case the fall-through behavior is enforced even if
the fall-through block is very cold, which is suboptimal.
Actually this optimization is also done in block-placement pass, which means we
can remove this optimization from AnalyzeBranch(). However, currently
X86::COND_NE_OR_P and X86::COND_NP_OR_E are not reversible: there is no defined
negation conditions for them.
In order to reverse them, this patch defines two new CondCode X86::COND_E_AND_NP
and X86::COND_P_AND_NE. It also defines how to synthesize instructions for them.
Here only the second conditional jump is reversed. This is valid as we only need
them to do this "unconditional jump removal" optimization.
Differential Revision: http://reviews.llvm.org/D11393
llvm-svn: 258847
This patch adds support for trailing zero elements to VZEXT_LOAD loads (and checks that no zero elts occur within the consecutive load).
It also generalizes the 64-bit VZEXT_LOAD load matching to work for loads other than 2x32-bit loads.
After this patch it will also be easier to add support for other basic load patterns like 32-bit VZEXT_LOAD loads, PMOVZX and subvector load insertion.
Differential Revision: http://reviews.llvm.org/D16217
llvm-svn: 258798
Their opcodes are used as part of the VEX prefix in 64-bit mode. Clearly the disassembler implicitly decoded them as AVX instructions in 64-bit mode, but I think the AsmParser would have encoded them.
llvm-svn: 258793
Make comments and indentation more consistent.
Rearrange a few things to be in a more consistent order,
such as organizing subtarget features from those describing
an actual device property, and those used as options.
llvm-svn: 258789
I did my best to try to update all the uses in tests that
just happened to use the old ones to the newer intrinsics.
I'm not sure I got all of the immediate operand conversions
correct, since the value seems to have been ignored by the
old pattern but I don't think it really matters.
llvm-svn: 258787
Some of the special intrinsics now that now correspond to a instruction
also have special setting of some registers, e.g. llvm.SI.sendmsg sets
m0 as well as use s_sendmsg. Using these explicit register intrinsics
may be a better option.
Reading the exec mask and others may be useful for debugging. For this
I'm not sure this is entirely correct because we would want this to
be convergent, although it's possible this is already treated
sufficently conservatively.
llvm-svn: 258785
These calls return their first argument, but because LLVM uses an intrinsic
with a void return type, they can't use the returned attribute. Generalize
the store results pass to optimize these calls too.
llvm-svn: 258781
Step one towards using a simple binary search to lookup intrinsic IDs
instead of our crazy table generated switch+memcmp+startswith code that
makes Function.cpp take about a minute to compile. See PR24785 and
PR11951 for why we should do this.
The X86 backend contains tables that need to be sorted on intrinsic ID,
so reorder those.
llvm-svn: 258757
There's a special case in EmitLoweredSelect() that produces an improved
lowering for cmov(cmov) patterns. However this special lowering is
currently broken if the inner cmov has multiple users so this patch
stops using it in this case.
If you wonder why this wasn't fixed by continuing to use the special
lowering and inserting a 2nd PHI for the inner cmov: I believe this
would incur additional copies/register pressure so the special lowering
does not improve upon the normal one anymore in this case.
This fixes http://llvm.org/PR26256 (= rdar://24329747)
llvm-svn: 258729