On the z13, it turns out to be more efficient to access a full
floating-point register than just the upper half (as done e.g.
by the LE and LER instructions).
Current code already takes this into account when loading from
memory by using the LDE instruction in place of LE. However,
we still generate LER, which shows the same performance issues
as LE in certain circumstances.
This patch changes the back-end to emit LDR instead of LER to
implement FP32 register-to-register copies on z13.
llvm-svn: 263431
For cases where we are truncating an integer vector arithmetic result, it may be better to pre-truncate the input operands - no code to support this yet (scalar is done with SimplifyDemandedBits but adding vector support could be a lot of work) but these tests represent the current codegen status.
Example bugs: PR14666, PR22703
llvm-svn: 263384
The SSE41 v8i16 shift lowering using (v)pblendvb is great for non-constant shift amounts, but if it is constant then we can efficiently reduce the VSELECT to shuffles with the pre-SSE41 lowering.
llvm-svn: 263383
This patch corresponds to review:
http://reviews.llvm.org/D17712
We were not clearing the TOC vector in PPCAsmPrinter when initializing it. This
caused duplicate definition asserts when the pass is reused on the module
(i.e. with -compile-twice or in JIT contexts).
llvm-svn: 263338
cmpxchg[8|16]b uses RBX as one of its argument.
In other words, using this instruction clobbers RBX as it is defined to hold one
the input. When the backend uses dynamically allocated stack, RBX is used as a
reserved register for the base pointer.
Reserved registers have special semantic that only the target understands and
enforces, because of that, the register allocator don’t use them, but also,
don’t try to make sure they are used properly (remember it does not know how
they are supposed to be used).
Therefore, when RBX is used as a reserved register but defined by something that
is not compatible with that use, the register allocator will not fix the
surrounding code to make sure it gets saved and restored properly around the
broken code. This is the responsibility of the target to do the right thing with
its reserved register.
To fix that, when the base pointer needs to be preserved, we use a different
pseudo instruction for cmpxchg that save rbx.
That pseudo takes two more arguments than the regular instruction:
- One is the value to be copied into RBX to set the proper value for the
comparison.
- The other is the virtual register holding the save of the value of RBX as the
base pointer. This saving is done as part of isel (i.e., we emit a copy from
rbx).
cmpxchg_save_rbx <regular cmpxchg args>, input_for_rbx_reg, save_of_rbx_as_bp
This gets expanded into:
rbx = copy input_for_rbx_reg
cmpxchg <regular cmpxchg args>
rbx = save_of_rbx_as_bp
Note: The actual modeling of the pseudo is a bit more complicated to make sure
the interferes that appears after the pseudo gets expanded are properly modeled
before that expansion.
This fixes PR26883.
llvm-svn: 263325
Improve vector extension of vectors on hardware without dedicated VSEXT/VZEXT instructions.
We already convert these to SIGN_EXTEND_VECTOR_INREG/ZERO_EXTEND_VECTOR_INREG but can further improve this by using the legalizer instead of prematurely splitting into legal vectors in the combine as this only properly helps for lowering to VSEXT/VZEXT.
Removes a lot of unnecessary any_extend + mask pattern - (Fix for PR25718).
Differential Revision: http://reviews.llvm.org/D17932
llvm-svn: 263303
Its not enough that we test for SSSE3 - that's only OK for 128-bit vectors - we also need to test for AVX2 / AVX512BW for 256/512 bit vector cases.
llvm-svn: 263239
The constant is now at source operand 1 (previously at 2).
This is also how it is in legacy AMD sp3 assembler.
Update tests.
Differential Revision: http://reviews.llvm.org/D17984
llvm-svn: 263212
Generalise the existing SIGN_EXTEND to SIGN_EXTEND_VECTOR_INREG combine to support zero extension as well and get rid of a lot of unnecessary ANY_EXTEND + mask patterns.
Reapplied with a fix for PR26870 (avoid premature use of TargetConstant in ZERO_EXTEND_VECTOR_INREG expansion).
Differential Revision: http://reviews.llvm.org/D17691
llvm-svn: 263159
This patch fixes the problem which occurs when loop-vectorize tries to use @llvm.masked.load/store intrinsic for a non-default addrspace pointer. It fails with "Calling a function with a bad signature!" assertion in CallInst constructor because it tries to pass a non-default addrspace pointer to the pointer argument which has default addrspace.
The fix is to add pointer type as another overloaded type to @llvm.masked.load/store intrinsics.
Reviewed By: reames
Differential Revision: http://reviews.llvm.org/D17270
llvm-svn: 263158
Summary:
They correspond to BUFFER_LOAD/STORE_FORMAT_XYZW and will be used by Mesa
to implement the GL_ARB_shader_image_load_store extension.
The intention is that for llvm.amdgcn.buffer.load.format, LLVM will decide
whether one of the _X/_XY/_XYZ opcodes can be used (similar to image sampling
and loads). However, this is not currently implemented.
For llvm.amdgcn.buffer.store, LLVM cannot decide to use one of the "smaller"
opcodes and therefore the intrinsic is overloaded. Currently, only the v4f32
is actually implemented since GLSL also only has a vec4 variant of the store
instructions, although it's conceivable that Mesa will want to be smarter
about this in the future.
BUFFER_LOAD_FORMAT_XYZW is already exposed via llvm.SI.vs.load.input, which
has a legacy name, pretends not to access memory, and does not capture the
full flexibility of the instruction.
Reviewers: arsenm, tstellarAMD, mareko
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D17277
llvm-svn: 263140
When trying to replace an add to esp with pops, we need to choose dead
registers to pop into. Registers clobbered by the call and not imp-def'd
by it should be safe. Except that it's not enough to check the register
itself isn't defined, we also need to make sure no overlapping registers
are defined either.
This fixes PR26711.
Differential Revision: http://reviews.llvm.org/D18029
llvm-svn: 263139
Summary:
Peephole optimization that generates a single TBZ/TBNZ instruction
for test and branch sequences like in the example below. This handles
the cases that miss folding of AND into TBZ/TBNZ during ISelLowering of BR_CC
Examples:
and w8, w8, #0x400
cbnz w8, L1
to
tbnz w8, #10, L1
Reviewers: MatzeB, jmolloy, mcrosier, t.p.northover
Subscribers: aemerson, rengolin, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D17942
llvm-svn: 263136
This patch adds Cortex-R8 to Target Parser and TableGen.
It also adds CodeGen tests for the build attributes.
Patch by Pablo Barrio.
Differential Revision: http://reviews.llvm.org/D17925
llvm-svn: 263132
The initial change was insufficiently complete for always getting the semantics
of __builtin_longjmp correct. The builtin is translated into a
`tInt_eh_sjlj_longjmp` DAG node. This node set R7 as clobbered. However, the
code would then follow up with a clobber of R11. I had failed to notice the
imp-def,kill on R7 in the isel. Unfortunately, it seems that it is not possible
to conditionalise the Defs list via an !if. Instead, construct a new parallel
WIN node and prefer that when targeting windows. This ensures that we now both
correctly model the __builtin_longjmp as well as construct the frame in a more
ABI conformant manner.
llvm-svn: 263123
WoA uses r11 as the FP even though it is a pure thumb-2 environment in contrast
to AAPCS which states r7. This adjusts __builtin_longjmp to not clobber r7 and
to properly restore the frame pointer on execution.
llvm-svn: 263118
This patch reorders the combining of target shuffle masks so that when a unary shuffle takes a binary shuffle as its input but only references one of its inputs it can correctly combine into a unary shuffle mask.
This is starting to encroach on the purpose of resolveTargetShuffleInputs, but I don't want to remove it until we definitely know we won't need it for full binary shuffle combining.
There is a lot more work before we can properly support binary target shuffle masks but this was an easy case to add support for.
Differential Revision: http://reviews.llvm.org/D17858
llvm-svn: 263102
Operation SCALAR_TO_VECTOR for v64i8 and v32i16 should be lowered if BW feature is "on".
Differential Revision: http://reviews.llvm.org/D17994
llvm-svn: 263097
This change adds a support for a preserve_most calling convention to the AArch64 backend, similar to how it was done for X86-64.
There is also a subsequent patch on top of this one to add a tail-calls support for this calling convention.
Differential Revision: http://reviews.llvm.org/D18016
llvm-svn: 263092
Instead of a variable-blend instruction, form a blend with immediate because those are always cheaper.
Differential Revision: http://reviews.llvm.org/D17899
llvm-svn: 263067
Summary:
The code in SelectionDAG did not handle the case where the
register type and output types were different, but had the same size.
Reviewers: arsenm, echristo
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D17940
llvm-svn: 263022
Supprot DPP syntax as used in SP3 (except several operands syntax).
Added dpp-specific operands in td-files.
Added DPP flag to TSFlags to determine if instruction is dpp in InstPrinter.
Support for VOP2 DPP instructions in td-files.
Some tests for DPP instructions.
ToDo:
- VOP2bInst:
- vcc is considered as operand
- AsmMatcher doesn't apply mnemonic aliases when parsing operands
- v_mac_f32
- v_nop
- disable instructions with 64-bit operands
- change dpp_ctrl assembler representation to conform sp3
Review: http://reviews.llvm.org/D17804
llvm-svn: 263008
This implements a very simple conservative transformation that doesn't
require more than linear code size growth. There's room for much more
optimization in this space.
llvm-svn: 262982
The fix consisting in using the library call for atomic compare and swap when
the instruction is not safe to use may be incorrect. Indeed the library call may
not exist on all platform. In other words, we need a better fix!
llvm-svn: 262943