The VDUP instruction source register doesn't allow a non-constant lane
index, so make sure we don't construct a ARM::VDUPLANE node asking it to
do so.
rdar://13328063
http://llvm.org/bugs/show_bug.cgi?id=13963
llvm-svn: 176413
This matters for example in following matrix multiply:
int **mmult(int rows, int cols, int **m1, int **m2, int **m3) {
int i, j, k, val;
for (i=0; i<rows; i++) {
for (j=0; j<cols; j++) {
val = 0;
for (k=0; k<cols; k++) {
val += m1[i][k] * m2[k][j];
}
m3[i][j] = val;
}
}
return(m3);
}
Taken from the test-suite benchmark Shootout.
We estimate the cost of the multiply to be 2 while we generate 9 instructions
for it and end up being quite a bit slower than the scalar version (48% on my
machine).
Also, properly differentiate between avx1 and avx2. On avx-1 we still split the
vector into 2 128bits and handle the subvector muls like above with 9
instructions.
Only on avx-2 will we have a cost of 9 for v4i64.
I changed the test case in test/Transforms/LoopVectorize/X86/avx1.ll to use an
add instead of a mul because with a mul we now no longer vectorize. I did
verify that the mul would be indeed more expensive when vectorized with 3
kernels:
for (i ...)
r += a[i] * 3;
for (i ...)
m1[i] = m1[i] * 3; // This matches the test case in avx1.ll
and a matrix multiply.
In each case the vectorized version was considerably slower.
radar://13304919
llvm-svn: 176403
This patch eliminates the need to emit a constant move instruction when this
pattern is matched:
(select (setgt a, Constant), T, F)
The pattern above effectively turns into this:
(conditional-move (setlt a, Constant + 1), F, T)
llvm-svn: 176384
- ISD::SHL/SRL/SRA must have either both scalar or both vector operands
but TLI.getShiftAmountTy() so far only return scalar type. As a
result, backend logic assuming that breaks.
- Rename the original TLI.getShiftAmountTy() to
TLI.getScalarShiftAmountTy() and re-define TLI.getShiftAmountTy() to
return target-specificed scalar type or the same vector type as the
1st operand.
- Fix most TICG logic assuming TLI.getShiftAmountTy() a simple scalar
type.
llvm-svn: 176364
dispatch code. As far as I can tell the thumb2 code is behaving as expected.
I was able to compile and run the associated test case for both arm and thumb1.
rdar://13066352
llvm-svn: 176363
v2: based on Michels patch, but now allows copying of all registers sizes.
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
llvm-svn: 176346
This function will be used later when the capability to search delay slot
filling instructions in successor blocks is added. No intended functionality
changes.
llvm-svn: 176325
The work done by the post-encoder (setting architecturally unused bits to 0 as
required) can be done by the existing operand that covers the "#0.0". This
removes at least one use of the discouraged PostEncoderMethod uses.
llvm-svn: 176261
If an otherwise weak var is actually defined in this unit, it can't be
undefined at runtime so we can use normal global variable sequences (ADRP/ADD)
to access it.
llvm-svn: 176259
This fixes an issue where trying to assemlbe valid ADR instructions would cause
LLVM to hit a failed assertion.
Patch by Keith Walker.
llvm-svn: 176189
There's no need to generate a stack frame for PPC32 SVR4 when there are
no local variables assigned to the stack, i.e., when no red zone is needed.
(PPC64 supports a red zone, but PPC32 does not.)
llvm-svn: 176124
Make it possible to map between e32 and e64 encoding opcodes.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 176104
Include immediate folding and SGPR limit handling for VOP3 instructions.
v2: remove leftover hasExtraSrcRegAllocReq
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 176101
v2: document why we hardcode VCC for now.
This is a candidate for the mesa-stable branch.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 176099
Prevent producing real strange tablegen code by using
proper register sizes, alignments and hierarchy.
Also cleanup the unused definitions and add some comments.
v2: add SGPR 512 bit registers, stop registers from wrapping around,
fix SGPR alignment
This is a candidate for the mesa-stable branch.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 176098
This is a candidate for the mesa-stable branch.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 176097
fewer scalar integer (i32 or i64) arguments. It completely eliminates the need
for SDISel for trivial functions.
Also, add the new llc -fast-isel-abort-args option, which is similar to
-fast-isel-abort option, but for formal argument lowering.
llvm-svn: 176052
This removes a const_cast hack from PPCRegisterInfo::hasReservedSpillSlot().
The proper place to save the frame index for the CR spill slot is in the
PPCFunctionInfo object, not the PPCRegisterInfo object.
No new test cases, as this just reimplements existing function. Existing
tests such as test/CodeGen/PowerPC/crsave.ll are sufficient.
llvm-svn: 175998
16 more little piglits with radeonsi.
NOTE: This is a candidate for the Mesa stable branch.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 175887
24 more little piglits with radeonsi.
NOTE: This is a candidate for the Mesa stable branch.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 175886
9 more little piglits with radeonsi.
NOTE: This is a candidate for the Mesa stable branch.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 175885
There's no apparent reason this code was copied from generated source
into a .cpp. It sets a bad example for those working on other targets
and trying to understand the register info API.
llvm-svn: 175849
to TargetFrameLowering, where it belongs. Incidentally, this allows us
to delete some duplicated (and slightly different!) code in TRI.
There are potentially other layering problems that can be cleaned up
as a result, or in a similar manner.
The refactoring was OK'd by Anton Korobeynikov on llvmdev.
Note: this touches the target interfaces, so out-of-tree targets may
be affected.
llvm-svn: 175788
Large code model is identical to medium code model except that the
addis/addi sequence for "local" accesses is never used. All accesses
use the addis/ld sequence.
The coding changes are straightforward; most of the patch is taken up
with creating variants of the medium model tests for large model.
llvm-svn: 175767
exists solely to enable it to call itself for i8 with some registers.
The proposed patch simplifies the function somewhat to make the High
bit only meaningful for the i8 mode, which makes sense. No functional
difference (getX86SubSuperRegister is not getting called from anywhere
outside with i64 and High=true).
llvm-svn: 175762
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 175758
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 175757
It actually fixes quite a bunch of piglit tests.
This is a candidate for the mesa-stable branch.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 175756
Instead of using custom inserters, it's simpler and
should make DAG folding easier.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 175755
v2: put implicit parameters in []
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 175754
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 175753
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 175752
Order the classes and add asm operands.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 175751
Fixing asm operation names.
v2: fix name of the e64 encoding, also add asm operands
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 175750
Fixing asm operation names.
v2: use ZERO constant, also add asm operands
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 175749
Fixing asm operation names.
v2: use ZERO constant, also add asm operands
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 175748
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 175747
Those two files got mixed up.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 175746
Fixes for-loop.cl piglit test
Patch By: Vincent Lejeune
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
NOTE: This is a candidate for the Mesa stable branch.
llvm-svn: 175742
The constructs %hi() and %lo() represent the high and low 16
bits of the address.
Because the 16 bit offset field of an LW instruction is
interpreted as signed, if bit 15 of the low part is 1 then the
low part will act as a negative and 1 needs to be added to the
high part.
Contributer: Vladimir Medic
llvm-svn: 175707
This patch implements the PPCDAGToDAGISel::PostprocessISelDAG virtual
method to perform post-selection peephole optimizations on the DAG
representation.
One optimization is implemented here: folds to clean up complex
addressing expressions for thread-local storage and medium code
model. It will also be useful for large code model sequences when
those are added later. I originally thought about doing this on the
MI representation prior to register assignment, but it's difficult to
do effective global dead code elimination at that point. DCE is
trivial on the DAG representation.
A typical example of a candidate code sequence in assembly:
addis 3, 2, globalvar@toc@ha
addi 3, 3, globalvar@toc@l
lwz 5, 0(3)
When the final instruction is a load or store with an immediate offset
of zero, the offset from the add-immediate can replace the zero,
provided the relocation information is carried along:
addis 3, 2, globalvar@toc@ha
lwz 5, globalvar@toc@l(3)
Since the addi can in general have multiple uses, we need to only
delete the instruction when the last use is removed.
llvm-svn: 175697
excluding visibility bits.
Mips specific standalone assembler directive "set at".
This directive changes the general purpose register
that the assembler will use when given the symbolic
register name $at.
This does not include negative testing. That will come
in a future patch.
A side affect of this patch recognizes the different
GPR register names for temporaries between old abi
and new abi so a test case for that is included.
Contributer: Vladimir Medic
llvm-svn: 175686
This handles the cases where the 6-bit splat element is odd, converting
to a three-instruction sequence to add or subtract two splats. With this
fix, the XFAIL in test/CodeGen/PowerPC/vec_constants.ll is removed.
llvm-svn: 175663
The PPC backend doesn't handle these correctly. This patch uses logic
similar to that in the X86 and ARM backends to track these arguments
properly.
llvm-svn: 175635
Add HexagonMCInst class which adds various Hexagon VLIW annotations.
In addition, this class also includes some APIs related to the
constant extenders.
llvm-svn: 175634
During lowering of a BUILD_VECTOR, we look for opportunities to use a
vector splat. When the splatted value fits in 5 signed bits, a single
splat does the job. When it doesn't fit in 5 bits but does fit in 6,
and is an even value, we can splat on half the value and add the result
to itself.
This last optimization hasn't been working recently because of improved
constant folding. To circumvent this, create a pseudo VADD_SPLAT that
can be expanded during instruction selection.
llvm-svn: 175632
sext <4 x i1> to <4 x i64>
sext <4 x i8> to <4 x i64>
sext <4 x i16> to <4 x i64>
I'm running Combine on SIGN_EXTEND_IN_REG and revert SEXT patterns:
(sext_in_reg (v4i64 anyext (v4i32 x )), ExtraVT) -> (v4i64 sext (v4i32 sext_in_reg (v4i32 x , ExtraVT)))
The sext_in_reg (v4i32 x) may be lowered to shl+sar operations.
The "sar" does not exist on 64-bit operation, so lowering sext_in_reg (v4i64 x) has no vector solution.
I also added a cost of this operations to the AVX costs table.
llvm-svn: 175619
It is possible that frame pointer is not found in the
callee saved info, thus FramePtrSpillFI may be incorrect
if we don't check the result of hasFP(MF).
Besides, if we enable the stack coloring algorithm, there
will be an assertion to ensure the slot is live. But in
the test case, %var1 is not live in the prologue of the
function, and we will get the assertion failure.
Note: There is similar code in ARMFrameLowering.cpp.
llvm-svn: 175616
SltCCRxRy16, SltiCCRxImmX16, SltiuCCRxImmX16, SltuCCRxRy16
$T8 shows up as register $24 when emitted from C++ code so we had
to change some tests that were already there for this functionality.
llvm-svn: 175593
MS-style inline assembly.
This is a follow-on to r175334. Forcing a FP to be emitted doesn't ensure it
will be used. Therefore, force the base pointer as well. We now treat MS
inline assembly in the same way we treat functions with dynamic stack
realignment and VLAs. This guarantees the BP will be used to reference
parameters and locals.
rdar://13218191
llvm-svn: 175576
excluding visibility bits.
Mips (o32 abi) specific e_header setting.
EF_MIPS_ABI_O32 needs to be set in the
ELF header flags for o32 abi output.
Contributer: Reed Kotler
llvm-svn: 175569
excluding visibility bits.
Mips (Mips16) specific e_header setting.
EF_MIPS_ARCH_ASE_M16 needs to be set in the
ELF header flags for Mips16.
Contributer: Reed Kotler
llvm-svn: 175566
excluding visibility bits.
Mips (MicroMips) specific STO handling .
The st_other field settig for STO_MIPS_MICROMIPS
Contributer: Zoran Jovanovic
llvm-svn: 175564
In my previous commit:
"Merge a f32 bitcast of a v2i32 extractelt
A vectorized sitfp on doubles will get scalarized to a sequence of an
extract_element of <2 x i32>, a bitcast to f32 and a sitofp.
Due to the the extract_element, and the bitcast we will uneccessarily generate
moves between scalar and vector registers."
I added a pattern containing a copy_to_regclass. The copy_to_regclass is
actually not needed.
radar://13191881
llvm-svn: 175555
When creating an allocation hint for a register pair, make sure the hint
for the physical register reference is still in the allocation order.
rdar://13240556
llvm-svn: 175541
A vectorized sitfp on doubles will get scalarized to a sequence of an
extract_element of <2 x i32>, a bitcast to f32 and a sitofp.
Due to the the extract_element, and the bitcast we will uneccessarily generate
moves between scalar and vector registers.
The patch fixes this by using a COPY_TO_REGCLASS and a EXTRACT_SUBREG to extract
the element from the vector instead.
radar://13191881
llvm-svn: 175520
This stops the Machine Verifier from complaining about uses of undefined
physical registers.
NOTE: This is a candidate for the Mesa stable branch.
llvm-svn: 175518
Kernel function arguments are lowered to loads from the PARAM_I address
space. When creating these load instructions, we were initializing
their MachinePointerInfo with an Arguement object that was not attached
to any function. This was causing the MachineScheduler to crash when
it tried to access the parent of the Arguement.
This has been fixed by initializing the MachinePointerInfo with a
UndefValue instead.
NOTE: This is a candidate for the Mesa stable branch.
llvm-svn: 175517
In some cases, we were losing track of live implicit registers which
was creating dead defs and causing the scheduler to produce invalid
code.
NOTE: This is a candidate for the Mesa stable branch.
llvm-svn: 175516
fields were only ever set in the constructor. The create method retains
its consistent interface so that these bits can be re-threaded through
the emitter if they're ever needed.
This was found by the -Wunused-private-field Clang warning.
llvm-svn: 175482