llvm-project/llvm/lib/Target/PowerPC/README_ALTIVEC.txt

180 lines
5.5 KiB
Plaintext
Raw Normal View History

//===- README_ALTIVEC.txt - Notes for improving Altivec code gen ----------===//
Implement PPCInstrInfo::isLoadFromStackSlot/isStoreToStackSlot for vector
registers, to generate better spill code.
//===----------------------------------------------------------------------===//
2006-04-18 05:52:03 +08:00
The first should be a single lvx from the constant pool, the second should be
a xor/stvx:
void foo(void) {
2006-04-07 07:16:19 +08:00
int x[8] __attribute__((aligned(128))) = { 1, 1, 1, 17, 1, 1, 1, 1 };
bar (x);
}
#include <string.h>
void foo(void) {
int x[8] __attribute__((aligned(128)));
memset (x, 0, sizeof (x));
bar (x);
}
//===----------------------------------------------------------------------===//
Altivec: Codegen'ing MUL with vector FMADD should add -0.0, not 0.0:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=8763
When -ffast-math is on, we can use 0.0.
//===----------------------------------------------------------------------===//
Consider this:
v4f32 Vector;
v4f32 Vector2 = { Vector.X, Vector.X, Vector.X, Vector.X };
Since we know that "Vector" is 16-byte aligned and we know the element offset
of ".X", we should change the load into a lve*x instruction, instead of doing
a load/store/lve*x sequence.
//===----------------------------------------------------------------------===//
For functions that use altivec AND have calls, we are VRSAVE'ing all call
clobbered regs.
//===----------------------------------------------------------------------===//
Implement passing vectors by value into calls and receiving them as arguments.
//===----------------------------------------------------------------------===//
GCC apparently tries to codegen { C1, C2, Variable, C3 } as a constant pool load
of C1/C2/C3, then a load and vperm of Variable.
//===----------------------------------------------------------------------===//
We need a way to teach tblgen that some operands of an intrinsic are required to
be constants. The verifier should enforce this constraint.
//===----------------------------------------------------------------------===//
2006-03-29 02:56:23 +08:00
We currently codegen SCALAR_TO_VECTOR as a store of the scalar to a 16-byte
aligned stack slot, followed by a load/vperm. We should probably just store it
to a scalar stack slot, then use lvsl/vperm to load it. If the value is already
in memory this is a big win.
//===----------------------------------------------------------------------===//
extract_vector_elt of an arbitrary constant vector can be done with the
following instructions:
vTemp = vec_splat(v0,2); // 2 is the element the src is in.
vec_ste(&destloc,0,vTemp);
We can do an arbitrary non-constant value by using lvsr/perm/ste.
//===----------------------------------------------------------------------===//
2006-04-18 01:29:41 +08:00
If we want to tie instruction selection into the scheduler, we can do some
constant formation with different instructions. For example, we can generate
"vsplti -1" with "vcmpequw R,R" and 1,1,1,1 with "vsubcuw R,R", and 0,0,0,0 with
"vsplti 0" or "vxor", each of which use different execution units, thus could
help scheduling.
2006-04-18 01:29:41 +08:00
This is probably only reasonable for a post-pass scheduler.
//===----------------------------------------------------------------------===//
2006-04-19 02:30:19 +08:00
For this function:
void test(vector float *A, vector float *B) {
vector float C = (vector float)vec_cmpeq(*A, *B);
if (!vec_any_eq(*A, *B))
*B = (vector float){0,0,0,0};
*A = C;
}
we get the following basic block:
...
lvx v2, 0, r4
lvx v3, 0, r3
vcmpeqfp v4, v3, v2
vcmpeqfp. v2, v3, v2
bne cr6, LBB1_2 ; cond_next
The vcmpeqfp/vcmpeqfp. instructions currently cannot be merged when the
vcmpeqfp. result is used by a branch. This can be improved.
//===----------------------------------------------------------------------===//
2006-04-19 13:55:06 +08:00
The code generated for this is truly aweful:
vector float test(float a, float b) {
return (vector float){ 0.0, a, 0.0, 0.0};
}
LCPI1_0: ; float
.space 4
.text
.globl _test
.align 4
_test:
mfspr r2, 256
oris r3, r2, 4096
mtspr 256, r3
lis r3, ha16(LCPI1_0)
addi r4, r1, -32
stfs f1, -16(r1)
addi r5, r1, -16
lfs f0, lo16(LCPI1_0)(r3)
stfs f0, -32(r1)
lvx v2, 0, r4
lvx v3, 0, r5
vmrghw v3, v3, v2
vspltw v2, v2, 0
vmrghw v2, v2, v3
mtspr 256, r2
blr
//===----------------------------------------------------------------------===//
2006-04-20 00:22:38 +08:00
int foo(vector float *x, vector float *y) {
if (vec_all_eq(*x,*y)) return 3245;
else return 12;
}
A predicate compare being used in a select_cc should have the same peephole
applied to it as a predicate compare used by a br_cc. There should be no
mfcr here:
_foo:
mfspr r2, 256
oris r5, r2, 12288
mtspr 256, r5
li r5, 12
li r6, 3245
lvx v2, 0, r4
lvx v3, 0, r3
vcmpeqfp. v2, v3, v2
mfcr r3, 2
rlwinm r3, r3, 25, 31, 31
cmpwi cr0, r3, 0
bne cr0, LBB1_2 ; entry
LBB1_1: ; entry
mr r6, r5
LBB1_2: ; entry
mr r3, r6
mtspr 256, r2
blr
//===----------------------------------------------------------------------===//
2006-04-28 08:04:05 +08:00
CodeGen/PowerPC/vec_constants.ll has an and operation that should be
codegen'd to andc. The issue is that the 'all ones' build vector is
SelectNodeTo'd a VSPLTISB instruction node before the and/xor is selected
which prevents the vnot pattern from matching.
//===----------------------------------------------------------------------===//