v2: use ffbh/l if available
v3: Rebase on top of Matt's SI patches
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <tom@stellard.net>
llvm-svn: 213072
This helps avoid redundant instructions to unpack, and repack
the vectors. Ideally we could recognize that pattern and eliminate
it. Currently v4i8 and other small element type vectors are scalarized,
so this has the added bonus of avoiding that.
llvm-svn: 213031
We need the intrinsics with offsets, so why not just add them all.
The R128 parameter will also be useful for reducing SGPR usage.
GL_ARB_image_load_store also adds some image GLSL modifiers like "coherent",
so Mesa will probably translate those to slc, glc, etc.
When LLVM 3.5 is released, I'll switch Mesa to these new intrinsics.
llvm-svn: 212830
Use alg. from LegalizeDAG.cpp
Move Expand setting to SIISellowering
v2: Extend existing tests instead of creating new ones
v3: use separate LowerFPTOSINT function
v4: use TargetLowering::expandFP_TO_SINT
add comment about using FP_TO_SINT for uints
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <tom@stellard.net>
llvm-svn: 212773
vector type legalization strategies in a more fine grained manner, and
change the legalization of several v1iN types and v1f32 to be widening
rather than scalarization on AArch64.
This fixes an assertion failure caused by scalarizing nodes like "v1i32
trunc v1i64". As v1i64 is legal it will fail to scalarize v1i32.
This also provides a foundation for other targets to have more granular
control over how vector types are legalized.
Patch by Hao Liu, reviewed by Tim Northover. I'm committing it to allow
some work to start taking place on top of this patch as it adds some
really important hooks to the backend that I'd like to immediately start
using. =]
http://reviews.llvm.org/D4322
llvm-svn: 212242
SGPRs are written by instructions that sometimes will ignore control flow,
which means if you have code like:
if (VGPR0) {
SGPR0 = S_MOV_B32 0
} else {
SGPR0 = S_MOV_B32 1
}
The value of SGPR0 will 1 no matter what the condition is.
In order to deal with this situation correctly, we need to view the
program as if it were a single basic block when we calculate the
live ranges for the SGPRs. They way we actually update the live
range is by iterating over all of the segments in each LiveRange
object and setting the end of each segment equal to the start of
the next segment. So a live range like:
[3888r,9312r:0)[10032B,10384B:0) 0@3888r
will become:
[3888r,10032B:0)[10032B,10384B:0) 0@3888r
This change will allow us to use SALU instructions within branches.
llvm-svn: 212215
The default rounding mode to initialize the mode register needs
to be reported to the runtime. Fill in other bits a kernel
may be interested in setting for future use.
llvm-svn: 211791
Now that non-leaf ComplexPatterns are allowed we can fold all the MUBUF
store patterns into the instruction definition. We will also be able to
reuse this new ComplexPattern for MUBUF loads and atomic operations.
llvm-svn: 211644
R600 was using a clamped version of rsq, but SI was not. Add a
new rsq_clamped intrinsic and use them consistently.
It's unclear to me from the documentation what behavior
the R600 instructions have, so I assume they have the legacy behavior
described by the SI documents. For R600, use RECIPSQRT_IEEE
for both llvm.AMDGPU.rsq.legacy and llvm.AMDGPU.rsq. R600 also
has RECIPSQRT_FF, which I'm not sure how it fits in here.
llvm-svn: 211637