type (in addition to the memory type).
The *LoadExt* legalization handling used to only have one type, the
memory type. This forced users to assume that as long as the extload
for the memory type was declared legal, and the result type was legal,
the whole extload was legal.
However, this isn't always the case. For instance, on X86, with AVX,
this is legal:
v4i32 load, zext from v4i8
but this isn't:
v4i64 load, zext from v4i8
Whereas v4i64 is (arguably) legal, even without AVX2.
Note that the same thing was done a while ago for truncstores (r46140),
but I assume no one needed it yet for extloads, so here we go.
Calls to getLoadExtAction were changed to add the value type, found
manually in the surrounding code.
Calls to setLoadExtAction were mechanically changed, by wrapping the
call in a loop, to match previous behavior. The loop iterates over
the MVT subrange corresponding to the memory type (FP vectors, etc...).
I also pulled neighboring setTruncStoreActions into some of the loops;
those shouldn't make a difference, as the additional types are illegal.
(e.g., i128->i1 truncstores on PPC.)
No functional change intended.
Differential Revision: http://reviews.llvm.org/D6532
llvm-svn: 225421
The returned operand needs to be permuted for the unordered
compares. Also fix incorrectly producing fmin_legacy / fmax_legacy
for f64, which don't exist.
llvm-svn: 224094
This sort of doesn't matter since the setcc type is i1, but
this previously was using the default UndefinedBooleanContent. This
makes it more consistent with R600. This enables more optimizations
which typically give up on UndefinedBooleanContent. For example,
there is already a special case target DAG combine for
setcc + sext which can be eliminated in favor of what the generic
DAG combiner can do if it assumes boolean values are sign extended.
Since -1 is an inline immediate, using it is basically free and the
backend already uses it when a boolean value is needed in a wider type.
llvm-svn: 222850
This is so it could potentially be used by SI. However, the current
implementation does not always produce correct results, so the
IntegerDivisionPass is being used instead.
llvm-svn: 222072
This effectively reverts revert 219707. After fixing the test to work with
new function name format and renamed intrinsic.
Reviewed-by: Tom Stellard <tom@stellard.net>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 219710
v2: Add SI lowering
Add test
v3: Place work dimensions after the kernel arguments.
v4: Calculate offset while lowering arguments
v5: rebase
v6: change prefix to AMDGPU
Reviewed-by: Tom Stellard <tom@stellard.net>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 219705
Refactored the R600_LDS_1A2D class a bit to get it to actually work.
It seemed to be previously unused and broken.
We also have to disable the conversion to the noret variant for now in
R600ISelLowering because the getLDSNoRetOp method only handles 1A1D LDS ops.
Someone can feel free to modify the AMDGPU::getLDSNoRetOp method to
work for more than 1A1D variants of LDS operations. It's being left as a
future TODO for now.
Signed-off-by: Aaron Watry <awatry at gmail.com>
Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com>
llvm-svn: 217596
This for some reason fixes v1i64 kernel arguments on pre-SI. This
currently breaks some other cases in the kernel-args.ll test for R600,
but I'm not particularly confident in the new output. VTX_READ_* are not
used for some of the scalarized cases, and the code reading from the
constant buffer doesn't make much sense to me.
llvm-svn: 215564
to get the subtarget and that's accessible from the MachineFunction
now. This helps clear the way for smaller changes where we getting
a subtarget will require passing in a MachineFunction/Function as
well.
llvm-svn: 214988
shorter/easier and have the DAG use that to do the same lookup. This
can be used in the future for TargetMachine based caching lookups from
the MachineFunction easily.
Update the MIPS subtarget switching machinery to update this pointer
at the same time it runs.
llvm-svn: 214838
Currently when DAGCombine converts loads feeding a switch into a switch of
addresses feeding a load the new load inherits the isInvariant flag of the left
side. This is incorrect since invariant loads can be reordered in cases where it
is illegal to reoarder normal loads.
This patch adds an isInvariant parameter to getExtLoad() and updates all call
sites to pass in the data if they have it or false if they don't. It also
changes the DAGCombine to use that data to make the right decision when
creating the new load.
llvm-svn: 214449
This implements a solution for constant initializers suggested
by Vadim Girlin, where we store the data after the shader code
and then use the S_GETPC instruction to compute its address.
This saves use the trouble of creating a new buffer for constant data
and then having to pass the pointer to the kernel via user SGPRs or the
input buffer.
llvm-svn: 213530
Use alg. from LegalizeDAG.cpp
Move Expand setting to SIISellowering
v2: Extend existing tests instead of creating new ones
v3: use separate LowerFPTOSINT function
v4: use TargetLowering::expandFP_TO_SINT
add comment about using FP_TO_SINT for uints
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <tom@stellard.net>
llvm-svn: 212773
R600 was using a clamped version of rsq, but SI was not. Add a
new rsq_clamped intrinsic and use them consistently.
It's unclear to me from the documentation what behavior
the R600 instructions have, so I assume they have the legacy behavior
described by the SI documents. For R600, use RECIPSQRT_IEEE
for both llvm.AMDGPU.rsq.legacy and llvm.AMDGPU.rsq. R600 also
has RECIPSQRT_FF, which I'm not sure how it fits in here.
llvm-svn: 211637
It's already set in AMDGPUISelLowering for all GPUs
Patch By: Jan Vesely
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 207592
When legalizing ops, with UDIV/UREM set to expand, they automatically
expand to UDIVREM (if legal or custom).
We need to do this manually for legalize types.
v2:
SI should be set to Expand because the type is legal, and it is
automatically lowered to UDIVREM if UDIVREM is Legal/Custom
R600 should set to UDIV/UREM to Custom because it needs to lower them
during type legalization
Patch by: Jan Vesely
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 207587
Setting vector types to expand will result in scalarization on pre SI hw,
as those gpus don't have vector shifts either.
Expand also i32 vectors, this helps llvm make the correct decision
about scalarizing the vector ops.
v2: move setOperation() calls to R600ISelLowering.cpp.
cleanup the SI code to make it obvious that this patch does is nop for SI
Patch by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 206348
Through some oddity where truncate (sextload x) isn't folded into
an anyextload for vectors, the sextload remains if the
vector isn't immediately scalarized. This keeps the expected
zextload instructions in the kernel-args test when small type
vectors aren't scalarized.
llvm-svn: 206070
Moving these patterns from TableGen files to PerformDAGCombine()
should allow us to generate better code by eliminating unnecessary
shifts and extensions earlier.
This also fixes a bug where the MAD pattern was calling
SimplifyDemandedBits with a 24-bit mask on the first operand
even when the full pattern wasn't being matched. This occasionally
resulted in some instructions being incorrectly deleted from the
program.
v2:
- Fix bug with 64-bit mul
llvm-svn: 205731