Commit Graph

149 Commits

Author SHA1 Message Date
Ahmed Bougacha 2b6917b020 [SelectionDAG] Allow targets to specify legality of extloads' result
type (in addition to the memory type).

The *LoadExt* legalization handling used to only have one type, the
memory type.  This forced users to assume that as long as the extload
for the memory type was declared legal, and the result type was legal,
the whole extload was legal.

However, this isn't always the case.  For instance, on X86, with AVX,
this is legal:
    v4i32 load, zext from v4i8
but this isn't:
    v4i64 load, zext from v4i8
Whereas v4i64 is (arguably) legal, even without AVX2.

Note that the same thing was done a while ago for truncstores (r46140),
but I assume no one needed it yet for extloads, so here we go.

Calls to getLoadExtAction were changed to add the value type, found
manually in the surrounding code.

Calls to setLoadExtAction were mechanically changed, by wrapping the
call in a loop, to match previous behavior.  The loop iterates over
the MVT subrange corresponding to the memory type (FP vectors, etc...).
I also pulled neighboring setTruncStoreActions into some of the loops;
those shouldn't make a difference, as the additional types are illegal.
(e.g., i128->i1 truncstores on PPC.)

No functional change intended.

Differential Revision: http://reviews.llvm.org/D6532

llvm-svn: 225421
2015-01-08 00:51:32 +00:00
Matt Arsenault 1e3a4ebc6e R600: Fix min/max matching problems with unordered compares
The returned operand needs to be permuted for the unordered
compares. Also fix incorrectly producing fmin_legacy / fmax_legacy
for f64, which don't exist.

llvm-svn: 224094
2014-12-12 02:30:37 +00:00
Matt Arsenault fcdddf9602 R600/SI: Use ZeroOrNegativeOneBooleanContent
This sort of doesn't matter since the setcc type is i1, but
this previously was using the default UndefinedBooleanContent. This
makes it more consistent with R600. This enables more optimizations
which typically give up on UndefinedBooleanContent. For example,
there is already a special case target DAG combine for
setcc + sext which can be eliminated in favor of what the generic
DAG combiner can do if it assumes boolean values are sign extended.
Since -1 is an inline immediate, using it is basically free and the
backend already uses it when a boolean value is needed in a wider type.

llvm-svn: 222850
2014-11-26 21:23:15 +00:00
Matt Arsenault 2a495975ed R600: Fix extloads of i1 on R600/Evergreen
llvm-svn: 222631
2014-11-23 02:57:54 +00:00
Tom Stellard bf69d76106 R600: Factor i64 UDIVREM lowering into its own fuction
This is so it could potentially be used by SI.  However, the current
implementation does not always produce correct results, so the
IntegerDivisionPass is being used instead.

llvm-svn: 222072
2014-11-15 01:07:53 +00:00
Jan Vesely e5121f3c10 Reapply "R600: Add new intrinsic to read work dimensions"
This effectively reverts revert 219707. After fixing the test to work with
new function name format and renamed intrinsic.

Reviewed-by: Tom Stellard <tom@stellard.net>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 219710
2014-10-14 20:05:26 +00:00
Rafael Espindola db3f0a24ec Revert "R600: Add new intrinsic to read work dimensions"
This reverts commit r219705.

CodeGen/R600/work-item-intrinsics.ll was failing on linux.

llvm-svn: 219707
2014-10-14 18:58:04 +00:00
Jan Vesely 86187d231a R600: Add new intrinsic to read work dimensions
v2: Add SI lowering
    Add test

v3: Place work dimensions after the kernel arguments.
v4: Calculate offset while lowering arguments
v5: rebase
v6: change prefix to AMDGPU

Reviewed-by: Tom Stellard <tom@stellard.net>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 219705
2014-10-14 18:52:07 +00:00
Aaron Watry 1885e53a75 R600: Add cmpxchg instruction for evergreen
Refactored the R600_LDS_1A2D class a bit to get it to actually work.

It seemed to be previously unused and broken.

We also have to disable the conversion to the noret variant for now in
R600ISelLowering because the getLDSNoRetOp method only handles 1A1D LDS ops.

Someone can feel free to modify the AMDGPU::getLDSNoRetOp method to
work for more than 1A1D variants of LDS operations. It's being left as a
future TODO for now.

Signed-off-by: Aaron Watry <awatry at gmail.com>
Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com>
llvm-svn: 217596
2014-09-11 15:02:54 +00:00
Matt Arsenault 74ef277774 R600: Correctly set the src value offset for scalarized kernel args
This for some reason fixes v1i64 kernel arguments on pre-SI. This
currently breaks some other cases in the kernel-args.ll test for R600,
but I'm not particularly confident in the new output. VTX_READ_* are not
used for some of the scalarized cases, and the code reading from the
constant buffer doesn't make much sense to me.

llvm-svn: 215564
2014-08-13 18:14:11 +00:00
Eric Christopher b5217507c7 Remove the target machine from CCState. Previously it was only used
to get the subtarget and that's accessible from the MachineFunction
now. This helps clear the way for smaller changes where we getting
a subtarget will require passing in a MachineFunction/Function as
well.

llvm-svn: 214988
2014-08-06 18:45:26 +00:00
Eric Christopher fc6de428c8 Have MachineFunction cache a pointer to the subtarget to make lookups
shorter/easier and have the DAG use that to do the same lookup. This
can be used in the future for TargetMachine based caching lookups from
the MachineFunction easily.

Update the MIPS subtarget switching machinery to update this pointer
at the same time it runs.

llvm-svn: 214838
2014-08-05 02:39:49 +00:00
Eric Christopher d913448b38 Remove the TargetMachine forwards for TargetSubtargetInfo based
information and update all callers. No functional change.

llvm-svn: 214781
2014-08-04 21:25:23 +00:00
Tom Stellard 4973a13680 Revert "R600: Move code for generating REGISTER_LOAD into R600ISelLowering.cpp"
This reverts commit r214566.

I did not mean to commit this yet.

llvm-svn: 214572
2014-08-01 21:55:50 +00:00
Tom Stellard c16f73d7c5 R600: Move code for generating REGISTER_LOAD into R600ISelLowering.cpp
SI doesn't use REGISTER_LOAD anymore, but it was still hitting this code
path for 8-bit and 16-bit private loads.

llvm-svn: 214566
2014-08-01 21:50:47 +00:00
Louis Gerbarg 67474e3755 Make sure no loads resulting from load->switch DAGCombine are marked invariant
Currently when DAGCombine converts loads feeding a switch into a switch of
addresses feeding a load the new load inherits the isInvariant flag of the left
side. This is incorrect since invariant loads can be reordered in cases where it
is illegal to reoarder normal loads.

This patch adds an isInvariant parameter to getExtLoad() and updates all call
sites to pass in the data if they have it or false if they don't. It also
changes the DAGCombine to use that data to make the right decision when
creating the new load.

llvm-svn: 214449
2014-07-31 21:45:05 +00:00
Matt Arsenault 83e60581c3 R600: Add new functions for splitting vector loads and stores.
These will be used in future patches and shouldn't change anything yet.

llvm-svn: 213877
2014-07-24 17:10:35 +00:00
Tom Stellard 067c81567b R600/SI: Store constant initializer data in constant memory
This implements a solution for constant initializers suggested
by Vadim Girlin, where we store the data after the shader code
and then use the S_GETPC instruction to compute its address.

This saves use the trouble of creating a new buffer for constant data
and then having to pass the pointer to the kernel via user SGPRs or the
input buffer.

llvm-svn: 213530
2014-07-21 14:01:14 +00:00
Matt Arsenault 762af96f46 R600: Make ShaderType private
llvm-svn: 212896
2014-07-13 03:06:39 +00:00
Jan Vesely 2cb62ce2a0 R600: Implement float to long/ulong
Use alg. from LegalizeDAG.cpp
Move Expand setting to SIISellowering

v2: Extend existing tests instead of creating new ones
v3: use separate LowerFPTOSINT function
v4: use TargetLowering::expandFP_TO_SINT
    add comment about using FP_TO_SINT for uints

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <tom@stellard.net>
llvm-svn: 212773
2014-07-10 22:40:21 +00:00
Matt Arsenault d2c9e08b63 R600: Fix mishandling of load / store chains.
Fixes various bugs with reordering loads and stores.
Scalarized vector loads weren't collecting the chains
at all.

llvm-svn: 212473
2014-07-07 18:34:45 +00:00
Craig Topper 66e588be09 Add ops() method to SDNode that returns an ArrayRef<SDUse>. Use it to simplify some code.
llvm-svn: 211993
2014-06-29 00:40:57 +00:00
Matt Arsenault 961ca43180 R600: Move load/store ReplaceNodeResults to common code.
Future patches will want to custom lower loads on SI.

llvm-svn: 211848
2014-06-27 02:33:47 +00:00
Matt Arsenault 257d48d22c R600: Fix inconsistency in rsq instructions.
R600 was using a clamped version of rsq, but SI was not. Add a
new rsq_clamped intrinsic and use them consistently.

It's unclear to me from the documentation what behavior
the R600 instructions have, so I assume they have the legacy behavior
described by the SI documents. For R600, use RECIPSQRT_IEEE
for both llvm.AMDGPU.rsq.legacy and llvm.AMDGPU.rsq. R600 also
has RECIPSQRT_FF, which I'm not sure how it fits in here.

llvm-svn: 211637
2014-06-24 22:13:39 +00:00
Matt Arsenault 1d555c4e91 R600: Remove AMDILISelLowering
llvm-svn: 211519
2014-06-23 18:00:55 +00:00
Matt Arsenault c4d3d3a16e R600: Move add/sub with overflow out of AMDILISelLowering
Add more tests for these.

llvm-svn: 211517
2014-06-23 18:00:49 +00:00
Matt Arsenault b8b5153935 R600/SI: Handle i64 sub.
We can handle it the same way as add

llvm-svn: 211514
2014-06-23 18:00:38 +00:00
Matt Arsenault c791f39912 R600: Rename AMDIL file
llvm-svn: 211512
2014-06-23 18:00:31 +00:00
Jan Vesely 343cd6f056 R600: Use LowerSDIVREM for i64 node replace
v2: move div/rem node replacement to R600ISelLowering
    make lowerSDIVREM protected

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 211478
2014-06-22 21:43:01 +00:00
Jan Vesely ecf5133a2b R600: Implement 64bit SRA
v2: Use capitalized variable name

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 211159
2014-06-18 12:27:17 +00:00
Jan Vesely 900ff2e74b R600: Implement 64bit SRL
v2: use C++ style comment

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 211158
2014-06-18 12:27:15 +00:00
Jan Vesely 25f362766e R600: Implement 64bit SHL
v2: Use c++ style comment

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 211157
2014-06-18 12:27:13 +00:00
Tom Stellard 880a80ad07 R600: Use LDS and vectors for private memory
llvm-svn: 211110
2014-06-17 16:53:14 +00:00
Tom Stellard 2e59a45f80 R600: Move AMDGPUInstrInfo from AMDGPUTargetMachine into AMDGPUSubtarget
llvm-svn: 210869
2014-06-13 01:32:00 +00:00
Matt Arsenault 5565f65e13 R600: Add dag combine for BFE
llvm-svn: 209461
2014-05-22 18:09:07 +00:00
Matt Arsenault 37c12d7343 Use cast<> for unchecked use
llvm-svn: 208627
2014-05-12 20:42:57 +00:00
Matt Arsenault b3ee388594 Use cast<> for unchecked use
llvm-svn: 208618
2014-05-12 19:26:38 +00:00
Matt Arsenault 4d64f96530 Use range for
llvm-svn: 208617
2014-05-12 19:23:21 +00:00
Tom Stellard afa8b532b1 R600: Move MIN/MAX matching from LowerOperation() to PerformDAGCombine()
llvm-svn: 208429
2014-05-09 16:42:16 +00:00
Craig Topper 2d2aa0ca1f Use makeArrayRef insted of calling ArrayRef<T> constructor directly. I introduced most of these recently.
llvm-svn: 207616
2014-04-30 07:17:30 +00:00
Tom Stellard 93f9f4950c R600: Remove duplicate setting of SELECT expansion.
It's already set in AMDGPUISelLowering for all GPUs

Patch By: Jan Vesely

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 207592
2014-04-29 23:12:55 +00:00
Tom Stellard 5f3378879f R600: Change UDIV/UREM to UDIVREM when legalizing types
When legalizing ops, with UDIV/UREM set to expand, they automatically
expand to UDIVREM (if legal or custom).
We need to do this manually for legalize types.

v2:
  SI should be set to Expand because the type is legal, and it is
    automatically lowered to UDIVREM if UDIVREM is Legal/Custom
  R600 should set to UDIV/UREM to Custom because it needs to lower them
    during type legalization

Patch by: Jan Vesely

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 207587
2014-04-29 23:12:43 +00:00
Craig Topper 64941d9786 Convert SelectionDAG::getMergeValues to use ArrayRef.
llvm-svn: 207374
2014-04-27 19:20:57 +00:00
Craig Topper 206fcd450a Convert getMemIntrinsicNode to take ArrayRef of SDValue instead of pointer and size.
llvm-svn: 207329
2014-04-26 19:29:41 +00:00
Craig Topper 48d114bed1 Convert SelectionDAG::getNode methods to use ArrayRef<SDValue>.
llvm-svn: 207327
2014-04-26 18:35:24 +00:00
Matt Arsenault 209a7b92b5 R600: Minor cleanups.
Fix indentation, better line wrapping, unused includes.

llvm-svn: 206562
2014-04-18 07:40:20 +00:00
Matt Arsenault 4e46665a80 R600: Expand sign extension of vectors.
Setting vector types to expand will result in scalarization on pre SI hw,
as those gpus don't have vector shifts either.
Expand also i32 vectors, this helps llvm make the correct decision
about scalarizing the vector ops.

v2: move setOperation() calls to R600ISelLowering.cpp.
    cleanup the SI code to make it obvious that this patch does is nop for SI

Patch by: Jan Vesely <jan.vesely@rutgers.edu>

llvm-svn: 206348
2014-04-16 01:41:30 +00:00
Nick Lewycky aad475b324 Break PseudoSourceValue out of the Value hierarchy. It is now the root of its own tree containing FixedStackPseudoSourceValue (which you can use isa/dyn_cast on) and MipsCallEntry (which you can't). Anything that needs to use either a PseudoSourceValue* and Value* is strongly encouraged to use a MachinePointerInfo instead.
llvm-svn: 206255
2014-04-15 07:22:52 +00:00
Matt Arsenault e1f030ca66 R600: Check if a sextload should be used for parameter loads.
Through some oddity where truncate (sextload x) isn't folded into
an anyextload for vectors, the sextload remains if the
vector isn't immediately scalarized. This keeps the expected
zextload instructions in the kernel-args test when small type
vectors aren't scalarized.

llvm-svn: 206070
2014-04-11 20:59:54 +00:00
Tom Stellard 50122a5890 R600: Match 24-bit arithmetic patterns in a Target DAGCombine
Moving these patterns from TableGen files to PerformDAGCombine()
should allow us to generate better code by eliminating unnecessary
shifts and extensions earlier.

This also fixes a bug where the MAD pattern was calling
SimplifyDemandedBits with a 24-bit mask on the first operand
even when the full pattern wasn't being matched.  This occasionally
resulted in some instructions being incorrectly deleted from the
program.

v2:
  - Fix bug with 64-bit mul

llvm-svn: 205731
2014-04-07 19:45:41 +00:00