llvm-project/llvm/lib/Target/AMDGPU
Tom Stellard a76bcc2ea1 AMDGPU/SI: Limit load clustering to 16 bytes instead of 4 instructions
Summary:
This helps prevent load clustering from drastically increasing register
pressure by trying to cluster 4 SMRDx8 loads together.  The limit of 16
bytes was chosen, because it seems like that was the original intent
of setting the limit to 4 instructions, but more analysis could show
that a different limit is better.

This fixes yields small decreases in register usage with shader-db, but
also helps avoid a large increase in register usage when lane mask
tracking is enabled in the machine scheduler, because lane mask tracking
enables more opportunities for load clustering.

shader-db stats:

2379 shaders in 477 tests
Totals:
SGPRS: 49744 -> 48600 (-2.30 %)
VGPRS: 34120 -> 34076 (-0.13 %)
Code Size: 1282888 -> 1283184 (0.02 %) bytes
LDS: 28 -> 28 (0.00 %) blocks
Scratch: 495616 -> 492544 (-0.62 %) bytes per wave
Max Waves: 6843 -> 6853 (0.15 %)
Wait states: 0 -> 0 (0.00 %)

Reviewers: nhaehnle, arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18451

llvm-svn: 264589
2016-03-28 16:10:13 +00:00
..
AsmParser [AMDGPU] Assembler: Change dpp_ctrl syntax to match sp3 2016-03-18 15:35:51 +00:00
Disassembler [AMDGPU] Fix SMEM instructions encoding/operand namings 2016-03-10 13:06:08 +00:00
InstPrinter [AMDGPU] Assembler: Change dpp_ctrl syntax to match sp3 2016-03-18 15:35:51 +00:00
MCTargetDesc AMDGPU: R600 code splitting cleanup 2016-03-11 08:00:27 +00:00
TargetInfo Remove autoconf support 2016-01-26 21:29:08 +00:00
Utils fix sanitizer-ppc64be-linux failure for r262804 2016-03-06 15:13:54 +00:00
AMDGPU.h AMDGPU: Add SIWholeQuadMode pass 2016-03-21 20:28:33 +00:00
AMDGPU.td AMDGPU: More bits of frame index are known to be zero 2016-02-27 20:26:57 +00:00
AMDGPUAlwaysInlinePass.cpp AMDGPU: Minor cleanups to always inline pass 2015-07-13 19:08:36 +00:00
AMDGPUAnnotateKernelFeatures.cpp AMDGPU: Stop checking intrinsics not used by HSA for dispatch-ptr 2016-01-30 05:10:59 +00:00
AMDGPUAnnotateUniformValues.cpp AMDGPU/SI: Detect uniform branches and emit s_cbranch instructions 2016-02-12 23:45:29 +00:00
AMDGPUAsmPrinter.cpp AMDGPU: Don't use estimated stack size when we know the real stack size 2016-03-01 04:58:20 +00:00
AMDGPUAsmPrinter.h AMDGPU: R600 code splitting cleanup 2016-03-11 08:00:27 +00:00
AMDGPUCallingConv.td AMDGPU/SI: Add support for non-void functions 2016-01-13 17:23:04 +00:00
AMDGPUFrameLowering.cpp AMDGPU: Fix old comments that mention AMDIL 2016-01-20 21:22:21 +00:00
AMDGPUFrameLowering.h AMDGPU: Create emergency stack slots during frame lowering 2015-11-06 18:17:45 +00:00
AMDGPUISelDAGToDAG.cpp Fix sequence point warning. NFC. 2016-03-24 10:53:28 +00:00
AMDGPUISelLowering.cpp AMDGPU: R600 code splitting cleanup 2016-03-11 08:00:27 +00:00
AMDGPUISelLowering.h AMDGPU: R600 code splitting cleanup 2016-03-11 08:00:27 +00:00
AMDGPUInstrInfo.cpp AMDGPU: Move subtarget specific code out of AMDGPUInstrInfo.cpp 2016-01-28 16:04:37 +00:00
AMDGPUInstrInfo.h AMDGPU: Add SIWholeQuadMode pass 2016-03-21 20:28:33 +00:00
AMDGPUInstrInfo.td AMDGPU: Rename intrinsic to better match instruction name 2016-02-13 01:03:00 +00:00
AMDGPUInstructions.td AMDGPU: Match more med3 integer patterns 2016-03-07 21:54:48 +00:00
AMDGPUIntrinsicInfo.cpp [llvm-tblgen] Avoid StringMatcher for GCC and MS builtin names 2016-01-27 01:43:12 +00:00
AMDGPUIntrinsicInfo.h AMDGPU: R600 code splitting cleanup 2016-03-11 08:00:27 +00:00
AMDGPUIntrinsics.td AMDGPU: Remove bfi and bfm intrinsics 2016-02-08 19:06:01 +00:00
AMDGPUMCInstLower.cpp AMDGPU: Verify instructions in non-debug builds as well 2016-03-16 09:10:42 +00:00
AMDGPUMCInstLower.h AMDGPU: R600 code splitting cleanup 2016-03-11 08:00:27 +00:00
AMDGPUMachineFunction.cpp AMDGPU/SI: Add getShaderType() function to Utils/ 2015-12-15 16:26:16 +00:00
AMDGPUMachineFunction.h AMDGPU: R600 code splitting cleanup 2016-03-11 08:00:27 +00:00
AMDGPUOpenCLImageTypeLoweringPass.cpp AMDGPU/SI: Remove assert from AMDGPUOpenCLImageTypeLowering pass 2015-10-01 21:16:05 +00:00
AMDGPUPromoteAlloca.cpp AMDGPU: Promote alloca should skip volatiles 2016-03-23 23:17:29 +00:00
AMDGPURegisterInfo.cpp
AMDGPURegisterInfo.h AMDGPU: R600 code splitting cleanup 2016-03-11 08:00:27 +00:00
AMDGPURegisterInfo.td AMDGPU: Set SubRegIndex size and offset 2015-07-30 17:03:11 +00:00
AMDGPUSubtarget.cpp AMDGPU: More bits of frame index are known to be zero 2016-02-27 20:26:57 +00:00
AMDGPUSubtarget.h AMDGPU: More bits of frame index are known to be zero 2016-02-27 20:26:57 +00:00
AMDGPUTargetMachine.cpp AMDGPU: Add SIWholeQuadMode pass 2016-03-21 20:28:33 +00:00
AMDGPUTargetMachine.h AMDGPU: R600 code splitting cleanup 2016-03-11 08:00:27 +00:00
AMDGPUTargetObjectFile.cpp AMDGPU/SI: Emit constant variables in the .hsatext section when targeting HSA 2015-12-15 22:39:36 +00:00
AMDGPUTargetObjectFile.h AMDGPU/SI: Emit constant arrays in the .text section 2015-12-10 02:13:01 +00:00
AMDGPUTargetTransformInfo.cpp AMDGPU: Cost model for basic integer operations 2016-03-25 01:16:40 +00:00
AMDGPUTargetTransformInfo.h AMDGPU: Partially implement getArithmeticInstrCost for FP ops 2016-03-25 01:00:32 +00:00
AMDILCFGStructurizer.cpp Bug 20810: Use report_fatal_error instead of unreachable 2016-03-02 03:33:55 +00:00
AMDKernelCodeT.h [AMDGPU] fix amd_kernel_code_t bit field position as per spec (added missing reserved fields) 2016-02-24 10:54:25 +00:00
CIInstructions.td [AMDGPU] Disassembler: Added basic disassembler for AMDGPU target 2016-02-18 03:42:32 +00:00
CMakeLists.txt AMDGPU: Add SIWholeQuadMode pass 2016-03-21 20:28:33 +00:00
CaymanInstructions.td AMDGPU: Remove 24-bit intrinsics 2016-01-29 10:05:16 +00:00
EvergreenInstructions.td AMDGPU: Remove 24-bit intrinsics 2016-01-29 10:05:16 +00:00
LLVMBuild.txt [AMDGPU] Disassembler: Added basic disassembler for AMDGPU target 2016-02-18 03:42:32 +00:00
Processors.td AMDGPU/SI: Add Polaris support 2016-03-24 15:31:05 +00:00
R600ClauseMergePass.cpp
R600ControlFlowFinalizer.cpp AMDGPU: Fix a use-after free and a missing break 2016-03-25 18:33:16 +00:00
R600Defines.h AMDGPU: R600 code splitting cleanup 2016-03-11 08:00:27 +00:00
R600EmitClauseMarkers.cpp
R600ExpandSpecialInstrs.cpp
R600ISelLowering.cpp [DAG] use isUndef() ; NFCI 2016-03-14 17:28:46 +00:00
R600ISelLowering.h AMDGPU: R600 code splitting cleanup 2016-03-11 08:00:27 +00:00
R600InstrFormats.td
R600InstrInfo.cpp AMDGPU: Simplify boolean conditional return statements 2016-03-02 23:00:21 +00:00
R600InstrInfo.h AMDGPU: R600 code splitting cleanup 2016-03-11 08:00:27 +00:00
R600Instructions.td AMDGPU: Rename intrinsic to better match instruction name 2016-02-13 01:03:00 +00:00
R600Intrinsics.td AMDGPU: Move AMDGPU intrinsics only used by R600 2016-01-26 04:49:24 +00:00
R600MachineFunctionInfo.cpp
R600MachineFunctionInfo.h AMDGPU: R600 code splitting cleanup 2016-03-11 08:00:27 +00:00
R600MachineScheduler.cpp
R600MachineScheduler.h AMDGPU: R600 code splitting cleanup 2016-03-11 08:00:27 +00:00
R600OptimizeVectorRegisters.cpp AMDGPU: Remove implicit ilist iterator conversions, NFC 2015-10-13 20:07:10 +00:00
R600Packetizer.cpp AMDGPU: Simplify boolean conditional return statements 2016-03-02 23:00:21 +00:00
R600RegisterInfo.cpp
R600RegisterInfo.h AMDGPU: R600 code splitting cleanup 2016-03-11 08:00:27 +00:00
R600RegisterInfo.td
R600Schedule.td
R600TextureIntrinsicsReplacer.cpp AMDGPU: Rename some r600 intrinsics to use correct TargetPrefix 2016-01-22 19:00:09 +00:00
R700Instructions.td
SIAnnotateControlFlow.cpp AMDGPU/SI: Detect uniform branches and emit s_cbranch instructions 2016-02-12 23:45:29 +00:00
SIDefines.h AMDGPU: R600 code splitting cleanup 2016-03-11 08:00:27 +00:00
SIFixControlFlowLiveIntervals.cpp AMDGPU: Remove unused includes 2015-09-25 00:28:43 +00:00
SIFixSGPRCopies.cpp AMDGPU/SI: Fold operands with sub-registers 2016-01-07 17:10:29 +00:00
SIFixSGPRLiveRanges.cpp AMDGPU/SI: Detect uniform branches and emit s_cbranch instructions 2016-02-12 23:45:29 +00:00
SIFoldOperands.cpp AMDGPU: Fix passes depending on dominator tree for no reason 2016-02-11 06:15:34 +00:00
SIFrameLowering.cpp AMDGPU/SI: Don't try to move scratch wave offset when there are no free SGPRs 2016-03-03 03:45:09 +00:00
SIFrameLowering.h AMDGPU: Remove SIPrepareScratchRegs 2015-11-30 21:15:53 +00:00
SIISelLowering.cpp AMDGPU/SI: Implement GroupStaticSize Intrinsic for Dynamic LDS 2016-03-15 17:28:44 +00:00
SIISelLowering.h AMDGPU: R600 code splitting cleanup 2016-03-11 08:00:27 +00:00
SIInsertNopsPass.cpp AMDGPU: Insert two S_NOP instructions for every high level source statement. 2016-03-03 03:53:29 +00:00
SIInsertWaits.cpp AMDGPU/SI: Handle wait states required for DPP instructions 2016-03-14 17:05:56 +00:00
SIInstrFormats.td [AMDGPU] Fix VOPC instruction operand namings 2016-03-11 14:53:28 +00:00
SIInstrInfo.cpp AMDGPU/SI: Limit load clustering to 16 bytes instead of 4 instructions 2016-03-28 16:10:13 +00:00
SIInstrInfo.h AMDGPU: Add SIWholeQuadMode pass 2016-03-21 20:28:33 +00:00
SIInstrInfo.td [AMDGPU] Fix missing assembler predicates. 2016-03-23 04:27:26 +00:00
SIInstructions.td AMDGPU: Remove atomic inc/dec patterns 2016-03-23 23:23:38 +00:00
SIIntrinsics.td AMDGPU: Remove old sample intrinsics 2016-01-26 04:38:08 +00:00
SILoadStoreOptimizer.cpp CodeGen: Take MachineInstr& in SlotIndexes and LiveIntervals, NFC 2016-02-27 06:40:41 +00:00
SILowerControlFlow.cpp AMDGPU: Add SIWholeQuadMode pass 2016-03-21 20:28:33 +00:00
SILowerI1Copies.cpp AMDGPU: Fix passes depending on dominator tree for no reason 2016-02-11 06:15:34 +00:00
SIMachineFunctionInfo.cpp AMDGPU/SI: Add support for spiling SGPRs to scratch buffer 2016-03-04 18:31:18 +00:00
SIMachineFunctionInfo.h AMDGPU: R600 code splitting cleanup 2016-03-11 08:00:27 +00:00
SIMachineScheduler.cpp [TII] Allow getMemOpBaseRegImmOfs() to accept negative offsets. NFC. 2016-03-09 16:00:35 +00:00
SIMachineScheduler.h AMDGPU: R600 code splitting cleanup 2016-03-11 08:00:27 +00:00
SIRegisterInfo.cpp AMDGPU: Cache information about register pressure sets 2016-03-23 01:53:22 +00:00
SIRegisterInfo.h AMDGPU: Cache information about register pressure sets 2016-03-23 01:53:22 +00:00
SIRegisterInfo.td AMDGPU/SI: Detect uniform branches and emit s_cbranch instructions 2016-02-12 23:45:29 +00:00
SISchedule.td TableGen: Check scheduling models for completeness 2016-03-01 20:03:21 +00:00
SIShrinkInstructions.cpp AMDGPU: Materialize sign bits with bfrev 2016-03-11 07:42:49 +00:00
SITypeRewriter.cpp AMDGPU/SI: Fix crash when inline assembly is used in a graphics shader 2016-01-06 22:01:04 +00:00
SIWholeQuadMode.cpp AMDGPU: Fix dangling references introduced by r263982 2016-03-21 22:54:02 +00:00
VIInstrFormats.td [AMDGPU] Fix SMEM instructions encoding/operand namings 2016-03-10 13:06:08 +00:00
VIInstructions.td [AMDGPU] Assembler: change v_madmk operands to have same order as mad. 2016-03-11 09:27:25 +00:00