llvm-project/llvm/lib/Target/PowerPC/PPCCallingConv.td

263 lines
12 KiB
TableGen
Raw Normal View History

//===- PPCCallingConv.td - Calling Conventions for PowerPC -*- tablegen -*-===//
//
// The LLVM Compiler Infrastructure
//
// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.
//
//===----------------------------------------------------------------------===//
//
// This describes the calling conventions for the PowerPC 32- and 64-bit
// architectures.
//
//===----------------------------------------------------------------------===//
/// CCIfSubtarget - Match if the current subtarget has a feature F.
class CCIfSubtarget<string F, CCAction A>
: CCIf<!strconcat("static_cast<const PPCSubtarget&>"
"(State.getMachineFunction().getSubtarget()).",
F),
A>;
Add CR-bit tracking to the PowerPC backend for i1 values This change enables tracking i1 values in the PowerPC backend using the condition register bits. These bits can be treated on PowerPC as separate registers; individual bit operations (and, or, xor, etc.) are supported. Tracking booleans in CR bits has several advantages: - Reduction in register pressure (because we no longer need GPRs to store boolean values). - Logical operations on booleans can be handled more efficiently; we used to have to move all results from comparisons into GPRs, perform promoted logical operations in GPRs, and then move the result back into condition register bits to be used by conditional branches. This can be very inefficient, because the throughput of these CR <-> GPR moves have high latency and low throughput (especially when other associated instructions are accounted for). - On the POWER7 and similar cores, we can increase total throughput by using the CR bits. CR bit operations have a dedicated functional unit. Most of this is more-or-less mechanical: Adjustments were needed in the calling-convention code, support was added for spilling/restoring individual condition-register bits, and conditional branch instruction definitions taking specific CR bits were added (plus patterns and code for generating bit-level operations). This is enabled by default when running at -O2 and higher. For -O0 and -O1, where the ability to debug is more important, this feature is disabled by default. Individual CR bits do not have assigned DWARF register numbers, and storing values in CR bits makes them invisible to the debugger. It is critical, however, that we don't move i1 values that have been promoted to larger values (such as those passed as function arguments) into bit registers only to quickly turn around and move the values back into GPRs (such as happens when values are returned by functions). A pair of target-specific DAG combines are added to remove the trunc/extends in: trunc(binary-ops(binary-ops(zext(x), zext(y)), ...) and: zext(binary-ops(binary-ops(trunc(x), trunc(y)), ...) In short, we only want to use CR bits where some of the i1 values come from comparisons or are used by conditional branches or selects. To put it another way, if we can do the entire i1 computation in GPRs, then we probably should (on the POWER7, the GPR-operation throughput is higher, and for all cores, the CR <-> GPR moves are expensive). POWER7 test-suite performance results (from 10 runs in each configuration): SingleSource/Benchmarks/Misc/mandel-2: 35% speedup MultiSource/Benchmarks/Prolangs-C++/city/city: 21% speedup MultiSource/Benchmarks/MiBench/automotive-susan: 23% speedup SingleSource/Benchmarks/CoyoteBench/huffbench: 13% speedup SingleSource/Benchmarks/Misc-C++/Large/sphereflake: 13% speedup SingleSource/Benchmarks/Misc-C++/mandel-text: 10% speedup SingleSource/Benchmarks/Misc-C++-EH/spirit: 10% slowdown MultiSource/Applications/lemon/lemon: 8% slowdown llvm-svn: 202451
2014-02-28 08:27:01 +08:00
class CCIfNotSubtarget<string F, CCAction A>
: CCIf<!strconcat("!static_cast<const PPCSubtarget&>"
"(State.getMachineFunction().getSubtarget()).",
F),
A>;
//===----------------------------------------------------------------------===//
// Return Value Calling Convention
//===----------------------------------------------------------------------===//
// PPC64 AnyReg return-value convention. No explicit register is specified for
// the return-value. The register allocator is allowed and expected to choose
// any free register.
//
// This calling convention is currently only supported by the stackmap and
// patchpoint intrinsics. All other uses will result in an assert on Debug
// builds. On Release builds we fallback to the PPC C calling convention.
def RetCC_PPC64_AnyReg : CallingConv<[
CCCustom<"CC_PPC_AnyReg_Error">
]>;
// Return-value convention for PowerPC
def RetCC_PPC : CallingConv<[
CCIfCC<"CallingConv::AnyReg", CCDelegateTo<RetCC_PPC64_AnyReg>>,
// On PPC64, integer return values are always promoted to i64
Add CR-bit tracking to the PowerPC backend for i1 values This change enables tracking i1 values in the PowerPC backend using the condition register bits. These bits can be treated on PowerPC as separate registers; individual bit operations (and, or, xor, etc.) are supported. Tracking booleans in CR bits has several advantages: - Reduction in register pressure (because we no longer need GPRs to store boolean values). - Logical operations on booleans can be handled more efficiently; we used to have to move all results from comparisons into GPRs, perform promoted logical operations in GPRs, and then move the result back into condition register bits to be used by conditional branches. This can be very inefficient, because the throughput of these CR <-> GPR moves have high latency and low throughput (especially when other associated instructions are accounted for). - On the POWER7 and similar cores, we can increase total throughput by using the CR bits. CR bit operations have a dedicated functional unit. Most of this is more-or-less mechanical: Adjustments were needed in the calling-convention code, support was added for spilling/restoring individual condition-register bits, and conditional branch instruction definitions taking specific CR bits were added (plus patterns and code for generating bit-level operations). This is enabled by default when running at -O2 and higher. For -O0 and -O1, where the ability to debug is more important, this feature is disabled by default. Individual CR bits do not have assigned DWARF register numbers, and storing values in CR bits makes them invisible to the debugger. It is critical, however, that we don't move i1 values that have been promoted to larger values (such as those passed as function arguments) into bit registers only to quickly turn around and move the values back into GPRs (such as happens when values are returned by functions). A pair of target-specific DAG combines are added to remove the trunc/extends in: trunc(binary-ops(binary-ops(zext(x), zext(y)), ...) and: zext(binary-ops(binary-ops(trunc(x), trunc(y)), ...) In short, we only want to use CR bits where some of the i1 values come from comparisons or are used by conditional branches or selects. To put it another way, if we can do the entire i1 computation in GPRs, then we probably should (on the POWER7, the GPR-operation throughput is higher, and for all cores, the CR <-> GPR moves are expensive). POWER7 test-suite performance results (from 10 runs in each configuration): SingleSource/Benchmarks/Misc/mandel-2: 35% speedup MultiSource/Benchmarks/Prolangs-C++/city/city: 21% speedup MultiSource/Benchmarks/MiBench/automotive-susan: 23% speedup SingleSource/Benchmarks/CoyoteBench/huffbench: 13% speedup SingleSource/Benchmarks/Misc-C++/Large/sphereflake: 13% speedup SingleSource/Benchmarks/Misc-C++/mandel-text: 10% speedup SingleSource/Benchmarks/Misc-C++-EH/spirit: 10% slowdown MultiSource/Applications/lemon/lemon: 8% slowdown llvm-svn: 202451
2014-02-28 08:27:01 +08:00
CCIfType<[i32, i1], CCIfSubtarget<"isPPC64()", CCPromoteToType<i64>>>,
CCIfType<[i1], CCIfNotSubtarget<"isPPC64()", CCPromoteToType<i32>>>,
CCIfType<[i32], CCAssignToReg<[R3, R4, R5, R6, R7, R8, R9, R10]>>,
CCIfType<[i64], CCAssignToReg<[X3, X4, X5, X6]>>,
CCIfType<[i128], CCAssignToReg<[X3, X4, X5, X6]>>,
[PowerPC] ELFv2 aggregate passing support This patch adds infrastructure support for passing array types directly. These can be used by the front-end to pass aggregate types (coerced to an appropriate array type). The details of the array type being used inform the back-end about ABI-relevant properties. Specifically, the array element type encodes: - whether the parameter should be passed in FPRs, VRs, or just GPRs/stack slots (for float / vector / integer element types, respectively) - what the alignment requirements of the parameter are when passed in GPRs/stack slots (8 for float / 16 for vector / the element type size for integer element types) -- this corresponds to the "byval align" field Using the infrastructure provided by this patch, a companion patch to clang will enable two features: - In the ELFv2 ABI, pass (and return) "homogeneous" floating-point or vector aggregates in FPRs and VRs (this is similar to the ARM homogeneous aggregate ABI) - As an optimization for both ELFv1 and ELFv2 ABIs, pass aggregates that fit fully in registers without using the "byval" mechanism The patch uses the functionArgumentNeedsConsecutiveRegisters callback to encode that special treatment is required for all directly-passed array types. The isInConsecutiveRegs / isInConsecutiveRegsLast bits set as a results are then used to implement the required size and alignment rules in CalculateStackSlotSize / CalculateStackSlotAlignment etc. As a related change, the ABI routines have to be modified to support passing floating-point types in GPRs. This is necessary because with homogeneous aggregates of 4-byte float type we can now run out of FPRs *before* we run out of the 64-byte argument save area that is shadowed by GPRs. Any extra floating-point arguments that no longer fit in FPRs must now be passed in GPRs until we run out of those too. Note that there was already code to pass floating-point arguments in GPRs used with vararg parameters, which was done by writing the argument out to the argument save area first and then reloading into GPRs. The patch re-implements this, however, in favor of code packing float arguments directly via extension/truncation, BITCAST, and BUILD_PAIR operations. This is required to support the ELFv2 ABI, since we cannot unconditionally write to the argument save area (which the caller might not have allocated). The change does, however, affect ELFv1 varags routines too; but even here the overall effect should be advantageous: Instead of loading the argument into the FPR, then storing the argument to the stack slot, and finally reloading the argument from the stack slot into a GPR, the new code now just loads the argument into the FPR, and subsequently loads the argument into the GPR (via BITCAST). That BITCAST might imply a save/reload from a stack temporary (in which case we're no worse than before); but it might be implemented more efficiently in some cases. The final part of the patch enables up to 8 FPRs and VRs for argument return in PPCCallingConv.td; this is required to support returning ELFv2 homogeneous aggregates. (Note that this doesn't affect other ABIs since LLVM wil only look for which register to use if the parameter is marked as "direct" return anyway.) Reviewed by Hal Finkel. llvm-svn: 213493
2014-07-21 08:13:26 +08:00
// Floating point types returned as "direct" go into F1 .. F8; note that
// only the ELFv2 ABI fully utilizes all these registers.
CCIfType<[f32], CCAssignToReg<[F1, F2, F3, F4, F5, F6, F7, F8]>>,
CCIfType<[f64], CCAssignToReg<[F1, F2, F3, F4, F5, F6, F7, F8]>>,
2015-02-25 09:06:45 +08:00
// QPX vectors are returned in QF1 and QF2.
CCIfType<[v4f64, v4f32, v4i1],
CCIfSubtarget<"hasQPX()", CCAssignToReg<[QF1, QF2]>>>,
[PowerPC] ELFv2 aggregate passing support This patch adds infrastructure support for passing array types directly. These can be used by the front-end to pass aggregate types (coerced to an appropriate array type). The details of the array type being used inform the back-end about ABI-relevant properties. Specifically, the array element type encodes: - whether the parameter should be passed in FPRs, VRs, or just GPRs/stack slots (for float / vector / integer element types, respectively) - what the alignment requirements of the parameter are when passed in GPRs/stack slots (8 for float / 16 for vector / the element type size for integer element types) -- this corresponds to the "byval align" field Using the infrastructure provided by this patch, a companion patch to clang will enable two features: - In the ELFv2 ABI, pass (and return) "homogeneous" floating-point or vector aggregates in FPRs and VRs (this is similar to the ARM homogeneous aggregate ABI) - As an optimization for both ELFv1 and ELFv2 ABIs, pass aggregates that fit fully in registers without using the "byval" mechanism The patch uses the functionArgumentNeedsConsecutiveRegisters callback to encode that special treatment is required for all directly-passed array types. The isInConsecutiveRegs / isInConsecutiveRegsLast bits set as a results are then used to implement the required size and alignment rules in CalculateStackSlotSize / CalculateStackSlotAlignment etc. As a related change, the ABI routines have to be modified to support passing floating-point types in GPRs. This is necessary because with homogeneous aggregates of 4-byte float type we can now run out of FPRs *before* we run out of the 64-byte argument save area that is shadowed by GPRs. Any extra floating-point arguments that no longer fit in FPRs must now be passed in GPRs until we run out of those too. Note that there was already code to pass floating-point arguments in GPRs used with vararg parameters, which was done by writing the argument out to the argument save area first and then reloading into GPRs. The patch re-implements this, however, in favor of code packing float arguments directly via extension/truncation, BITCAST, and BUILD_PAIR operations. This is required to support the ELFv2 ABI, since we cannot unconditionally write to the argument save area (which the caller might not have allocated). The change does, however, affect ELFv1 varags routines too; but even here the overall effect should be advantageous: Instead of loading the argument into the FPR, then storing the argument to the stack slot, and finally reloading the argument from the stack slot into a GPR, the new code now just loads the argument into the FPR, and subsequently loads the argument into the GPR (via BITCAST). That BITCAST might imply a save/reload from a stack temporary (in which case we're no worse than before); but it might be implemented more efficiently in some cases. The final part of the patch enables up to 8 FPRs and VRs for argument return in PPCCallingConv.td; this is required to support returning ELFv2 homogeneous aggregates. (Note that this doesn't affect other ABIs since LLVM wil only look for which register to use if the parameter is marked as "direct" return anyway.) Reviewed by Hal Finkel. llvm-svn: 213493
2014-07-21 08:13:26 +08:00
// Vector types returned as "direct" go into V2 .. V9; note that only the
// ELFv2 ABI fully utilizes all these registers.
CCIfType<[v16i8, v8i16, v4i32, v2i64, v1i128, v4f32],
CCIfSubtarget<"hasAltivec()",
2015-02-25 09:06:45 +08:00
CCAssignToReg<[V2, V3, V4, V5, V6, V7, V8, V9]>>>,
CCIfType<[v2f64, v2i64], CCIfSubtarget<"hasVSX()",
CCAssignToReg<[VSH2, VSH3, VSH4, VSH5, VSH6, VSH7, VSH8, VSH9]>>>
]>;
// No explicit register is specified for the AnyReg calling convention. The
// register allocator may assign the arguments to any free register.
//
// This calling convention is currently only supported by the stackmap and
// patchpoint intrinsics. All other uses will result in an assert on Debug
// builds. On Release builds we fallback to the PPC C calling convention.
def CC_PPC64_AnyReg : CallingConv<[
CCCustom<"CC_PPC_AnyReg_Error">
]>;
// Note that we don't currently have calling conventions for 64-bit
// PowerPC, but handle all the complexities of the ABI in the lowering
// logic. FIXME: See if the logic can be simplified with use of CCs.
// This may require some extensions to current table generation.
// Simple calling convention for 64-bit ELF PowerPC fast isel.
// Only handle ints and floats. All ints are promoted to i64.
// Vector types and quadword ints are not handled.
def CC_PPC64_ELF_FIS : CallingConv<[
CCIfCC<"CallingConv::AnyReg", CCDelegateTo<CC_PPC64_AnyReg>>,
Add CR-bit tracking to the PowerPC backend for i1 values This change enables tracking i1 values in the PowerPC backend using the condition register bits. These bits can be treated on PowerPC as separate registers; individual bit operations (and, or, xor, etc.) are supported. Tracking booleans in CR bits has several advantages: - Reduction in register pressure (because we no longer need GPRs to store boolean values). - Logical operations on booleans can be handled more efficiently; we used to have to move all results from comparisons into GPRs, perform promoted logical operations in GPRs, and then move the result back into condition register bits to be used by conditional branches. This can be very inefficient, because the throughput of these CR <-> GPR moves have high latency and low throughput (especially when other associated instructions are accounted for). - On the POWER7 and similar cores, we can increase total throughput by using the CR bits. CR bit operations have a dedicated functional unit. Most of this is more-or-less mechanical: Adjustments were needed in the calling-convention code, support was added for spilling/restoring individual condition-register bits, and conditional branch instruction definitions taking specific CR bits were added (plus patterns and code for generating bit-level operations). This is enabled by default when running at -O2 and higher. For -O0 and -O1, where the ability to debug is more important, this feature is disabled by default. Individual CR bits do not have assigned DWARF register numbers, and storing values in CR bits makes them invisible to the debugger. It is critical, however, that we don't move i1 values that have been promoted to larger values (such as those passed as function arguments) into bit registers only to quickly turn around and move the values back into GPRs (such as happens when values are returned by functions). A pair of target-specific DAG combines are added to remove the trunc/extends in: trunc(binary-ops(binary-ops(zext(x), zext(y)), ...) and: zext(binary-ops(binary-ops(trunc(x), trunc(y)), ...) In short, we only want to use CR bits where some of the i1 values come from comparisons or are used by conditional branches or selects. To put it another way, if we can do the entire i1 computation in GPRs, then we probably should (on the POWER7, the GPR-operation throughput is higher, and for all cores, the CR <-> GPR moves are expensive). POWER7 test-suite performance results (from 10 runs in each configuration): SingleSource/Benchmarks/Misc/mandel-2: 35% speedup MultiSource/Benchmarks/Prolangs-C++/city/city: 21% speedup MultiSource/Benchmarks/MiBench/automotive-susan: 23% speedup SingleSource/Benchmarks/CoyoteBench/huffbench: 13% speedup SingleSource/Benchmarks/Misc-C++/Large/sphereflake: 13% speedup SingleSource/Benchmarks/Misc-C++/mandel-text: 10% speedup SingleSource/Benchmarks/Misc-C++-EH/spirit: 10% slowdown MultiSource/Applications/lemon/lemon: 8% slowdown llvm-svn: 202451
2014-02-28 08:27:01 +08:00
CCIfType<[i1], CCPromoteToType<i64>>,
CCIfType<[i8], CCPromoteToType<i64>>,
CCIfType<[i16], CCPromoteToType<i64>>,
CCIfType<[i32], CCPromoteToType<i64>>,
CCIfType<[i64], CCAssignToReg<[X3, X4, X5, X6, X7, X8, X9, X10]>>,
CCIfType<[f32, f64], CCAssignToReg<[F1, F2, F3, F4, F5, F6, F7, F8]>>
]>;
// Simple return-value convention for 64-bit ELF PowerPC fast isel.
// All small ints are promoted to i64. Vector types, quadword ints,
// and multiple register returns are "supported" to avoid compile
// errors, but none are handled by the fast selector.
def RetCC_PPC64_ELF_FIS : CallingConv<[
CCIfCC<"CallingConv::AnyReg", CCDelegateTo<RetCC_PPC64_AnyReg>>,
Add CR-bit tracking to the PowerPC backend for i1 values This change enables tracking i1 values in the PowerPC backend using the condition register bits. These bits can be treated on PowerPC as separate registers; individual bit operations (and, or, xor, etc.) are supported. Tracking booleans in CR bits has several advantages: - Reduction in register pressure (because we no longer need GPRs to store boolean values). - Logical operations on booleans can be handled more efficiently; we used to have to move all results from comparisons into GPRs, perform promoted logical operations in GPRs, and then move the result back into condition register bits to be used by conditional branches. This can be very inefficient, because the throughput of these CR <-> GPR moves have high latency and low throughput (especially when other associated instructions are accounted for). - On the POWER7 and similar cores, we can increase total throughput by using the CR bits. CR bit operations have a dedicated functional unit. Most of this is more-or-less mechanical: Adjustments were needed in the calling-convention code, support was added for spilling/restoring individual condition-register bits, and conditional branch instruction definitions taking specific CR bits were added (plus patterns and code for generating bit-level operations). This is enabled by default when running at -O2 and higher. For -O0 and -O1, where the ability to debug is more important, this feature is disabled by default. Individual CR bits do not have assigned DWARF register numbers, and storing values in CR bits makes them invisible to the debugger. It is critical, however, that we don't move i1 values that have been promoted to larger values (such as those passed as function arguments) into bit registers only to quickly turn around and move the values back into GPRs (such as happens when values are returned by functions). A pair of target-specific DAG combines are added to remove the trunc/extends in: trunc(binary-ops(binary-ops(zext(x), zext(y)), ...) and: zext(binary-ops(binary-ops(trunc(x), trunc(y)), ...) In short, we only want to use CR bits where some of the i1 values come from comparisons or are used by conditional branches or selects. To put it another way, if we can do the entire i1 computation in GPRs, then we probably should (on the POWER7, the GPR-operation throughput is higher, and for all cores, the CR <-> GPR moves are expensive). POWER7 test-suite performance results (from 10 runs in each configuration): SingleSource/Benchmarks/Misc/mandel-2: 35% speedup MultiSource/Benchmarks/Prolangs-C++/city/city: 21% speedup MultiSource/Benchmarks/MiBench/automotive-susan: 23% speedup SingleSource/Benchmarks/CoyoteBench/huffbench: 13% speedup SingleSource/Benchmarks/Misc-C++/Large/sphereflake: 13% speedup SingleSource/Benchmarks/Misc-C++/mandel-text: 10% speedup SingleSource/Benchmarks/Misc-C++-EH/spirit: 10% slowdown MultiSource/Applications/lemon/lemon: 8% slowdown llvm-svn: 202451
2014-02-28 08:27:01 +08:00
CCIfType<[i1], CCPromoteToType<i64>>,
CCIfType<[i8], CCPromoteToType<i64>>,
CCIfType<[i16], CCPromoteToType<i64>>,
CCIfType<[i32], CCPromoteToType<i64>>,
CCIfType<[i64], CCAssignToReg<[X3, X4]>>,
CCIfType<[i128], CCAssignToReg<[X3, X4, X5, X6]>>,
[PowerPC] ELFv2 aggregate passing support This patch adds infrastructure support for passing array types directly. These can be used by the front-end to pass aggregate types (coerced to an appropriate array type). The details of the array type being used inform the back-end about ABI-relevant properties. Specifically, the array element type encodes: - whether the parameter should be passed in FPRs, VRs, or just GPRs/stack slots (for float / vector / integer element types, respectively) - what the alignment requirements of the parameter are when passed in GPRs/stack slots (8 for float / 16 for vector / the element type size for integer element types) -- this corresponds to the "byval align" field Using the infrastructure provided by this patch, a companion patch to clang will enable two features: - In the ELFv2 ABI, pass (and return) "homogeneous" floating-point or vector aggregates in FPRs and VRs (this is similar to the ARM homogeneous aggregate ABI) - As an optimization for both ELFv1 and ELFv2 ABIs, pass aggregates that fit fully in registers without using the "byval" mechanism The patch uses the functionArgumentNeedsConsecutiveRegisters callback to encode that special treatment is required for all directly-passed array types. The isInConsecutiveRegs / isInConsecutiveRegsLast bits set as a results are then used to implement the required size and alignment rules in CalculateStackSlotSize / CalculateStackSlotAlignment etc. As a related change, the ABI routines have to be modified to support passing floating-point types in GPRs. This is necessary because with homogeneous aggregates of 4-byte float type we can now run out of FPRs *before* we run out of the 64-byte argument save area that is shadowed by GPRs. Any extra floating-point arguments that no longer fit in FPRs must now be passed in GPRs until we run out of those too. Note that there was already code to pass floating-point arguments in GPRs used with vararg parameters, which was done by writing the argument out to the argument save area first and then reloading into GPRs. The patch re-implements this, however, in favor of code packing float arguments directly via extension/truncation, BITCAST, and BUILD_PAIR operations. This is required to support the ELFv2 ABI, since we cannot unconditionally write to the argument save area (which the caller might not have allocated). The change does, however, affect ELFv1 varags routines too; but even here the overall effect should be advantageous: Instead of loading the argument into the FPR, then storing the argument to the stack slot, and finally reloading the argument from the stack slot into a GPR, the new code now just loads the argument into the FPR, and subsequently loads the argument into the GPR (via BITCAST). That BITCAST might imply a save/reload from a stack temporary (in which case we're no worse than before); but it might be implemented more efficiently in some cases. The final part of the patch enables up to 8 FPRs and VRs for argument return in PPCCallingConv.td; this is required to support returning ELFv2 homogeneous aggregates. (Note that this doesn't affect other ABIs since LLVM wil only look for which register to use if the parameter is marked as "direct" return anyway.) Reviewed by Hal Finkel. llvm-svn: 213493
2014-07-21 08:13:26 +08:00
CCIfType<[f32], CCAssignToReg<[F1, F2, F3, F4, F5, F6, F7, F8]>>,
CCIfType<[f64], CCAssignToReg<[F1, F2, F3, F4, F5, F6, F7, F8]>>,
2015-02-25 09:06:45 +08:00
CCIfType<[v4f64, v4f32, v4i1],
CCIfSubtarget<"hasQPX()", CCAssignToReg<[QF1, QF2]>>>,
CCIfType<[v16i8, v8i16, v4i32, v2i64, v1i128, v4f32],
CCIfSubtarget<"hasAltivec()",
2015-02-25 09:06:45 +08:00
CCAssignToReg<[V2, V3, V4, V5, V6, V7, V8, V9]>>>,
CCIfType<[v2f64, v2i64], CCIfSubtarget<"hasVSX()",
CCAssignToReg<[VSH2, VSH3, VSH4, VSH5, VSH6, VSH7, VSH8, VSH9]>>>
]>;
//===----------------------------------------------------------------------===//
// PowerPC System V Release 4 32-bit ABI
//===----------------------------------------------------------------------===//
def CC_PPC32_SVR4_Common : CallingConv<[
Add CR-bit tracking to the PowerPC backend for i1 values This change enables tracking i1 values in the PowerPC backend using the condition register bits. These bits can be treated on PowerPC as separate registers; individual bit operations (and, or, xor, etc.) are supported. Tracking booleans in CR bits has several advantages: - Reduction in register pressure (because we no longer need GPRs to store boolean values). - Logical operations on booleans can be handled more efficiently; we used to have to move all results from comparisons into GPRs, perform promoted logical operations in GPRs, and then move the result back into condition register bits to be used by conditional branches. This can be very inefficient, because the throughput of these CR <-> GPR moves have high latency and low throughput (especially when other associated instructions are accounted for). - On the POWER7 and similar cores, we can increase total throughput by using the CR bits. CR bit operations have a dedicated functional unit. Most of this is more-or-less mechanical: Adjustments were needed in the calling-convention code, support was added for spilling/restoring individual condition-register bits, and conditional branch instruction definitions taking specific CR bits were added (plus patterns and code for generating bit-level operations). This is enabled by default when running at -O2 and higher. For -O0 and -O1, where the ability to debug is more important, this feature is disabled by default. Individual CR bits do not have assigned DWARF register numbers, and storing values in CR bits makes them invisible to the debugger. It is critical, however, that we don't move i1 values that have been promoted to larger values (such as those passed as function arguments) into bit registers only to quickly turn around and move the values back into GPRs (such as happens when values are returned by functions). A pair of target-specific DAG combines are added to remove the trunc/extends in: trunc(binary-ops(binary-ops(zext(x), zext(y)), ...) and: zext(binary-ops(binary-ops(trunc(x), trunc(y)), ...) In short, we only want to use CR bits where some of the i1 values come from comparisons or are used by conditional branches or selects. To put it another way, if we can do the entire i1 computation in GPRs, then we probably should (on the POWER7, the GPR-operation throughput is higher, and for all cores, the CR <-> GPR moves are expensive). POWER7 test-suite performance results (from 10 runs in each configuration): SingleSource/Benchmarks/Misc/mandel-2: 35% speedup MultiSource/Benchmarks/Prolangs-C++/city/city: 21% speedup MultiSource/Benchmarks/MiBench/automotive-susan: 23% speedup SingleSource/Benchmarks/CoyoteBench/huffbench: 13% speedup SingleSource/Benchmarks/Misc-C++/Large/sphereflake: 13% speedup SingleSource/Benchmarks/Misc-C++/mandel-text: 10% speedup SingleSource/Benchmarks/Misc-C++-EH/spirit: 10% slowdown MultiSource/Applications/lemon/lemon: 8% slowdown llvm-svn: 202451
2014-02-28 08:27:01 +08:00
CCIfType<[i1], CCPromoteToType<i32>>,
// The ABI requires i64 to be passed in two adjacent registers with the first
// register having an odd register number.
CCIfType<[i32], CCIfSplit<CCCustom<"CC_PPC32_SVR4_Custom_AlignArgRegs">>>,
// The first 8 integer arguments are passed in integer registers.
CCIfType<[i32], CCAssignToReg<[R3, R4, R5, R6, R7, R8, R9, R10]>>,
// Make sure the i64 words from a long double are either both passed in
// registers or both passed on the stack.
CCIfType<[f64], CCIfSplit<CCCustom<"CC_PPC32_SVR4_Custom_AlignFPArgRegs">>>,
// FP values are passed in F1 - F8.
CCIfType<[f32, f64], CCAssignToReg<[F1, F2, F3, F4, F5, F6, F7, F8]>>,
// Split arguments have an alignment of 8 bytes on the stack.
CCIfType<[i32], CCIfSplit<CCAssignToStack<4, 8>>>,
CCIfType<[i32], CCAssignToStack<4, 4>>,
// Floats are stored in double precision format, thus they have the same
// alignment and size as doubles.
CCIfType<[f32,f64], CCAssignToStack<8, 8>>,
2015-02-25 09:06:45 +08:00
// QPX vectors that are stored in double precision need 32-byte alignment.
CCIfType<[v4f64, v4i1], CCAssignToStack<32, 32>>,
// Vectors get 16-byte stack slots that are 16-byte aligned.
CCIfType<[v16i8, v8i16, v4i32, v4f32, v2f64, v2i64], CCAssignToStack<16, 16>>
]>;
// This calling convention puts vector arguments always on the stack. It is used
// to assign vector arguments which belong to the variable portion of the
// parameter list of a variable argument function.
def CC_PPC32_SVR4_VarArg : CallingConv<[
CCDelegateTo<CC_PPC32_SVR4_Common>
]>;
// In contrast to CC_PPC32_SVR4_VarArg, this calling convention first tries to
// put vector arguments in vector registers before putting them on the stack.
def CC_PPC32_SVR4 : CallingConv<[
2015-02-25 09:06:45 +08:00
// QPX vectors mirror the scalar FP convention.
CCIfType<[v4f64, v4f32, v4i1], CCIfSubtarget<"hasQPX()",
CCAssignToReg<[QF1, QF2, QF3, QF4, QF5, QF6, QF7, QF8]>>>,
// The first 12 Vector arguments are passed in AltiVec registers.
CCIfType<[v16i8, v8i16, v4i32, v2i64, v1i128, v4f32],
CCIfSubtarget<"hasAltivec()", CCAssignToReg<[V2, V3, V4, V5, V6, V7,
V8, V9, V10, V11, V12, V13]>>>,
2015-02-25 09:06:45 +08:00
CCIfType<[v2f64, v2i64], CCIfSubtarget<"hasVSX()",
CCAssignToReg<[VSH2, VSH3, VSH4, VSH5, VSH6, VSH7, VSH8, VSH9,
2015-02-25 09:06:45 +08:00
VSH10, VSH11, VSH12, VSH13]>>>,
CCDelegateTo<CC_PPC32_SVR4_Common>
]>;
// Helper "calling convention" to handle aggregate by value arguments.
// Aggregate by value arguments are always placed in the local variable space
// of the caller. This calling convention is only used to assign those stack
// offsets in the callers stack frame.
//
// Still, the address of the aggregate copy in the callers stack frame is passed
// in a GPR (or in the parameter list area if all GPRs are allocated) from the
// caller to the callee. The location for the address argument is assigned by
// the CC_PPC32_SVR4 calling convention.
//
// The only purpose of CC_PPC32_SVR4_Custom_Dummy is to skip arguments which are
// not passed by value.
def CC_PPC32_SVR4_ByVal : CallingConv<[
CCIfByVal<CCPassByVal<4, 4>>,
CCCustom<"CC_PPC32_SVR4_Custom_Dummy">
]>;
Cleanup PPC Altivec registers in CSR lists and improve VRSAVE handling There are a couple of (small) related changes here: 1. The printed name of the VRSAVE register has been changed from VRsave to vrsave in order to match the name accepted by GNU binutils. 2. Support for parsing vrsave has been added to the asm parser (it seems that there was no test case specifically covering this code, so I've added one). 3. The list of Altivec registers, which was common to all calling conventions, has been separated out. This allows us to define the base CSR lists, and then lists for each ABI with Altivec included. This allows SjLj, for example, to work correctly on non-Altivec targets without using unnatural definitions of the NoRegs CSR list. 4. VRSAVE is now always reserved on non-Darwin targets and all Altivec registers are reserved when Altivec is disabled. With these changes, it is now possible to compile a function containing __builtin_unwind_init() on Linux/PPC64 with debugging information. This did not work previously because GNU binutils assumes that all .cfi_offset offsets will be 8-byte aligned on PPC64 (and errors out if you provide a non-8-byte-aligned offset). This is not true for the vrsave register, however, because this register is used only on Darwin, GCC does not bother printing a .cfi_offset entry for it (even though there is a slot in the stack frame for it as specified by the ABI). This change allows us to do the same: we will also not print .cfi_offset directives for vrsave. llvm-svn: 185409
2013-07-02 11:39:34 +08:00
def CSR_Altivec : CalleeSavedRegs<(add V20, V21, V22, V23, V24, V25, V26, V27,
V28, V29, V30, V31)>;
def CSR_Darwin32 : CalleeSavedRegs<(add R13, R14, R15, R16, R17, R18, R19, R20,
R21, R22, R23, R24, R25, R26, R27, R28,
R29, R30, R31, F14, F15, F16, F17, F18,
F19, F20, F21, F22, F23, F24, F25, F26,
Cleanup PPC Altivec registers in CSR lists and improve VRSAVE handling There are a couple of (small) related changes here: 1. The printed name of the VRSAVE register has been changed from VRsave to vrsave in order to match the name accepted by GNU binutils. 2. Support for parsing vrsave has been added to the asm parser (it seems that there was no test case specifically covering this code, so I've added one). 3. The list of Altivec registers, which was common to all calling conventions, has been separated out. This allows us to define the base CSR lists, and then lists for each ABI with Altivec included. This allows SjLj, for example, to work correctly on non-Altivec targets without using unnatural definitions of the NoRegs CSR list. 4. VRSAVE is now always reserved on non-Darwin targets and all Altivec registers are reserved when Altivec is disabled. With these changes, it is now possible to compile a function containing __builtin_unwind_init() on Linux/PPC64 with debugging information. This did not work previously because GNU binutils assumes that all .cfi_offset offsets will be 8-byte aligned on PPC64 (and errors out if you provide a non-8-byte-aligned offset). This is not true for the vrsave register, however, because this register is used only on Darwin, GCC does not bother printing a .cfi_offset entry for it (even though there is a slot in the stack frame for it as specified by the ABI). This change allows us to do the same: we will also not print .cfi_offset directives for vrsave. llvm-svn: 185409
2013-07-02 11:39:34 +08:00
F27, F28, F29, F30, F31, CR2, CR3, CR4
)>;
def CSR_Darwin32_Altivec : CalleeSavedRegs<(add CSR_Darwin32, CSR_Altivec)>;
Cleanup PPC Altivec registers in CSR lists and improve VRSAVE handling There are a couple of (small) related changes here: 1. The printed name of the VRSAVE register has been changed from VRsave to vrsave in order to match the name accepted by GNU binutils. 2. Support for parsing vrsave has been added to the asm parser (it seems that there was no test case specifically covering this code, so I've added one). 3. The list of Altivec registers, which was common to all calling conventions, has been separated out. This allows us to define the base CSR lists, and then lists for each ABI with Altivec included. This allows SjLj, for example, to work correctly on non-Altivec targets without using unnatural definitions of the NoRegs CSR list. 4. VRSAVE is now always reserved on non-Darwin targets and all Altivec registers are reserved when Altivec is disabled. With these changes, it is now possible to compile a function containing __builtin_unwind_init() on Linux/PPC64 with debugging information. This did not work previously because GNU binutils assumes that all .cfi_offset offsets will be 8-byte aligned on PPC64 (and errors out if you provide a non-8-byte-aligned offset). This is not true for the vrsave register, however, because this register is used only on Darwin, GCC does not bother printing a .cfi_offset entry for it (even though there is a slot in the stack frame for it as specified by the ABI). This change allows us to do the same: we will also not print .cfi_offset directives for vrsave. llvm-svn: 185409
2013-07-02 11:39:34 +08:00
def CSR_SVR432 : CalleeSavedRegs<(add R14, R15, R16, R17, R18, R19, R20,
R21, R22, R23, R24, R25, R26, R27, R28,
R29, R30, R31, F14, F15, F16, F17, F18,
F19, F20, F21, F22, F23, F24, F25, F26,
Cleanup PPC Altivec registers in CSR lists and improve VRSAVE handling There are a couple of (small) related changes here: 1. The printed name of the VRSAVE register has been changed from VRsave to vrsave in order to match the name accepted by GNU binutils. 2. Support for parsing vrsave has been added to the asm parser (it seems that there was no test case specifically covering this code, so I've added one). 3. The list of Altivec registers, which was common to all calling conventions, has been separated out. This allows us to define the base CSR lists, and then lists for each ABI with Altivec included. This allows SjLj, for example, to work correctly on non-Altivec targets without using unnatural definitions of the NoRegs CSR list. 4. VRSAVE is now always reserved on non-Darwin targets and all Altivec registers are reserved when Altivec is disabled. With these changes, it is now possible to compile a function containing __builtin_unwind_init() on Linux/PPC64 with debugging information. This did not work previously because GNU binutils assumes that all .cfi_offset offsets will be 8-byte aligned on PPC64 (and errors out if you provide a non-8-byte-aligned offset). This is not true for the vrsave register, however, because this register is used only on Darwin, GCC does not bother printing a .cfi_offset entry for it (even though there is a slot in the stack frame for it as specified by the ABI). This change allows us to do the same: we will also not print .cfi_offset directives for vrsave. llvm-svn: 185409
2013-07-02 11:39:34 +08:00
F27, F28, F29, F30, F31, CR2, CR3, CR4
)>;
def CSR_SVR432_Altivec : CalleeSavedRegs<(add CSR_SVR432, CSR_Altivec)>;
def CSR_Darwin64 : CalleeSavedRegs<(add X13, X14, X15, X16, X17, X18, X19, X20,
X21, X22, X23, X24, X25, X26, X27, X28,
X29, X30, X31, F14, F15, F16, F17, F18,
F19, F20, F21, F22, F23, F24, F25, F26,
Cleanup PPC Altivec registers in CSR lists and improve VRSAVE handling There are a couple of (small) related changes here: 1. The printed name of the VRSAVE register has been changed from VRsave to vrsave in order to match the name accepted by GNU binutils. 2. Support for parsing vrsave has been added to the asm parser (it seems that there was no test case specifically covering this code, so I've added one). 3. The list of Altivec registers, which was common to all calling conventions, has been separated out. This allows us to define the base CSR lists, and then lists for each ABI with Altivec included. This allows SjLj, for example, to work correctly on non-Altivec targets without using unnatural definitions of the NoRegs CSR list. 4. VRSAVE is now always reserved on non-Darwin targets and all Altivec registers are reserved when Altivec is disabled. With these changes, it is now possible to compile a function containing __builtin_unwind_init() on Linux/PPC64 with debugging information. This did not work previously because GNU binutils assumes that all .cfi_offset offsets will be 8-byte aligned on PPC64 (and errors out if you provide a non-8-byte-aligned offset). This is not true for the vrsave register, however, because this register is used only on Darwin, GCC does not bother printing a .cfi_offset entry for it (even though there is a slot in the stack frame for it as specified by the ABI). This change allows us to do the same: we will also not print .cfi_offset directives for vrsave. llvm-svn: 185409
2013-07-02 11:39:34 +08:00
F27, F28, F29, F30, F31, CR2, CR3, CR4
)>;
Cleanup PPC Altivec registers in CSR lists and improve VRSAVE handling There are a couple of (small) related changes here: 1. The printed name of the VRSAVE register has been changed from VRsave to vrsave in order to match the name accepted by GNU binutils. 2. Support for parsing vrsave has been added to the asm parser (it seems that there was no test case specifically covering this code, so I've added one). 3. The list of Altivec registers, which was common to all calling conventions, has been separated out. This allows us to define the base CSR lists, and then lists for each ABI with Altivec included. This allows SjLj, for example, to work correctly on non-Altivec targets without using unnatural definitions of the NoRegs CSR list. 4. VRSAVE is now always reserved on non-Darwin targets and all Altivec registers are reserved when Altivec is disabled. With these changes, it is now possible to compile a function containing __builtin_unwind_init() on Linux/PPC64 with debugging information. This did not work previously because GNU binutils assumes that all .cfi_offset offsets will be 8-byte aligned on PPC64 (and errors out if you provide a non-8-byte-aligned offset). This is not true for the vrsave register, however, because this register is used only on Darwin, GCC does not bother printing a .cfi_offset entry for it (even though there is a slot in the stack frame for it as specified by the ABI). This change allows us to do the same: we will also not print .cfi_offset directives for vrsave. llvm-svn: 185409
2013-07-02 11:39:34 +08:00
def CSR_Darwin64_Altivec : CalleeSavedRegs<(add CSR_Darwin64, CSR_Altivec)>;
def CSR_SVR464 : CalleeSavedRegs<(add X14, X15, X16, X17, X18, X19, X20,
X21, X22, X23, X24, X25, X26, X27, X28,
X29, X30, X31, F14, F15, F16, F17, F18,
F19, F20, F21, F22, F23, F24, F25, F26,
Cleanup PPC Altivec registers in CSR lists and improve VRSAVE handling There are a couple of (small) related changes here: 1. The printed name of the VRSAVE register has been changed from VRsave to vrsave in order to match the name accepted by GNU binutils. 2. Support for parsing vrsave has been added to the asm parser (it seems that there was no test case specifically covering this code, so I've added one). 3. The list of Altivec registers, which was common to all calling conventions, has been separated out. This allows us to define the base CSR lists, and then lists for each ABI with Altivec included. This allows SjLj, for example, to work correctly on non-Altivec targets without using unnatural definitions of the NoRegs CSR list. 4. VRSAVE is now always reserved on non-Darwin targets and all Altivec registers are reserved when Altivec is disabled. With these changes, it is now possible to compile a function containing __builtin_unwind_init() on Linux/PPC64 with debugging information. This did not work previously because GNU binutils assumes that all .cfi_offset offsets will be 8-byte aligned on PPC64 (and errors out if you provide a non-8-byte-aligned offset). This is not true for the vrsave register, however, because this register is used only on Darwin, GCC does not bother printing a .cfi_offset entry for it (even though there is a slot in the stack frame for it as specified by the ABI). This change allows us to do the same: we will also not print .cfi_offset directives for vrsave. llvm-svn: 185409
2013-07-02 11:39:34 +08:00
F27, F28, F29, F30, F31, CR2, CR3, CR4
)>;
def CSR_SVR464_Altivec : CalleeSavedRegs<(add CSR_SVR464, CSR_Altivec)>;
def CSR_SVR464_R2 : CalleeSavedRegs<(add CSR_SVR464, X2)>;
def CSR_SVR464_R2_Altivec : CalleeSavedRegs<(add CSR_SVR464_Altivec, X2)>;
Cleanup PPC Altivec registers in CSR lists and improve VRSAVE handling There are a couple of (small) related changes here: 1. The printed name of the VRSAVE register has been changed from VRsave to vrsave in order to match the name accepted by GNU binutils. 2. Support for parsing vrsave has been added to the asm parser (it seems that there was no test case specifically covering this code, so I've added one). 3. The list of Altivec registers, which was common to all calling conventions, has been separated out. This allows us to define the base CSR lists, and then lists for each ABI with Altivec included. This allows SjLj, for example, to work correctly on non-Altivec targets without using unnatural definitions of the NoRegs CSR list. 4. VRSAVE is now always reserved on non-Darwin targets and all Altivec registers are reserved when Altivec is disabled. With these changes, it is now possible to compile a function containing __builtin_unwind_init() on Linux/PPC64 with debugging information. This did not work previously because GNU binutils assumes that all .cfi_offset offsets will be 8-byte aligned on PPC64 (and errors out if you provide a non-8-byte-aligned offset). This is not true for the vrsave register, however, because this register is used only on Darwin, GCC does not bother printing a .cfi_offset entry for it (even though there is a slot in the stack frame for it as specified by the ABI). This change allows us to do the same: we will also not print .cfi_offset directives for vrsave. llvm-svn: 185409
2013-07-02 11:39:34 +08:00
def CSR_NoRegs : CalleeSavedRegs<(add)>;
def CSR_64_AllRegs: CalleeSavedRegs<(add X0, (sequence "X%u", 3, 10),
(sequence "X%u", 14, 31),
(sequence "F%u", 0, 31),
(sequence "CR%u", 0, 7))>;
def CSR_64_AllRegs_Altivec : CalleeSavedRegs<(add CSR_64_AllRegs,
(sequence "V%u", 0, 31))>;
def CSR_64_AllRegs_VSX : CalleeSavedRegs<(add CSR_64_AllRegs_Altivec,
(sequence "VSL%u", 0, 31),
(sequence "VSH%u", 0, 31))>;