Most of the test changes are trivial instruction reorderings and differing
register allocations, without any obvious performance impact.
Differential Revision: https://reviews.llvm.org/D66973
llvm-svn: 372106
This follows similar logic in the ARM and Mips backends, and allows the free
use of s0 in functions without a dedicated frame pointer. The changes in
callee-saved-gprs.ll most clearly show the effect of this patch.
llvm-svn: 356063
This patch:
* Adds necessary RV64D codegen patterns
* Modifies CC_RISCV so it will properly handle f64 types (with soft float ABI)
Note that in general there is no reason to try to select fcvt.w[u].d rather than fcvt.l[u].d for i32 conversions because fptosi/fptoui produce poison if the input won't fit into the target type.
Differential Revision: https://reviews.llvm.org/D53237
llvm-svn: 352833
Summary:
Set CostPerUse higher for registers that are not used in the compressed
instruction set. This will influence the greedy register allocator to reduce
the use of registers that can't be encoded in 16 bit instructions. This
affects register allocation even when compressed instruction isn't targeted,
we see no major negative codegen impact.
Reviewers: asb
Reviewed By: asb
Subscribers: rbar, johnrusso, simoncook, jordy.potman.lists, apazos, niosHD, kito-cheng, shiva0217, zzheng, edward-jones, mgrang
Differential Revision: https://reviews.llvm.org/D47039
llvm-svn: 333132
Summary:
When lowering global address, lower the base as a TargetGlobal first then
create an SDNode for the offset separately and chain it to the address calculation
This optimization will create a DAG where the base address of a global access will
be reused between different access. The offset can later be folded into the immediate
part of the memory access instruction.
With this optimization we generate:
lui a0, %hi(s)
addi a0, a0, %lo(s) ; shared base address.
addi a1, zero, 20 ; 2 instructions per access.
sw a1, 44(a0)
addi a1, zero, 10
sw a1, 8(a0)
addi a1, zero, 30
sw a1, 80(a0)
Instead of:
lui a0, %hi(s+44) ; 3 instructions per access.
addi a1, zero, 20
sw a1, %lo(s+44)(a0)
lui a0, %hi(s+8)
addi a1, zero, 10
sw a1, %lo(s+8)(a0)
lui a0, %hi(s+80)
addi a1, zero, 30
sw a1, %lo(s+80)(a0)
Which will save one instruction per access.
Reviewers: asb, apazos
Reviewed By: asb
Subscribers: rbar, johnrusso, simoncook, jordy.potman.lists, niosHD, kito-cheng, shiva0217, zzheng, edward-jones, mgrang, apazos, asb, llvm-commits
Differential Revision: https://reviews.llvm.org/D46989
llvm-svn: 332641
To do this:
1. Change GlobalAddress SDNode to TargetGlobalAddress to avoid legalizer
split the symbol.
2. Change ExternalSymbol SDNode to TargetExternalSymbol to avoid legalizer
split the symbol.
3. Let PseudoCALL match direct call with target operand TargetGlobalAddress
and TargetExternalSymbol.
Differential Revision: https://reviews.llvm.org/D44885
llvm-svn: 330827