The isTriviallyRematerializable hook is only called for instructions that are
tagged as isAsCheapAsAMove. Since ADDI 0 is used for "mv" it should definitely
be marked with "isAsCheapAsAMove". This change avoids one stack spill in most of
the atomic-rmw.ll tests functions. It also avoids stack spills in two of our
out-of-tree CHERI tests.
ORI/XORI with zero may or may not be the same as a move micro-architecturally,
but since we are already doing it for register == x0, we might as well
do the same if the immediate is zero.
Reviewed By: luismarques
Differential Revision: https://reviews.llvm.org/D86480
Current tail duplication integrated in bb layout is designed to increase the fallthrough from a BB's predecessor to its successor, but we have observed cases that duplication doesn't increase fallthrough, or it brings too much size overhead.
To overcome these two issues in function canTailDuplicateUnplacedPreds I add two checks:
make sure there is at least one duplication in current work set.
the number of duplication should not exceed the number of successors.
The modification in hasBetterLayoutPredecessor fixes a bug that potential predecessor must be at the bottom of a chain.
Differential Revision: https://reviews.llvm.org/D64376
Summary:
The RISC-V backend used to generate `add <reg>, x0, <reg>` in a few
instances. It seems most places no longer generate this sequence.
This is semantically equivalent to `addi <reg>, <reg>, 0`, but the
latter has the advantage of being noted to be the canonical instruction
to be used for moves (which microarchitectures can and should recognise
as such).
The changed testcases use instruction aliases - `mv <reg>, <reg>` is an
alias for `addi <reg>, <reg>, 0`.
Reviewers: luismarques
Reviewed By: luismarques
Subscribers: hiraditya, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, kito-cheng, shiva0217, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, Jim, s.egerton, pzheng, sameer.abuasal, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70124
Most of the test changes are trivial instruction reorderings and differing
register allocations, without any obvious performance impact.
Differential Revision: https://reviews.llvm.org/D66973
llvm-svn: 372106
Patch https://reviews.llvm.org/D43256 introduced more aggressive loop layout optimization which depends on profile information. If profile information is not available, the statically estimated profile information(generated by BranchProbabilityInfo.cpp) is used. If user program doesn't behave as BranchProbabilityInfo.cpp expected, the layout may be worse.
To be conservative this patch restores the original layout algorithm in plain mode. But user can still try the aggressive layout optimization with -force-precise-rotation-cost=true.
Differential Revision: https://reviews.llvm.org/D65673
llvm-svn: 369664
It caused assertions to fire when building Chromium:
lib/CodeGen/LiveDebugValues.cpp:331: bool
{anonymous}::LiveDebugValues::OpenRangesSet::empty() const: Assertion
`Vars.empty() == VarLocs.empty() && "open ranges are inconsistent"' failed.
See https://crbug.com/992871#c3 for how to reproduce.
> Patch https://reviews.llvm.org/D43256 introduced more aggressive loop layout optimization which depends on profile information. If profile information is not available, the statically estimated profile information(generated by BranchProbabilityInfo.cpp) is used. If user program doesn't behave as BranchProbabilityInfo.cpp expected, the layout may be worse.
>
> To be conservative this patch restores the original layout algorithm in plain mode. But user can still try the aggressive layout optimization with -force-precise-rotation-cost=true.
>
> Differential Revision: https://reviews.llvm.org/D65673
llvm-svn: 368579
Patch https://reviews.llvm.org/D43256 introduced more aggressive loop layout optimization which depends on profile information. If profile information is not available, the statically estimated profile information(generated by BranchProbabilityInfo.cpp) is used. If user program doesn't behave as BranchProbabilityInfo.cpp expected, the layout may be worse.
To be conservative this patch restores the original layout algorithm in plain mode. But user can still try the aggressive layout optimization with -force-precise-rotation-cost=true.
Differential Revision: https://reviews.llvm.org/D65673
llvm-svn: 368339
This patch optimizes the emission of a sequence of SELECTs with the same
condition, avoiding the insertion of unnecessary control flow. Such a sequence
often occurs when a SELECT of values wider than XLEN is legalized into two
SELECTs with legal types. We have identified several use cases where the
SELECTs could be interleaved with other instructions. Therefore, we extend the
sequence to include non-SELECT instructions if we are able to detect that the
non-SELECT instructions do not impact the optimization.
This patch supersedes https://reviews.llvm.org/D59096, which attempted to
address this issue by introducing a new SelectionDAG node. Hat tip to Eli
Friedman for his feedback on how to best handle this issue.
Differential Revision: https://reviews.llvm.org/D59355
Patch by Luís Marques.
llvm-svn: 356741
This follows similar logic in the ARM and Mips backends, and allows the free
use of s0 in functions without a dedicated frame pointer. The changes in
callee-saved-gprs.ll most clearly show the effect of this patch.
llvm-svn: 356063
The previous DAG combiner-based approach had an issue with infinite loops
between the target-dependent and target-independent combiner logic (see
PR40333). Although this was worked around in rL351806, the combiner-based
approach is still potentially brittle and can fail to select the 32-bit shift
variant when profitable to do so, as demonstrated in the pr40333.ll test case.
This patch instead introduces target-specific SelectionDAG nodes for
SHLW/SRLW/SRAW and custom-lowers variable i32 shifts to them. pr40333.ll is a
good example of how this approach can improve codegen.
This adds DAG combine that does SimplifyDemandedBits on the operands (only
lower 32-bits of first operand and lower 5 bits of second operand are read).
This seems better than implementing SimplifyDemandedBitsForTargetNode as there
is no guarantee that would be called (and it's not for e.g. the anyext return
test cases). Also implements ComputeNumSignBitsForTargetNode.
There are codegen changes in atomic-rmw.ll and atomic-cmpxchg.ll but the new
instruction sequences are semantically equivalent.
Differential Revision: https://reviews.llvm.org/D57085
llvm-svn: 352169
In order to support codegen RV64A, this patch:
* Introduces masked atomics intrinsics for atomicrmw operations and cmpxchg
that use the i64 type. These are ultimately lowered to masked operations
using lr.w/sc.w, but we need to use these alternate intrinsics for RV64
because i32 is not legal
* Modifies RISCVExpandPseudoInsts.cpp to handle PseudoAtomicLoadNand64 and
PseudoCmpXchg64
* Modifies the AtomicExpandPass hooks in RISCVTargetLowering to sext/trunc as
needed for RV64 and to select the i64 intrinsic IDs when necessary
* Adds appropriate patterns to RISCVInstrInfoA.td
* Updates test/CodeGen/RISCV/atomic-*.ll to show RV64A support
This ends up being a fairly mechanical change, as the logic for RV32A is
effectively reused.
Differential Revision: https://reviews.llvm.org/D53233
llvm-svn: 351422
Introduce a new RISCVExpandPseudoInsts pass to expand atomic
pseudo-instructions after register allocation. This is necessary in order to
ensure that register spills aren't introduced between LL and SC, thus breaking
the forward progress guarantee for the operation. AArch64 does something
similar for CmpXchg (though only at O0), and Mips is moving towards this
approach (see D31287). See also [this mailing list
post](http://lists.llvm.org/pipermail/llvm-dev/2016-May/099490.html) from
James Knight, which summarises the issues with lowering to ll/sc in IR or
pre-RA.
See the [accompanying RFC
thread](http://lists.llvm.org/pipermail/llvm-dev/2018-June/123993.html) for an
overview of the lowering strategy.
Differential Revision: https://reviews.llvm.org/D47882
llvm-svn: 342534
This patch adds lowering for atomic fences and relies on AtomicExpandPass to
lower atomic loads/stores, atomic rmw, and cmpxchg to __atomic_* libcalls.
test/CodeGen/RISCV/atomic-* are modelled on the exhaustive
test/CodeGen/PPC/atomics-regression.ll, and will prove more useful once RV32A
codegen support is introduced.
Fence mappings are taken from table A.6 in the current draft of version 2.3 of
the RISC-V Instruction Set Manual, which incorporates the memory model changes
and definitions contributed by the RISC-V Memory Consistency Model task group.
Differential Revision: https://reviews.llvm.org/D47587
llvm-svn: 334590