Returning int64_t was arbitrarily limiting for wide integer types, and
the functions should handle the full generality of the IR.
Also changes the full form which returns the originally defined
vreg. Add another wrapper for the common case of just immediately
converting to int64_t (arguably this would be useful for the full
return value case as well).
One possible issue with this change is some of the existing uses did
break without conversion to getConstantVRegSExtVal, and it's possible
some without adequate test coverage are now broken.
`TryFoldBinOpIntoSelect` didn't have a check for `Optimized`, meaning you could
end up folding twice. (e.g. a select with a G_ADD on the true side, and a G_SUB
on the false side)
Add in the missing `if` and a test.
This implements the following folds:
```
G_SELECT cc, (G_SUB 0, %x), %false -> CSNEG %x, %false, inv_cc
G_SELECT cc, (G_XOR x, -1), %false -> CSINV %x, %false, inv_cc
```
This is similar to the folds introduced in
5bc0bd05e6.
In 5bc0bd05e6 I mentioned that we may prefer to do
this in AArch64PostLegalizerLowering.
I think that it's probably better to do this in the selector. The way we select
G_SELECT depends on what register banks end up being assigned to it. If we did
this in AArch64PostLegalizerLowering, then we'd end up checking *every* G_SELECT
to see if it's worth swapping operands. Doing it in the selector allows us to
restrict the optimization to only relevant G_SELECTs.
Also fix up some comments in `TryFoldBinOpIntoSelect` which are kind of
confusing IMO.
Example IR: https://godbolt.org/z/3qPGca
Differential Revision: https://reviews.llvm.org/D92860
We didn't have selector support for these.
Selection code is similar to `getAArch64XALUOOp` in AArch64ISelLowering. Similar
to that code, this returns the AArch64CC and the instruction produced. In SDAG,
this is used to optimize select + overflow and condition branch + overflow
pairs. (See `AArch64TargetLowering::LowerBR_CC` and
`AArch64TargetLowering::LowerSelect`)
(G_USUBO should be easy to add here, but it isn't legalized right now.)
This also factors out the existing G_UADDO selection code, and removes an
unnecessary check for s32/s64. AFAIK, we shouldn't ever get anything other than
s32/s64. It makes more sense for this to be handled by the type assertion in
`emitAddSub`.
Differential Revision: https://reviews.llvm.org/D92610
Sometimes people get minimal crash reports after a UBSAN incident. This change
tags each trap with an integer representing the kind of failure encountered,
which can aid in tracking down the root cause of the problem.
`selectCompareBranch` was hard to understand.
Also, it was being needlessly pessimistic with the `ProduceNonFlagSettingCondBr`
case. It assumed that everything in `selectCompareBranch` would emit a TB(N)Z
or C(B)NZ. That's not true; the G_FCMP + G_BRCOND case would never emit those
instructions, and the G_ICMP + G_BRCOND case was capable of emitting an integer
compare + Bcc.
- Refactor `selectCompareBranch` into separate functions based off of what is
feeding the G_BRCOND's condition.
- Move G_BRCOND selection code from `select` to `selectCompareBranch`.
- Remove duplicated constraint code from the code originally in `select`;
`emitTestBit` already handles that, so no need to constrain twice.
- Factor out the G_FCMP + G_BRCOND case into `selectCompareBranchFedByFCmp`.
- Split the G_ICMP + G_BRCOND case into an optimization function,
`tryOptCompareBranchFedByICmp` and a general selection function,
`selectCompareBranchFedByICmp`.
- Reduce the number of things passed to `tryOptAndIntoCompareBranch`.
- Improve documentation.
- Give some variables more descriptive names.
Other than improving the code generation for functions with
speculative_load_hardening by getting the logic correct, this is NFC.
Differential Revision: https://reviews.llvm.org/D92582
When we have a 128-bit register, emitTestBit would incorrectly narrow to 32
bits always. If the bit number was > 32, then we would need a TB(N)ZX. This
would cause a crash, as we'd have the wrong register class. (PR48379)
This generalizes `narrowExtReg` into `moveScalarRegClass`.
This also allows us to remove `widenGPRBankRegIfNeeded` entirely, since
`selectCopy` correctly handles SUBREG_TO_REG etc.
This does create some codegen changes (since `selectCopy` uses the `all`
regclass variants). However, I think that these will likely be optimized away,
and we can always improve the `selectCopy` code. It looks like we should
revisit `selectCopy` at this point, and possibly refactor it into at least one
`emit` function.
Differential Revision: https://reviews.llvm.org/D92707
We are avoiding writing to WZR just about everywhere else.
Also update the code to use MachineIRBuilder for the sake of consistency.
We also didn't have a GlobalISel testcase for this path, so add a simple one
now.
Differential Revision: https://reviews.llvm.org/D90626
Instead of falling back to selecting TB(N)Z when we fail to select an
optimized compare against 0, select Bcc instead.
Also simplify selectCompareBranch a little while we're here, because the logic
was kind of hard to follow.
At -O0, this is a 0.1% geomean code size improvement for CTMark.
A simple example of where this can kick in is here:
https://godbolt.org/z/4rra6P
In the example above, GlobalISel currently produces a subs, cset, and tbnz.
SelectionDAG, on the other hand, just emits a compare and b.le.
Differential Revision: https://reviews.llvm.org/D92358
This uses the same reasoning as other similar conversions just before selection,
without it we miss out on selection because the importer considers s64 and p0
distinct types.
When we see
```
xor = G_XOR xor_lhs, -1
select = G_SELECT cc, tval, xor
```
Fold this into
```
select = CSINV tval, xor_lhs, cc
```
Update select-select.mir to reflect the changes.
For now, only handle the case where the G_XOR is the false-value for the
G_SELECT. It may make more sense to handle the true-value case in post-legalizer
lowering.
Differential Revision: https://reviews.llvm.org/D90774
The G_ZEXT in these cases seems to actually come from a combine that we do but
SelectionDAG doesn't. Looking through it allows us to match "uxtw #2" addressing
modes.
Differential Revision: https://reviews.llvm.org/D91475
When we see
```
%sub = G_SUB 0, %x
%select = G_SELECT %cc, %t, %sub
```
Fold away the G_SUB by producing
```
%select = CSNEG %t, %x, cc
```
Simple IR example: https://godbolt.org/z/K8TEnh
This is valid on both sides of the select, but for now, just handle one side.
It may make more sense to handle swapping sides during post-legalizer lowering.
Differential Revision: https://reviews.llvm.org/D90723
Reducing some code duplication.
We had a helper for checking if a predicate is unsigned. Remove that and use
the existing function in Instructions.cpp.
Differential Revision: https://reviews.llvm.org/D91288
It's fairly common to need matchers for a specific constant value, or for
common idioms like finding a negated register.
Add
- `m_SpecificICst`, which returns true when matching a specific value..
- `m_ZeroInt`, which returns true when an integer 0 is matched.
- `m_Neg`, which returns when a register is negated.
Also update a few places which use idioms related to the new matchers.
Differential Revision: https://reviews.llvm.org/D91397
Select the following:
- G_SELECT cc, 0, 1 -> CSINC zreg, zreg, cc
- G_SELECT cc 0, -1 -> CSINV zreg, zreg cc
- G_SELECT cc, 1, f -> CSINC f, zreg, inv_cc
- G_SELECT cc, -1, f -> CSINV f, zreg, inv_cc
- G_SELECT cc, t, 1 -> CSINC t, zreg, cc
- G_SELECT cc, t, -1 -> CSINC t, zreg, cc
(IR example: https://godbolt.org/z/YfPna9)
These correspond to a bunch of the AArch64csel patterns in AArch64InstrInfo.td.
Unfortunately, it doesn't seem like we can import patterns that use NZCV like
those ones do. E.g.
```
def : Pat<(AArch64csel GPR32:$tval, (i32 1), (i32 imm:$cc), NZCV),
(CSINCWr GPR32:$tval, WZR, (i32 imm:$cc))>;
```
So we have to manually select these for now.
This replaces `selectSelectOpc` with an `emitSelect` function, which performs
these optimizations.
Differential Revision: https://reviews.llvm.org/D90701
The manual selection code for add/sub was not checking if it was possible to
fold in shifts + extends (the *rx opcode variants).
As a result, we could never select things like
```
cmp x1, w0, uxtw #2
```
Because we don't import any patterns for compares.
This adds support for the arithmetic shifted register forms and updates tests
for instructions selected using `emitADD`, `emitADDS`, and `emitSUBS`.
This is a 0.1% geomean code size improvement on SPECINT2000 at -Os.
Differential Revision: https://reviews.llvm.org/D91207
Previously, we only handled negative arithmetic immediates in the imported
selector code.
Since we don't import code for, say, compares, we were missing opportunities
for things like
```
%cst:gpr(s64) = G_CONSTANT i64 -10
%cmp:gpr(s32) = G_ICMP intpred(eq), %reg0(s64), %cst
->
%adds = ADDSXri %reg0, 10, 0, implicit-def $nzcv
%cmp = CSINCWr $wzr, $wzr, 1, implicit $nzcv
```
Instead, we would have to materialize the constant and emit a SUBS.
This adds support for selection like above for SUB, SUBS, ADD, and ADDS.
This is a 0.1% geomean code size improvement on SPECINT2000 at -Os.
Differential Revision: https://reviews.llvm.org/D91108
These were previously handled by pattern matching shuffles in the selector, but
adding a new opcode and making it equivalent to the AArch64duplane SDAG node
allows us to select more patterns, like lane indexed FMLAs (patch adding a test
for that will be committed later).
The pattern matching code has been simply moved to postlegalize lowering.
Differential Revision: https://reviews.llvm.org/D90820
Move the code which adjusts the immediate/predicate on a G_ICMP to
AArch64PostLegalizerLowering.
This
- Reduces the number of places we need to test for optimized compares in the
selector. We know that the compare should have been simplified by the time it
hits the selector, so we can avoid testing this in selects, brconds, etc.
- Allows us to potentially fold more compares (previously, this optimization
was only done after calling `tryFoldCompare`, this may allow us to hit some more
TST cases)
- Simplifies the selection code in `emitIntegerCompare` significantly; we can
just use an emitSUBS function.
- Allows us to avoid checking that the predicate has been updated after
`emitIntegerCompare`.
Also add a utility header file for things that may be useful in the selector
and various combiners. No need for an implementation file at this point, since
it's just one constexpr function for now. I've run into a couple cases where
having one of these would be handy, so might as well add it here. There are
a couple functions in the selector that can probably be factored out into
here.
Differential Revision: https://reviews.llvm.org/D89823
Simplify emitIntegerCompare and improve comments + asserts.
Mostly making the code a little easier to follow.
Also, this code is only used for G_ICMP. The legalizer ensures that the LHS/RHS
for every G_ICMP is either a s32 or s64. So, there's no need to handle anything
else. This lets us remove a bunch of checks for whether or not we successfully
emitted the compare.
Differential Revision: https://reviews.llvm.org/D89433
These cause problems for later optimizations, just using an unused vreg like
SelectionDAG generates better code in the end, and obviates the need for some
GISel specific flag optimizations.
Differential Revision: https://reviews.llvm.org/D89419
Similar to the FP case in `AArch64TargetLowering::LowerBR_CC`.
Instead of emitting the csets + a tbnz, just emit a compare + bcc
(or two bccs, depending on the condition code)
This improves cases like this: https://godbolt.org/z/v8hebx
This is a 0.1% geomean code size improvement for CTMark at -O3.
Differential Revision: https://reviews.llvm.org/D88624
Partially refactoring, partially fixing a bug.
- We shouldn't use TB(N)ZX unless the bit number is >= 32
- We can fold more than xor using emitTestBit
Also remove a check which isn't relevant anymore + update tests.
Rename select-brcond-of-not.mir to select-brcond-of-binop.mir, since it now
tests more than just G_XOR.
Differential Revision: https://reviews.llvm.org/D88702
Unfortunately the leaf SDAG patterns aren't supported yet so we need to do
this manually, but it's not a significant amount of code anyway.
Differential Revision: https://reviews.llvm.org/D87924
Refactor this so it's similar to the existing integer comparison code.
Also add some missing 64-bit testcases to select-fcmp.mir.
Refactoring to prep for improving selection for G_FCMP-related conditional
branches etc.
Differential Revision: https://reviews.llvm.org/D88614
This isn't a real with the codegen, it's a previously known bug in clang which
causes non-deterministic failures due to garbage bits in undef registers being
used in saturating instructions.
I'm disabling the result checking for the test until this issue is resolved.
This reverts commit 6c8168324b.
Support emitting ANDSXrs and ANDSWrs in `emitTST`. Update opt-fold-compare.mir
to show that it works.
Differential Revision: https://reviews.llvm.org/D87530
These don't get selected by the imported patterns, and avoiding generating them
is a whole load of not-worth-it-hassle (until we have fp types in GlobalISel).
This turns all jump table entries into deltas within the target
function because in the small memory model all code & static data must
be in a 4GB block somewhere in memory.
When the entries were a delta between the table location and a basic
block, the 32-bit signed entries are not enough to guarantee
reachability.
https://reviews.llvm.org/D87286
These functions were extremely similar:
- `emitADD`
- `emitADDS`
- `emitCMN`
Refactor them a little, introducing a more generic `emitInstr` function to
do most of the work.
Also add support for the immediate + shifted register addressing modes in each
of them.
Update select-uaddo.mir to show that selecing ADDS now supports folding
immediates + shifts. (I don't think this can impact CMN, because the CMN checks
require a G_SUB with a non-constant on the RHS.)
This is around a 0.02% code size improvement on CTMark at -O3.
Differential Revision: https://reviews.llvm.org/D87529