Move the code which adjusts the immediate/predicate on a G_ICMP to
AArch64PostLegalizerLowering.
This
- Reduces the number of places we need to test for optimized compares in the
selector. We know that the compare should have been simplified by the time it
hits the selector, so we can avoid testing this in selects, brconds, etc.
- Allows us to potentially fold more compares (previously, this optimization
was only done after calling `tryFoldCompare`, this may allow us to hit some more
TST cases)
- Simplifies the selection code in `emitIntegerCompare` significantly; we can
just use an emitSUBS function.
- Allows us to avoid checking that the predicate has been updated after
`emitIntegerCompare`.
Also add a utility header file for things that may be useful in the selector
and various combiners. No need for an implementation file at this point, since
it's just one constexpr function for now. I've run into a couple cases where
having one of these would be handy, so might as well add it here. There are
a couple functions in the selector that can probably be factored out into
here.
Differential Revision: https://reviews.llvm.org/D89823
There are a lot of combines in AArch64PostLegalizerCombiner which exist to
facilitate instruction matching in the selector. (E.g. matching for G_ZIP and
other shuffle vector pseudos)
It still makes sense to select these instructions at -O0.
Matching earlier in a combiner can reduce complexity in the selector
significantly. For example, a good portion of our selection code for compares
would be a lot easier to represent in a combine.
This patch moves matching combines into a "AArch64PostLegalizerLowering"
combiner which runs at all optimization levels.
Also, while we're here, improve the documentation for the
AArch64PostLegalizerCombiner, and fix up the filepath in its file comment.
And also add a 'r' which somehow got dropped from a bunch of function names.
https://reviews.llvm.org/D89820
When we see this:
```
%and = G_AND %x, %y
%xor = G_XOR %and, %y
```
Produce this:
```
%not = G_XOR %x, -1
%new_and = G_AND %not, %y
```
as long as we are guaranteed to eliminate the original G_AND.
Also matches all commuted forms. E.g.
```
%and = G_AND %y, %x
%xor = G_XOR %y, %and
```
will be matched as well.
Differential Revision: https://reviews.llvm.org/D88104
In order to select the immediate forms using the imported patterns, we need to
lower them into new G_VASHR/G_VLSHR target generic ops. Add a combine to do this
matching build_vector of constant operands.
With this, we get selection for free.
This combine previously tried to take sequences like:
%cond = G_ICMP pred, a, b
G_BRCOND %cond, %truebb
G_BR %falsebb
%truebb:
...
%falsebb:
...
and by inverting the compare predicate and swapping branch targets, delete the
G_BR and instead have a single conditional branch to the falsebb. Since in an
earlier patch we have a combine to fold not(icmp) into just an inverted icmp,
we don't need this combine to do as much. This patch instead generalizes the
combine by just looking for:
G_BRCOND %cond, %truebb
G_BR %falsebb
%truebb:
...
%falsebb:
...
and then inverting the condition using a not (xor). The xor can be folded away
in a separate combine. This change also lets us avoid some optimization code
in the IRTranslator.
I also think that deleting G_BRs in the combiner is unnecessary. That's
something that targets can decide to do at selection time and could simplify
generic code in future.
Differential Revision: https://reviews.llvm.org/D86664
If we have a mask, and a value x, where (x & mask) == x, we can drop the AND
and just use x.
This is about a 0.4% geomean code size improvement on CTMark at -O3 for AArch64.
In AArch64, this is most useful post-legalization. Patterns like this often
show up when legalizing s1s, which must be extended to larger types.
e.g.
```
%cmp:_(s32) = G_ICMP ...
%and:_(s32) = G_AND %cmp, 1
```
Since G_ICMP only produces a single bit, there's no reason to mask it with the
G_AND.
Differential Revision: https://reviews.llvm.org/D85463
This cleans up copies that the legalizer or other combines leave around. They
can occasionally end up escaping as moves.
Differential Revision: https://reviews.llvm.org/D85964
This implements
```
(logic_op (op x...), (op y...)) -> (op (logic_op x, y))
```
when `op` is an extend, a shift, or an and.
This is similar to `DAGCombiner::hoistLogicOpWithSameOpcodeHands`
(with a bunch of missing cases, e.g. G_TRUNC, G_BITCAST, etc.)
This is implemented so it works both pre and post-legalization.
This also adds a general way to add a series of instructions in a combine.
(`applyBuildInstructionSteps`).
Differential Revision: https://reviews.llvm.org/D85050
Summary:
Adds the ability to add members to a generated combiner via
a State base class. In the current AArch64PreLegalizerCombiner
this is used to make Helper available without having to
provide it to every call.
As part of this, split the command line processing into a
separate object so that it still only runs once even though
the generated combiner is constructed more frequently.
Depends on D81862
Reviewers: aditya_nandakumar, bogner, volkan, aemerson, paquette, arsenm
Reviewed By: arsenm
Subscribers: jvesely, wdng, nhaehnle, kristof.beyls, hiraditya, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D81863
Add selection support for ext via a new opcode, G_EXT and a post-legalizer
combine which matches it.
Add an `applyEXT` function, because the AArch64ext patterns require a register
for the immediate. So, we have to create a G_CONSTANT to get these without
writing new patterns or modifying the existing ones.
Tests are the same as arm64-ext.ll.
Also prevent ext from firing on the zip test. It has higher priority, so we
don't want it potentially getting in the way of mask tests.
Also fix up the shuffle-splat test, because ext is now selected there. The
test was incorrectly regbank selected before, which could cause a verifier
failure when you emit copies.
Differential Revision: https://reviews.llvm.org/D81436
Same idea as for zip, uzp, etc. Teach the post-legalizer combiner to recognize
G_SHUFFLE_VECTORs that are trn1/trn2 instructions.
- Add G_TRN1 and G_TRN2
- Port mask matching code from AArch64ISelLowering
- Produce G_TRN1 and G_TRN2 in the post-legalizer combiner
- Select via importer
Add select-trn.mir to test selection.
Add postlegalizer-combiner-trn.mir to test the combine. This is similar to the
existing arm64-trn test.
Note that both of these tests contain things we currently don't legalize.
I figured it would be easier to test these now rather than later, since once
we legalize the G_SHUFFLE_VECTORs, it's not guaranteed that someone will update
the tests.
Differential Revision: https://reviews.llvm.org/D81182
Since all of the other G_SHUFFLE_VECTOR transforms are going there, let's do
this with dup as well. This is nice, because it lets us split up the original
code into matching, register bank selection, and instruction selection.
- Create G_DUP, make it equivalent to AArch64dup
- Add a post-legalizer combine which is 90% a copy-and-paste from
tryOptVectorDup, except with shuffle matching closer to what SelectionDAG
does in `ShuffleVectorSDNode::isSplatMask`.
- Teach RegBankSelect about G_DUP. Since dup selection relies on the correct
register bank for FP/GPR dup selection, this is necessary.
- Kill `tryOptVectorDup`, since it's now entirely handled by G_DUP.
- Add testcases for the combine, RegBankSelect, and selection. The selection
test gives the same selection results as the old test.
Differential Revision: https://reviews.llvm.org/D81221
This does three things:
1) Adds G_REV16, G_REV32, and G_REV64. These are equivalent to AArch64rev16,
AArch64rev32, and AArch64rev64 respectively.
2) Adds support for producing G_REV64 in the postlegalizer combiner.
We don't legalize any of the shuffles which could give us a G_REV32 or
G_REV16 yet. Since the function for detecting the rev mask is lifted from
AArch64ISelLowering, it should work for G_REV32 and G_REV16 when we get
there.
3) Adds a selection test for a good portion of the patterns imported for the rev
family. The only ones which are not tested are the ones with bitconvert.
This also does a little cleanup, and adds a struct for shuffle vector pseudo
matchdata. This lets us still use `applyShuffleVectorPseudo` rather than adding
a new function.
It should also make it a bit easier to port some of the other masks from
AArch64ISelLowering. (e.g. `isZIP_v_undef_Mask` and friends)
Differential Revision: https://reviews.llvm.org/D81112
Porting the mask stuff for uzp1 and uzp2 from AArch64ISelLowering.
Add two custom opcodes: G_UZP1 and G_UZP2.
Produce them in the post-legalizer combiner when the mask checks out.
Tests:
- postlegalizer-combiner-uzp.mir verifies that we create G_UZP1 and G_UZP2.
The testcases that check that we create them come from neon-perm.ll.
- select-uzp.mir verifies that we can select G_UZP1 and G_UZP2.
Differential Revision: https://reviews.llvm.org/D81049
Port the code to recognize a zip1/zip2 shuffle mask from AArch64ISelLowering
and put it into the post-legalizer combiner.
Add G_ZIP1 and G_ZIP2 to AArch64InstrGISel.td and hook them up as equivalent
nodes to AArch64zip1 and AArch64zip2. This allows us to select them.
Minor code size improvements for SPECINT2000 at -O3 on 197.parser, 252.eon, and
186.crafty.
Differential Revision: https://reviews.llvm.org/D80969
During legalization we can end up with extends of loads, which in the case of
zexts causes us to not hit tablegen imported patterns.
The caveat here is that we don't want anyext load forming, since some variants
are illegal. This change also prevents the combine from creating any illegal
loads.
Differential Revision: https://reviews.llvm.org/D80458
(This patch is by Jessica, I'm just committing it on her behalf because I need
a post-legalizer combiner for something else).
This supersedes D77250, which did equivalent work in the selector. This can be
done pre-legalization or post-legalization. Post-legalization is more likely to
hit, since G_IMPLICIT_DEFs tend to appear during legalization. There's no reason
to not do it pre-legalization though-- if it can be caught earlier, great.
(I also think that it might be worth reimplementing D78769 using a
target-specific post-legalization combine too after thinking about it for a
while.)
Differential Revision: https://reviews.llvm.org/D78852
Given the following situation:
x = G_FCONSTANT (something that can't be materialized)
G_STORE x, some_addr
We know that x must be materialized as at least a single mov. However, at the
time of selection, the G_STORE will have been regbankselected to a FPR store.
So, as a result, you'll get an unnecessary fmov into the G_STORE.
Storing a constant value in a GPR and a constant value in a FPR are the same.
So, whenever you see a G_FCONSTANT that feeds into only G_STORES, so might as
well make it a G_CONSTANT.
This adds a target-specific combine which changes G_FCONSTANTs feeding into
G_STOREs into G_CONSTANTs.
Differential Revision: https://reviews.llvm.org/D72814
Summary:
Each generated helper can be configured to generate an option that disables
rules in that helper. This can be used to bisect rulesets.
The disable bits are stored in a SparseVector as this is very cheap for the
common case where nothing is disabled. It gets more expensive the more rules
are disabled but you're generally doing that for debug purposes where
performance is less of a concern.
Depends on D68426
Reviewers: volkan, bogner
Reviewed By: volkan
Subscribers: hiraditya, Petar.Avramovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68438
llvm-svn: 375067
Summary:
This is just moving the existing C++ code around and will be NFC w.r.t
AArch64. Renamed 'CombineBr' to something more descriptive
('ElideByByInvertingCond') at the same time.
The remaining combines in AArch64PreLegalizeCombiner require features that
aren't implemented at this point and will be hoisted as they are added.
Depends on D68424
Reviewers: bogner, volkan
Subscribers: kristof.beyls, hiraditya, Petar.Avramovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68426
llvm-svn: 375057
Summary:
This is the first of a series of patches extracted from a much bigger WIP
patch. It merely establishes the tblgen pass and the way empty combiner
helpers are declared and integrated into a combiner info.
The tablegen pass takes a -combiners option to select the combiner helper
that will be generated. This can be given multiple values to generate
multiple combiner helpers at once. Doing so helps to minimize parsing
overhead.
The reason for creating a GlobalISel subdirectory in utils/TableGen is that
there will be quite a lot of non-pass files (~15) by the time the patch
series is done.
Reviewers: volkan
Subscribers: mgorny, hiraditya, simoncook, Petar.Avramovic, s.egerton, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68286
llvm-svn: 373527