llvm-project/llvm/test/CodeGen/AMDGPU/GlobalISel/artifact-combiner-sext.mir

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

98 lines
4.4 KiB
Plaintext
Raw Normal View History

# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
# RUN: llc -O0 -mtriple=amdgcn-mesa-mesa3d -mcpu=tahiti -run-pass=legalizer %s -o - | FileCheck %s
---
name: test_sext_trunc_v2s32_to_v2s16_to_v2s32
body: |
bb.0:
liveins: $vgpr0_vgpr1
; CHECK-LABEL: name: test_sext_trunc_v2s32_to_v2s16_to_v2s32
; CHECK: [[COPY:%[0-9]+]]:_(<2 x s32>) = COPY $vgpr0_vgpr1
; CHECK: [[COPY1:%[0-9]+]]:_(<2 x s32>) = COPY [[COPY]](<2 x s32>)
; CHECK: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[COPY1]](<2 x s32>)
; CHECK: [[SEXT_INREG:%[0-9]+]]:_(s32) = G_SEXT_INREG [[UV]], 16
; CHECK: [[SEXT_INREG1:%[0-9]+]]:_(s32) = G_SEXT_INREG [[UV1]], 16
; CHECK: [[BUILD_VECTOR:%[0-9]+]]:_(<2 x s32>) = G_BUILD_VECTOR [[SEXT_INREG]](s32), [[SEXT_INREG1]](s32)
; CHECK: $vgpr0_vgpr1 = COPY [[BUILD_VECTOR]](<2 x s32>)
%0:_(<2 x s32>) = COPY $vgpr0_vgpr1
%1:_(<2 x s16>) = G_TRUNC %0
%2:_(<2 x s32>) = G_SEXT %1
$vgpr0_vgpr1 = COPY %2
...
---
name: test_sext_trunc_v2s32_to_v2s16_to_v2s64
body: |
bb.0:
liveins: $vgpr0_vgpr1
; CHECK-LABEL: name: test_sext_trunc_v2s32_to_v2s16_to_v2s64
; CHECK: [[COPY:%[0-9]+]]:_(<2 x s32>) = COPY $vgpr0_vgpr1
; CHECK: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[COPY]](<2 x s32>)
; CHECK: [[ANYEXT:%[0-9]+]]:_(s64) = G_ANYEXT [[UV]](s32)
; CHECK: [[ANYEXT1:%[0-9]+]]:_(s64) = G_ANYEXT [[UV1]](s32)
; CHECK: [[SEXT_INREG:%[0-9]+]]:_(s64) = G_SEXT_INREG [[ANYEXT]], 16
; CHECK: [[SEXT_INREG1:%[0-9]+]]:_(s64) = G_SEXT_INREG [[ANYEXT1]], 16
; CHECK: [[BUILD_VECTOR:%[0-9]+]]:_(<2 x s64>) = G_BUILD_VECTOR [[SEXT_INREG]](s64), [[SEXT_INREG1]](s64)
; CHECK: $vgpr0_vgpr1_vgpr2_vgpr3 = COPY [[BUILD_VECTOR]](<2 x s64>)
%0:_(<2 x s32>) = COPY $vgpr0_vgpr1
%1:_(<2 x s16>) = G_TRUNC %0
%2:_(<2 x s64>) = G_SEXT %1
$vgpr0_vgpr1_vgpr2_vgpr3 = COPY %2
...
---
name: test_sext_trunc_v2s32_to_v2s8_to_v2s16
body: |
bb.0:
liveins: $vgpr0_vgpr1
[globalisel] Add G_SEXT_INREG Summary: Targets often have instructions that can sign-extend certain cases faster than the equivalent shift-left/arithmetic-shift-right. Such cases can be identified by matching a shift-left/shift-right pair but there are some issues with this in the context of combines. For example, suppose you can sign-extend 8-bit up to 32-bit with a target extend instruction. %1:_(s32) = G_SHL %0:_(s32), i32 24 # (I've inlined the G_CONSTANT for brevity) %2:_(s32) = G_ASHR %1:_(s32), i32 24 %3:_(s32) = G_ASHR %2:_(s32), i32 1 would reasonably combine to: %1:_(s32) = G_SHL %0:_(s32), i32 24 %2:_(s32) = G_ASHR %1:_(s32), i32 25 which no longer matches the special case. If your shifts and extend are equal cost, this would break even as a pair of shifts but if your shift is more expensive than the extend then it's cheaper as: %2:_(s32) = G_SEXT_INREG %0:_(s32), i32 8 %3:_(s32) = G_ASHR %2:_(s32), i32 1 It's possible to match the shift-pair in ISel and emit an extend and ashr. However, this is far from the only way to break this shift pair and make it hard to match the extends. Another example is that with the right known-zeros, this: %1:_(s32) = G_SHL %0:_(s32), i32 24 %2:_(s32) = G_ASHR %1:_(s32), i32 24 %3:_(s32) = G_MUL %2:_(s32), i32 2 can become: %1:_(s32) = G_SHL %0:_(s32), i32 24 %2:_(s32) = G_ASHR %1:_(s32), i32 23 All upstream targets have been configured to lower it to the current G_SHL,G_ASHR pair but will likely want to make it legal in some cases to handle their faster cases. To follow-up: Provide a way to legalize based on the constant. At the moment, I'm thinking that the best way to achieve this is to provide the MI in LegalityQuery but that opens the door to breaking core principles of the legalizer (legality is not context sensitive). That said, it's worth noting that looking at other instructions and acting on that information doesn't violate this principle in itself. It's only a violation if, at the end of legalization, a pass that checks legality without being able to see the context would say an instruction might not be legal. That's a fairly subtle distinction so to give a concrete example, saying %2 in: %1 = G_CONSTANT 16 %2 = G_SEXT_INREG %0, %1 is legal is in violation of that principle if the legality of %2 depends on %1 being constant and/or being 16. However, legalizing to either: %2 = G_SEXT_INREG %0, 16 or: %1 = G_CONSTANT 16 %2:_(s32) = G_SHL %0, %1 %3:_(s32) = G_ASHR %2, %1 depending on whether %1 is constant and 16 does not violate that principle since both outputs are genuinely legal. Reviewers: bogner, aditya_nandakumar, volkan, aemerson, paquette, arsenm Subscribers: sdardis, jvesely, wdng, nhaehnle, rovka, kristof.beyls, javed.absar, hiraditya, jrtc27, atanasyan, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61289 llvm-svn: 368487
2019-08-10 05:11:20 +08:00
; The G_SEXT_INREG doesn't lower here because G_TRUNC is both illegal and
; unable to legalize. This prevents further legalization.
; CHECK-LABEL: name: test_sext_trunc_v2s32_to_v2s8_to_v2s16
; CHECK: [[COPY:%[0-9]+]]:_(<2 x s32>) = COPY $vgpr0_vgpr1
; CHECK: [[TRUNC:%[0-9]+]]:_(<2 x s16>) = G_TRUNC [[COPY]](<2 x s32>)
; CHECK: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[TRUNC]](<2 x s16>)
[Legalizer] Making artifact combining order-independent Legalization algorithm is complicated by two facts: 1) While regular instructions should be possible to legalize in an isolated, per-instruction, context-free manner, legalization artifacts can only be eliminated in pairs, which could be deeply, and ultimately arbitrary nested: { [ () ] }, where which paranthesis kind depicts an artifact kind, like extend, unmerge, etc. Such structure can only be fully eliminated by simple local combines if they are attempted in a particular order (inside out), or alternatively by repeated scans each eliminating only one innermost pair, resulting in O(n^2) complexity. 2) Some artifacts might in fact be regular instructions that could (and sometimes should) be legalized by the target-specific rules. Which means failure to eliminate all artifacts on the first iteration is not a failure, they need to be tried as instructions, which may produce more artifacts, including the ones that are in fact regular instructions, resulting in a non-constant number of iterations required to finish the process. I trust the recently introduced termination condition (no new artifacts were created during as-a-regular-instruction-retrial of artifacts not eliminated on the previous iteration) to be efficient in providing termination, but only performing the legalization in full if and only if at each step such chains of artifacts are successfully eliminated in full as well. Which is currently not guaranteed, as the artifact combines are applied only once and in an arbitrary order that has to do with the order of creation or insertion of artifacts into their worklist, which is a no particular order. In this patch I make a small change to the artifact combiner, making it to re-insert into the worklist immediate (modulo a look-through copies) artifact users of each vreg that changes its definition due to an artifact combine. Here the first scan through the artifacts worklist, while not being done in any guaranteed order, only needs to find the innermost pair(s) of artifacts that could be immediately combined out. After that the process follows def-use chains, making them shorter at each step, thus combining everything that can be combined in O(n) time. Reviewers: volkan, aditya_nandakumar, qcolombet, paquette, aemerson, dsanders Reviewed By: aditya_nandakumar, paquette Tags: #llvm Differential Revision: https://reviews.llvm.org/D71448
2019-12-13 05:20:41 +08:00
; CHECK: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 16
; CHECK: [[LSHR:%[0-9]+]]:_(s32) = G_LSHR [[BITCAST]], [[C]](s32)
; CHECK: [[COPY1:%[0-9]+]]:_(s32) = COPY [[BITCAST]](s32)
; CHECK: [[SEXT_INREG:%[0-9]+]]:_(s32) = G_SEXT_INREG [[COPY1]], 8
; CHECK: [[COPY2:%[0-9]+]]:_(s32) = COPY [[LSHR]](s32)
; CHECK: [[SEXT_INREG1:%[0-9]+]]:_(s32) = G_SEXT_INREG [[COPY2]], 8
; CHECK: [[C1:%[0-9]+]]:_(s32) = G_CONSTANT i32 65535
; CHECK: [[COPY3:%[0-9]+]]:_(s32) = COPY [[SEXT_INREG]](s32)
; CHECK: [[AND:%[0-9]+]]:_(s32) = G_AND [[COPY3]], [[C1]]
; CHECK: [[COPY4:%[0-9]+]]:_(s32) = COPY [[SEXT_INREG1]](s32)
; CHECK: [[AND1:%[0-9]+]]:_(s32) = G_AND [[COPY4]], [[C1]]
; CHECK: [[SHL:%[0-9]+]]:_(s32) = G_SHL [[AND1]], [[C]](s32)
; CHECK: [[OR:%[0-9]+]]:_(s32) = G_OR [[AND]], [[SHL]]
; CHECK: [[BITCAST1:%[0-9]+]]:_(<2 x s16>) = G_BITCAST [[OR]](s32)
; CHECK: $vgpr0 = COPY [[BITCAST1]](<2 x s16>)
%0:_(<2 x s32>) = COPY $vgpr0_vgpr1
%1:_(<2 x s8>) = G_TRUNC %0
%2:_(<2 x s16>) = G_SEXT %1
$vgpr0 = COPY %2
...
---
name: test_sext_trunc_v3s32_to_v3s16_to_v3s32
body: |
bb.0:
liveins: $vgpr0_vgpr1_vgpr2
; CHECK-LABEL: name: test_sext_trunc_v3s32_to_v3s16_to_v3s32
; CHECK: [[COPY:%[0-9]+]]:_(<3 x s32>) = COPY $vgpr0_vgpr1_vgpr2
; CHECK: [[COPY1:%[0-9]+]]:_(<3 x s32>) = COPY [[COPY]](<3 x s32>)
; CHECK: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32), [[UV2:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[COPY1]](<3 x s32>)
; CHECK: [[SEXT_INREG:%[0-9]+]]:_(s32) = G_SEXT_INREG [[UV]], 16
; CHECK: [[SEXT_INREG1:%[0-9]+]]:_(s32) = G_SEXT_INREG [[UV1]], 16
; CHECK: [[SEXT_INREG2:%[0-9]+]]:_(s32) = G_SEXT_INREG [[UV2]], 16
; CHECK: [[BUILD_VECTOR:%[0-9]+]]:_(<3 x s32>) = G_BUILD_VECTOR [[SEXT_INREG]](s32), [[SEXT_INREG1]](s32), [[SEXT_INREG2]](s32)
; CHECK: $vgpr0_vgpr1_vgpr2 = COPY [[BUILD_VECTOR]](<3 x s32>)
%0:_(<3 x s32>) = COPY $vgpr0_vgpr1_vgpr2
%1:_(<3 x s16>) = G_TRUNC %0
%2:_(<3 x s32>) = G_SEXT %1
$vgpr0_vgpr1_vgpr2 = COPY %2
...