Commit Graph

113 Commits

Author SHA1 Message Date
hyeongyu kim 1b1c8d83d3 [Clang/Test]: Rename enable_noundef_analysis to disable-noundef-analysis and turn it off by default
Turning on `enable_noundef_analysis` flag allows better codegen by removing freeze instructions.
I modified clang by renaming `enable_noundef_analysis` flag to `disable-noundef-analysis` and turning it off by default.

Test updates are made as a separate patch: D108453

Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D105169
2022-01-16 18:54:17 +09:00
Freddy Ye 0bab742805 [X86] Add missing CET intrinsics support
These two intrinsics are documented o SDM and intrinsic guide.

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D116325
2022-01-04 11:40:40 +08:00
Philip Reames e6ad9ef4e7 [instcombine] Canonicalize constant index type to i64 for extractelement/insertelement
The basic idea to this is that a) having a single canonical type makes CSE easier, and b) many of our transforms are inconsistent about which types we end up with based on visit order.

I'm restricting this to constants as for non-constants, we'd have to decide whether the simplicity was worth extra instructions. For constants, there are no extra instructions.

We chose the canonical type as i64 arbitrarily.  We might consider changing this to something else in the future if we have cause.

Differential Revision: https://reviews.llvm.org/D115387
2021-12-13 16:56:22 -08:00
Phoebe Wang 925ec98d00 Revert "[X86][clang] Emit diagnostic for float and double when we have features -x87 and -sse on 64-bits"
This reverts commit 4a2c827b17.

Need to fix the problem when using `-mno-sse` together with "x86intrin.h"
2021-12-10 10:31:09 +08:00
Phoebe Wang d7c07f60b3 [X86][MS-InlineAsm] Make the constraint *m to be simple place holder
D113096 solved the "undefined reference to xxx" issue by adding
constraint *m for the global var. But it has strong side effect due to
the symbol in the assembly being replaced with constraint variable.
This leads to some lowering fails. https://godbolt.org/z/h3nWoerPe

This patch fix the problem by use the constraint *m as place holder
rather than real constraint. It has negligible effect for the existing
code generation.

Reviewed By: skan

Differential Revision: https://reviews.llvm.org/D115225
2021-12-10 09:29:38 +08:00
Phoebe Wang 4a2c827b17 [X86][clang] Emit diagnostic for float and double when we have features -x87 and -sse on 64-bits
A follow up of D114162.

Reviewed By: asavonic

Differential Revision: https://reviews.llvm.org/D114782
2021-12-08 09:50:26 +08:00
Zahira Ammarguellat fd759d42c9 Revert "The _Float16 type is supported on x86 systems with SSE2 enabled."
This reverts commit 6623c02d70.
The change seems to be breaking build of compiler-rt on Debian.
2021-11-23 08:00:57 -05:00
Zahira Ammarguellat 6623c02d70 The _Float16 type is supported on x86 systems with SSE2 enabled.
Operations are emulated by software emulation and “float” instructions.
This patch is allowing the support of _Float16 type without the use of
-max512fp16 flag. The final goal being, perform _Float16 emulation for
all arithmetic expressions.
2021-11-19 08:59:50 -05:00
Freddy Ye eb9dc0c78f [X86] add 3 missing intrinsics: _mm_(mask/maskz)_cvtpbh_ps
Reviewed By: craig.topper, pengfei

Differential Revision: https://reviews.llvm.org/D114059
2021-11-18 08:48:19 +08:00
Freddy Ye 73c9cf8204 [X86][FP16] add alias for f*mul_*ch intrinsics
*_mul_*ch is to align with *_mul_*s, *_mul_*d and *_mul_*h.

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D112777
2021-11-17 13:26:11 +08:00
hyeongyu kim fd9b099906 Revert "[Clang/Test]: Rename enable_noundef_analysis to disable-noundef-analysis and turn it off by default"
This reverts commit aacfbb953e.

Revert "Fix lit test failures in CodeGenCoroutines"

This reverts commit 63fff0f5bf.
2021-11-09 02:15:55 +09:00
hyeongyukim aacfbb953e [Clang/Test]: Rename enable_noundef_analysis to disable-noundef-analysis and turn it off by default
Turning on `enable_noundef_analysis` flag allows better codegen by removing freeze instructions.
I modified clang by renaming `enable_noundef_analysis` flag to `disable-noundef-analysis` and turning it off by default.

Test updates are made as a separate patch: D108453

Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D105169

[Clang/Test]: Rename enable_noundef_analysis to disable-noundef-analysis and turn it off by default (2)

This patch updates test files after D105169.
Autogenerated test codes are changed by `utils/update_cc_test_checks.py,` and non-autogenerated test codes are changed as follows:

(1) I wrote a python script that (partially) updates the tests using regex: {F18594904} The script is not perfect, but I believe it gives hints about which patterns are updated to have `noundef` attached.

(2) The remaining tests are updated manually.

Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D108453

Resolve lit failures in clang after 8ca4b3e's land

Fix lit test failures in clang-ppc* and clang-x64-windows-msvc

Fix missing failures in clang-ppc64be* and retry fixing clang-x64-windows-msvc

Fix internal_clone(aarch64) inline assembly
2021-11-06 19:19:22 +09:00
Juneyoung Lee 89ad2822af Revert "[Clang/Test]: Rename enable_noundef_analysis to disable-noundef-analysis and turn it off by default"
This reverts commit 7584ef766a.
2021-11-06 15:39:19 +09:00
Juneyoung Lee 7584ef766a [Clang/Test]: Rename enable_noundef_analysis to disable-noundef-analysis and turn it off by default
Turning on `enable_noundef_analysis` flag allows better codegen by removing freeze instructions.
I modified clang by renaming `enable_noundef_analysis` flag to `disable-noundef-analysis` and turning it off by default.

Test updates are made as a separate patch: D108453

Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D105169
2021-11-06 15:36:42 +09:00
Shengchen Kan be08e452f3 [X86][MS-InlineAsm] Add constraint *m for memory access w/ global var
Constraint `*m` should be used when the address of a variable is passed
as a value. And the constraint is missing for MS inline assembly when sth
is written to the address of the variable.

The missing would cause FE delete the definition of the static varible,
and then result in "undefined reference to xxx" issue.

Reviewed By: xiangzhangllvm

Differential Revision: https://reviews.llvm.org/D113096
2021-11-05 09:11:41 +08:00
Arthur Eubanks cb5a10199b [test] Remove tests pinned to the legacy PM
Now that the legacy PM is deprecated for the optimization pipeline, we
can start deleting legacy PM tests.

For tests that test both PMs, merge the RUN lines.
Delete tests specific to the legacy PM.
2021-10-18 16:40:46 -07:00
Juneyoung Lee f193bcc701 Revert D105169 due to the two-stage failure in ASAN
This reverts the following commits:
37ca7a795b
9aa6c72b92
705387c507
8ca4b3ef19
80dba72a66
2021-10-18 23:52:46 +09:00
Juneyoung Lee 8ca4b3ef19 [Clang/Test]: Rename enable_noundef_analysis to disable-noundef-analysis and turn it off by default (2)
This patch updates test files after D105169.
Autogenerated test codes are changed by `utils/update_cc_test_checks.py,` and non-autogenerated test codes are changed as follows:

(1) I wrote a python script that (partially) updates the tests using regex: {F18594904} The script is not perfect, but I believe it gives hints about which patterns are updated to have `noundef` attached.

(2) The remaining tests are updated manually.

Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D108453
2021-10-16 12:01:41 +09:00
Wang, Pengfei c0f9c7c015 [X86] Check if struct is blank before getting the inner types
This fixes pr52011.

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D111037
2021-10-08 17:09:34 +08:00
Wang, Pengfei 7d6889964a [X86][FP16] Add more builtins to avoid multi evaluation problems & add 2 missed intrinsics
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D110336
2021-09-27 09:27:04 +08:00
Wang, Pengfei 227673398c [X86] Always check the size of SourceTy before getting the next type
D109607 results in a regression in llvm-test-suite.
The reason is we didn't check the size of SourceTy, so that we will
return wrong SSE type when SourceTy is overlapped.

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D110037
2021-09-20 23:34:19 +08:00
Wang, Pengfei 5b47256fa5 [X86] Add test to show the effect caused by D109607. NFC 2021-09-20 23:34:18 +08:00
Wang, Pengfei e9e1d4751b [X86] Refactor GetSSETypeAtOffset to fix pr51813
D105263 adds support for _Float16 type. It introduced a bug (pr51813) that generates a <4 x half> type instead the default double when passing blank structure by SSE registers.

Although I doubt it may expose a bug somewhere other than D105263, it's good to avoid return half type when no half type in arguments.

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D109607
2021-09-17 10:51:59 +08:00
Xiang1 Zhang 1f1c71aeac [X86][InlineAsm] Use mem size information (*word ptr) for "global variable + registers" memory expression in inline asm.
Differential Revision: https://reviews.llvm.org/D109739
2021-09-15 16:11:14 +08:00
Xiang1 Zhang c81d6ab875 [X86] Adjust Keylocker handle mem size
Reviewed By: Topper Craig

Differential Revision: https://reviews.llvm.org/D109488
2021-09-13 18:03:27 +08:00
Xiang1 Zhang bdce8d40c6 Revert "[X86] Adjust Keylocker handle mem size"
This reverts commit 3731de6b7f.
2021-09-13 18:00:46 +08:00
Xiang1 Zhang 3731de6b7f [X86] Adjust Keylocker handle mem size
Reviewed By: Topper Craig

Differential Revision: https://reviews.llvm.org/D109354
2021-09-13 17:59:33 +08:00
Wang, Pengfei 2aaa6466fe [X86] Support *_set1_pch(Float16 _Complex h)
Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D109487
2021-09-11 17:47:31 +08:00
Simon Pilgrim ea685e1028 [X86][AVX] Update _mm256_loadu2_m128* intrinsics to use _mm256_set_m128* (PR51796)
As reported on PR51796, the _mm256_loadu2_m128i in particular was inserting bitcasts and shuffles with different types making it trickier for some combines, and prevented the value tracker from identifying the shuffle sequences as a single insert_subvector style concat_vectors pattern.

This patch instead concatenate the 128-bit unaligned loads with _mm256_set_m128*, which was written to avoid the unnecessary bitcasts and only emits a single shuffle.

Differential Revision: https://reviews.llvm.org/D109497
2021-09-09 19:15:48 +01:00
Tianqing Wang 12fa608af4 [X86] Add CRC32 feature.
d8faf03807 implemented general-regs-only for X86 by disabling all features
with vector instructions. But the CRC32 instruction in SSE4.2 ISA, which uses
only GPRs, also becomes unavailable. This patch adds a CRC32 feature for this
instruction and allows it to be used with general-regs-only.

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D105462
2021-09-06 17:24:30 +08:00
Nico Weber e5438f3868 clang/win: Add __readfsdword to intrin.h
When using __readfsdword(), clang used to warn that one has
to include <intrin.h> -- no matter if that was already included
or not.

Now it only warns if it's not yet included.

To verify that this was the only intrin with this problem, I ran:

    $ for f in $(grep intrin.h clang/include/clang/Basic/BuiltinsX86* |
                 egrep -o '\([^,]+,' | egrep -o '[^(,]*'); do
        if ! grep -q $f clang/lib/Headers/intrin.h; then echo $f; fi;
      done

This printed 9 more functions, but those are all in emmintrin.h,
xsaveintrin.h (which are included by intrin.h based on /arch: flags).
So this is indeed the only built-in that was missing in intrin.h.

Fixes PR51188.

Differential Revision: https://reviews.llvm.org/D109085
2021-09-02 12:22:07 -04:00
Roman Lebedev 3f1f08f0ed
Revert @llvm.isnan intrinsic patchset.
Please refer to
https://lists.llvm.org/pipermail/llvm-dev/2021-September/152440.html
(and that whole thread.)

TLDR: the original patch had no prior RFC, yet it had some changes that
really need a proper RFC discussion. It won't be productive to discuss
such an RFC, once it's actually posted, while said patch is already
committed, because that introduces bias towards already-committed stuff,
and the tree is potentially in broken state meanwhile.

While the end result of discussion may lead back to the current design,
it may also not lead to the current design.

Therefore i take it upon myself
to revert the tree back to last known good state.

This reverts commit 4c4093e6e3.
This reverts commit 0a2b1ba33a.
This reverts commit d9873711cb.
This reverts commit 791006fb8c.
This reverts commit c22b64ef66.
This reverts commit 72ebcd3198.
This reverts commit 5fa6039a5f.
This reverts commit 9efda541bf.
This reverts commit 94d3ff09cf.
2021-09-02 13:53:56 +03:00
Wang, Pengfei ab40dbfe03 [X86] AVX512FP16 instructions enabling 6/6
Enable FP16 complex FMA instructions.

Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D105269
2021-08-30 13:08:45 +08:00
Xiang1 Zhang 80f7ce8993 [X86] Support __SSC_MARK(const int id)
Differential Revision: https://reviews.llvm.org/D108682
2021-08-30 09:55:35 +08:00
Xiang1 Zhang 4c29dc18cf Revert "[X86] Support __SSC_MARK(const int id)"
This reverts commit 78fbde5779.
2021-08-30 09:50:26 +08:00
Xiang1 Zhang 78fbde5779 [X86] Support __SSC_MARK(const int id)
Differential Revision: https://reviews.llvm.org/D108682
2021-08-30 09:21:22 +08:00
Xiang1 Zhang fd88fac6ca Revert "[X86] Support __SSC_MARK(const int id)"
This reverts commit 83e82ff767.
2021-08-30 09:18:27 +08:00
Xiang1 Zhang 83e82ff767 [X86] Support __SSC_MARK(const int id)
Differential Revision: https://reviews.llvm.org/D108682
2021-08-30 08:51:20 +08:00
Wang, Pengfei c728bd5bba [X86] AVX512FP16 instructions enabling 5/6
Enable FP16 FMA instructions.

Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D105268
2021-08-24 09:07:19 +08:00
Wang, Pengfei b088536ce9 [X86] AVX512FP16 instructions enabling 4/6
Enable FP16 unary operator instructions.

Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D105267
2021-08-22 08:59:35 +08:00
Wang, Pengfei 5aeca3b0a5 [CFE][X86] Enable complex _Float16 support
Support complex _Float16 on X86 in C/C++ following the latest X86 psABI. (https://gitlab.com/x86-psABIs)

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D105331
2021-08-18 11:16:14 +08:00
Wang, Pengfei 2379949aad [X86] AVX512FP16 instructions enabling 3/6
Enable FP16 conversion instructions.

Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D105265
2021-08-18 09:03:41 +08:00
Wang, Pengfei f1de9d6dae [X86] AVX512FP16 instructions enabling 2/6
Enable FP16 binary operator instructions.

Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D105264
2021-08-15 08:56:33 +08:00
Craig Topper 4190d99dfc [X86] Add parentheses around casts in some of the X86 intrinsic headers.
This covers the SSE and AVX/AVX2 headers. AVX512 has a lot more macros
due to rounding mode.

Fixes part of PR51324.

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D107843
2021-08-13 09:36:16 -07:00
Wang, Pengfei 6f7f5b54c8 [X86] AVX512FP16 instructions enabling 1/6
1. Enable FP16 type support and basic declarations used by following patches.
2. Enable new instructions VMOVW and VMOVSH.

Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D105263
2021-08-10 12:46:01 +08:00
Serge Pavlov 4c4093e6e3 Introduce intrinsic llvm.isnan
This is recommit of the patch 16ff91ebcc,
reverted in 0c28a7c990 because it had
an error in call of getFastMathFlags (base type should be FPMathOperator
but not Instruction). The original commit message is duplicated below:

    Clang has builtin function '__builtin_isnan', which implements C
    library function 'isnan'. This function now is implemented entirely in
    clang codegen, which expands the function into set of IR operations.
    There are three mechanisms by which the expansion can be made.

    * The most common mechanism is using an unordered comparison made by
      instruction 'fcmp uno'. This simple solution is target-independent
      and works well in most cases. It however is not suitable if floating
      point exceptions are tracked. Corresponding IEEE 754 operation and C
      function must never raise FP exception, even if the argument is a
      signaling NaN. Compare instructions usually does not have such
      property, they raise 'invalid' exception in such case. So this
      mechanism is unsuitable when exception behavior is strict. In
      particular it could result in unexpected trapping if argument is SNaN.

    * Another solution was implemented in https://reviews.llvm.org/D95948.
      It is used in the cases when raising FP exceptions by 'isnan' is not
      allowed. This solution implements 'isnan' using integer operations.
      It solves the problem of exceptions, but offers one solution for all
      targets, however some can do the check in more efficient way.

    * Solution implemented by https://reviews.llvm.org/D96568 introduced a
      hook 'clang::TargetCodeGenInfo::testFPKind', which injects target
      specific code into IR. Now only SystemZ implements this hook and it
      generates a call to target specific intrinsic function.

    Although these mechanisms allow to implement 'isnan' with enough
    efficiency, expanding 'isnan' in clang has drawbacks:

    * The operation 'isnan' is hidden behind generic integer operations or
      target-specific intrinsics. It complicates analysis and can prevent
      some optimizations.

    * IR can be created by tools other than clang, in this case treatment
      of 'isnan' has to be duplicated in that tool.

    Another issue with the current implementation of 'isnan' comes from the
    use of options '-ffast-math' or '-fno-honor-nans'. If such option is
    specified, 'fcmp uno' may be optimized to 'false'. It is valid
    optimization in general, but it results in 'isnan' always returning
    'false'. For example, in some libc++ implementations the following code
    returns 'false':

        std::isnan(std::numeric_limits<float>::quiet_NaN())

    The options '-ffast-math' and '-fno-honor-nans' imply that FP operation
    operands are never NaNs. This assumption however should not be applied
    to the functions that check FP number properties, including 'isnan'. If
    such function returns expected result instead of actually making
    checks, it becomes useless in many cases. The option '-ffast-math' is
    often used for performance critical code, as it can speed up execution
    by the expense of manual treatment of corner cases. If 'isnan' returns
    assumed result, a user cannot use it in the manual treatment of NaNs
    and has to invent replacements, like making the check using integer
    operations. There is a discussion in https://reviews.llvm.org/D18513#387418,
    which also expresses the opinion, that limitations imposed by
    '-ffast-math' should be applied only to 'math' functions but not to
    'tests'.

    To overcome these drawbacks, this change introduces a new IR intrinsic
    function 'llvm.isnan', which realizes the check as specified by IEEE-754
    and C standards in target-agnostic way. During IR transformations it
    does not undergo undesirable optimizations. It reaches instruction
    selection, where is lowered in target-dependent way. The lowering can
    vary depending on options like '-ffast-math' or '-ffp-model' so the
    resulting code satisfies requested semantics.

    Differential Revision: https://reviews.llvm.org/D104854
2021-08-06 14:32:27 +07:00
Serge Pavlov 0c28a7c990 Revert "Introduce intrinsic llvm.isnan"
This reverts commit 16ff91ebcc.
Several errors were reported mainly test-suite execution time. Reverted
for investigation.
2021-08-04 17:18:15 +07:00
Serge Pavlov 16ff91ebcc Introduce intrinsic llvm.isnan
Clang has builtin function '__builtin_isnan', which implements C
library function 'isnan'. This function now is implemented entirely in
clang codegen, which expands the function into set of IR operations.
There are three mechanisms by which the expansion can be made.

* The most common mechanism is using an unordered comparison made by
  instruction 'fcmp uno'. This simple solution is target-independent
  and works well in most cases. It however is not suitable if floating
  point exceptions are tracked. Corresponding IEEE 754 operation and C
  function must never raise FP exception, even if the argument is a
  signaling NaN. Compare instructions usually does not have such
  property, they raise 'invalid' exception in such case. So this
  mechanism is unsuitable when exception behavior is strict. In
  particular it could result in unexpected trapping if argument is SNaN.

* Another solution was implemented in https://reviews.llvm.org/D95948.
  It is used in the cases when raising FP exceptions by 'isnan' is not
  allowed. This solution implements 'isnan' using integer operations.
  It solves the problem of exceptions, but offers one solution for all
  targets, however some can do the check in more efficient way.

* Solution implemented by https://reviews.llvm.org/D96568 introduced a
  hook 'clang::TargetCodeGenInfo::testFPKind', which injects target
  specific code into IR. Now only SystemZ implements this hook and it
  generates a call to target specific intrinsic function.

Although these mechanisms allow to implement 'isnan' with enough
efficiency, expanding 'isnan' in clang has drawbacks:

* The operation 'isnan' is hidden behind generic integer operations or
  target-specific intrinsics. It complicates analysis and can prevent
  some optimizations.

* IR can be created by tools other than clang, in this case treatment
  of 'isnan' has to be duplicated in that tool.

Another issue with the current implementation of 'isnan' comes from the
use of options '-ffast-math' or '-fno-honor-nans'. If such option is
specified, 'fcmp uno' may be optimized to 'false'. It is valid
optimization in general, but it results in 'isnan' always returning
'false'. For example, in some libc++ implementations the following code
returns 'false':

    std::isnan(std::numeric_limits<float>::quiet_NaN())

The options '-ffast-math' and '-fno-honor-nans' imply that FP operation
operands are never NaNs. This assumption however should not be applied
to the functions that check FP number properties, including 'isnan'. If
such function returns expected result instead of actually making
checks, it becomes useless in many cases. The option '-ffast-math' is
often used for performance critical code, as it can speed up execution
by the expense of manual treatment of corner cases. If 'isnan' returns
assumed result, a user cannot use it in the manual treatment of NaNs
and has to invent replacements, like making the check using integer
operations. There is a discussion in https://reviews.llvm.org/D18513#387418,
which also expresses the opinion, that limitations imposed by
'-ffast-math' should be applied only to 'math' functions but not to
'tests'.

To overcome these drawbacks, this change introduces a new IR intrinsic
function 'llvm.isnan', which realizes the check as specified by IEEE-754
and C standards in target-agnostic way. During IR transformations it
does not undergo undesirable optimizations. It reaches instruction
selection, where is lowered in target-dependent way. The lowering can
vary depending on options like '-ffast-math' or '-ffp-model' so the
resulting code satisfies requested semantics.

Differential Revision: https://reviews.llvm.org/D104854
2021-08-04 15:27:49 +07:00
Xiang1 Zhang 6d234a6908 [X86] Zero some outputs of Kelocker intrinsics in error case
Reviewed By: WangPengfei

Differential Revision: https://reviews.llvm.org/D104766
2021-06-29 13:35:40 +08:00
Craig Topper 7a112356e4 [X86] Correct the conversion of VALIGND/Q intrinsics to shufflevector.
We need to mask the immediate to the width of a single vector
rather than 2 vectors. If we use the width of 2 vectors then
any shift larger than the length of 1 vector is going to overflow
the shuffle indices.

Fixes PR50895.
2021-06-26 19:06:00 -07:00