llvm-project/llvm/test/CodeGen/X86/avx512-shuffles
Roman Lebedev cf9b1f7a0e
[X86] Split FeatureFastVariableShuffle tuning into Lane-Crossing and Per-Lane variants
Currently, X86 backend only has a global one-size-fits-all `FeatureFastVariableShuffle` feature,
which controls profitability of both the cross-lane and per-lane variable shuffles.
I guess, this has been fine so far.

But at least on AMD Zen 3, while per-line variable shuffles (e.g. `VPSHUFB`)
are as fast as as shuffles with fixed/immediate mask,
while lane-crossing shuffles, e.g. `VPERMPS` is performing worse.

So to get the benefits of variable-mask shuffles, but not the drawbacks of lane-crossing shuffles,
as suggested by @RKSimon, split the feature flag into two.

Differential Revision: https://reviews.llvm.org/D103274
2021-06-01 10:39:36 +03:00
..
broadcast-scalar-fp.ll
broadcast-scalar-int.ll
broadcast-vector-fp.ll
broadcast-vector-int.ll [X86] Split FeatureFastVariableShuffle tuning into Lane-Crossing and Per-Lane variants 2021-06-01 10:39:36 +03:00
duplicate-high.ll
duplicate-low.ll
in_lane_permute.ll
partial_permute.ll [X86] Split FeatureFastVariableShuffle tuning into Lane-Crossing and Per-Lane variants 2021-06-01 10:39:36 +03:00
permute.ll
shuffle-interleave.ll
shuffle-vec.ll
shuffle.ll
unpack.ll