forked from OSchip/llvm-project
cf9b1f7a0e
Currently, X86 backend only has a global one-size-fits-all `FeatureFastVariableShuffle` feature, which controls profitability of both the cross-lane and per-lane variable shuffles. I guess, this has been fine so far. But at least on AMD Zen 3, while per-line variable shuffles (e.g. `VPSHUFB`) are as fast as as shuffles with fixed/immediate mask, while lane-crossing shuffles, e.g. `VPERMPS` is performing worse. So to get the benefits of variable-mask shuffles, but not the drawbacks of lane-crossing shuffles, as suggested by @RKSimon, split the feature flag into two. Differential Revision: https://reviews.llvm.org/D103274 |
||
---|---|---|
.. | ||
broadcast-scalar-fp.ll | ||
broadcast-scalar-int.ll | ||
broadcast-vector-fp.ll | ||
broadcast-vector-int.ll | ||
duplicate-high.ll | ||
duplicate-low.ll | ||
in_lane_permute.ll | ||
partial_permute.ll | ||
permute.ll | ||
shuffle-interleave.ll | ||
shuffle-vec.ll | ||
shuffle.ll | ||
unpack.ll |