forked from OSchip/llvm-project
a20352e13e
If we're shrinking a binary operation, it may be the case that the new operations wraps where the old didn't. If this happens, the behavior should be well-defined. So, we can't always carry wrapping flags with us when we shrink operations. If we do, we get incorrect optimizations in cases like: void foo(const unsigned char *from, unsigned char *to, int n) { for (int i = 0; i < n; i++) to[i] = from[i] - 128; } which gets optimized to: void foo(const unsigned char *from, unsigned char *to, int n) { for (int i = 0; i < n; i++) to[i] = from[i] | 128; } Because: - InstCombine turned `sub i32 %from.i, 128` into `add nuw nsw i32 %from.i, 128`. - LoopVectorize vectorized the add to be `add nuw nsw <16 x i8>` with a vector full of `i8 128`s - InstCombine took advantage of the fact that the newly-shrunken add "couldn't wrap", and changed the `add` to an `or`. InstCombine seems happy to figure out whether we can add nuw/nsw on its own, so I just decided to drop the flags. There are already a number of places in LoopVectorize where we rely on InstCombine to clean up. llvm-svn: 305053 |
||
---|---|---|
.. | ||
aarch64-predication.ll | ||
aarch64-unroll.ll | ||
arbitrary-induction-step.ll | ||
arm64-unroll.ll | ||
backedge-overflow.ll | ||
deterministic-type-shrinkage.ll | ||
gather-cost.ll | ||
induction-trunc.ll | ||
interleaved-vs-scalar.ll | ||
interleaved_cost.ll | ||
lit.local.cfg | ||
loop-vectorization-factors.ll | ||
max-vf-for-interleaved.ll | ||
no_vector_instructions.ll | ||
pr31900.ll | ||
pr33053.ll | ||
predication_costs.ll | ||
reduction-small-size.ll | ||
sdiv-pow2.ll | ||
smallest-and-widest-types.ll | ||
type-shrinkage-insertelt.ll |