Found several missed optimizations while investigating replacing _mm_mul_epi32/_mm_mul_epu32 with generic implementations llvm-svn: 306302