forked from OSchip/llvm-project
483927aefb
It should be a win to avoid going out to the system lib for all small memcmp() calls using scalar ops. For x86 32-bit, this means most everything up to 16 bytes. For 64-bit, that doubles because we can do 8-byte loads. Notes: Reduced from 4 to 2 loads for -Os behavior, which might not be optimal in all cases. It's effectively a question of how much do we trust the system implementation. Linux and macOS (and Windows I assume, but did not test) have optimized memcmp() code for x86, so it's probably not bad either way? PPC is using 8/4 for defaults on these. We do not expand at all for -Oz. There are still potential improvements to make for the CGP expansion IR and/or lowering such as avoiding select-of-constants (D34904) and not doing zexts to the max load type before doing a compare. We have special-case SSE/AVX codegen for (memcmp(x, y, 16/32) == 0) that will no longer be produced after this patch. I've shown the experimental justification for that change in PR33329: https://bugs.llvm.org/show_bug.cgi?id=33329#c12 TLDR: While the vector code is a likely winner, we can't guarantee that it's a winner in all cases on all CPUs, so I'm willing to sacrifice it for the greater good of expanding all small memcmp(). If we want to resurrect that codegen, it can be done by adjusting the CGP params or poking a hole to let those fall-through the CGP expansion. Committed on behalf of Sanjay Patel Differential Revision: https://reviews.llvm.org/D35067 llvm-svn: 308322 |
||
---|---|---|
clang | ||
clang-tools-extra | ||
compiler-rt | ||
debuginfo-tests | ||
libclc | ||
libcxx | ||
libcxxabi | ||
libunwind | ||
lld | ||
lldb | ||
llgo | ||
llvm | ||
openmp | ||
parallel-libs | ||
polly |