llvm-project/llvm/lib/Transforms
Andrea Di Biagio 99493df257 [MemCpyOpt] Fix wrong merging adjacent nontemporal stores into memset calls.
Pass MemCpyOpt doesn't check if a store instruction is nontemporal.
As a consequence, adjacent nontemporal stores are always merged into a
memset call.

Example:

;;;
define void @foo(<4 x float>* nocapture %p) {
entry:
  store <4 x float> zeroinitializer, <4 x float>* %p, align 16, !nontemporal !0
  %p1 = getelementptr inbounds <4 x float>, <4 x float>* %dst, i64 1
  store <4 x float> zeroinitializer, <4 x float>* %p1, align 16, !nontemporal !0
  ret void
}

!0 = !{i32 1}
;;;

In this example, the two nontemporal stores are combined to a memset of zero
which does not preserve the nontemporal hint. Later on the backend (tested on a
x86-64 corei7) expands that memset call into a sequence of two normal 16-byte
aligned vector stores.

opt -memcpyopt example.ll -S -o - | llc -mcpu=corei7 -o -

Before:
  xorps  %xmm0, %xmm0
  movaps  %xmm0, 16(%rdi)
  movaps  %xmm0, (%rdi)

With this patch, we no longer merge nontemporal stores into calls to memset.
In this example, llc correctly expands the two stores into two movntps:
  xorps  %xmm0, %xmm0
  movntps %xmm0, 16(%rdi)
  movntps  %xmm0, (%rdi)

In theory, we could extend the usage of !nontemporal metadata to memcpy/memset
calls. However a change like that would only have the effect of forcing the
backend to expand !nontemporal memsets back to sequences of store instructions.
A memset library call would not have exactly the same semantic of a builtin
!nontemporal memset call. So, SelectionDAG will have to conservatively expand
it back to a sequence of !nontemporal stores (effectively undoing the merging).

Differential Revision: http://reviews.llvm.org/D13519

llvm-svn: 249820
2015-10-09 10:53:41 +00:00
..
Hello Revert r240137 (Fixed/added namespace ending comments using clang-tidy. NFC) 2015-06-23 09:49:53 +00:00
IPO Fix Clang-tidy modernize-use-nullptr warnings in source directories and generated files; other minor cleanups. 2015-10-06 23:24:35 +00:00
InstCombine [InstCombine] transform masking off of an FP sign bit into a fabs() intrinsic call (PR24886) 2015-10-08 17:09:31 +00:00
Instrumentation New MSan mapping layout (llvm part). 2015-10-08 21:35:26 +00:00
ObjCARC [PM/AA] Rebuild LLVM's alias analysis infrastructure in a way compatible 2015-09-09 17:55:00 +00:00
Scalar [MemCpyOpt] Fix wrong merging adjacent nontemporal stores into memset calls. 2015-10-09 10:53:41 +00:00
Utils [PlaceSafeopints] Extract out `callsGCLeafFunction`, NFC 2015-10-08 23:18:30 +00:00
Vectorize inariant.group handling in GVN 2015-10-02 22:12:22 +00:00
CMakeLists.txt
LLVMBuild.txt
Makefile