llvm-project

History

Daniel Neilson 1e68724d24 Remove alignment argument from memcpy/memmove/memset in favour of alignment attributes (Step 1) Summary: This is a resurrection of work first proposed and discussed in Aug 2015: http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.html and initially landed (but then backed out) in Nov 2015: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html The @llvm.memcpy/memmove/memset intrinsics currently have an explicit argument which is required to be a constant integer. It represents the alignment of the dest (and source), and so must be the minimum of the actual alignment of the two. This change is the first in a series that allows source and dest to each have their own alignments by using the alignment attribute on their arguments. In this change we: 1) Remove the alignment argument. 2) Add alignment attributes to the source & dest arguments. We, temporarily, require that the alignments for source & dest be equal. For example, code which used to read: call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 100, i32 4, i1 false) will now read call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 4 %dest, i8* align 4 %src, i32 100, i1 false) Downstream users may have to update their lit tests that check for @llvm.memcpy/memmove/memset call/declaration patterns. The following extended sed script may help with updating the majority of your tests, but it does not catch all possible patterns so some manual checking and updating will be required. s~declare void @llvm\.mem(set\|cpy\|move)\.p([^(])\((.), i32, i1\)~declare void @llvm.mem\1.p\2(\3, i1)~g s~call void @llvm\.memset\.p([^(])i8\(i8([^])\ (.), i8 (.), i8 (.), i32 [01], i1 ([^)])\)~call void @llvm.memset.p\1i8(i8\2* \3, i8 \4, i8 \5, i1 \6)~g s~call void @llvm\.memset\.p([^(])i16\(i8([^])\ (.), i8 (.), i16 (.), i32 [01], i1 ([^)])\)~call void @llvm.memset.p\1i16(i8\2* \3, i8 \4, i16 \5, i1 \6)~g s~call void @llvm\.memset\.p([^(])i32\(i8([^])\ (.), i8 (.), i32 (.), i32 [01], i1 ([^)])\)~call void @llvm.memset.p\1i32(i8\2* \3, i8 \4, i32 \5, i1 \6)~g s~call void @llvm\.memset\.p([^(])i64\(i8([^])\ (.), i8 (.), i64 (.), i32 [01], i1 ([^)])\)~call void @llvm.memset.p\1i64(i8\2* \3, i8 \4, i64 \5, i1 \6)~g s~call void @llvm\.memset\.p([^(])i128\(i8([^])\ (.), i8 (.), i128 (.), i32 [01], i1 ([^)])\)~call void @llvm.memset.p\1i128(i8\2* \3, i8 \4, i128 \5, i1 \6)~g s~call void @llvm\.memset\.p([^(])i8\(i8([^])\ (.), i8 (.), i8 (.), i32 ([0-9]), i1 ([^)])\)~call void @llvm.memset.p\1i8(i8\2 align \6 \3, i8 \4, i8 \5, i1 \7)~g s~call void @llvm\.memset\.p([^(])i16\(i8([^])\ (.), i8 (.), i16 (.), i32 ([0-9]), i1 ([^)])\)~call void @llvm.memset.p\1i16(i8\2 align \6 \3, i8 \4, i16 \5, i1 \7)~g s~call void @llvm\.memset\.p([^(])i32\(i8([^])\ (.), i8 (.), i32 (.), i32 ([0-9]), i1 ([^)])\)~call void @llvm.memset.p\1i32(i8\2 align \6 \3, i8 \4, i32 \5, i1 \7)~g s~call void @llvm\.memset\.p([^(])i64\(i8([^])\ (.), i8 (.), i64 (.), i32 ([0-9]), i1 ([^)])\)~call void @llvm.memset.p\1i64(i8\2 align \6 \3, i8 \4, i64 \5, i1 \7)~g s~call void @llvm\.memset\.p([^(])i128\(i8([^])\ (.), i8 (.), i128 (.), i32 ([0-9]), i1 ([^)])\)~call void @llvm.memset.p\1i128(i8\2 align \6 \3, i8 \4, i128 \5, i1 \7)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i8\(i8([^])\ (.), i8([^])\ (.), i8 (.), i32 [01], i1 ([^)])\)~call void @llvm.mem\1.p\2i8(i8\3 \4, i8\5* \6, i8 \7, i1 \8)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i16\(i8([^])\ (.), i8([^])\ (.), i16 (.), i32 [01], i1 ([^)])\)~call void @llvm.mem\1.p\2i16(i8\3 \4, i8\5* \6, i16 \7, i1 \8)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i32\(i8([^])\ (.), i8([^])\ (.), i32 (.), i32 [01], i1 ([^)])\)~call void @llvm.mem\1.p\2i32(i8\3 \4, i8\5* \6, i32 \7, i1 \8)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i64\(i8([^])\ (.), i8([^])\ (.), i64 (.), i32 [01], i1 ([^)])\)~call void @llvm.mem\1.p\2i64(i8\3 \4, i8\5* \6, i64 \7, i1 \8)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i128\(i8([^])\ (.), i8([^])\ (.), i128 (.), i32 [01], i1 ([^)])\)~call void @llvm.mem\1.p\2i128(i8\3 \4, i8\5* \6, i128 \7, i1 \8)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i8\(i8([^])\ (.), i8([^])\ (.), i8 (.), i32 ([0-9]), i1 ([^)])\)~call void @llvm.mem\1.p\2i8(i8\3* align \8 \4, i8\5* align \8 \6, i8 \7, i1 \9)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i16\(i8([^])\ (.), i8([^])\ (.), i16 (.), i32 ([0-9]), i1 ([^)])\)~call void @llvm.mem\1.p\2i16(i8\3* align \8 \4, i8\5* align \8 \6, i16 \7, i1 \9)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i32\(i8([^])\ (.), i8([^])\ (.), i32 (.), i32 ([0-9]), i1 ([^)])\)~call void @llvm.mem\1.p\2i32(i8\3* align \8 \4, i8\5* align \8 \6, i32 \7, i1 \9)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i64\(i8([^])\ (.), i8([^])\ (.), i64 (.), i32 ([0-9]), i1 ([^)])\)~call void @llvm.mem\1.p\2i64(i8\3* align \8 \4, i8\5* align \8 \6, i64 \7, i1 \9)~g s~call void @llvm\.mem(cpy\|move)\.p([^(])i128\(i8([^])\ (.), i8([^])\ (.), i128 (.), i32 ([0-9]), i1 ([^)])\)~call void @llvm.mem\1.p\2i128(i8\3* align \8 \4, i8\5* align \8 \6, i128 \7, i1 \9)~g The remaining changes in the series will: Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing source and dest alignments. Step 3) Update Clang to use the new IRBuilder API. Step 4) Update Polly to use the new IRBuilder API. Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API, and those that use use MemIntrinsicInst::[get\|set]Alignment() to use getDestAlignment() and getSourceAlignment() instead. Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the MemIntrinsicInst::[get\|set]Alignment() methods. Reviewers: pete, hfinkel, lhames, reames, bollu Reviewed By: reames Subscribers: niosHD, reames, jholewinski, qcolombet, jfb, sanjoy, arsenm, dschuff, dylanmckay, mehdi_amini, sdardis, nemanjai, david2050, nhaehnle, javed.absar, sbc100, jgravelle-google, eraman, aheejin, kbarton, JDevlieghere, asb, rbar, johnrusso, simoncook, jordy.potman.lists, apazos, sabuasal, llvm-commits Differential Revision: https://reviews.llvm.org/D41675 llvm-svn: 322965		2018-01-19 17:13:12 +00:00
..
LoadStoreVectorizer.ll	[NVPTX] Added support for .f16x2 instructions.	2017-02-23 22:38:24 +00:00
MachineSink-call.ll	[NVPTX] Annotate call machine instructions as calls.	2016-02-17 17:46:50 +00:00
MachineSink-convergent.ll	NVPTX: Replace uses of cuda.syncthreads with nvvm.barrier0	2016-07-06 20:02:45 +00:00
TailDuplication-convergent.ll	NVPTX: Replace uses of cuda.syncthreads with nvvm.barrier0	2016-07-06 20:02:45 +00:00
access-non-generic.ll	NVPTX: Move InferAddressSpaces to generic code	2017-01-31 01:10:58 +00:00
add-128bit.ll	[DAGCombiner] add missing folds for scalar select of {-1,0,1}	2017-02-24 17:17:33 +00:00
addrspacecast-gvar.ll	…
addrspacecast.ll	[NVPTX] Remove NVPTXFavorNonGenericAddrSpaces pass.	2016-10-31 21:51:42 +00:00
aggr-param.ll	…
aggregate-return.ll	[NVPTX] Unify vectorization of load/stores of aggregate arguments and return values.	2017-02-21 22:56:05 +00:00
alias.ll	[CUDA] Die gracefully when trying to output an LLVM alias.	2016-01-23 21:12:20 +00:00
annotations.ll	Whitespace cleanup in test/CodeGen/NVPTX/annotations.ll.	2016-12-14 22:32:55 +00:00
arg-lowering.ll	…
arithmetic-fp-sm20.ll	…
arithmetic-int.ll	[NVPTX] expand mul_lohi to mul_lo and mul_hi	2016-01-22 19:47:26 +00:00
atomics-sm60.ll	[NVPTX] Implement __nvvm_atom_add_gen_d builtin.	2017-11-07 22:10:54 +00:00
atomics-with-scope.ll	[NVPTX] Added intrinsics for atom.gen.{sys\|cta}.* instructions.	2016-09-28 17:25:38 +00:00
atomics.ll	…
barrier.ll	[NVPTX] Implemented bar.warp.sync, barrier.sync, and vote{.sync} instructions/intrinsics/builtins.	2017-09-21 18:44:49 +00:00
bfe.ll	…
branch-fold.ll	…
bug17709.ll	[NVPTX] Don't flag StoreParam/LoadParam memory chain operands as ReadMem/WriteMem (PR32146)	2017-05-15 17:17:44 +00:00
bug21465.ll	[NVPTX] Renamed NVPTXLowerKernelArgs -> NVPTXLowerArgs. NFC.	2016-07-20 21:44:07 +00:00
bug22246.ll	…
bug22322.ll	Add address space mangling to lifetime intrinsics	2017-04-10 20:18:21 +00:00
bug26185-2.ll	[NVPTX] Fix sign/zero-extending ldg/ldu instruction selection	2016-05-02 18:12:02 +00:00
bug26185.ll	[NVPTX] Handle ldg created from sign-/zero-extended load	2016-04-05 12:38:01 +00:00
bypass-div.ll	…
call-with-alloca-buffer.ll	Fix NVPTX/call-with-alloca-buffer.ll after r276777.	2016-07-26 18:28:33 +00:00
callchain.ll	…
calling-conv.ll	…
combine-min-max.ll	[NVPTX] Implement min/max in tablegen, rather than with custom DAGComine logic.	2017-01-18 00:09:01 +00:00
compare-int.ll	…
constant-vectors.ll	…
convergent-mir-call.ll	[NVPTX] Use different, convergent MIs for convergent calls.	2016-03-01 19:24:03 +00:00
convert-fp.ll	[NVPTX] Add fptosi tests to convert-fp.ll.	2017-01-15 16:55:54 +00:00
convert-int-sm20.ll	…
ctlz.ll	[NVPTX] Don't flag StoreRetVal memory chain operands as ReadMem (PR32146)	2017-05-12 19:56:43 +00:00
ctpop.ll	[NVPTX] Don't flag StoreRetVal memory chain operands as ReadMem (PR32146)	2017-05-12 19:56:43 +00:00
cttz.ll	[NVPTX] Don't flag StoreRetVal memory chain operands as ReadMem (PR32146)	2017-05-12 19:56:43 +00:00
disable-opt.ll	[NVPTX] Disable performance optimizations when OptLevel==None	2016-02-04 04:15:36 +00:00
div-ri.ll	…
divrem-combine.ll	[NVPTX] Compute 'rem' using the result of 'div', if possible.	2016-10-28 21:44:00 +00:00
envreg.ll	…
extloadv.ll	…
f16-instructions.ll	[NVPTX] Don't flag StoreParam/LoadParam memory chain operands as ReadMem/WriteMem (PR32146)	2017-05-15 17:17:44 +00:00
f16x2-instructions.ll	[NVPTX] Don't flag StoreParam/LoadParam memory chain operands as ReadMem/WriteMem (PR32146)	2017-05-15 17:17:44 +00:00
fast-math.ll	[NVPTX] Enable combineRepeatedFPDivisors for NVPTX.	2017-02-03 15:13:50 +00:00
fcos-no-fast-math.ll	[NVPTX] Only lower sin/cos to approximate instructions if unsafe math is allowed.	2017-01-13 18:48:13 +00:00
fma-assoc.ll	[DAGCombine] require UnsafeFPMath for re-association of addition	2017-01-31 14:35:37 +00:00
fma-disable.ll	…
fma.ll	[NVPTX] Don't flag StoreParam/LoadParam memory chain operands as ReadMem/WriteMem (PR32146)	2017-05-15 17:17:44 +00:00
fns.ll	[NVPTX,CUDA] Added llvm.nvvm.fns intrinsic and matching __nvvm_fns builtin in clang.	2017-12-06 17:50:05 +00:00
fp-contract.ll	…
fp-literals.ll	…
fp16.ll	…
fsin-no-fast-math.ll	[NVPTX] Only lower sin/cos to approximate instructions if unsafe math is allowed.	2017-01-13 18:48:13 +00:00
function-align.ll	…
generic-to-nvvm-ir.ll	[Verifier] Remove the -verify-debug-info cl::opt	2017-11-02 23:44:20 +00:00
generic-to-nvvm.ll	…
global-addrspace.ll	…
global-ctor-empty.ll	[CUDA] Die if we ask the NVPTX backend to emit a global ctor/dtor.	2016-01-30 01:07:38 +00:00
global-ctor.ll	[CUDA] Die if we ask the NVPTX backend to emit a global ctor/dtor.	2016-01-30 01:07:38 +00:00
global-dtor.ll	[CUDA] Die if we ask the NVPTX backend to emit a global ctor/dtor.	2016-01-30 01:07:38 +00:00
global-ordering.ll	…
global-variable-big.ll	[NVPTX] Support global variables of integer type larger than i64.	2017-01-18 00:29:53 +00:00
global-visibility.ll	[NVPTX] Do not emit .hidden or .protected directives as they are not allowed by PTX.	2016-01-15 23:57:53 +00:00
globals_init.ll	…
globals_lowering.ll	…
gvar-init.ll	…
half.ll	[NVPTX] Added support for half-precision floating point.	2017-01-13 20:56:17 +00:00
i1-global.ll	…
i1-int-to-fp.ll	…
i1-param.ll	…
i8-param.ll	[NVPTX] Don't flag StoreParam/LoadParam memory chain operands as ReadMem/WriteMem (PR32146)	2017-05-15 17:17:44 +00:00
i128-global.ll	[NVPTX] Add lowering of i128 params.	2017-07-20 21:16:03 +00:00
i128-param.ll	[NVPTX] Add lowering of i128 params.	2017-07-20 21:16:03 +00:00
i128-retval.ll	[NVPTX] Add lowering of i128 params.	2017-07-20 21:16:03 +00:00
idioms.ll	[NVPTX] Lower integer absolute value idiom to abs instruction.	2017-01-18 00:08:44 +00:00
imad.ll	…
implicit-def.ll	…
inline-asm.ll	…
intrin-nocapture.ll	…
intrinsic-old.ll	[NVVMIntrRange] Only set range metadata if none is already present	2016-12-22 00:51:59 +00:00
intrinsics.ll	Fix some broken CHECK lines.	2017-01-22 20:28:56 +00:00
isspacep.ll	…
ld-addrspace.ll	…
ld-generic.ll	…
ld-st-addrrspace.py	[NVPTX] allow address space inference for volatile loads/stores.	2017-10-24 20:31:44 +00:00
ldg-invariant.ll	[NVPTX] Add tests that invariant vector loads get lowered to ld.global.nc.	2017-02-04 01:54:56 +00:00
ldparam-v4.ll	[NVPTX] Unify vectorization of load/stores of aggregate arguments and return values.	2017-02-21 22:56:05 +00:00
ldu-i8.ll	…
ldu-ldg.ll	…
ldu-reg-plus-offset.ll	…
lit.local.cfg	…
load-sext-i1.ll	…
load-with-non-coherent-cache.ll	…
local-stack-frame.ll	…
loop-vectorize.ll	…
lower-aggr-copies.ll	Remove alignment argument from memcpy/memmove/memset in favour of alignment attributes (Step 1)	2018-01-19 17:13:12 +00:00
lower-alloca.ll	NVPTX: Move InferAddressSpaces to generic code	2017-01-31 01:10:58 +00:00
lower-kernel-ptr-arg.ll	[NVPTX] Improve lowering of byval args of device functions.	2016-07-20 18:39:47 +00:00
machine-sink.ll	…
managed.ll	…
match.ll	[NVPTX] added match.{any,all}.sync instructions, intrinsics & builtins.	2017-09-26 17:07:23 +00:00
math-intrins.ll	[NVPTX] Add codegen tests for llvm.fma.	2017-01-15 16:55:37 +00:00
minmax-negative.ll	Improve clamp recognition in ValueTracking.	2017-10-27 20:53:41 +00:00
misaligned-vector-ldst.ll	[NVPTX] Fixed lowering of unaligned loads/stores of f16 scalars and vectors.	2017-03-07 20:33:38 +00:00
module-inline-asm.ll	…
mulwide.ll	…
named-barriers.ll	[NVPTX] Add intrinsics to support named barriers.	2017-01-28 16:38:15 +00:00
noduplicate-syncthreads.ll	NVPTX: Replace uses of cuda.syncthreads with nvvm.barrier0	2016-07-06 20:02:45 +00:00
nounroll.ll	…
nvcl-param-align.ll	…
nvvm-reflect-module-flag.ll	[NVPTX] Read __CUDA_FTZ from module flags in NVVMReflect.	2016-04-01 01:09:07 +00:00
nvvm-reflect.ll	[NVPTX] Let there be One True Way to set NVVMReflect params.	2017-01-15 16:54:35 +00:00
param-align.ll	[NVPTX] Make sure we adjust alignment at all call sites	2016-07-18 21:58:48 +00:00
param-load-store.ll	[NVPTX] Don't flag StoreParam/LoadParam memory chain operands as ReadMem/WriteMem (PR32146)	2017-05-15 17:17:44 +00:00
pr13291-i1-store.ll	…
pr16278.ll	…
pr17529.ll	…
refl1.ll	…
reg-copy.ll	…
reg-types.ll	[NVPTX] Use untyped (.b) integer registers in PTX.	2016-08-12 22:02:19 +00:00
rotate.ll	…
sched1.ll	Only enable LiveRangeShrink for x86.	2017-05-17 20:18:13 +00:00
sched2.ll	Only enable LiveRangeShrink for x86.	2017-05-17 20:18:13 +00:00
sext-in-reg.ll	…
sext-params.ll	…
shfl-sync.ll	[NVPTX] Implemented shfl.sync instruction and supporting intrinsics/builtins.	2017-09-20 21:23:07 +00:00
shfl.ll	[NVPTX] Remove NVPTXFavorNonGenericAddrSpaces pass.	2016-10-31 21:51:42 +00:00
shift-parts.ll	…
simple-call.ll	[NVPTX] Don't flag StoreParam/LoadParam memory chain operands as ReadMem/WriteMem (PR32146)	2017-05-15 17:17:44 +00:00
sm-version-20.ll	…
sm-version-21.ll	…
sm-version-30.ll	…
sm-version-32.ll	…
sm-version-35.ll	…
sm-version-37.ll	…
sm-version-50.ll	…
sm-version-52.ll	…
sm-version-53.ll	…
sm-version-60.ll	[NVPTX] Add sm_60, sm_61, sm_62 targets to LLVM.	2016-07-06 21:06:10 +00:00
sm-version-61.ll	[NVPTX] Add sm_60, sm_61, sm_62 targets to LLVM.	2016-07-06 21:06:10 +00:00
sm-version-62.ll	[NVPTX] Add sm_60, sm_61, sm_62 targets to LLVM.	2016-07-06 21:06:10 +00:00
sm-version-70.ll	[CUDA] Added rudimentary support for CUDA-9 and sm_70.	2017-09-07 18:14:32 +00:00
speculative-execution-divergent-target.ll	Move divergent-target test into CodeGen/NVPTX because it requires an NVPTX target.	2016-04-15 01:20:52 +00:00
sqrt-approx.ll	[NVPTX] Compute approx sqrt as 1/rsqrt(x) rather than x*rsqrt(x).	2017-01-31 23:08:57 +00:00
st-addrspace.ll	…
st-generic.ll	…
surf-read-cuda.ll	…
surf-read.ll	…
surf-write-cuda.ll	…
surf-write.ll	…
symbol-naming.ll	[NVPTX] Assign valid global names	2017-12-04 14:19:33 +00:00
tex-read-cuda.ll	…
tex-read.ll	…
texsurf-queries.ll	…
tid-range.ll	[SelectionDAG] Correctly transform range metadata to AssertZExt	2017-01-06 00:11:46 +00:00
tuple-literal.ll	…
vec-param-load.ll	[NVPTX] Unify vectorization of load/stores of aggregate arguments and return values.	2017-02-21 22:56:05 +00:00
vec8.ll	Revert r302938 "Add LiveRangeShrink pass to shrink live range within BB."	2017-05-18 18:50:05 +00:00
vector-args.ll	…
vector-call.ll	[NVPTX] Don't flag StoreParam/LoadParam memory chain operands as ReadMem/WriteMem (PR32146)	2017-05-15 17:17:44 +00:00
vector-compare.ll	…
vector-global.ll	…
vector-loads.ll	…
vector-select.ll	…
vector-stores.ll	…
vote.ll	[NVPTX] Implemented bar.warp.sync, barrier.sync, and vote{.sync} instructions/intrinsics/builtins.	2017-09-21 18:44:49 +00:00
weak-global.ll	…
weak-linkage.ll	…
wmma.py	[NVPTX] Implemented wmma intrinsics and instructions.	2017-10-12 18:27:55 +00:00
zero-cs.ll	llvm/test/CodeGen/NVPTX/zero-cs.ll: Relax an expression to match in -Asserts.	2016-09-21 04:43:11 +00:00
zeroext-32bit.ll	[NVPTX] Don't flag StoreParam/LoadParam memory chain operands as ReadMem/WriteMem (PR32146)	2017-05-15 17:17:44 +00:00