llvm-project

Commit Graph

Author	SHA1	Message	Date
Jingyue Wu	995dde2799	[NVPTXFavorNonGenericAddrSpaces] recursively trace into GEP and BitCast Summary: This patch allows NVPTXFavorNonGenericAddrSpaces to remove addrspacecast from longer chains consisting of GEPs and BitCasts. For example, it can now optimize %0 = addrspacecast [10 x float] addrspace(3)* @a to [10 x float]* %1 = gep [10 x float]* %0, i64 0, i64 %i %2 = bitcast float* %1 to i32* %3 = load i32* %2 ; emits ld.u32 to %0 = gep [10 x float] addrspace(3)* @a, i64 0, i64 %i %1 = bitcast float addrspace(3)* %0 to i32 addrspace(3)* %3 = load i32 addrspace(3)* %1 ; emits ld.shared.f32 Test Plan: @ld_int_from_global_float in access-non-generic.ll Reviewers: broune, eliben, jholewinski, meheff Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D10074 llvm-svn: 238574	2015-05-29 17:00:27 +00:00
Justin Holewinski	3d2a976197	[NVPTX] Handle addrspacecast constant expressions in aggregate initializers We need to track if an AddrSpaceCast expression was seen when generating an MCExpr for a ConstantExpr. This change introduces a custom lowerConstant method to the NVPTX asm printer that will create NVPTXGenericMCSymbolRefExpr nodes at the appropriate places to encode the information that a given symbol needs to be casted to a generic address. llvm-svn: 236000	2015-04-28 17:18:30 +00:00
Jingyue Wu	312fd0242d	[NVPTX] Emits "generic()" depending on the original address space Summary: Fixes a bug in the NVPTX codegen. The code used to miss necessary "generic()" on aggregates of addrspacecasts. Test Plan: addrspacecast-gvar.ll Reviewers: eliben, jholewinski Reviewed By: jholewinski Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D9130 llvm-svn: 235689	2015-04-24 02:57:30 +00:00
David Blaikie	23af64846f	[opaque pointer type] Add textual IR support for explicit type parameter to the call instruction See r230786 and r230794 for similar changes to gep and load respectively. Call is a bit different because it often doesn't have a single explicit type - usually the type is deduced from the arguments, and just the return type is explicit. In those cases there's no need to change the IR. When that's not the case, the IR usually contains the pointer type of the first operand - but since typed pointers are going away, that representation is insufficient so I'm just stripping the "pointerness" of the explicit type away. This does make the IR a bit weird - it /sort of/ reads like the type of the first operand: "call void () %x(" but %x is actually of type "void ()" and will eventually be just of type "ptr". But this seems not too bad and I don't think it would benefit from repeating the type ("void (), void () %x(" and then eventually "void (), ptr %x(") as has been done with gep and load. This also has a side benefit: since the explicit type is no longer a pointer, there's no ambiguity between an explicit type and a function that returns a function pointer. Previously this case needed an explicit type (eg: a function returning a void() function was written as "call void () () * @x(" rather than "call void () * @x(" because of the ambiguity between a function returning a pointer to a void() function and a function returning void). No ambiguity means even function pointer return types can just be written alone, without writing the whole function's type. This leaves /only/ the varargs case where the explicit type is required. Given the special type syntax in call instructions, the regex-fu used for migration was a bit more involved in its own unique way (as every one of these is) so here it is. Use it in conjunction with the apply.sh script and associated find/xargs commands I've provided in rr230786 to migrate your out of tree tests. Do let me know if any of this doesn't cover your cases & we can iterate on a more general script/regexes to help others with out of tree tests. About 9 test cases couldn't be automatically migrated - half of those were functions returning function pointers, where I just had to manually delete the function argument types now that we didn't need an explicit function type there. The other half were typedefs of function types used in calls - just had to manually drop the * from those. import fileinput import sys import re pat = re.compile(r'((?:=\|:\|^\|\s)call\s(?:[^@]?))(\s$\|\s(?:(?:\[\[[a-zA-Z0-9_]+\]\]\|[@%](?:(")?[\\\?@a-zA-Z0-9_.]?(?(3)"\|)\|{{.}}))(?:$\|$)\|undef\|inttoptr\|bitcast\|null\|asm).$)') addrspace_end = re.compile(r"addrspace\(\d+$\s\$") func_end = re.compile("(?:void.\|\)\s)\$") def conv(match, line): if not match or re.search(addrspace_end, match.group(1)) or not re.search(func_end, match.group(1)): return line return line[:match.start()] + match.group(1)[:match.group(1).rfind('')].rstrip() + match.group(2) + line[match.end():] for line in sys.stdin: sys.stdout.write(conv(re.search(pat, line), line)) llvm-svn: 235145	2015-04-16 23:24:18 +00:00
Jan Vesely	ffcd968647	Revert revisions r234755, r234759, r234760 Revert "Remove default in fully-covered switch (to fix Clang -Werror -Wcovered-switch-default)" Revert "R600: Add carry and borrow instructions. Use them to implement UADDO/USUBO" Revert "LegalizeDAG: Try to use Overflow operations when expanding ADD/SUB" Using overflow operations fails CodeGen/Generic/2011-07-07-ScheduleDAGCrash.ll on hexagon, nvptx, and r600. Revert while I investigate. llvm-svn: 234768	2015-04-13 17:47:15 +00:00
Jan Vesely	a835555e40	LegalizeDAG: Try to use Overflow operations when expanding ADD/SUB v2: consider BooleanContents when processing overflow Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewers: resistor, jholewinsky (nvidia parts) Differential Revision: http://reviews.llvm.org/D6340 llvm-svn: 234755	2015-04-13 15:32:01 +00:00
Justin Holewinski	0afca26e90	[NVPTX] Associate a minimum PTX version for each SM architecture When a new SM architecture is introduced, it is only supported by the current PTX version and later. Make sure we are using at least the minimum PTX version for the target architecture. This also removes support for PTX ISA < 3.2. llvm-svn: 233583	2015-03-30 19:30:55 +00:00
Justin Holewinski	f94d5b5137	[NVPTX] Add options for PTX 4.1/4.2 and SM 3.2/3.7/5.2/5.3 llvm-svn: 233575	2015-03-30 18:12:50 +00:00
Artem Belevich	9e8a039318	Add support for __nvvm_reflect changes in libdevice in CUDA-7.0 Summary: CUDA 7.0's libdevice uses slightly different IR to call __nvvm_reflect and that triggers an assertion in nvvm_reflect optimization pass. This change allows nvvm_reflect pass to deal with both old and new ways to pass an argument to __nvvm_reflect. Test Plan: ninja check-all Reviewers: eliben, echristo Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D8399 llvm-svn: 232732	2015-03-19 17:05:35 +00:00
David Blaikie	f72d05bc7b	[opaque pointer type] Add textual IR support for explicit type parameter to gep operator Similar to gep (r230786) and load (r230794) changes. Similar migration script can be used to update test cases, which successfully migrated all of LLVM and Polly, but about 4 test cases needed manually changes in Clang. (this script will read the contents of stdin and massage it into stdout - wrap it in the 'apply.sh' script shown in previous commits + xargs to apply it over a large set of test cases) import fileinput import sys import re rep = re.compile(r"(getelementptr(?:\s+inbounds)?\s$)((<\d\s+x\s+)?([^@]?)(\|\saddrspace\(\d+$)\s\(?(3)>)\s*)(?=$\|%\|@\|null\|undef\|blockaddress\|getelementptr\|addrspacecast\|bitcast\|inttoptr\|zeroinitializer\|<\|\[\[[a-zA-Z]\|\{\{)", re.MULTILINE \| re.DOTALL) def conv(match): line = match.group(1) line += match.group(4) line += ", " line += match.group(2) return line line = sys.stdin.read() off = 0 for match in re.finditer(rep, line): sys.stdout.write(line[off:match.start()]) sys.stdout.write(conv(match)) off = match.end() sys.stdout.write(line[off:]) llvm-svn: 232184	2015-03-13 18:20:45 +00:00
Jingyue Wu	e8290f21b5	[NVPTXAsmPrinter] do not print .align on function headers Summary: PTX does not allow .align directives on function headers. Fixes PR21551. Test Plan: test/Codegen/NVPTX/function-align.ll Reviewers: eliben, jholewinski Reviewed By: eliben, jholewinski Subscribers: llvm-commits, eliben, jpienaar, jholewinski Differential Revision: http://reviews.llvm.org/D8274 llvm-svn: 232004	2015-03-12 01:50:30 +00:00
David Blaikie	a79ac14fa6	[opaque pointer type] Add textual IR support for explicit type parameter to load instruction Essentially the same as the GEP change in r230786. A similar migration script can be used to update test cases, though a few more test case improvements/changes were required this time around: (r229269-r229278) import fileinput import sys import re pat = re.compile(r"((?:=\|:\|^)\sload (?:atomic )?(?:volatile )?(.?))(\| addrspace$\d+$ )\($\| (?:%\|@\|null\|undef\|blockaddress\|getelementptr\|addrspacecast\|bitcast\|inttoptr\|\[\[[a-zA-Z]\|\{\{).$)") for line in sys.stdin: sys.stdout.write(re.sub(pat, r"\1, \2\3*\4", line)) Reviewers: rafael, dexonsmith, grosser Differential Revision: http://reviews.llvm.org/D7649 llvm-svn: 230794	2015-02-27 21:17:42 +00:00
David Blaikie	79e6c74981	[opaque pointer type] Add textual IR support for explicit type parameter to getelementptr instruction One of several parallel first steps to remove the target type of pointers, replacing them with a single opaque pointer type. This adds an explicit type parameter to the gep instruction so that when the first parameter becomes an opaque pointer type, the type to gep through is still available to the instructions. * This doesn't modify gep operators, only instructions (operators will be handled separately) * Textual IR changes only. Bitcode (including upgrade) and changing the in-memory representation will be in separate changes. * geps of vectors are transformed as: getelementptr <4 x float> %x, ... ->getelementptr float, <4 x float> %x, ... Then, once the opaque pointer type is introduced, this will ultimately look like: getelementptr float, <4 x ptr> %x with the unambiguous interpretation that it is a vector of pointers to float. * address spaces remain on the pointer, not the type: getelementptr float addrspace(1)* %x ->getelementptr float, float addrspace(1)* %x Then, eventually: getelementptr float, ptr addrspace(1) %x Importantly, the massive amount of test case churn has been automated by same crappy python code. I had to manually update a few test cases that wouldn't fit the script's model (r228970,r229196,r229197,r229198). The python script just massages stdin and writes the result to stdout, I then wrapped that in a shell script to handle replacing files, then using the usual find+xargs to migrate all the files. update.py: import fileinput import sys import re ibrep = re.compile(r"(^.?[^%\w]getelementptr inbounds )(((?:<\d x )?)(.?)(\| addrspace$\d$) \(\|>)(?:$\| (?:%\|@\|null\|undef\|blockaddress\|getelementptr\|addrspacecast\|bitcast\|inttoptr\|\[\[[a-zA-Z]\|\{\{).$))") normrep = re.compile( r"(^.?[^%\w]getelementptr )(((?:<\d* x )?)(.?)(\| addrspace$\d$) \(\|>)(?:$\| (?:%\|@\|null\|undef\|blockaddress\|getelementptr\|addrspacecast\|bitcast\|inttoptr\|\[\[[a-zA-Z]\|\{\{).$))") def conv(match, line): if not match: return line line = match.groups()[0] if len(match.groups()[5]) == 0: line += match.groups()[2] line += match.groups()[3] line += ", " line += match.groups()[1] line += "\n" return line for line in sys.stdin: if line.find("getelementptr ") == line.find("getelementptr inbounds"): if line.find("getelementptr inbounds") != line.find("getelementptr inbounds ("): line = conv(re.match(ibrep, line), line) elif line.find("getelementptr ") != line.find("getelementptr ("): line = conv(re.match(normrep, line), line) sys.stdout.write(line) apply.sh: for name in "$@" do python3 `dirname "$0"`/update.py < "$name" > "$name.tmp" && mv "$name.tmp" "$name" rm -f "$name.tmp" done The actual commands: From llvm/src: find test/ -name .ll \| xargs ./apply.sh From llvm/src/tools/clang: find test/ -name .mm -o -name .m -o -name .cpp -o -name .c \| xargs -I '{}' ../../apply.sh "{}" From llvm/src/tools/polly: find test/ -name *.ll \| xargs ./apply.sh After that, check-all (with llvm, clang, clang-tools-extra, lld, compiler-rt, and polly all checked out). The extra 'rm' in the apply.sh script is due to a few files in clang's test suite using interesting unicode stuff that my python script was throwing exceptions on. None of those files needed to be migrated, so it seemed sufficient to ignore those cases. Reviewers: rafael, dexonsmith, grosser Differential Revision: http://reviews.llvm.org/D7636 llvm-svn: 230786	2015-02-27 19:29:02 +00:00
Jingyue Wu	0220df0dfd	[NVPTX] Emit .pragma "nounroll" for loops marked with nounroll Summary: CUDA driver can unroll loops when jit-compiling PTX. To prevent CUDA driver from unrolling a loop marked with llvm.loop.unroll.disable is not unrolled by CUDA driver, we need to emit .pragma "nounroll" at the header of that loop. This patch also extracts getting unroll metadata from loop ID metadata into a shared helper function. Test Plan: test/CodeGen/NVPTX/nounroll.ll Reviewers: eliben, meheff, jholewinski Reviewed By: jholewinski Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D7041 llvm-svn: 227703	2015-02-01 02:27:45 +00:00
Justin Holewinski	d4d2e9bd0e	[NVPTX] Generate a more optimal sequence for select of i1 Instead of creating a pattern like "(p && a) \|\| ((!p) && b)", just expand the i8 operands to i32 and perform the selp on them. Fixes PR22246 llvm-svn: 227123	2015-01-26 19:52:20 +00:00
Justin Holewinski	23df659e6d	[NVPTX] Handle floating-point conversion patterns that are not explicitly ordered or unordered Fixes PR22322 llvm-svn: 227117	2015-01-26 19:11:20 +00:00
Olivier Sallenave	d657321aef	Check that the TLI callback enableAggressiveFMAFusion has the desired effect on FMA folding. llvm-svn: 225987	2015-01-14 15:36:28 +00:00
Jingyue Wu	e4c9cf04f5	[NVPTX] Fix bugs related to isSingleValueType Summary: With isSingleValueType starting to treat vector types as single-value types, code that uses this interface needs to be updated. Test Plan: vector-global.ll nvcl-param-align.ll Reviewers: jholewinski Reviewed By: jholewinski Subscribers: llvm-commits, meheff, eliben, jholewinski Differential Revision: http://reviews.llvm.org/D6573 llvm-svn: 224440	2014-12-17 17:59:04 +00:00
Duncan P. N. Exon Smith	be7ea19b58	IR: Make metadata typeless in assembly Now that `Metadata` is typeless, reflect that in the assembly. These are the matching assembly changes for the metadata/value split in r223802. - Only use the `metadata` type when referencing metadata from a call intrinsic -- i.e., only when it's used as a `Value`. - Stop pretending that `ValueAsMetadata` is wrapped in an `MDNode` when referencing it from call intrinsics. So, assembly like this: define @foo(i32 %v) { call void @llvm.foo(metadata !{i32 %v}, metadata !0) call void @llvm.foo(metadata !{i32 7}, metadata !0) call void @llvm.foo(metadata !1, metadata !0) call void @llvm.foo(metadata !3, metadata !0) call void @llvm.foo(metadata !{metadata !3}, metadata !0) ret void, !bar !2 } !0 = metadata !{metadata !2} !1 = metadata !{i32* @global} !2 = metadata !{metadata !3} !3 = metadata !{} turns into this: define @foo(i32 %v) { call void @llvm.foo(metadata i32 %v, metadata !0) call void @llvm.foo(metadata i32 7, metadata !0) call void @llvm.foo(metadata i32* @global, metadata !0) call void @llvm.foo(metadata !3, metadata !0) call void @llvm.foo(metadata !{!3}, metadata !0) ret void, !bar !2 } !0 = !{!2} !1 = !{i32* @global} !2 = !{!3} !3 = !{} I wrote an upgrade script that handled almost all of the tests in llvm and many of the tests in cfe (even handling many `CHECK` lines). I've attached it (or will attach it in a moment if you're speedy) to PR21532 to help everyone update their out-of-tree testcases. This is part of PR21532. llvm-svn: 224257	2014-12-15 19:07:53 +00:00
Duncan P. N. Exon Smith	6ec9edf8ee	IR: Canonicalize metadata formatting, NFC Canonicalize formatting of metadata to make it easier to upgrade via scripts -- in particular, one line per metadata definition makes it more `sed`-able. This is preparation for changing the assembly syntax for metadata [1]. [1]: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20141208/248449.html llvm-svn: 224002	2014-12-11 06:32:29 +00:00
Jingyue Wu	5b62eb9b48	[NVPTX] Do not emit .weak symbols for NVPTX Summary: ".weak" symbols cannot be consumed by ptxas (PR21685). This patch makes the weak directive in MCAsmPrinter customizable, and disables emitting ".weak" symbols for NVPTX. Test Plan: weak-linkage.ll Reviewers: jholewinski Reviewed By: jholewinski Subscribers: majnemer, jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D6455 llvm-svn: 223077	2014-12-01 21:16:17 +00:00
Justin Holewinski	3d140fcfd1	[NVPTX] Add NVPTXLowerStructArgs pass This works around the limitation that PTX does not allow .param space loads/stores with arbitrary pointers. If a function has a by-val struct ptr arg, say foo(%struct.x byval %d), then add the following instructions to the first basic block : %temp = alloca %struct.x, align 8 %tt1 = bitcast %struct.x %d to i8 * %tt2 = llvm.nvvm.cvt.gen.to.param %tt2 %tempd = bitcast i8 addrspace(101) * to %struct.x addrspace(101) * %tv = load %struct.x addrspace(101) * %tempd store %struct.x %tv, %struct.x * %temp, align 8 The above code allocates some space in the stack and copies the incoming struct from param space to local space. Then replace all occurences of %d by %temp. Fixes PR21465. llvm-svn: 221377	2014-11-05 18:19:30 +00:00
Jingyue Wu	ea51161a94	[NVPTX] aligned byte-buffers for vector return types Summary: Fixes PR21100 which is caused by inconsistency between the declared return type and the expected return type at the call site. The new behavior is consistent with nvcc and the NVPTXTargetLowering::getPrototype function. Test Plan: test/Codegen/NVPTX/vector-return.ll Reviewers: jholewinski Reviewed By: jholewinski Subscribers: llvm-commits, meheff, eliben, jholewinski Differential Revision: http://reviews.llvm.org/D5612 llvm-svn: 220607	2014-10-25 03:46:16 +00:00
Jingyue Wu	2954280f6a	[MachineSink] Use the real post dominator tree Summary: Fixes a FIXME in MachineSinking. Instead of using the simple heuristics in isPostDominatedBy, use the real MachinePostDominatorTree and MachineLoopInfo. The old heuristics caused instructions to sink unnecessarily, and might create register pressure. This is the second try of the fix. The first one (D4814) caused a performance regression due to failing to sink instructions out of loops (PR21115). This patch fixes PR21115 by sinking an instruction from a deeper loop to a shallower one regardless of whether the target block post-dominates the source. Thanks Alexey Volkov for reporting PR21115! Test Plan: Added a NVPTX codegen test to verify that our change prevents the backend from over-sinking. It also shows the unnecessary register pressure caused by over-sinking. Added an X86 test to verify we can sink instructions out of loops regardless of the dominance relationship. This test is reduced from Alexey's test in PR21115. Updated an affected test in X86. Also ran SPEC CINT2006 and llvm-test-suite for compilation time and runtime performance. Results are attached separately in the review thread. Reviewers: Jiangning, resistor, hfinkel Reviewed By: hfinkel Subscribers: hfinkel, bruno, volkalexey, llvm-commits, meheff, eliben, jholewinski Differential Revision: http://reviews.llvm.org/D5633 llvm-svn: 219773	2014-10-15 03:27:43 +00:00
Jingyue Wu	fd47fb9976	Revert r216862 due to a performance regression Reported by Alexey Volkov in PR21115 llvm-svn: 218771	2014-10-01 15:22:13 +00:00
Jingyue Wu	5208cc5dbe	[MachineSink] Use the real post dominator tree Summary: Fixes a FIXME in MachineSinking. Instead of using the simple heuristics in isPostDominatedBy, use the real MachinePostDominatorTree. The old heuristics caused instructions to sink unnecessarily, and might create register pressure. Test Plan: Added a NVPTX codegen test to verify that our change is in effect. It also shows the unnecessary register pressure caused by over-sinking. Updated affected tests in AArch64 and X86. Reviewers: eliben, meheff, Jiangning Reviewed By: Jiangning Subscribers: jholewinski, aemerson, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D4814 llvm-svn: 216862	2014-09-01 03:47:25 +00:00
Jingyue Wu	cb83a155c1	[NVPTX] Make the alignment an explicit argument to ldu/ldg Summary: Instead of specifying the alignment as metadata which may be destroyed by transformation passes, make the alignment the second argument to ldu/ldg intrinsic calls. Test Plan: ldu-ldg.ll ldu-i8.ll ldu-reg-plus-offset.ll Reviewers: eliben, meheff, jholewinski Reviewed By: meheff, jholewinski Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D5093 llvm-svn: 216731	2014-08-29 15:30:20 +00:00
Justin Holewinski	4d6f783281	[NVPTX] Add some extra tests for mul.wide to test non-power-of-two source types llvm-svn: 213794	2014-07-23 20:23:49 +00:00
Justin Holewinski	ecca715b3c	[NVPTX] mul.wide generation works for any smaller integer source types, not just the next smaller power of two llvm-svn: 213784	2014-07-23 18:46:03 +00:00
Justin Holewinski	511664dc76	[NVPTX] Make sure we do not generate MULWIDE ISD nodes when optimizations are disabled With optimizations disabled, we disable the isel patterns for mul.wide; but we were still generating MULWIDE ISD nodes. Now, we only try to generate MULWIDE ISD nodes in DAGCombine if the optimization level is not zero. llvm-svn: 213773	2014-07-23 17:40:45 +00:00
Eli Bendersky	24d8aacd94	Add some tests for NVPTX lowering of cmpxchg llvm-svn: 213586	2014-07-21 22:54:44 +00:00
Eli Bendersky	f4f1cff4ba	Add tests for atomic adds on floats. llvm-svn: 213406	2014-07-18 20:11:26 +00:00
Eli Bendersky	fc42738a75	Use CHECK-LABEL where appropriate in this test. llvm-svn: 213398	2014-07-18 19:32:09 +00:00
Tim Northover	9e108a0e3a	NVPTX: support fpext/fptrunc to and from f16. llvm-svn: 213377	2014-07-18 13:01:43 +00:00
Tim Northover	20bd0ced30	CodeGen: soften f16 type by default instead of marking legal. Actual support for softening f16 operations is still limited, and can be added when it's needed. But Soften is much closer to being a useful thing to try than keeping it Legal when no registers can actually hold such values. Longer term, we probably want something between Soften and Promote semantics for most targets, it'll be more efficient to promote the 4 basic operations to f32 than libcall them. llvm-svn: 213372	2014-07-18 12:41:46 +00:00
Tim Northover	5e54fe14a4	NVPTX: support direct f16 <-> f64 conversions via intrinsics. Clang may well start emitting these soon, and while it may not be directly relevant for OpenCL or GLSL, the instructions were just sitting there waiting to be used. llvm-svn: 213356	2014-07-18 08:30:10 +00:00
Justin Holewinski	428cf0e49a	[NVPTX] Improve handling of FP fusion We now consider the FPOpFusion flag when determining whether to fuse ops. We also explicitly emit add.rn when fusion is disabled to prevent ptxas from fusing the operations on its own. llvm-svn: 213287	2014-07-17 18:10:09 +00:00
Justin Holewinski	e5a1173f67	[NVPTX] Add missing .v4 qualifier on vector store instruction llvm-svn: 213276	2014-07-17 16:58:56 +00:00
Justin Holewinski	18cfe7d634	[NVPTX] Flag surface/texture query instructions with IsTexSurfQuery Also, add some tests to make sure we can handle surface/texture queries on both Fermi and Kepler+. llvm-svn: 213268	2014-07-17 14:51:33 +00:00
Justin Holewinski	9a2350e459	[NVPTX] Add more surface/texture intrinsics, including CUDA unified texture fetch This also uses TSFlags to mark machine instructions that are surface/texture accesses, as well as the vector width for surface operations. This is used to simplify some of the switch statements that need to detect surface/texture instructions llvm-svn: 213256	2014-07-17 11:59:04 +00:00
Justin Holewinski	ac451066f4	[NVPTX] Honor alignment on vector loads/stores We were not considering the stated alignment on vector loads/stores, leading us to generate vector instructions even when we do not have sufficient alignment. Now, for IR like: %1 = load <4 x float>, <4 x float>* %ptr, align 4 we will generate correct, conservative PTX like: ld.f32 ... [%ptr] ld.f32 ... [%ptr+4] ld.f32 ... [%ptr+8] ld.f32 ... [%ptr+12] Or if we have an alignment of 8 (for example), we can generate code like: ld.v2.f32 ... [%ptr] ld.v2.f32 ... [%ptr+8] llvm-svn: 213186	2014-07-16 19:45:35 +00:00
Justin Holewinski	3e037d98e6	[NVPTX] Rename registers %fl -> %fd and %rl -> %rd This matches the internal behavior of NVIDIA tools like libnvvm. llvm-svn: 213168	2014-07-16 16:26:58 +00:00
Justin Holewinski	a0d531f031	[NVPTX] Add reflect intrinsic (better than matching by function name) Also clean up some of the logic in NVVMReflect.cpp while we're messing around in there. llvm-svn: 211948	2014-06-27 18:36:11 +00:00
Justin Holewinski	2739c0175c	[NVPTX] Add 'b' asm constraint llvm-svn: 211946	2014-06-27 18:36:06 +00:00
Justin Holewinski	549c773619	[NVPTX] Error out if initializer is given for variable in an address space that does not support initialization llvm-svn: 211943	2014-06-27 18:36:01 +00:00
Justin Holewinski	773ca40f5d	[NVPTX] Add support for .managed variables for UVM llvm-svn: 211942	2014-06-27 18:35:58 +00:00
Justin Holewinski	d73767a80a	[NVPTX] Emit .weak linkage for link_once, weak, available_externally, and common linkage llvm-svn: 211941	2014-06-27 18:35:56 +00:00
Justin Holewinski	b926d9d446	[NVPTX] Fix handling of ldg/ldu intrinsics. The address space of the pointer must be global (1) for these intrinsics. There must also be alignment metadata attached to the intrinsic calls, e.g. %val = tail call i32 @llvm.nvvm.ldu.i.global.i32.p1i32(i32 addrspace(1)* %ptr), !align !0 !0 = metadata !{i32 4} llvm-svn: 211939	2014-06-27 18:35:51 +00:00
Justin Holewinski	6e40f63e41	[NVPTX] Clean up argument lowering code and properly handle alignment for structs and vectors llvm-svn: 211938	2014-06-27 18:35:44 +00:00
Justin Holewinski	360a5cfcd3	[NVPTX] Add support for [SHL,SRA,SRL]_PARTS llvm-svn: 211936	2014-06-27 18:35:40 +00:00

1 2 3

129 Commits