llvm-project

Commit Graph

Author	SHA1	Message	Date
Daniel Sanders	50f17235dd	Revert r247692: Replace Triple with a new TargetTuple in MCTargetDesc/* and related. NFC. Eric has replied and has demanded the patch be reverted. llvm-svn: 247702	2015-09-15 16:17:27 +00:00
Daniel Sanders	153010c52d	Re-commit r247683: Replace Triple with a new TargetTuple in MCTargetDesc/* and related. NFC. Summary: This is the first patch in the series to migrate Triple's (which are ambiguous) to TargetTuple's (which aren't). For the moment, TargetTuple simply passes all requests to the Triple object it holds. Once it has replaced Triple, it will start to implement the interface in a more suitable way. This change makes some changes to the public C++ API. In particular, InitMCSubtargetInfo(), createMCRelocationInfo(), and createMCSymbolizer() now take TargetTuples instead of Triples. The other public C++ API's have been left as-is for the moment to reduce patch size. This commit also contains a trivial patch to clang to account for the C++ API change. Thanks go to Pavel Labath for fixing LLDB for me. Reviewers: rengolin Subscribers: jyknight, dschuff, arsenm, rampitec, danalbert, srhines, javed.absar, dsanders, echristo, emaste, jholewinski, tberghammer, ted, jfb, llvm-commits, rengolin Differential Revision: http://reviews.llvm.org/D10969 llvm-svn: 247692	2015-09-15 14:08:28 +00:00
Daniel Sanders	c40de48041	Revert r247684 - Replace Triple with a new TargetTuple ... LLDB needs to be updated in the same commit. llvm-svn: 247686	2015-09-15 13:46:21 +00:00
Daniel Sanders	18d4b0dab7	Replace Triple with a new TargetTuple in MCTargetDesc/* and related. NFC. Summary: This is the first patch in the series to migrate Triple's (which are ambiguous) to TargetTuple's (which aren't). For the moment, TargetTuple simply passes all requests to the Triple object it holds. Once it has replaced Triple, it will start to implement the interface in a more suitable way. This change makes some changes to the public C++ API. In particular, InitMCSubtargetInfo(), createMCRelocationInfo(), and createMCSymbolizer() now take TargetTuples instead of Triples. The other public C++ API's have been left as-is for the moment to reduce patch size. This commit also contains a trivial patch to clang to account for the C++ API change. Reviewers: rengolin Subscribers: jyknight, dschuff, arsenm, rampitec, danalbert, srhines, javed.absar, dsanders, echristo, emaste, jholewinski, tberghammer, ted, jfb, llvm-commits, rengolin Differential Revision: http://reviews.llvm.org/D10969 llvm-svn: 247683	2015-09-15 13:17:40 +00:00
Daniel Sanders	c8cd6e95d2	Fix namespace indentation and missing blank lines before 'public:' in *MCAsmInfo.h. NFC. This is to reduce noise in a following commit. Also fixes a couple missing spaces before the reference operator. llvm-svn: 247679	2015-09-15 12:27:06 +00:00
Bruce Mitchener	e9ffb45b60	Fix typos. Summary: This fixes a variety of typos in docs, code and headers. Subscribers: jholewinski, sanjoy, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D12626 llvm-svn: 247495	2015-09-12 01:17:08 +00:00
Chandler Carruth	e4405e949f	[ADT] Switch a bunch of places in LLVM that were doing single-character splits to actually use the single character split routine which does less work, and in a debug build is substantially faster. llvm-svn: 247245	2015-09-10 06:12:31 +00:00
Artem Belevich	0127d80986	[NVPTX] Added run NVVMReflect pass to NVPTX back-end. The pass is needed to remove __nvvm_reflect calls when we link in libdevice bitcode that comes with CUDA. Differential Revision: http://reviews.llvm.org/D11663 llvm-svn: 247072	2015-09-08 21:04:55 +00:00
Bjarke Hammersholt Roune	6c64738e87	[NVPTX] Let NVPTX backend detect integer min and max patterns. Summary: Let NVPTX backend detect integer min and max patterns during isel and emit intrinsics that enable hardware support. Reviewers: jholewinski, meheff, jingyue Subscribers: arsenm, llvm-commits, meheff, jingyue, eliben, jholewinski Differential Revision: http://reviews.llvm.org/D12377 llvm-svn: 246107	2015-08-26 23:22:02 +00:00
Jingyue Wu	fcec09866a	[NVPTX] Allow undef value as global initializer Summary: __shared__ variable may now emit undef value as initializer, do not throw error on that. Test Plan: test/CodeGen/NVPTX/global-addrspace.ll Patch by Xuetian Weng Reviewers: jholewinski, tra, jingyue Subscribers: llvm-commits, jholewinski Differential Revision: http://reviews.llvm.org/D12242 llvm-svn: 245785	2015-08-22 05:40:26 +00:00
Jingyue Wu	ca3ef11a9b	[NVPTX] truncating 64-bit to 32-bit is free Summary: Add an LSR test that exercises isTruncateFree. Without this change, LSR creates another indvar representing the truncated value. Reviewers: jholewinski, eliben Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D12058 llvm-svn: 245611	2015-08-20 20:59:02 +00:00
Mark Heffernan	438ffe5eac	Use 32-bit divides instead of 64-bit divides where possible. For NVPTX, try to use 32-bit division instead of 64-bit division when the dividend and divisor fit in 32 bits. This speeds up some internal benchmarks significantly. The underlying reason is that many index computations are carried out in 64-bits but never actually exceed the capacity of a 32-bit word. llvm-svn: 244684	2015-08-11 22:16:34 +00:00
Benjamin Kramer	df005cbe19	Fix some comment typos. llvm-svn: 244402	2015-08-08 18:27:36 +00:00
Bjarke Hammersholt Roune	5cbc7d2999	[NVPTX] Use LDG for pointer induction variables. More specifically, make NVPTXISelDAGToDAG able to emit cached loads (LDG) for pointer induction variables. Also fix latent bug where LDG was not restricted to kernel functions. I believe that this could not be triggered so far since we do not currently infer that a pointer is global outside a kernel function, and only loads of global pointers are considered for cached loads. llvm-svn: 244166	2015-08-05 23:11:57 +00:00
Chandler Carruth	93205eb966	[TTI] Make the cost APIs in TargetTransformInfo consistently use 'int' rather than 'unsigned' for their costs. For something like costs in particular there is a natural "negative" value, that of savings or saved cost. As a consequence, there is a lot of code that subtracts or creates negative values based on cost, all of which is prone to awkwardness or bugs when dealing with an unsigned type. Similarly, we never want these values to wrap, as that would cause Very Bad code generation (likely percieved as an infinite loop as we try to emit over 2^32 instructions or some such insanity). All around 'int' seems a much better fit for these basic metrics. I've added asserts to ensure that at least the TTI interface never returns negative numbers here. If we ever have a use case for negative numbers, we can remove this, but this way a bug where someone used '-1' to produce a 'very large' cost will be caught by the assert. This passes all tests, and is also UBSan clean. No functional change intended. Differential Revision: http://reviews.llvm.org/D11741 llvm-svn: 244080	2015-08-05 18:08:10 +00:00
Craig Topper	e3dcce9700	De-constify pointers to Type since they can't be modified. NFC This was already done in most places a while ago. This just fixes the ones that crept in over time. llvm-svn: 243842	2015-08-01 22:20:21 +00:00
Jingyue Wu	ffa09be222	[NVPTX] allow register copy between float and int Summary: Fixes PR24303. With Bruno's WIP (D11197) on PeepholeOptimizer, across-class register copying (e.g. i32 to f32) becomes possible. Enhance NVPTXInstrInfo::copyPhysReg to handle these cases. Reviewers: jholewinski Subscribers: eliben, jholewinski, llvm-commits, bruno Differential Revision: http://reviews.llvm.org/D11622 llvm-svn: 243839	2015-08-01 18:02:12 +00:00
Jingyue Wu	cf70053b20	[NVPTX] convert pointers in byval kernel arguments to global Summary: For example, in struct S { int x; int y; }; __global__ void foo(S s) { int *b = s.y; // use b } "b" is guaranteed to point to global. NVPTX should emit ld.global/st.global for accessing "b". Reviewers: jholewinski Subscribers: llvm-commits, jholewinski Differential Revision: http://reviews.llvm.org/D11505 llvm-svn: 243790	2015-07-31 21:44:14 +00:00
Jingyue Wu	4be014aebe	Refactor: Simplify boolean conditional return statements in lib/Target/NVPTX Summary: Use clang-tidy to simplify boolean conditional return statements Reviewers: rafael, echristo, chandlerc, bkramer, craig.topper, dexonsmith, chapuni, eliben, jingyue, jholewinski Subscribers: llvm-commits, jholewinski Differential Revision: http://reviews.llvm.org/D9983 llvm-svn: 243734	2015-07-31 05:09:47 +00:00
Jingyue Wu	3a04dc6e78	Roll forward r242871 r242871 missed one place that should be guarded with isPhysicalReg. This patch fixes that. llvm-svn: 243555	2015-07-29 18:59:09 +00:00
Jingyue Wu	7ec38530a5	Temporarily revert r242871 PR24299 llvm-svn: 243522	2015-07-29 15:26:11 +00:00
Jingyue Wu	6a3fdeca22	[NVPTX] run LSR before straight-line optimizations Summary: Straight-line optimizations can simplify the loop body and make LSR's cost analysis more precise. This significantly improves several Eigen3 CUDA benchmarks. With this change, EigenContractionKernel runs up to 40% faster (`753ceee5f2/unsupported/Eigen/CXX11/src/Tensor/TensorContractionCuda.h`?at=default#cl-502). EigenConvolutionKernel2D runs up to 10% faster (`753ceee5f2/unsupported/Eigen/CXX11/src/Tensor/TensorConvolution.h`?at=default#cl-605). I have some difficulties writing small tests that benefit from this reordering due to a seemingly issue with LSR (being discussed at http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-July/088244.html). See the review thread for the compilation time impact of GVN. Reviewers: eliben, jholewinski Subscribers: llvm-commits, jholewinski Differential Revision: http://reviews.llvm.org/D11304 llvm-svn: 242982	2015-07-23 04:59:07 +00:00
Jingyue Wu	20d73c6cc0	[BranchFolding] do not iterate the aliases of virtual registers Summary: MCRegAliasIterator only works for physical registers. So, do not run it on virtual registers. With this issue fixed, we can resurrect the BranchFolding pass in NVPTX backend. Reviewers: jholewinski, bkramer Subscribers: henryhu, meheff, llvm-commits, jholewinski Differential Revision: http://reviews.llvm.org/D11174 llvm-svn: 242871	2015-07-22 04:16:52 +00:00
Jingyue Wu	48a9bdc6aa	[NVPTX] make load on global readonly memory to use ldg Summary: [NVPTX] make load on global readonly memory to use ldg Summary: As describe in [1], ld.global.nc may be used to load memory by nvcc when __restrict__ is used and compiler can detect whether read-only data cache is safe to use. This patch will try to check whether ldg is safe to use and use them to replace ld.global when possible. This change can improve the performance by 18~29% on affected kernels (ratt_kernel and rwdot_kernel) in S3D benchmark of shoc [2]. Patched by Xuetian Weng. [1] http://docs.nvidia.com/cuda/kepler-tuning-guide/#read-only-data-cache [2] https://github.com/vetter/shoc Test Plan: test/CodeGen/NVPTX/load-with-non-coherent-cache.ll Reviewers: jholewinski, jingyue Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D11314 llvm-svn: 242713	2015-07-20 21:28:54 +00:00
Eli Bendersky	b09cfb51ca	Use inbounds GEPs for memcpy and memset lowering Follow-up on discussion in http://reviews.llvm.org/D11220 llvm-svn: 242542	2015-07-17 16:42:33 +00:00
Eli Bendersky	f871e09d2d	Streamline the coding style in NVPTXLowerAggrCopies Make the style consistent with LLVM style throughout and clang-format. llvm-svn: 242439	2015-07-16 20:42:38 +00:00
Jingyue Wu	e7981cee24	[NVPTX] enable SpeculativeExecution in NVPTX Summary: SpeculativeExecution enables a series straight line optimizations (such as SLSR and NaryReassociate) on conditional code. For example, if (...) ... b * s ... if (...) ... (b + 1) * s ... speculative execution can hoist b * s and (b + 1) * s from then-blocks, so that we have ... b * s ... if (...) ... ... (b + 1) * s ... if (...) ... Then, SLSR can rewrite (b + 1) * s to (b * s + s) because after speculative execution b * s dominates (b + 1) * s. The performance impact of this change is significant. It speeds up the benchmarks running EigenFloatContractionKernelInternal16x16 (`ba68f42fa6/unsupported/Eigen/CXX11/src/Tensor/TensorContractionCuda.h`?at=default#cl-526) by roughly 2%. Some internal benchmarks that have the above code pattern are improved by up to 40%. No significant slowdowns are observed on Eigen CUDA microbenchmarks. Reviewers: jholewinski, broune, eliben Subscribers: llvm-commits, jholewinski Differential Revision: http://reviews.llvm.org/D11201 llvm-svn: 242437	2015-07-16 20:13:48 +00:00
Benjamin Kramer	5dfd8b6da0	[NVPTX] Don't leak dead instructions after unlinking them from the BasicBlock llvm-svn: 242417	2015-07-16 16:51:48 +00:00
Eli Bendersky	f14af16219	Correct lowering of memmove in NVPTX This fixes https://llvm.org/bugs/show_bug.cgi?id=24056 Also a bit of refactoring along the way. Differential Revision: http://reviews.llvm.org/D11220 llvm-svn: 242413	2015-07-16 16:27:19 +00:00
Mehdi Amini	bd7287ebe5	Move most user of TargetMachine::getDataLayout to the Module one Summary: This change is part of a series of commits dedicated to have a single DataLayout during compilation by using always the one owned by the module. This patch is quite boring overall, except for some uglyness in ASMPrinter which has a getDataLayout function but has some clients that use it without a Module (llmv-dsymutil, llvm-dwarfdump), so some methods are taking a DataLayout as parameter. Reviewers: echristo Subscribers: yaron.keren, rafael, llvm-commits, jholewinski Differential Revision: http://reviews.llvm.org/D11090 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 242386	2015-07-16 06:11:10 +00:00
Mehdi Amini	5c0fa58e91	Remove DataLayout from TargetLoweringObjectFile, redirect to Module Summary: This change is part of a series of commits dedicated to have a single DataLayout during compilation by using always the one owned by the module. Reviewers: echristo Subscribers: yaron.keren, rafael, llvm-commits, jholewinski Differential Revision: http://reviews.llvm.org/D11079 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 242385	2015-07-16 06:04:17 +00:00
Mark Heffernan	4c8ca53f7e	Enable partial and runtime loop unrolling for NVPTX. Enable partial and runtime loop unrolling for NVPTX backend via TTI::UnrollingPreferences with a small threshold. This partially unrolls small loops which are often unrolled by the PTX to SASS compiler and unrolling earlier can be beneficial. llvm-svn: 242049	2015-07-13 18:33:21 +00:00
Duncan P. N. Exon Smith	754e21f244	MC: Remove MCSubtargetInfo() default constructor Force all creators of `MCSubtargetInfo` to immediately initialize it, merging the default constructor and the initializer into an initializing constructor. Besides cleaning up the code a little, this makes it clear that the initializer is never called again later. Out-of-tree backends need a trivial change: instead of calling: auto *X = new MCSubtargetInfo(); InitXYZMCSubtargetInfo(X, ...); return X; they should call: return createXYZMCSubtargetInfoImpl(...); There's no real functionality change here. llvm-svn: 241957	2015-07-10 22:43:42 +00:00
Jingyue Wu	a277561922	[TTI] BasicTTIImpl assumes no vector registers Summary: Following the discussion on r241884, it's more reasonable to assume that a target has no vector registers by default instead of letting every such target overrides getNumberOfRegisters. Therefore, this patch modifies BasicTTIImpl::getNumberOfRegisters to return 0 when Vector is true, and partially reverts r241884 which modifies NVPTXTTIImpl::getNumberOfRegisters. It also fixes a performance bug in LoopVectorizer. Even if a target has no vector registers, vectorization may still help ILP. So, we need both checks to be false before disabling loop vectorization all together. Reviewers: hfinkel Subscribers: llvm-commits, jholewinski Differential Revision: http://reviews.llvm.org/D11108 llvm-svn: 241942	2015-07-10 21:14:54 +00:00
Eli Bendersky	5c0039a014	Actually support volatile memcpys in NVPTX lowering Differential Revision: http://reviews.llvm.org/D11091 llvm-svn: 241914	2015-07-10 15:40:33 +00:00
Jingyue Wu	ad85c8c204	[NVPTX] declare no vector registers Summary: Without this patch, LoopVectorizer in certain cases (see loop-vectorize.ll) produces code with complex control flow which hurts later optimizations. Since NVPTX doesn't have vector registers in LLVM's sense (NVPTXTTI::getRegisterBitWidth(true) == 32), we for now declare no vector registers to effectively disable loop vectorization. Reviewers: jholewinski Subscribers: jingyue, llvm-commits, jholewinski Differential Revision: http://reviews.llvm.org/D11089 llvm-svn: 241884	2015-07-10 04:31:56 +00:00
Eli Bendersky	d880520bc2	Replace index-loops by range-based loops NFC llvm-svn: 241875	2015-07-09 23:06:03 +00:00
Mehdi Amini	eaabc51e78	Re-instate the EVT parameter to getScalarShiftAmountTy() for OOT user A documentation for this function would be nice by the way. From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 241807	2015-07-09 15:12:23 +00:00
Mehdi Amini	157e5a6d10	Remove getDataLayout() from TargetSelectionDAGInfo (had no users) Summary: Remove empty subclass in the process. This change is part of a series of commits dedicated to have a single DataLayout during compilation by using always the one owned by the module. Reviewers: echristo Subscribers: jholewinski, llvm-commits, rafael, yaron.keren, ted Differential Revision: http://reviews.llvm.org/D11045 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 241780	2015-07-09 02:10:08 +00:00
Mehdi Amini	a749f2ad47	Remove getDataLayout() from TargetLowering Summary: This change is part of a series of commits dedicated to have a single DataLayout during compilation by using always the one owned by the module. Reviewers: echristo Subscribers: yaron.keren, rafael, llvm-commits, jholewinski Differential Revision: http://reviews.llvm.org/D11042 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 241779	2015-07-09 02:09:52 +00:00
Mehdi Amini	0cdec1e2ab	Make isLegalAddressingMode() taking DataLayout as an argument Summary: This change is part of a series of commits dedicated to have a single DataLayout during compilation by using always the one owned by the module. Reviewers: echristo Subscribers: jholewinski, llvm-commits, rafael, yaron.keren Differential Revision: http://reviews.llvm.org/D11040 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 241778	2015-07-09 02:09:40 +00:00
Mehdi Amini	9639d650bb	Make TargetLowering::getShiftAmountTy() taking DataLayout as an argument Summary: This change is part of a series of commits dedicated to have a single DataLayout during compilation by using always the one owned by the module. Reviewers: echristo Subscribers: jholewinski, llvm-commits, rafael, yaron.keren Differential Revision: http://reviews.llvm.org/D11037 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 241776	2015-07-09 02:09:20 +00:00
Mehdi Amini	44ede33a69	Make TargetLowering::getPointerTy() taking DataLayout as an argument Summary: This change is part of a series of commits dedicated to have a single DataLayout during compilation by using always the one owned by the module. Reviewers: echristo Subscribers: jholewinski, ted, yaron.keren, rafael, llvm-commits Differential Revision: http://reviews.llvm.org/D11028 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 241775	2015-07-09 02:09:04 +00:00
Mehdi Amini	5010ebf181	Make TargetTransformInfo keeping a reference to the Module DataLayout DataLayout is no longer optional. It was initialized with or without a DataLayout, and the DataLayout when supplied could have been the one from the TargetMachine. Summary: This change is part of a series of commits dedicated to have a single DataLayout during compilation by using always the one owned by the module. Reviewers: echristo Subscribers: jholewinski, llvm-commits, rafael, yaron.keren Differential Revision: http://reviews.llvm.org/D11021 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 241774	2015-07-09 02:08:42 +00:00
Mehdi Amini	56228dabfa	Redirect DataLayout from TargetMachine to Module in ComputeValueVTs() Summary: Avoid using the TargetMachine owned DataLayout and use the Module owned one instead. This requires passing the DataLayout up the stack to ComputeValueVTs(). This change is part of a series of commits dedicated to have a single DataLayout during compilation by using always the one owned by the module. Reviewers: echristo Subscribers: jholewinski, yaron.keren, rafael, llvm-commits Differential Revision: http://reviews.llvm.org/D11019 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 241773	2015-07-09 01:57:34 +00:00
Eli Bendersky	8e131f8cbc	Cosmetic cleanups - NFC Remove commented lines, trailing whitespace, etc. llvm-svn: 241687	2015-07-08 16:33:21 +00:00
Daniel Sanders	f423f5627c	Change the last few internal StringRef triples into Triple objects. Summary: This concludes the patch series to eliminate StringRef forms of GNU triples from the internals of LLVM that began in r239036. At this point, the StringRef-form of GNU Triples should only be used in the public API (including IR serialization) and a couple objects that directly interact with the API (most notably the Module class). The next step is to replace these Triple objects with the TargetTuple object that will represent our authoratative/unambiguous internal equivalent to GNU Triples. Reviewers: rengolin Subscribers: llvm-commits, jholewinski, ted, rengolin Differential Revision: http://reviews.llvm.org/D10962 llvm-svn: 241472	2015-07-06 16:56:07 +00:00
Benjamin Kramer	9bfb627a0e	[TargetLowering] StringRefize asm constraint getters. There is some functional change here because it changes target code from atoi(3) to StringRef::getAsInteger which has error checking. For valid constraints there should be no difference. llvm-svn: 241411	2015-07-05 19:29:18 +00:00
Jingyue Wu	a0a56601c0	[NVPTX] expand extload/truncstore for vectors of floats Summary: According to PTX ISA: For convenience, ld, st, and cvt instructions permit source and destination data operands to be wider than the instruction-type size, so that narrow values may be loaded, stored, and converted using regular-width registers. For example, 8-bit or 16-bit values may be held directly in 32-bit or 64-bit registers when being loaded, stored, or converted to other types and sizes. The operand type checking rules are relaxed for bit-size and integer (signed and unsigned) instruction types; floating-point instruction types still require that the operand type-size matches exactly, unless the operand is of bit-size type. So, the ISA does not support load with extending/store with truncatation for floating numbers. This is reflected in setting the loadext/truncstore actions to expand in the code for floating numbers, but vectors of floating numbers are not taken care of. As a result, loading a vector of floats followed by a fp_extend may be combined by DAGCombiner to a extload, and the extload may be lowered to NVPTXISD::LoadV2 with extending information. However, NVPTXISD::LoadV2 does not perform extending, and no extending instructions are inserted. Finally, PTX instructions with mismatched types are generated, like ld.v2.f32 {%fd3, %fd4}, [%rd2] This patch adds the correct actions for vectors of floats, so DAGCombiner would not create loads with extending, and correct code is generated. Patched by Gang Hu. Test Plan: Test case attached. Reviewers: jingyue Reviewed By: jingyue Subscribers: llvm-commits, jholewinski Differential Revision: http://reviews.llvm.org/D10876 llvm-svn: 241191	2015-07-01 21:32:42 +00:00
Jingyue Wu	77b5b385ee	[NVPTX] Move NVPTXPeephole after NVPTXPrologEpilogPass Summary: Offset of frame index is calculated by NVPTXPrologEpilogPass. Before that the correct offset of stack objects cannot be obtained, which leads to wrong offset if there are more than 2 frame objects. This patch move NVPTXPeephole after NVPTXPrologEpilogPass. Because the frame index is already replaced by %VRFrame in NVPTXPrologEpilogPass, we check VRFrame register instead, and try to remove the VRFrame if there is no usage after NVPTXPeephole pass. Patched by Xuetian Weng. Test Plan: Strengthened test/CodeGen/NVPTX/local-stack-frame.ll to check the offset calculation based on SP and SPL. Reviewers: jholewinski, jingyue Reviewed By: jingyue Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D10853 llvm-svn: 241185	2015-07-01 20:08:06 +00:00
Jingyue Wu	b374dc651c	[NVPTX] cleanups and refacotring in NVPTXFrameLowering.cpp Summary: NFC Test Plan: no regression Reviewers: wengxt Reviewed By: wengxt Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D10849 llvm-svn: 241118	2015-06-30 21:28:31 +00:00
Jingyue Wu	9fe08c4bb3	[NVPTX] Fix issue introduced in D10321 Summary: Really check if %SP is not used in other places, instead of checking only exact one non-dbg use. Patched by Xuetian Weng. Test Plan: @foo4 in test/CodeGen/NVPTX/local-stack-frame.ll, create a case that SP will appear twice. Reviewers: jholewinski, jingyue Reviewed By: jingyue Subscribers: llvm-commits, sfantao, jholewinski Differential Revision: http://reviews.llvm.org/D10844 llvm-svn: 241099	2015-06-30 18:59:19 +00:00
Samuel Antao	01ee64c2ea	Force relocation mode to be default, regardless of what is passed to the backend. llvm-svn: 241081	2015-06-30 17:18:00 +00:00
Jingyue Wu	3203818bf7	[NVPTX] noop when kernel pointers are already global Summary: Some front ends make kernel pointers global already. In that case, handlePointerParams does nothing. Test Plan: more tests in lower-kernel-ptr-arg.ll Reviewers: grosser Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D10779 llvm-svn: 240849	2015-06-26 22:35:43 +00:00
Jingyue Wu	9c71150bfb	Add NVPTXPeephole pass to reduce unnecessary address cast Summary: This patch first change the register that holds local address for stack frame to %SPL. Then the new NVPTXPeephole pass will try to scan the following pattern %vreg0<def> = LEA_ADDRi64 <fi#0>, 4 %vreg1<def> = cvta_to_local %vreg0 and transform it into %vreg1<def> = LEA_ADDRi64 %VRFrameLocal, 4 Patched by Xuetian Weng Test Plan: test/CodeGen/NVPTX/local-stack-frame.ll Reviewers: jholewinski, jingyue Reviewed By: jingyue Subscribers: eliben, jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D10549 llvm-svn: 240587	2015-06-24 20:20:16 +00:00
Rafael Espindola	c233f74e6e	Simplify the Mangler interface now that DataLayout is mandatory. We only need to pass in a DataLayout when mangling a raw string, not when constructing the mangler. llvm-svn: 240405	2015-06-23 13:59:29 +00:00
Alexander Kornienko	f00654e31b	Revert r240137 (Fixed/added namespace ending comments using clang-tidy. NFC) Apparently, the style needs to be agreed upon first. llvm-svn: 240390	2015-06-23 09:49:53 +00:00
Alexander Kornienko	70bc5f1398	Fixed/added namespace ending comments using clang-tidy. NFC The patch is generated using this command: tools/clang/tools/extra/clang-tidy/tool/run-clang-tidy.py -fix \ -checks=-,llvm-namespace-comment -header-filter='llvm/.\|clang/.*' \ llvm/lib/ Thanks to Eugene Kosov for the original patch! llvm-svn: 240137	2015-06-19 15:57:42 +00:00
Jingyue Wu	cd3afea451	Add NVPTXLowerAlloca pass to convert alloca'ed memory to local address Summary: This is done by first adding two additional instructions to convert the alloca returned address to local and convert it back to generic. Then replace all uses of alloca instruction with the converted generic address. Then we can rely NVPTXFavorNonGenericAddrSpace pass to combine the generic addresscast and the corresponding Load, Store, Bitcast, GEP Instruction together. Patched by Xuetian Weng (xweng@google.com). Test Plan: test/CodeGen/NVPTX/lower-alloca.ll Reviewers: jholewinski, jingyue Reviewed By: jingyue Subscribers: meheff, broune, eliben, jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D10483 llvm-svn: 239964	2015-06-17 22:31:02 +00:00
Daniel Sanders	c81f450f1a	Clean up redundant copies of Triple objects. NFC Summary: Reviewers: rengolin Reviewed By: rengolin Subscribers: llvm-commits, rengolin, jholewinski Differential Revision: http://reviews.llvm.org/D10382 llvm-svn: 239823	2015-06-16 15:44:21 +00:00
Daniel Sanders	3e5de88dac	Replace string GNU Triples with llvm::Triple in TargetMachine. NFC. Summary: For the moment, TargetMachine::getTargetTriple() still returns a StringRef. This continues the patch series to eliminate StringRef forms of GNU triples from the internals of LLVM that began in r239036. Reviewers: rengolin Reviewed By: rengolin Subscribers: ted, llvm-commits, rengolin, jholewinski Differential Revision: http://reviews.llvm.org/D10362 llvm-svn: 239554	2015-06-11 19:41:26 +00:00
Ahmed Bougacha	c88bf54366	[CodeGen] ArrayRef'ize cond/pred in various TII APIs. NFC. llvm-svn: 239553	2015-06-11 19:30:37 +00:00
Pete Cooper	7cbe58d3c5	Remove MachineModuleInfo::UsedFunctions as it has no users. It hasn't been used since r130964. This also removes MachineModuleInfo::isUsedFunction and MachineModuleInfo::AnalyzeModule, both of which were only there to support UsedFunctions. llvm-svn: 239501	2015-06-11 01:04:56 +00:00
Daniel Sanders	a73f1fdb19	Replace string GNU Triples with llvm::Triple in MCSubtargetInfo and create*MCSubtargetInfo(). NFC. Summary: This continues the patch series to eliminate StringRef forms of GNU triples from the internals of LLVM that began in r239036. Reviewers: rafael Reviewed By: rafael Subscribers: rafael, ted, jfb, llvm-commits, rengolin, jholewinski Differential Revision: http://reviews.llvm.org/D10311 llvm-svn: 239467	2015-06-10 12:11:26 +00:00
Jingyue Wu	75589ffcc2	[NVPTX] fix a crash bug in NVPTXFavorNonGenericAddrSpaces Summary: We used to assume V->RAUW only modifies the operand list of V's user. However, if V and V's user are Constants, RAUW may replace and invalidate V's user entirely. This patch fixes the above issue by letting the caller replace the operand instead of calling RAUW on Constants. Test Plan: @nested_const_expr and @rauw in access-non-generic.ll Reviewers: broune, jholewinski Reviewed By: broune, jholewinski Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D10345 llvm-svn: 239435	2015-06-09 21:50:32 +00:00
Samuel Antao	cd50135a29	The constant initialization for globals in NVPTX is generated as an array of bytes. The generation of this byte arrays was expecting the host to be little endian, which prevents big endian hosts to be used in the generation of the PTX code. This patch fixes the problem by changing the way the bytes are extracted so that it works for either little and big endian. llvm-svn: 239412	2015-06-09 16:29:34 +00:00
Daniel Sanders	329fc9b68a	[nvptx] Only support the 'm' inline assembly memory constraint. NFC. Summary: NVPTX doesn't seem to support any additional constraints. Therefore remove the target hook. No functional change intended. Reviewers: jholewinski Reviewed By: jholewinski Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D8209 llvm-svn: 239395	2015-06-09 10:34:05 +00:00
Matt Arsenault	8b643559d4	MC: Add target hook to control symbol quoting llvm-svn: 239370	2015-06-09 00:31:39 +00:00
Jingyue Wu	2e4d1dd0ed	[NVPTX] run SROA after NVPTXFavorNonGenericAddrSpaces Summary: This cleans up most allocas NVPTXLowerKernelArgs emits for byval parameters. Test Plan: makes bug21465.ll more stronger to verify no redundant local load/store. Reviewers: eliben, jholewinski Reviewed By: eliben, jholewinski Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D10322 llvm-svn: 239368	2015-06-09 00:05:56 +00:00
Jingyue Wu	a2f6027a31	[NVPTX] roll forward r239082 NVPTXISelDAGToDAG translates "addrspacecast to param" to NVPTX::nvvm_ptr_gen_to_param Added an llc test in bug21465. llvm-svn: 239100	2015-06-04 21:28:26 +00:00
Jingyue Wu	b8f38668d5	Revert r239082 llc crashed for NVPTX backend llvm-svn: 239094	2015-06-04 21:07:08 +00:00
Jingyue Wu	f3a8079b75	[NVPTX] kernel pointer arguments point to the global address space Summary: With this patch, NVPTXLowerKernelArgs converts a kernel pointer argument to a pointer in the global address space. This change, along with NVPTXFavorNonGenericAddrSpaces, allows the NVPTX backend to emit ld.global.* and st.global.* for accessing kernel pointer arguments. Minor changes: 1. refactor: extract function convertToPointerInAddrSpace 2. fix a bug in the test case in bug21465.ll Test Plan: lower-kernel-ptr-arg.ll Reviewers: eliben, meheff, jholewinski Reviewed By: jholewinski Subscribers: wengxt, jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D10154 llvm-svn: 239082	2015-06-04 20:19:38 +00:00
Daniel Sanders	7813ae879e	Replace string GNU Triples with llvm::Triple in MCAsmInfo subclasses and create*AsmInfo(). NFC. Summary: This is the first of several patches to eliminate StringRef forms of GNU triples from the internals of LLVM. After this is complete, GNU triples will be replaced by a more authoratitive representation in the form of an LLVM TargetTuple. Reviewers: rengolin Reviewed By: rengolin Subscribers: ted, llvm-commits, rengolin, jholewinski Differential Revision: http://reviews.llvm.org/D10236 llvm-svn: 239036	2015-06-04 13:12:25 +00:00
Benjamin Kramer	db220dbf02	Push constness through LoopInfo::isLoopHeader and clean it up a bit. NFC. llvm-svn: 238843	2015-06-02 15:28:27 +00:00
Matt Arsenault	bd7d80a4a6	Add address space argument to isLegalAddressingMode This is important because of different addressing modes depending on the address space for GPU targets. This only adds the argument, and does not update any of the uses to provide the correct address space. llvm-svn: 238723	2015-06-01 05:31:59 +00:00
Jim Grosbach	13760bd152	MC: Clean up MCExpr naming. NFC. llvm-svn: 238634	2015-05-30 01:25:56 +00:00
Jingyue Wu	995dde2799	[NVPTXFavorNonGenericAddrSpaces] recursively trace into GEP and BitCast Summary: This patch allows NVPTXFavorNonGenericAddrSpaces to remove addrspacecast from longer chains consisting of GEPs and BitCasts. For example, it can now optimize %0 = addrspacecast [10 x float] addrspace(3)* @a to [10 x float]* %1 = gep [10 x float]* %0, i64 0, i64 %i %2 = bitcast float* %1 to i32* %3 = load i32* %2 ; emits ld.u32 to %0 = gep [10 x float] addrspace(3)* @a, i64 0, i64 %i %1 = bitcast float addrspace(3)* %0 to i32 addrspace(3)* %3 = load i32 addrspace(3)* %1 ; emits ld.shared.f32 Test Plan: @ld_int_from_global_float in access-non-generic.ll Reviewers: broune, eliben, jholewinski, meheff Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D10074 llvm-svn: 238574	2015-05-29 17:00:27 +00:00
Benjamin Kramer	dba7ee90b5	Don't call utostr in Twine/raw_ostream contexts. Creating temporary std::strings there is unnecessary. llvm-svn: 238412	2015-05-28 11:24:24 +00:00
Jingyue Wu	c2a014697a	[NaryReassociate] Run EarlyCSE after NaryReassociate Summary: This patch made two improvements to NaryReassociate and the NVPTX pipeline 1. Run EarlyCSE/GVN after NaryReassociate to get rid of redundant common expressions. 2. When adding an instruction to SeenExprs, maps both the SCEV before and after reassociation to that instruction. Test Plan: updated @reassociate_gep_nsw in nary-gep.ll Reviewers: meheff, broune Reviewed By: broune Subscribers: dberlin, jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D9947 llvm-svn: 238396	2015-05-28 04:56:52 +00:00
Rafael Espindola	0709a7bd1a	Move alignment from MCSectionData to MCSection. This starts merging MCSection and MCSectionData. There are a few issues with the current split between MCSection and MCSectionData. * It optimizes the the not as important case. We want the production of .o files to be really fast, but the split puts the information used for .o emission in a separate data structure. * The ELF/COFF/MachO hierarchy is not represented in MCSectionData, leading to some ad-hoc ways to represent the various flags. * It makes it harder to remember where each item is. The attached patch starts merging the two by moving the alignment from MCSectionData to MCSection. Most of the patch is actually just dropping 'const', since MCSectionData is mutable, but MCSection was not. llvm-svn: 237936	2015-05-21 19:20:38 +00:00
Jim Grosbach	6f482000e9	MC: Clean up method names in MCContext. The naming was a mish-mash of old and new style. Update to be consistent with the new. NFC. llvm-svn: 237594	2015-05-18 18:43:14 +00:00
Pete Cooper	81902a3ae4	Remove MCAssembler.h include from MCStreamer.h and fix users of MCStreamer.h llvm-svn: 237483	2015-05-15 22:19:42 +00:00
Pete Cooper	3de83e4098	Remove 3 includes from MCInstrDesc.h and explicitly include them where needed llvm-svn: 237481	2015-05-15 21:58:42 +00:00
Jim Grosbach	4c98cf77d9	MC: MCCodeGenInfo naming update. NFC. s/InitMCCodeGenInfo/initMCCodeGenInfo/ llvm-svn: 237471	2015-05-15 19:13:31 +00:00
Jim Grosbach	e9119e41ef	MC: Modernize MCOperand API naming. NFC. MCOperand::Create() methods renamed to MCOperand::create(). llvm-svn: 237275	2015-05-13 18:37:00 +00:00
Matthias Braun	d04893fa36	Change getTargetNodeName() to produce compiler warnings for missing cases, fix them llvm-svn: 236775	2015-05-07 21:33:59 +00:00
Quentin Colombet	61b305edfd	[ShrinkWrap] Add (a simplified version) of shrink-wrapping. This patch introduces a new pass that computes the safe point to insert the prologue and epilogue of the function. The interest is to find safe points that are cheaper than the entry and exits blocks. As an example and to avoid regressions to be introduce, this patch also implements the required bits to enable the shrink-wrapping pass for AArch64. Context Currently we insert the prologue and epilogue of the method/function in the entry and exits blocks. Although this is correct, we can do a better job when those are not immediately required and insert them at less frequently executed places. The job of the shrink-wrapping pass is to identify such places. Motivating example Let us consider the following function that perform a call only in one branch of a if: define i32 @f(i32 %a, i32 %b) { %tmp = alloca i32, align 4 %tmp2 = icmp slt i32 %a, %b br i1 %tmp2, label %true, label %false true: store i32 %a, i32* %tmp, align 4 %tmp4 = call i32 @doSomething(i32 0, i32* %tmp) br label %false false: %tmp.0 = phi i32 [ %tmp4, %true ], [ %a, %0 ] ret i32 %tmp.0 } On AArch64 this code generates (removing the cfi directives to ease readabilities): _f: ; @f ; BB#0: stp x29, x30, [sp, #-16]! mov x29, sp sub sp, sp, #16 ; =16 cmp w0, w1 b.ge LBB0_2 ; BB#1: ; %true stur w0, [x29, #-4] sub x1, x29, #4 ; =4 mov w0, wzr bl _doSomething LBB0_2: ; %false mov sp, x29 ldp x29, x30, [sp], #16 ret With shrink-wrapping we could generate: _f: ; @f ; BB#0: cmp w0, w1 b.ge LBB0_2 ; BB#1: ; %true stp x29, x30, [sp, #-16]! mov x29, sp sub sp, sp, #16 ; =16 stur w0, [x29, #-4] sub x1, x29, #4 ; =4 mov w0, wzr bl _doSomething add sp, x29, #16 ; =16 ldp x29, x30, [sp], #16 LBB0_2: ; %false ret Therefore, we would pay the overhead of setting up/destroying the frame only if we actually do the call. Proposed Solution This patch introduces a new machine pass that perform the shrink-wrapping analysis (See the comments at the beginning of ShrinkWrap.cpp for more details). It then stores the safe save and restore point into the MachineFrameInfo attached to the MachineFunction. This information is then used by the PrologEpilogInserter (PEI) to place the related code at the right place. This pass runs right before the PEI. Unlike the original paper of Chow from PLDI’88, this implementation of shrink-wrapping does not use expensive data-flow analysis and does not need hack to properly avoid frequently executed point. Instead, it relies on dominance and loop properties. The pass is off by default and each target can opt-in by setting the EnableShrinkWrap boolean to true in their derived class of TargetPassConfig. This setting can also be overwritten on the command line by using -enable-shrink-wrap. Before you try out the pass for your target, make sure you properly fix your emitProlog/emitEpilog/adjustForXXX method to cope with basic blocks that are not necessarily the entry block. Design Decisions 1. ShrinkWrap is its own pass right now. It could frankly be merged into PEI but for debugging and clarity I thought it was best to have its own file. 2. Right now, we only support one save point and one restore point. At some point we can expand this to several save point and restore point, the impacted component would then be: - The pass itself: New algorithm needed. - MachineFrameInfo: Hold a list or set of Save/Restore point instead of one pointer. - PEI: Should loop over the save point and restore point. Anyhow, at least for this first iteration, I do not believe this is interesting to support the complex cases. We should revisit that when we motivating examples. Differential Revision: http://reviews.llvm.org/D9210 <rdar://problem/3201744> llvm-svn: 236507	2015-05-05 17:38:16 +00:00
Duncan P. N. Exon Smith	a9308c49ef	IR: Give 'DI' prefix to debug info metadata Finish off PR23080 by renaming the debug info IR constructs from `MD` to `DI`. The last of the `DIDescriptor` classes were deleted in r235356, and the last of the related typedefs removed in r235413, so this has all baked for about a week. Note: If you have out-of-tree code (like a frontend), I recommend that you get everything compiling and tests passing with the previous commit before updating to this one. It'll be easier to keep track of what code is using the `DIDescriptor` hierarchy and what you've already updated, and I think you're extremely unlikely to insert bugs. YMMV of course. Back to this commit: I did this using the rename-md-di-nodes.sh upgrade script I've attached to PR23080 (both code and testcases) and filtered through clang-format-diff.py. I edited the tests for test/Assembler/invalid-generic-debug-node-*.ll by hand since the columns were off-by-three. It should work on your out-of-tree testcases (and code, if you've followed the advice in the previous paragraph). Some of the tests are in badly named files now (e.g., test/Assembler/invalid-mdcompositetype-missing-tag.ll should be 'dicompositetype'); I'll come back and move the files in a follow-up commit. llvm-svn: 236120	2015-04-29 16:38:44 +00:00
Eric Christopher	f4bf3779d8	Fix a [-Werror,-Winconsistent-missing-override] problem in the NVPTX overrides. llvm-svn: 236007	2015-04-28 18:06:27 +00:00
Justin Holewinski	3d2a976197	[NVPTX] Handle addrspacecast constant expressions in aggregate initializers We need to track if an AddrSpaceCast expression was seen when generating an MCExpr for a ConstantExpr. This change introduces a custom lowerConstant method to the NVPTX asm printer that will create NVPTXGenericMCSymbolRefExpr nodes at the appropriate places to encode the information that a given symbol needs to be casted to a generic address. llvm-svn: 236000	2015-04-28 17:18:30 +00:00
Sergey Dmitrouk	842a51bad8	Reapply r235977 "[DebugInfo] Add debug locations to constant SD nodes" [DebugInfo] Add debug locations to constant SD nodes This adds debug location to constant nodes of Selection DAG and updates all places that create constants to pass debug locations (see PR13269). Can't guarantee that all locations are correct, but in a lot of cases choice is obvious, so most of them should be. At least all tests pass. Tests for these changes do not cover everything, instead just check it for SDNodes, ARM and AArch64 where it's easy to get incorrect locations on constants. This is not complete fix as FastISel contains workaround for wrong debug locations, which drops locations from instructions on processing constants, but there isn't currently a way to use debug locations from constants there as llvm::Constant doesn't cache it (yet). Although this is a bit different issue, not directly related to these changes. Differential Revision: http://reviews.llvm.org/D9084 llvm-svn: 235989	2015-04-28 14:05:47 +00:00
Daniel Jasper	48e93f7181	Revert "[DebugInfo] Add debug locations to constant SD nodes" This breaks a test: http://bb.pgr.jp/builders/cmake-llvm-x86_64-linux/builds/23870 llvm-svn: 235987	2015-04-28 13:38:35 +00:00
Sergey Dmitrouk	adb4c69d5c	[DebugInfo] Add debug locations to constant SD nodes This adds debug location to constant nodes of Selection DAG and updates all places that create constants to pass debug locations (see PR13269). Can't guarantee that all locations are correct, but in a lot of cases choice is obvious, so most of them should be. At least all tests pass. Tests for these changes do not cover everything, instead just check it for SDNodes, ARM and AArch64 where it's easy to get incorrect locations on constants. This is not complete fix as FastISel contains workaround for wrong debug locations, which drops locations from instructions on processing constants, but there isn't currently a way to use debug locations from constants there as llvm::Constant doesn't cache it (yet). Although this is a bit different issue, not directly related to these changes. Differential Revision: http://reviews.llvm.org/D9084 llvm-svn: 235977	2015-04-28 11:56:37 +00:00
Lang Hames	9ff69c8f4d	[AsmPrinter] Make AsmPrinter's OutStreamer member a unique_ptr. AsmPrinter owns the OutStreamer, so an owning pointer makes sense here. Using a reference for this is crufty. llvm-svn: 235752	2015-04-24 19:11:51 +00:00
Jingyue Wu	72fca6c89b	Resurrect r235688 We should skip vector types which are not SCEVable. test/CodeGen/NVPTX/sched2.ll passes llvm-svn: 235695	2015-04-24 04:22:39 +00:00
Jingyue Wu	62af99b0db	Revert r235688 Seems breaking builds llvm-svn: 235690	2015-04-24 03:26:11 +00:00
Jingyue Wu	312fd0242d	[NVPTX] Emits "generic()" depending on the original address space Summary: Fixes a bug in the NVPTX codegen. The code used to miss necessary "generic()" on aggregates of addrspacecasts. Test Plan: addrspacecast-gvar.ll Reviewers: eliben, jholewinski Reviewed By: jholewinski Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D9130 llvm-svn: 235689	2015-04-24 02:57:30 +00:00
Jingyue Wu	3daace5295	[NVPTX] enable NaryReassociate in NVPTX Summary: We run NaryReassociate right after SLSR because SLSR enables many opportunities for NaryReassociate. For example, in nary-slsr.ll foo((a + b) + c); foo((a + b * 2) + c); foo((a + b * 3) + c); // 2 muls and 6 adds after SLSR: ab = a + b; foo(ab + c); ab2 = ab + b; foo(ab2 + c); ab3 = ab2 + b; foo(ab3 + c); // 6 adds after NaryReassociate: abc = (a + b) + c; foo(abc); ab2c = abc + b; foo(ab2c); ab3c = ab2c + b; foo(ab3c); // 4 adds Test Plan: nary-slsr.ll Reviewers: jholewinski, eliben Reviewed By: eliben Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D9066 llvm-svn: 235688	2015-04-24 02:54:06 +00:00
Jingyue Wu	3286ec1484	[NVPTX] run SeparateConstOffsetFromGEP before SLSR Summary: We pick this order because SeparateConstOffsetFromGEP may create more opportunities for SLSR. Test Plan: reassociate-geps-and-slsr.ll no performance regression on internal benchmarks Reviewers: meheff Subscribers: llvm-commits, jholewinski Differential Revision: http://reviews.llvm.org/D9230 llvm-svn: 235632	2015-04-23 20:00:04 +00:00
Jingyue Wu	66a161f05e	[NVPTX] do not run DCE after SLSR and SeparateConstOffsetFromGEP Summary: With D9096 and D9101, there's no need to run DCE after SLSR and SeparateConstOffsetFromGEP. Test Plan: no regression Reviewers: jholewinski, meheff Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D9172 llvm-svn: 235415	2015-04-21 20:47:15 +00:00
Duncan P. N. Exon Smith	b273d06b63	DebugInfo: Gut DIScope, DIEnumerator and DISubrange The only class the still has API left is `DIDescriptor` itself. llvm-svn: 235067	2015-04-16 01:37:00 +00:00
Duncan P. N. Exon Smith	35ef22cf53	DebugInfo: Gut DICompileUnit and DIFile Continuing gutting `DIDescriptor` subclasses; this edition, `DICompileUnit` and `DIFile`. In the name of PR23080. llvm-svn: 235055	2015-04-15 23:19:27 +00:00
Rafael Espindola	5560a4cfbd	Use raw_pwrite_stream in the object writer/streamer. The ELF object writer will take advantage of that in the next commit. llvm-svn: 234950	2015-04-14 22:14:34 +00:00
Duncan P. N. Exon Smith	537b4a8159	DebugInfo: Gut DISubprogram and DILexicalBlock* Gut the `DIDescriptor` wrappers around `MDLocalScope` subclasses. Note that `DILexicalBlock` wraps `MDLexicalBlockBase`, not `MDLexicalBlock`. llvm-svn: 234850	2015-04-14 03:40:37 +00:00
Benjamin Kramer	619c4e57ba	Reduce dyn_cast<> to isa<> or cast<> where possible. No functional change intended. llvm-svn: 234586	2015-04-10 11:24:51 +00:00
Jingyue Wu	5da831cc31	Divergence analysis for GPU programs Summary: Some optimizations such as jump threading and loop unswitching can negatively affect performance when applied to divergent branches. The divergence analysis added in this patch conservatively estimates which branches in a GPU program can diverge. This information can then help LLVM to run certain optimizations selectively. Test Plan: test/Analysis/DivergenceAnalysis/NVPTX/diverge.ll Reviewers: resistor, hfinkel, eliben, meheff, jholewinski Subscribers: broune, bjarke.roune, madhur13490, tstellarAMD, dberlin, echristo, jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D8576 llvm-svn: 234567	2015-04-10 05:03:50 +00:00
Duncan P. N. Exon Smith	e686f1591f	CodeGen: Stop using DIDescriptor::is*() and auto-casting Same as r234255, but for lib/CodeGen and lib/Target. llvm-svn: 234258	2015-04-06 23:27:40 +00:00
Rafael Espindola	8ca44f0b5c	Implement unique sections with an unique ID. This allows the compiler/assembly programmer to switch back to a section. This in turn fixes the bootstrap failure on powerpc (tested on gcc110) without changing the ppc codegen at all. I will try to cleanup the various getELFSection overloads in a followup patch. Just using a default argument now would lead to ambiguities. llvm-svn: 234099	2015-04-04 18:02:01 +00:00
David Blaikie	aa41cd57e0	[opaque pointer type] More GEP IRBuilder API migrations... llvm-svn: 234058	2015-04-03 21:33:42 +00:00
David Blaikie	93c5444fe0	[opaque pointer type] More GEP API migrations in IRBuilder uses The plan here is to push the API changes out from the common components (like Constant::getGetElementPtr and IRBuilder::CreateGEP related functions) and just update callers to either pass the type if it's obvious, or pass null. Do this with LoadInst as well and anything else that comes up, then to start porting specific uses to not pass null anymore - this may require some refactoring in each case. llvm-svn: 234042	2015-04-03 19:41:44 +00:00
David Blaikie	4a2e73b066	[opaque pointer type] API migration for GEP constant factories Require the pointee type to be passed explicitly and assert that it is correct. For now it's possible to pass nullptr here (and I've done so in a few places in this patch) but eventually that will be disallowed once all clients have been updated or removed. It'll be a long road to get all the way there... but if you have the cahnce to update your callers to pass the type explicitly without depending on a pointer's element type, that would be a good thing to do soon and a necessary thing to do eventually. llvm-svn: 233938	2015-04-02 18:55:32 +00:00
Eric Christopher	f8019408dc	Replace the MCSubtargetInfo parameter with a Triple when creating an MCInstPrinter. Update all callers and use where we wanted a Triple previously. llvm-svn: 233648	2015-03-31 00:10:04 +00:00
Eric Christopher	48e697f7fe	Remove unused MCSubtargetInfo argument from the NVPTX MCInstPrinter ctors. llvm-svn: 233611	2015-03-30 22:03:16 +00:00
Eric Christopher	c7c5592b7e	Remove unused Target argument from MCInstPrinter ctor functions. llvm-svn: 233607	2015-03-30 21:52:21 +00:00
Justin Holewinski	0afca26e90	[NVPTX] Associate a minimum PTX version for each SM architecture When a new SM architecture is introduced, it is only supported by the current PTX version and later. Make sure we are using at least the minimum PTX version for the target architecture. This also removes support for PTX ISA < 3.2. llvm-svn: 233583	2015-03-30 19:30:55 +00:00
Duncan P. N. Exon Smith	9dffcd04f7	CodeGen: Use the new DebugLoc API, NFC Update lib/CodeGen (and lib/Target) to use the new `DebugLoc` API. llvm-svn: 233582	2015-03-30 19:14:47 +00:00
Justin Holewinski	f94d5b5137	[NVPTX] Add options for PTX 4.1/4.2 and SM 3.2/3.7/5.2/5.3 llvm-svn: 233575	2015-03-30 18:12:50 +00:00
Yaron Keren	075759aadd	Remove more superfluous .str() and replace std::string concatenation with Twine. Following r233392, http://llvm.org/viewvc/llvm-project?rev=233392&view=rev. llvm-svn: 233555	2015-03-30 15:42:36 +00:00
Akira Hatanaka	fb2289cb1b	Delete MCInstPrinter::AvailableFeatures. All the ports have been fixed to read the feature bits from the subtarget passed to the print methods. Also, delete the call to setAvailableFeatures in the constructor of NVPTX's instprinter as the instprinter wasn't using the feature bits anywhere. llvm-svn: 233486	2015-03-28 21:07:24 +00:00
Akira Hatanaka	b46d0234a6	[MCInstPrinter] Enable MCInstPrinter to change its behavior based on the per-function subtarget. Currently, code-gen passes the default or generic subtarget to the constructors of MCInstPrinter subclasses (see LLVMTargetMachine::addPassesToEmitFile), which enables some targets (AArch64, ARM, and X86) to change their instprinter's behavior based on the subtarget feature bits. Since the backend can now use different subtargets for each function, instprinter has to be changed to use the per-function subtarget rather than the default subtarget. This patch takes the first step towards enabling instprinter to change its behavior based on the per-function subtarget. It adds a bit "PassSubtarget" to AsmWriter which tells table-gen to pass a reference to MCSubtargetInfo to the various print methods table-gen auto-generates. I will follow up with changes to instprinters of AArch64, ARM, and X86. llvm-svn: 233411	2015-03-27 20:36:02 +00:00
Rafael Espindola	b61beca40c	Close unique sections when switching away from them. It is not possible to switch back to unique secitons, so close them automatically when switching away. llvm-svn: 233380	2015-03-27 15:01:40 +00:00
Andrew Kaylor	5c73e1f85c	Disabling warnings for MSVC build to enable /W4 use. Differential Revision: http://reviews.llvm.org/D8572 llvm-svn: 233133	2015-03-24 23:37:10 +00:00
David Blaikie	156d46eda0	Opaque Pointer Types: GEP API migrations to specify the gep type explicitly The changes to InstCombine (& SCEV) do seem a bit silly - it doesn't make anything obviously better to have the caller access the pointers element type (the thing I'm trying to remove) than the GEP itself, but it's a helpful migration step. This will allow me to more obviously lock down GEP (& Load, etc) API usage, then fix all the code that accesses pointer element types except the places that need to be removed (most of the InstCombines) anyway - at which point I'll need to just remove all that code because it won't be meaningful anymore (there will be no pointer types, so no bitcasts to combine) SCEV looks like it'll need some restructuring - we'll have to do a bit more work for GEP canonicalization, since it'll depend on how it's used if we can even manage to canonicalize it to a non-ugly GEP. I guess we can do some fun stuff like voting (do 2 out of 3 load from the GEP with a certain type that gives a pretty GEP? Does every typed use of the GEP use either a specific type or a generic type (i8*, etc)?) llvm-svn: 233131	2015-03-24 23:34:31 +00:00
Benjamin Kramer	799003bf8c	Re-sort includes with sort-includes.py and insert raw_ostream.h where it's used. llvm-svn: 232998	2015-03-23 19:32:43 +00:00
Eli Bendersky	3e84019a39	Simplify boolean expressions with true and false using clang-tidy Patch by Richard (legalize@xmission.com) Differential Revision: http://reviews.llvm.org/D8521 llvm-svn: 232961	2015-03-23 16:26:23 +00:00
Eric Christopher	4d0f35a901	Remove the target independent TargetMachine::getSubtarget and TargetMachine::getSubtargetImpl routines. This keeps the target independent code free of bare subtarget calls while the remainder of the backends are migrated, or not if they don't wish to support per-function subtargets as would be needed for function multiversioning or LTO of disparate cpu subarchitecture types, e.g. clang -msse4.2 -c foo.c -emit-llvm -o foo.bc clang -c bar.c -emit-llvm -o bar.bc llvm-link foo.bc bar.bc -o baz.bc llc baz.bc and get appropriate code for what the command lines requested. llvm-svn: 232885	2015-03-21 04:22:23 +00:00
Eric Christopher	5c3dffc459	Simplify the query for a subtarget in the NVPTX pass manager. llvm-svn: 232876	2015-03-21 03:13:03 +00:00
Artem Belevich	9e8a039318	Add support for __nvvm_reflect changes in libdevice in CUDA-7.0 Summary: CUDA 7.0's libdevice uses slightly different IR to call __nvvm_reflect and that triggers an assertion in nvvm_reflect optimization pass. This change allows nvvm_reflect pass to deal with both old and new ways to pass an argument to __nvvm_reflect. Test Plan: ninja check-all Reviewers: eliben, echristo Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D8399 llvm-svn: 232732	2015-03-19 17:05:35 +00:00
Rafael Espindola	69244c3e78	two or more, use a for. llvm-svn: 232688	2015-03-18 23:15:49 +00:00
David Blaikie	9f380a3ca0	Fix uses of reserved identifiers starting with an underscore followed by an uppercase letter This covers essentially all of llvm's headers and libs. One or two weird cases I wasn't sure were worth/appropriate to fix. llvm-svn: 232394	2015-03-16 18:06:57 +00:00
Daniel Sanders	bf5b80f5f9	Make each target map all inline assembly memory constraints to InlineAsm::Constraint_m. NFC. Summary: This is instead of doing this in target independent code and is the last non-functional change before targets begin to distinguish between different memory constraints when selecting code for the ISD::INLINEASM node. Next, each target will individually move away from the idea that all memory constraints behave like 'm'. Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D8173 llvm-svn: 232373	2015-03-16 13:13:41 +00:00
David Blaikie	096b1da29d	[opaque pointer type] more gep API migration llvm-svn: 232274	2015-03-14 19:53:33 +00:00
Daniel Sanders	60f1db0525	Recommit r232027 with PR22883 fixed: Add infrastructure for support of multiple memory constraints. The operand flag word for ISD::INLINEASM nodes now contains a 15-bit memory constraint ID when the operand kind is Kind_Mem. This constraint ID is a numeric equivalent to the constraint code string and is converted with a target specific hook in TargetLowering. This patch maps all memory constraints to InlineAsm::Constraint_m so there is no functional change at this point. It just proves that using these previously unused bits in the encoding of the flag word doesn't break anything. The next patch will make each target preserve the current mapping of everything to Constraint_m for itself while changing the target independent implementation of the hook to return Constraint_Unknown appropriately. Each target will then be adapted in separate patches to use appropriate Constraint_* values. PR22883 was caused the matching operands copying the whole of the operand flags for the matched operand. This included the constraint id which needed to be replaced with the operand number. This has been fixed with a conversion function. Following on from this, matching operands also used the operand number as the constraint id. This has been fixed by looking up the matched operand and taking it from there. llvm-svn: 232165	2015-03-13 12:45:09 +00:00
Hal Finkel	e78e52ba9b	Revert "r232027 - Add infrastructure for support of multiple memory constraints" This (r232027) has caused PR22883; so it seems those bits might be used by something else after all. Reverting until we can figure out what else to do. Original commit message: The operand flag word for ISD::INLINEASM nodes now contains a 15-bit memory constraint ID when the operand kind is Kind_Mem. This constraint ID is a numeric equivalent to the constraint code string and is converted with a target specific hook in TargetLowering. This patch maps all memory constraints to InlineAsm::Constraint_m so there is no functional change at this point. It just proves that using these previously unused bits in the encoding of the flag word doesn't break anything. The next patch will make each target preserve the current mapping of everything to Constraint_m for itself while changing the target independent implementation of the hook to return Constraint_Unknown appropriately. Each target will then be adapted in separate patches to use appropriate Constraint_* values. llvm-svn: 232093	2015-03-12 20:09:39 +00:00
Daniel Sanders	41c072e63b	Add infrastructure for support of multiple memory constraints. Summary: The operand flag word for ISD::INLINEASM nodes now contains a 15-bit memory constraint ID when the operand kind is Kind_Mem. This constraint ID is a numeric equivalent to the constraint code string and is converted with a target specific hook in TargetLowering. This patch maps all memory constraints to InlineAsm::Constraint_m so there is no functional change at this point. It just proves that using these previously unused bits in the encoding of the flag word doesn't break anything. The next patch will make each target preserve the current mapping of everything to Constraint_m for itself while changing the target independent implementation of the hook to return Constraint_Unknown appropriately. Each target will then be adapted in separate patches to use appropriate Constraint_* values. Reviewers: hfinkel Reviewed By: hfinkel Subscribers: hfinkel, jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D8171 llvm-svn: 232027	2015-03-12 11:00:48 +00:00
Jingyue Wu	e8290f21b5	[NVPTXAsmPrinter] do not print .align on function headers Summary: PTX does not allow .align directives on function headers. Fixes PR21551. Test Plan: test/Codegen/NVPTX/function-align.ll Reviewers: eliben, jholewinski Reviewed By: eliben, jholewinski Subscribers: llvm-commits, eliben, jpienaar, jholewinski Differential Revision: http://reviews.llvm.org/D8274 llvm-svn: 232004	2015-03-12 01:50:30 +00:00
Mehdi Amini	93e1ea167e	Move the DataLayout to the generic TargetMachine, making it mandatory. Summary: I don't know why every singled backend had to redeclare its own DataLayout. There was a virtual getDataLayout() on the common base TargetMachine, the default implementation returned nullptr. It was not clear from this that we could assume at call site that a DataLayout will be available with each Target. Now getDataLayout() is no longer virtual and return a pointer to the DataLayout member of the common base TargetMachine. I plan to turn it into a reference in a future patch. The only backend that didn't have a DataLayout previsouly was the CPPBackend. It now initializes the default DataLayout. This commit is NFC for all the other backends. Test Plan: clang+llvm ninja check-all Reviewers: echristo Subscribers: jfb, jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D8243 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 231987	2015-03-12 00:07:24 +00:00
Eric Christopher	7af9528747	Have getCalleeSavedRegs take a non-null MachineFunction all the time. The target independent code was passing in one all the time and targets weren't checking validity before using. Update a few calls to pass in a MachineFunction where necessary. llvm-svn: 231970	2015-03-11 21:41:28 +00:00
Rafael Espindola	6b9998b3eb	Create symbols marking the start of a section earlier. This lets us pass the symbol to the constructor and avoid the mutable field. This also opens the way for outputting the symbol only when needed, instead of outputting them at the start of the file. llvm-svn: 231859	2015-03-10 22:00:25 +00:00
Benjamin Kramer	d58792f38b	Don't use LLVM_LIBRARY_VISIBILITY in cpp files. llvm-svn: 231831	2015-03-10 20:07:44 +00:00
Benjamin Kramer	414c0964c3	NVPTX: move NVPTXAllocaHoisting into the cpp file Also initialize without using static initialization. llvm-svn: 231822	2015-03-10 19:20:52 +00:00
Benjamin Kramer	42a31e1f5e	NVPTX: Remove copy of LLVMInitializeNVPTXAsmPrinter. If anyone is using this for some strange reason, LLVMInitializeNVPTXAsmPrinter does exactly the same thing and is what other LLVM tools are calling. llvm-svn: 231810	2015-03-10 18:19:24 +00:00
Rafael Espindola	760bf9520a	Remove incredibly confusing isBaseAddressKnownZero. When referring to a symbol in a dwarf section on ELF we should use .long foo instead of .long foo - .debug_something because ELF is unaware of the content of the sections and therefore needs relocations. This has nothing to do with optimizing a -0. llvm-svn: 231751	2015-03-10 04:11:52 +00:00
Rafael Espindola	fcc2821882	Use a better name for compile unit labels. They mark the start of a compile unit, so name them .Lcu_*. Using Section->getLabelBeginName() makes it looks like they mark the start of the section. While at it, switch to createTempSymbol to avoid collisions with labels created in inline assembly. Not sure if a "don't crash" test is worth it. With this getLabelBeginName is dead, delete it. llvm-svn: 231750	2015-03-10 03:58:36 +00:00
Mehdi Amini	a28d91d81b	DataLayout is mandatory, update the API to reflect it with references. Summary: Now that the DataLayout is a mandatory part of the module, let's start cleaning the codebase. This patch is a first attempt at doing that. This patch is not exactly NFC as for instance some places were passing a nullptr instead of the DataLayout, possibly just because there was a default value on the DataLayout argument to many functions in the API. Even though it is not purely NFC, there is no change in the validation. I turned as many pointer to DataLayout to references, this helped figuring out all the places where a nullptr could come up. I had initially a local version of this patch broken into over 30 independant, commits but some later commit were cleaning the API and touching part of the code modified in the previous commits, so it seemed cleaner without the intermediate state. Test Plan: Reviewers: echristo Subscribers: llvm-commits From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 231740	2015-03-10 02:37:25 +00:00
Reid Kleckner	294fa7aa9d	TableGen: Use 'enum : uint64_t' for feature flags to fix -Wmicrosoft clang-cl would warn that this value is not representable in 'int': enum { FeatureX = 1ULL << 31 }; All MS enums are 'ints' unless otherwise specified, so we have to use an explicit type. The AMDGPU target just hit 32 features, triggering this warning. Now that we have C++11 strong enum types, we can also eliminate the 'const uint64_t' codepath from tablegen and just use 'enum : uint64_t'. llvm-svn: 231697	2015-03-09 20:23:14 +00:00
Rafael Espindola	028f8b43e2	Delete dead code. NFC. llvm-svn: 231682	2015-03-09 18:48:29 +00:00
Benjamin Kramer	a52f696e6f	Move unreferenced passes into the cpp file NFC. llvm-svn: 231661	2015-03-09 15:50:58 +00:00
David Blaikie	dc3f01e9cf	Simplify expressions involving boolean constants with clang-tidy Patch by Richard (legalize at xmission dot com). Differential Revision: http://reviews.llvm.org/D8154 llvm-svn: 231617	2015-03-09 01:57:13 +00:00
Mehdi Amini	46a43556db	Make DataLayout Non-Optional in the Module Summary: DataLayout keeps the string used for its creation. As a side effect it is no longer needed in the Module. This is "almost" NFC, the string is no longer canonicalized, you can't rely on two "equals" DataLayout having the same string returned by getStringRepresentation(). Get rid of DataLayoutPass: the DataLayout is in the Module The DataLayout is "per-module", let's enforce this by not duplicating it more than necessary. One more step toward non-optionality of the DataLayout in the module. Make DataLayout Non-Optional in the Module Module->getDataLayout() will never returns nullptr anymore. Reviewers: echristo Subscribers: resistor, llvm-commits, jholewinski Differential Revision: http://reviews.llvm.org/D7992 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 231270	2015-03-04 18:43:29 +00:00
Paul Robinson	c583abc888	Remove useless .debug_macinfo section setup. llvm-svn: 231001	2015-03-02 19:52:42 +00:00
Benjamin Kramer	e86b0f186b	NVPTX: Remove dead code. Fun fact: This file was never referenced since the initial checkin of the NVPTX backend. llvm-svn: 230957	2015-03-02 13:16:28 +00:00
Benjamin Kramer	5fbfe2ffdc	Convert push_back loops into append calls. No functionality change intended. llvm-svn: 230849	2015-02-28 13:20:15 +00:00
Eric Christopher	11e4df73c8	getRegForInlineAsmConstraint wants to use TargetRegisterInfo for a lookup, pass that in rather than use a naked call to getSubtargetImpl. This involved passing down and around either a TargetMachine or TargetRegisterInfo. Update all callers/definitions around the targets and SelectionDAG. llvm-svn: 230699	2015-02-26 22:38:43 +00:00
Eric Christopher	23a3a7c871	Remove an argument-less call to getSubtargetImpl from TargetLoweringBase. This required plumbing a TargetRegisterInfo through computeRegisterProperties and into findRepresentativeClass which uses it for register class iteration. This required passing a subtarget into a few target specific initializations of TargetLowering. llvm-svn: 230583	2015-02-26 00:00:24 +00:00
Benjamin Kramer	ea68a944a1	Demote vectors to arrays. No functionality change. llvm-svn: 229861	2015-02-19 15:26:17 +00:00
Eric Christopher	ca929f2469	Avoid using a self-referential initializer and fix up uses. llvm-svn: 229790	2015-02-19 00:22:47 +00:00
Eric Christopher	02389e3886	Remove all use of is64bit off of NVPTXSubtarget and clean up code accordingly. This changes the constructors of a number of classes that don't need to know the subtarget's 64-bitness. llvm-svn: 229787	2015-02-19 00:08:27 +00:00
Eric Christopher	beffc4e84f	Remove all use of getDrvInterface off of NVPTXSubtarget and clean up code accordingly. Delete code that was checking for all cases of an enum. llvm-svn: 229786	2015-02-19 00:08:23 +00:00
Eric Christopher	6aad8b1801	Migrate the NVPTX backend asm printer to a per function subtarget. This involved moving two non-subtarget dependent features (64-bitness and the driver interface) to the NVPTX target machine and updating the uses (or migrating around the subtarget use for ease of review). Otherwise use the cached subtarget or create a default subtarget based on the TargetMachine cpu and feature string for the module level assembler emission. llvm-svn: 229785	2015-02-19 00:08:14 +00:00
Duncan P. N. Exon Smith	b5054333ec	NVPTX: Canonicalize access to function attributes, NFC Canonicalize access to function attributes to use the simpler API. getAttributes().getAttribute(AttributeSet::FunctionIndex, Kind) => getFnAttribute(Kind) getAttributes().hasAttribute(AttributeSet::FunctionIndex, Kind) => hasFnAttribute(Kind) llvm-svn: 229260	2015-02-14 15:35:43 +00:00
Chandler Carruth	30d69c2e36	[PM] Remove the old 'PassManager.h' header file at the top level of LLVM's include tree and the use of using declarations to hide the 'legacy' namespace for the old pass manager. This undoes the primary modules-hostile change I made to keep out-of-tree targets building. I sent an email inquiring about whether this would be reasonable to do at this phase and people seemed fine with it, so making it a reality. This should allow us to start bootstrapping with modules to a certain extent along with making it easier to mix and match headers in general. The updates to any code for users of LLVM are very mechanical. Switch from including "llvm/PassManager.h" to "llvm/IR/LegacyPassManager.h". Qualify the types which now produce compile errors with "legacy::". The most common ones are "PassManager", "PassManagerBase", and "FunctionPassManager". llvm-svn: 229094	2015-02-13 10:01:29 +00:00
Benjamin Kramer	5f6a907288	MathExtras: Bring Count(Trailing\|Leading)Ones and CountPopulation in line with countTrailingZeros Update all callers. llvm-svn: 228930	2015-02-12 15:35:40 +00:00
Jingyue Wu	d7966ff3b9	Add straight-line strength reduction to LLVM Summary: Straight-line strength reduction (SLSR) is implemented in GCC but not yet in LLVM. It has proven to effectively simplify statements derived from an unrolled loop, and can potentially benefit many other cases too. For example, LLVM unrolls #pragma unroll foo (int i = 0; i < 3; ++i) { sum += foo((b + i) * s); } into sum += foo(b * s); sum += foo((b + 1) * s); sum += foo((b + 2) * s); However, no optimizations yet reduce the internal redundancy of the three expressions: b * s (b + 1) * s (b + 2) * s With SLSR, LLVM can optimize these three expressions into: t1 = b * s t2 = t1 + s t3 = t2 + s This commit is only an initial step towards implementing a series of such optimizations. I will implement more (see TODO in the file commentary) in the near future. This optimization is enabled for the NVPTX backend for now. However, I am more than happy to push it to the standard optimization pipeline after more thorough performance tests. Test Plan: test/StraightLineStrengthReduce/slsr.ll Reviewers: eliben, HaoLiu, meheff, hfinkel, jholewinski, atrick Reviewed By: jholewinski, atrick Subscribers: karthikthecool, jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D7310 llvm-svn: 228016	2015-02-03 19:37:06 +00:00
Jingyue Wu	5bbcdaa8d9	Remove usernames from TODOs, NFC making the style consistent with the rest llvm-svn: 227991	2015-02-03 17:57:38 +00:00
Jingyue Wu	49a766e468	Resurrect the assertion removed by r227717 Summary: MSVC can compile "LoopID->getOperand(0) == LoopID" when LoopID is MDNode*. Test Plan: no regression Reviewers: mkuper Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D7327 llvm-svn: 227853	2015-02-02 20:41:11 +00:00
Chandler Carruth	c956ab6603	[multiversion] Switch the TTI queries from TargetMachine to Subtarget now that we have a correct and cached subtarget specific to the function. Also, finish providing a cached per-function subtarget in the core LLVMTargetMachine -- that layer hadn't switched over yet. The only use of the TargetMachine was to re-lookup a subtarget for a particular function to work around the fact that TTI was immutable. Now that it is per-function and we haved a cached subtarget, use it. This still leaves a few interfaces with real warts on them where we were passing Function objects through the TTI interface. I'll remove these and clean their usage up in subsequent commits now that this isn't necessary. llvm-svn: 227738	2015-02-01 14:22:17 +00:00
Chandler Carruth	c340ca839c	[multiversion] Remove the cached TargetMachine pointer from the intermediate TTI implementation template and instead query up to the derived class for both the TargetMachine and the TargetLowering. Most of the derived types had a TLI cached already and there is no need to store a less precisely typed target machine pointer. This will in turn make it much cleaner to look up the TLI via a per-function subtarget instead of the generic subtarget, and it will pave the way toward pulling the subtarget used for unroll preferences into the same form once we are always using the function to look up the correct subtarget. llvm-svn: 227737	2015-02-01 14:01:15 +00:00
Chandler Carruth	8b04c0d26a	[multiversion] Switch all of the targets over to use the TargetIRAnalysis access path directly rather than implementing getTTI. This even removes getTTI from the interface. It's more efficient for each target to just register a precise callback that creates their specific TTI. As part of this, all of the targets which are building their subtargets individually per-function now build their TTI instance with the function and thus look up the correct subtarget and cache it. NVPTX, R600, and XCore currently don't leverage this functionality, but its trivial for them to add it now. llvm-svn: 227735	2015-02-01 13:20:00 +00:00
Chandler Carruth	ee642690ea	[multiversion] Remove a false freedom to leave the TargetMachine pointer null. For some reason some of the original TTI code supported a null target machine. This seems to have been legacy, and I made matters worse when refactoring this code by spreading that pattern further through the various targets. The TargetMachine can't actually be null, and it doesn't make sense to support that use case. I've now consistently removed it and removed all of the code trying to cope with that situation. This is probably good, as several targets didn't cope with it being null despite the null default argument in their constructors. =] llvm-svn: 227734	2015-02-01 12:38:24 +00:00
Jingyue Wu	0220df0dfd	[NVPTX] Emit .pragma "nounroll" for loops marked with nounroll Summary: CUDA driver can unroll loops when jit-compiling PTX. To prevent CUDA driver from unrolling a loop marked with llvm.loop.unroll.disable is not unrolled by CUDA driver, we need to emit .pragma "nounroll" at the header of that loop. This patch also extracts getting unroll metadata from loop ID metadata into a shared helper function. Test Plan: test/CodeGen/NVPTX/nounroll.ll Reviewers: eliben, meheff, jholewinski Reviewed By: jholewinski Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D7041 llvm-svn: 227703	2015-02-01 02:27:45 +00:00
Chandler Carruth	d8b3e9a420	[PM] Remove a bunch of stale TTI creation method declarations. I nuked their definitions, but forgot to clean up all the declarations which are in different files. llvm-svn: 227698	2015-02-01 00:22:15 +00:00
Chandler Carruth	93dcdc47db	[PM] Switch the TargetMachine interface from accepting a pass manager base which it adds a single analysis pass to, to instead return the type erased TargetTransformInfo object constructed for that TargetMachine. This removes all of the pass variants for TTI. There is now a single TTI pass in the Analysis layer. All of the Analysis <-> Target communication is through the TTI's type erased interface itself. While the diff is large here, it is nothing more that code motion to make types available in a header file for use in a different source file within each target. I've tried to keep all the doxygen comments and file boilerplate in line with this move, but let me know if I missed anything. With this in place, the next step to making TTI work with the new pass manager is to introduce a really simple new-style analysis that produces a TTI object via a callback into this routine on the target machine. Once we have that, we'll have the building blocks necessary to accept a function argument as well. llvm-svn: 227685	2015-01-31 11:17:59 +00:00
Chandler Carruth	705b185f90	[PM] Change the core design of the TTI analysis to use a polymorphic type erased interface and a single analysis pass rather than an extremely complex analysis group. The end result is that the TTI analysis can contain a type erased implementation that supports the polymorphic TTI interface. We can build one from a target-specific implementation or from a dummy one in the IR. I've also factored all of the code into "mix-in"-able base classes, including CRTP base classes to facilitate calling back up to the most specialized form when delegating horizontally across the surface. These aren't as clean as I would like and I'm planning to work on cleaning some of this up, but I wanted to start by putting into the right form. There are a number of reasons for this change, and this particular design. The first and foremost reason is that an analysis group is complete overkill, and the chaining delegation strategy was so opaque, confusing, and high overhead that TTI was suffering greatly for it. Several of the TTI functions had failed to be implemented in all places because of the chaining-based delegation making there be no checking of this. A few other functions were implemented with incorrect delegation. The message to me was very clear working on this -- the delegation and analysis group structure was too confusing to be useful here. The other reason of course is that this is much more natural fit for the new pass manager. This will lay the ground work for a type-erased per-function info object that can look up the correct subtarget and even cache it. Yet another benefit is that this will significantly simplify the interaction of the pass managers and the TargetMachine. See the future work below. The downside of this change is that it is very, very verbose. I'm going to work to improve that, but it is somewhat an implementation necessity in C++ to do type erasure. =/ I discussed this design really extensively with Eric and Hal prior to going down this path, and afterward showed them the result. No one was really thrilled with it, but there doesn't seem to be a substantially better alternative. Using a base class and virtual method dispatch would make the code much shorter, but as discussed in the update to the programmer's manual and elsewhere, a polymorphic interface feels like the more principled approach even if this is perhaps the least compelling example of it. ;] Ultimately, there is still a lot more to be done here, but this was the huge chunk that I couldn't really split things out of because this was the interface change to TTI. I've tried to minimize all the other parts of this. The follow up work should include at least: 1) Improving the TargetMachine interface by having it directly return a TTI object. Because we have a non-pass object with value semantics and an internal type erasure mechanism, we can narrow the interface of the TargetMachine to just do what we need: build and return a TTI object that we can then insert into the pass pipeline. 2) Make the TTI object be fully specialized for a particular function. This will include splitting off a minimal form of it which is sufficient for the inliner and the old pass manager. 3) Add a new pass manager analysis which produces TTI objects from the target machine for each function. This may actually be done as part of #2 in order to use the new analysis to implement #2. 4) Work on narrowing the API between TTI and the targets so that it is easier to understand and less verbose to type erase. 5) Work on narrowing the API between TTI and its clients so that it is easier to understand and less verbose to forward. 6) Try to improve the CRTP-based delegation. I feel like this code is just a bit messy and exacerbating the complexity of implementing the TTI in each target. Many thanks to Eric and Hal for their help here. I ended up blocked on this somewhat more abruptly than I expected, and so I appreciate getting it sorted out very quickly. Differential Revision: http://reviews.llvm.org/D7293 llvm-svn: 227669	2015-01-31 03:43:40 +00:00
Eric Christopher	3084aff498	Migrate a bare getSubtarget call to query the MachineFunction for the target dependent one. llvm-svn: 227542	2015-01-30 01:50:09 +00:00
Eric Christopher	bef0a373b4	Migrate NVPTXISelLowering to take the subtarget that it's dependent upon as an argument and store/use that in the entire function. llvm-svn: 227541	2015-01-30 01:50:07 +00:00
Eric Christopher	9745b3aae0	Remove unused argument. llvm-svn: 227539	2015-01-30 01:41:01 +00:00
Eric Christopher	147bba2385	Migrate NVPTXISelDAGToDAG's getSubtarget to a runOnMachineFunction version. Update NVPTXInstrInfo accordingly. llvm-svn: 227538	2015-01-30 01:40:59 +00:00
Chandler Carruth	b81dfa6378	[LPM] Stop using the string based preservation API. It is an abomination. For starters, this API is incredibly slow. In order to lookup the name of a pass it must take a memory fence to acquire a pointer to the managed static pass registry, and then potentially acquire locks while it consults this registry for information about what passes exist by that name. This stops the world of LLVMs in your process no matter how little they cared about the result. To make this more joyful, you'll note that we are preserving many passes which do not exist any more, or are not even analyses which one might wish to have be preserved. This means we do all the work only to say "nope" with no error to the user. String-based APIs are a bad idea. String-based APIs that cannot produce any meaningful error are an even worse idea. =/ I have a patch that simply removes this API completely, but I'm hesitant to commit it as I don't really want to perniciously break out-of-tree users of the old pass manager. I'd rather they just have to migrate to the new one at some point. If others disagree and would like me to kill it with fire, just say the word. =] llvm-svn: 227294	2015-01-28 04:57:56 +00:00
Justin Holewinski	d4d2e9bd0e	[NVPTX] Generate a more optimal sequence for select of i1 Instead of creating a pattern like "(p && a) \|\| ((!p) && b)", just expand the i8 operands to i32 and perform the selp on them. Fixes PR22246 llvm-svn: 227123	2015-01-26 19:52:20 +00:00
Justin Holewinski	23df659e6d	[NVPTX] Handle floating-point conversion patterns that are not explicitly ordered or unordered Fixes PR22322 llvm-svn: 227117	2015-01-26 19:11:20 +00:00
Eric Christopher	8b7706517c	Move DataLayout back to the TargetMachine from TargetSubtargetInfo derived classes. Since global data alignment, layout, and mangling is often based on the DataLayout, move it to the TargetMachine. This ensures that global data is going to be layed out and mangled consistently if the subtarget changes on a per function basis. Prior to this all targets() have had subtarget dependent code moved out and onto the TargetMachine. One target hasn't been migrated as part of this change: R600. The R600 port has, as a subtarget feature, the size of pointers and this affects global data layout. I've currently hacked in a FIXME to enable progress, but the port needs to be updated to either pass the 64-bitness to the TargetMachine, or fix the DataLayout to avoid subtarget dependent features. llvm-svn: 227113	2015-01-26 19:03:15 +00:00
David Blaikie	9459832ebd	std::unique_ptrify the MCStreamer argument to createAsmPrinter llvm-svn: 226414	2015-01-18 20:29:04 +00:00
NAKAMURA Takumi	a50a89a081	Update libdeps in NVPTXCodeGen, since r225944. llvm-svn: 226055	2015-01-14 23:01:36 +00:00
Olivier Sallenave	c8d13bd370	Override the TLI callback enableAggressiveFMAFusion and return true. Indeed, fmul, fmadd and fadd nodes cost the same number of cycles, so we can enable more combining heuristics to produce more fmadd nodes. llvm-svn: 225984	2015-01-14 14:47:24 +00:00
Chandler Carruth	d9903888d9	[cleanup] Re-sort all the #include lines in LLVM using utils/sort_includes.py. I clearly haven't done this in a while, so more changed than usual. This even uncovered a missing include from the InstrProf library that I've added. No functionality changed here, just mechanical cleanup of the include order. llvm-svn: 225974	2015-01-14 11:23:27 +00:00
Duncan P. N. Exon Smith	9f6bddd4b2	NVPTX: Use MapMetadata() instead of custom/stale/untested logic Copy the `GVMap` over to a standard `ValueToValueMapTy` so that we can reuse the `MapMetadata()` logic. Unfortunately the `GVMap` can't just be replaced, since `MapMetadata()` likes to modify the map, but at least this will prevent NVPTX from bitrotting. llvm-svn: 225944	2015-01-14 05:14:30 +00:00
Duncan P. N. Exon Smith	f864ae2745	NVPTX: Remove bogus remap logic for global variable address spaces The comment is incorrect, and the code mangles debug info. Remove the bad logic, which wasn't tested anyway. llvm-svn: 225943	2015-01-14 05:13:18 +00:00
Ahmed Bougacha	2b6917b020	[SelectionDAG] Allow targets to specify legality of extloads' result type (in addition to the memory type). The LoadExt legalization handling used to only have one type, the memory type. This forced users to assume that as long as the extload for the memory type was declared legal, and the result type was legal, the whole extload was legal. However, this isn't always the case. For instance, on X86, with AVX, this is legal: v4i32 load, zext from v4i8 but this isn't: v4i64 load, zext from v4i8 Whereas v4i64 is (arguably) legal, even without AVX2. Note that the same thing was done a while ago for truncstores (r46140), but I assume no one needed it yet for extloads, so here we go. Calls to getLoadExtAction were changed to add the value type, found manually in the surrounding code. Calls to setLoadExtAction were mechanically changed, by wrapping the call in a loop, to match previous behavior. The loop iterates over the MVT subrange corresponding to the memory type (FP vectors, etc...). I also pulled neighboring setTruncStoreActions into some of the loops; those shouldn't make a difference, as the additional types are illegal. (e.g., i128->i1 truncstores on PPC.) No functional change intended. Differential Revision: http://reviews.llvm.org/D6532 llvm-svn: 225421	2015-01-08 00:51:32 +00:00
Ahmed Bougacha	67dd2d25a3	[CodeGen] Use MVT iterator_ranges in legality loops. NFC intended. A few loops do trickier things than just iterating on an MVT subset, so I'll leave them be for now. Follow-up of r225387. llvm-svn: 225392	2015-01-07 21:27:10 +00:00
Craig Topper	d3c02f177a	Replace several 'assert(false' with 'llvm_unreachable' or fold a condition into the assert. llvm-svn: 225160	2015-01-05 10:15:49 +00:00
Jingyue Wu	e4c9cf04f5	[NVPTX] Fix bugs related to isSingleValueType Summary: With isSingleValueType starting to treat vector types as single-value types, code that uses this interface needs to be updated. Test Plan: vector-global.ll nvcl-param-align.ll Reviewers: jholewinski Reviewed By: jholewinski Subscribers: llvm-commits, meheff, eliben, jholewinski Differential Revision: http://reviews.llvm.org/D6573 llvm-svn: 224440	2014-12-17 17:59:04 +00:00
Matt Arsenault	31a52ad48c	NVPTX: Remove duplicate of AsmPrinter::lowerConstant llvm-svn: 224355	2014-12-16 19:16:17 +00:00
Matthias Braun	7e37a5f523	[CodeGen] Add print and verify pass after each MachineFunctionPass by default Previously print+verify passes were added in a very unsystematic way, which is annoying when debugging as you miss intermediate steps and allows bugs to stay unnotice when no verification is performed. To make this change practical I added the possibility to explicitely disable verification. I used this option on all places where no verification was performed previously (because alot of places actually don't pass the MachineVerifier). In the long term these problems should be fixed properly and verification enabled after each pass. I'll enable some more verification in subsequent commits. This is the 2nd attempt at this after realizing that PassManager::add() may actually delete the pass. llvm-svn: 224059	2014-12-11 21:26:47 +00:00
Rafael Espindola	01c73610d0	This reverts commit r224043 and r224042. check-llvm was failing. llvm-svn: 224045	2014-12-11 20:03:57 +00:00
Matthias Braun	a7c82a9f1d	[CodeGen] Add print and verify pass after each MachineFunctionPass by default Previously print+verify passes were added in a very unsystematic way, which is annoying when debugging as you miss intermediate steps and allows bugs to stay unnotice when no verification is performed. To make this change practical I added the possibility to explicitely disable verification. I used this option on all places where no verification was performed previously (because alot of places actually don't pass the MachineVerifier). In the long term these problems should be fixed properly and verification enabled after each pass. I'll enable some more verification in subsequent commits. llvm-svn: 224042	2014-12-11 19:42:05 +00:00
Duncan P. N. Exon Smith	5bf8fef580	IR: Split Metadata from Value Split `Metadata` away from the `Value` class hierarchy, as part of PR21532. Assembly and bitcode changes are in the wings, but this is the bulk of the change for the IR C++ API. I have a follow-up patch prepared for `clang`. If this breaks other sub-projects, I apologize in advance :(. Help me compile it on Darwin I'll try to fix it. FWIW, the errors should be easy to fix, so it may be simpler to just fix it yourself. This breaks the build for all metadata-related code that's out-of-tree. Rest assured the transition is mechanical and the compiler should catch almost all of the problems. Here's a quick guide for updating your code: - `Metadata` is the root of a class hierarchy with three main classes: `MDNode`, `MDString`, and `ValueAsMetadata`. It is distinct from the `Value` class hierarchy. It is typeless -- i.e., instances do not have a `Type`. - `MDNode`'s operands are all `Metadata ` (instead of `Value `). - `TrackingVH<MDNode>` and `WeakVH` referring to metadata can be replaced with `TrackingMDNodeRef` and `TrackingMDRef`, respectively. If you're referring solely to resolved `MDNode`s -- post graph construction -- just use `MDNode`. - `MDNode` (and the rest of `Metadata`) have only limited support for `replaceAllUsesWith()`. As long as an `MDNode` is pointing at a forward declaration -- the result of `MDNode::getTemporary()` -- it maintains a side map of its uses and can RAUW itself. Once the forward declarations are fully resolved RAUW support is dropped on the ground. This means that uniquing collisions on changing operands cause nodes to become "distinct". (This already happened fairly commonly, whenever an operand went to null.) If you're constructing complex (non self-reference) `MDNode` cycles, you need to call `MDNode::resolveCycles()` on each node (or on a top-level node that somehow references all of the nodes). Also, don't do that. Metadata cycles (and the RAUW machinery needed to construct them) are expensive. - An `MDNode` can only refer to a `Constant` through a bridge called `ConstantAsMetadata` (one of the subclasses of `ValueAsMetadata`). As a side effect, accessing an operand of an `MDNode` that is known to be, e.g., `ConstantInt`, takes three steps: first, cast from `Metadata` to `ConstantAsMetadata`; second, extract the `Constant`; third, cast down to `ConstantInt`. The eventual goal is to introduce `MDInt`/`MDFloat`/etc. and have metadata schema owners transition away from using `Constant`s when the type isn't important (and they don't care about referring to `GlobalValue`s). In the meantime, I've added transitional API to the `mdconst` namespace that matches semantics with the old code, in order to avoid adding the error-prone three-step equivalent to every call site. If your old code was: MDNode N = foo(); bar(isa <ConstantInt>(N->getOperand(0))); baz(cast <ConstantInt>(N->getOperand(1))); bak(cast_or_null <ConstantInt>(N->getOperand(2))); bat(dyn_cast <ConstantInt>(N->getOperand(3))); bay(dyn_cast_or_null<ConstantInt>(N->getOperand(4))); you can trivially match its semantics with: MDNode N = foo(); bar(mdconst::hasa <ConstantInt>(N->getOperand(0))); baz(mdconst::extract <ConstantInt>(N->getOperand(1))); bak(mdconst::extract_or_null <ConstantInt>(N->getOperand(2))); bat(mdconst::dyn_extract <ConstantInt>(N->getOperand(3))); bay(mdconst::dyn_extract_or_null<ConstantInt>(N->getOperand(4))); and when you transition your metadata schema to `MDInt`: MDNode N = foo(); bar(isa <MDInt>(N->getOperand(0))); baz(cast <MDInt>(N->getOperand(1))); bak(cast_or_null <MDInt>(N->getOperand(2))); bat(dyn_cast <MDInt>(N->getOperand(3))); bay(dyn_cast_or_null<MDInt>(N->getOperand(4))); - A `CallInst` -- specifically, intrinsic instructions -- can refer to metadata through a bridge called `MetadataAsValue`. This is a subclass of `Value` where `getType()->isMetadataTy()`. `MetadataAsValue` is the only class that can legally refer to a `LocalAsMetadata`, which is a bridged form of non-`Constant` values like `Argument` and `Instruction`. It can also refer to any other `Metadata` subclass. (I'll break all your testcases in a follow-up commit, when I propagate this change to assembly.) llvm-svn: 223802	2014-12-09 18:38:53 +00:00
Jacques Pienaar	0c7dc9f7c3	Test commit. llvm-svn: 223310	2014-12-03 23:21:02 +00:00
Duncan P. N. Exon Smith	c280ff0e47	NVPTX: Delete dead code `MDNode` does not inherit from `User`, and it never has a name. llvm-svn: 223198	2014-12-03 04:13:23 +00:00
Jingyue Wu	5b62eb9b48	[NVPTX] Do not emit .weak symbols for NVPTX Summary: ".weak" symbols cannot be consumed by ptxas (PR21685). This patch makes the weak directive in MCAsmPrinter customizable, and disables emitting ".weak" symbols for NVPTX. Test Plan: weak-linkage.ll Reviewers: jholewinski Reviewed By: jholewinski Subscribers: majnemer, jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D6455 llvm-svn: 223077	2014-12-01 21:16:17 +00:00

... 2 3 4 5 6 ...

680 Commits