Commit Graph

186 Commits

Author SHA1 Message Date
Justin Holewinski dff28d215f [NVPTX] Fix vector loads from parameters that span multiple loads, and fix some typos
llvm-svn: 185332
2013-07-01 12:59:01 +00:00
Justin Holewinski a2911283e4 [NVPTX] Handle signext/zeroext attributes properly
Fix a case where we were incorrectly sign-extending a value when we should have been zero-extending the value.

Also change some SIGN_EXTEND to ANY_EXTEND because we really dont care and may have more opportunity to fold subexpressions

llvm-svn: 185331
2013-07-01 12:58:58 +00:00
Justin Holewinski 318c625ff4 [NVPTX] Add support for native SIGN_EXTEND_INREG where available
llvm-svn: 185330
2013-07-01 12:58:56 +00:00
Justin Holewinski e40e929eb1 [NVPTX] Add isel patterns for [reg+offset] form of ldg/ldu.
llvm-svn: 185329
2013-07-01 12:58:52 +00:00
Justin Holewinski e8c93e3378 [NVPTX] Make sure we zero out high-order 24 bits for 8-bit load into 32-bit value
llvm-svn: 185328
2013-07-01 12:58:48 +00:00
Justin Holewinski af258be134 [NVPTX] Add (1.0 / sqrt(x)) => rsqrt(x) generation when allowable by FP flags
llvm-svn: 185178
2013-06-28 17:58:13 +00:00
Justin Holewinski e04e4bdf71 [NVPTX] Calling conventions fix
Fix ABI handling for function
returning bool -- use st.param.b32 to return the value
and use ld.param.b32 in caller to load the return value.

llvm-svn: 185177
2013-06-28 17:58:10 +00:00
Justin Holewinski dc372df63b [NVPTX] Add support for cttz/ctlz/ctpop
llvm-svn: 185176
2013-06-28 17:58:07 +00:00
Justin Holewinski dc5e3b68f5 [NVPTX] Clean up comparison/select/convert patterns and factor out PTX instructions from their patterns
Test case is no breakage

llvm-svn: 185175
2013-06-28 17:58:04 +00:00
Justin Holewinski f8f7091722 [NVPTX] Remove i8 register class. PTX support for i8 (.b8, .u8, .s8) is rather poor and we're better off just ignoring it and letting LLVM expand all i8 ops out to i16.
llvm-svn: 185174
2013-06-28 17:57:59 +00:00
Justin Holewinski 120baee819 [NVPTX] Add support for vectorized function return values
llvm-svn: 185173
2013-06-28 17:57:55 +00:00
Justin Holewinski 44f5c60e58 [NVPTX] Clean up handling of formal arguments and enable generation of vector parameter loads
llvm-svn: 185172
2013-06-28 17:57:53 +00:00
Justin Holewinski b6e6cd356e [NVPTX] Add support for selecting CUDA vs OCL mode based on triple
IR for CUDA should use "nvptx[64]-nvidia-cuda", and IR for NV OpenCL should use "nvptx[64]-nvidia-nvcl"

llvm-svn: 184579
2013-06-21 18:51:49 +00:00
Justin Holewinski b96d1395f6 [NVPTX] Remove old CONST_NOT_GEN address space that is not being used anymore and causes constants to be emitted in the global address space
llvm-svn: 183652
2013-06-10 13:29:47 +00:00
Justin Holewinski dbb3b2f4b6 [NVPTX] Re-enable support for virtual registers in the final output
Now that 3.3 is branched, we are re-enabling virtual registers to help
iron out bugs before the next release. Some of the post-RA passes do
not play well with virtual registers, so we disable them for now. The
needed functionality of the PrologEpilogInserter pass is copied to a
new backend-specific NVPTXPrologEpilog pass.

The test for this commit is not breaking the existing tests.

llvm-svn: 182998
2013-05-31 12:14:49 +00:00
Justin Holewinski 994d66a345 [NVPTX] Fix case where a sext load of an i1 type may produce an
ld.u1 instead of an ld.u8.

llvm-svn: 182924
2013-05-30 12:22:39 +00:00
Justin Holewinski 48f4ad3fc0 [NVPTX] Add @llvm.nvvm.sqrt.f() intrinsic
llvm-svn: 182394
2013-05-21 16:51:30 +00:00
Justin Holewinski 4c47d87ba6 [NVPTX] Fix mis-use of CurrentFnSym in NVPTXAsmPrinter. This was causing a symbol name error in the output PTX.
llvm-svn: 182298
2013-05-20 16:42:18 +00:00
Justin Holewinski 01f89f0428 [NVPTX] Add GenericToNVVM IR converter to better handle idiomatic LLVM IR inputs
This converter currently only handles global variables in address space 0. For
these variables, they are promoted to address space 1 (global memory), and all
uses are updated to point to the result of a cvta.global instruction on the new
variable.

The motivation for this is address space 0 global variables are illegal since we
cannot declare variables in the generic address space.  Instead, we place the
variables in address space 1 and explicitly convert the pointer to address
space 0. This is primarily intended to help new users who expect to be able to
place global variables in the default address space.

llvm-svn: 182254
2013-05-20 12:13:32 +00:00
Justin Holewinski 700b6fa934 [NVPTX] Fix i1 kernel parameters and global variables. ABI rules say we need to use .u8 for i1 parameters for kernels.
llvm-svn: 182253
2013-05-20 12:13:28 +00:00
Justin Holewinski 59fd8ba5f5 [NVPTX] Remove support for SM < 2.0. This was never fully supported anyway.
llvm-svn: 178417
2013-03-30 14:29:30 +00:00
Justin Holewinski b94bd05b95 [NVPTX] Add NVVMReflect pass to allow compile-time selection of
specific code paths.

This allows us to write code like:

  if (__nvvm_reflect("FOO"))
    // Do something
  else
    // Do something else

and compile into a library, then give "FOO" a value at kernel
compile-time so the check becomes a no-op.

llvm-svn: 178416
2013-03-30 14:29:25 +00:00
Justin Holewinski e988409423 [NVPTX] Fix handling of vector arguments
llvm-svn: 177847
2013-03-24 21:17:47 +00:00
Justin Holewinski d068943809 Propagate DAG node ordering during type legalization and instruction selection
A node's ordering is only propagated during legalization if (a) the new node does
not have an ordering (is not a CSE'd node), or (b) the new node has an ordering
that is higher than the node being legalized.

llvm-svn: 177465
2013-03-20 00:10:32 +00:00
Justin Holewinski be8dc6499a [NVPTX] Disable vector registers
Vectors were being manually scalarized by the backend.  Instead,
let the target-independent code do all of the work.  The manual
scalarization was from a time before good target-independent support
for scalarization in LLVM. However, this forces us to specially-handle
vector loads and stores, which we can turn into PTX instructions that
produce/consume multiple operands.

llvm-svn: 174968
2013-02-12 14:18:49 +00:00
Justin Holewinski 4f12f53353 [NVPTX] Remove NoCapture from address space conversion intrinsics. NoCapture is not valid in this case, and was causing incorrect optimizations.
llvm-svn: 174896
2013-02-11 18:56:35 +00:00
Justin Holewinski fb711156ae [NVPTX] Fix crash with unnamed struct arguments
Patch by Eric Holk

llvm-svn: 169418
2012-12-05 20:50:28 +00:00
Justin Holewinski 0ac49bf846 Teach the legalizer how to handle operands for VSELECT nodes
If we need to split the operand of a VSELECT, it must be the mask operand. We
split the entire VSELECT operand with EXTRACT_SUBVECTOR.

llvm-svn: 168883
2012-11-29 14:26:28 +00:00
Justin Holewinski bc45119b44 Allow targets to prefer TypeSplitVector over TypePromoteInteger when computing the legalization method for vectors
For some targets, it is desirable to prefer scalarizing <N x i1> instead of promoting to a larger legal type, such as <N x i32>.

llvm-svn: 168882
2012-11-29 14:26:24 +00:00
Justin Holewinski 2c5ac70dd9 [NVPTX] Order global variables in def-use order before emiting them in the final assembly
llvm-svn: 168198
2012-11-16 21:03:51 +00:00
Justin Holewinski c6462aacd5 [NVPTX] Implement custom lowering of loads/stores for i1
Loads from i1 become loads from i8 followed by trunc
Stores to i1 become zext to i8 followed by store to i8

Fixes PR13291

llvm-svn: 167948
2012-11-14 19:19:16 +00:00
Justin Holewinski 1812ee9a5b [NVPTX] Add more precise PTX/SM target attributes
Each SM and PTX version is modeled as a subtarget feature/CPU. Additionally,
PTX 3.1 is added as the default PTX version to be out-of-the-box compatible
with CUDA 5.0.

Available CPUs for this target:

  sm_10 - Select the sm_10 processor.
  sm_11 - Select the sm_11 processor.
  sm_12 - Select the sm_12 processor.
  sm_13 - Select the sm_13 processor.
  sm_20 - Select the sm_20 processor.
  sm_21 - Select the sm_21 processor.
  sm_30 - Select the sm_30 processor.
  sm_35 - Select the sm_35 processor.

Available features for this target:

  ptx30 - Use PTX version 3.0.
  ptx31 - Use PTX version 3.1.
  sm_10 - Target SM 1.0.
  sm_11 - Target SM 1.1.
  sm_12 - Target SM 1.2.
  sm_13 - Target SM 1.3.
  sm_20 - Target SM 2.0.
  sm_21 - Target SM 2.1.
  sm_30 - Target SM 3.0.
  sm_35 - Target SM 3.5.

llvm-svn: 167699
2012-11-12 03:16:43 +00:00
Justin Holewinski 2dc9d072e5 [NVPTX] Use ABI alignment for parameters when alignment is not specified.
Affects SM 2.0+.  Fixes bug 13324.

llvm-svn: 167646
2012-11-09 23:50:24 +00:00
Peter Collingbourne 913869be45 Add llvm.fabs intrinsic.
llvm-svn: 157594
2012-05-28 21:48:37 +00:00
Justin Holewinski c98041d4d9 [NVPTX] Add a new test case for the newly-enabled call handling
NV_CONTRIB

llvm-svn: 157485
2012-05-25 17:20:38 +00:00
Justin Holewinski ae556d3ef7 This patch adds a new NVPTX back-end to LLVM which supports code generation for NVIDIA PTX 3.0. This back-end will (eventually) replace the current PTX back-end, while maintaining compatibility with it.
The new target machines are:

nvptx (old ptx32) => 32-bit PTX
nvptx64 (old ptx64) => 64-bit PTX

The sources are based on the internal NVIDIA NVPTX back-end, and
contain more functionality than the current PTX back-end currently
provides.

NV_CONTRIB

llvm-svn: 156196
2012-05-04 20:18:50 +00:00