Commit Graph

259 Commits

Author SHA1 Message Date
Jingyue Wu 5b62eb9b48 [NVPTX] Do not emit .weak symbols for NVPTX
Summary:
".weak" symbols cannot be consumed by ptxas (PR21685). This patch makes the
weak directive in MCAsmPrinter customizable, and disables emitting ".weak"
symbols for NVPTX.

Test Plan: weak-linkage.ll

Reviewers: jholewinski

Reviewed By: jholewinski

Subscribers: majnemer, jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D6455

llvm-svn: 223077
2014-12-01 21:16:17 +00:00
Justin Holewinski 3d140fcfd1 [NVPTX] Add NVPTXLowerStructArgs pass
This works around the limitation that PTX does not allow .param space
loads/stores with arbitrary pointers.

If a function has a by-val struct ptr arg, say foo(%struct.x *byval %d), then
add the following instructions to the first basic block :

%temp = alloca %struct.x, align 8
%tt1 = bitcast %struct.x * %d to i8 *
%tt2 = llvm.nvvm.cvt.gen.to.param %tt2
%tempd = bitcast i8 addrspace(101) * to %struct.x addrspace(101) *
%tv = load %struct.x addrspace(101) * %tempd
store %struct.x %tv, %struct.x * %temp, align 8

The above code allocates some space in the stack and copies the incoming
struct from param space to local space. Then replace all occurences of %d
by %temp.

Fixes PR21465.

llvm-svn: 221377
2014-11-05 18:19:30 +00:00
Jingyue Wu ea51161a94 [NVPTX] aligned byte-buffers for vector return types
Summary:
Fixes PR21100 which is caused by inconsistency between the declared return type
and the expected return type at the call site. The new behavior is consistent
with nvcc and the NVPTXTargetLowering::getPrototype function.

Test Plan: test/Codegen/NVPTX/vector-return.ll

Reviewers: jholewinski

Reviewed By: jholewinski

Subscribers: llvm-commits, meheff, eliben, jholewinski

Differential Revision: http://reviews.llvm.org/D5612

llvm-svn: 220607
2014-10-25 03:46:16 +00:00
Jingyue Wu 2954280f6a [MachineSink] Use the real post dominator tree
Summary:
Fixes a FIXME in MachineSinking. Instead of using the simple heuristics in
isPostDominatedBy, use the real MachinePostDominatorTree and MachineLoopInfo.
The old heuristics caused instructions to sink unnecessarily, and might create
register pressure.

This is the second try of the fix. The first one (D4814) caused a performance
regression due to failing to sink instructions out of loops (PR21115). This
patch fixes PR21115 by sinking an instruction from a deeper loop to a shallower
one regardless of whether the target block post-dominates the source.

Thanks Alexey Volkov for reporting PR21115! 

Test Plan:
Added a NVPTX codegen test to verify that our change prevents the backend from
over-sinking. It also shows the unnecessary register pressure caused by
over-sinking.

Added an X86 test to verify we can sink instructions out of loops regardless of
the dominance relationship. This test is reduced from Alexey's test in PR21115.

Updated an affected test in X86.

Also ran SPEC CINT2006 and llvm-test-suite for compilation time and runtime
performance. Results are attached separately in the review thread.

Reviewers: Jiangning, resistor, hfinkel

Reviewed By: hfinkel

Subscribers: hfinkel, bruno, volkalexey, llvm-commits, meheff, eliben, jholewinski

Differential Revision: http://reviews.llvm.org/D5633

llvm-svn: 219773
2014-10-15 03:27:43 +00:00
Jingyue Wu fd47fb9976 Revert r216862 due to a performance regression
Reported by Alexey Volkov in PR21115

llvm-svn: 218771
2014-10-01 15:22:13 +00:00
Jingyue Wu 5208cc5dbe [MachineSink] Use the real post dominator tree
Summary:
Fixes a FIXME in MachineSinking. Instead of using the simple heuristics
in isPostDominatedBy, use the real MachinePostDominatorTree. The old
heuristics caused instructions to sink unnecessarily, and might create
register pressure.

Test Plan:
Added a NVPTX codegen test to verify that our change is in effect. It also
shows the unnecessary register pressure caused by over-sinking. Updated
affected tests in AArch64 and X86.

Reviewers: eliben, meheff, Jiangning

Reviewed By: Jiangning

Subscribers: jholewinski, aemerson, mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D4814

llvm-svn: 216862
2014-09-01 03:47:25 +00:00
Jingyue Wu cb83a155c1 [NVPTX] Make the alignment an explicit argument to ldu/ldg
Summary:
Instead of specifying the alignment as metadata which may be destroyed by
transformation passes, make the alignment the second argument to ldu/ldg
intrinsic calls.

Test Plan:
ldu-ldg.ll
ldu-i8.ll
ldu-reg-plus-offset.ll

Reviewers: eliben, meheff, jholewinski

Reviewed By: meheff, jholewinski

Subscribers: jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D5093

llvm-svn: 216731
2014-08-29 15:30:20 +00:00
Justin Holewinski 4d6f783281 [NVPTX] Add some extra tests for mul.wide to test non-power-of-two source types
llvm-svn: 213794
2014-07-23 20:23:49 +00:00
Justin Holewinski ecca715b3c [NVPTX] mul.wide generation works for any smaller integer source types, not just the next smaller power of two
llvm-svn: 213784
2014-07-23 18:46:03 +00:00
Justin Holewinski 511664dc76 [NVPTX] Make sure we do not generate MULWIDE ISD nodes when optimizations are disabled
With optimizations disabled, we disable the isel patterns for mul.wide; but we
were still generating MULWIDE ISD nodes.  Now, we only try to generate MULWIDE
ISD nodes in DAGCombine if the optimization level is not zero.

llvm-svn: 213773
2014-07-23 17:40:45 +00:00
Eli Bendersky 24d8aacd94 Add some tests for NVPTX lowering of cmpxchg
llvm-svn: 213586
2014-07-21 22:54:44 +00:00
Eli Bendersky f4f1cff4ba Add tests for atomic adds on floats.
llvm-svn: 213406
2014-07-18 20:11:26 +00:00
Eli Bendersky fc42738a75 Use CHECK-LABEL where appropriate in this test.
llvm-svn: 213398
2014-07-18 19:32:09 +00:00
Tim Northover 9e108a0e3a NVPTX: support fpext/fptrunc to and from f16.
llvm-svn: 213377
2014-07-18 13:01:43 +00:00
Tim Northover 20bd0ced30 CodeGen: soften f16 type by default instead of marking legal.
Actual support for softening f16 operations is still limited, and can be added
when it's needed.  But Soften is much closer to being a useful thing to try
than keeping it Legal when no registers can actually hold such values.

Longer term, we probably want something between Soften and Promote semantics
for most targets, it'll be more efficient to promote the 4 basic operations to
f32 than libcall them.

llvm-svn: 213372
2014-07-18 12:41:46 +00:00
Tim Northover 5e54fe14a4 NVPTX: support direct f16 <-> f64 conversions via intrinsics.
Clang may well start emitting these soon, and while it may not be
directly relevant for OpenCL or GLSL, the instructions were just
sitting there waiting to be used.

llvm-svn: 213356
2014-07-18 08:30:10 +00:00
Justin Holewinski 428cf0e49a [NVPTX] Improve handling of FP fusion
We now consider the FPOpFusion flag when determining whether
to fuse ops.  We also explicitly emit add.rn when fusion is
disabled to prevent ptxas from fusing the operations on its
own.

llvm-svn: 213287
2014-07-17 18:10:09 +00:00
Justin Holewinski e5a1173f67 [NVPTX] Add missing .v4 qualifier on vector store instruction
llvm-svn: 213276
2014-07-17 16:58:56 +00:00
Justin Holewinski 18cfe7d634 [NVPTX] Flag surface/texture query instructions with IsTexSurfQuery
Also, add some tests to make sure we can handle surface/texture
queries on both Fermi and Kepler+.

llvm-svn: 213268
2014-07-17 14:51:33 +00:00
Justin Holewinski 9a2350e459 [NVPTX] Add more surface/texture intrinsics, including CUDA unified texture fetch
This also uses TSFlags to mark machine instructions that are surface/texture
accesses, as well as the vector width for surface operations.  This is used
to simplify some of the switch statements that need to detect surface/texture
instructions

llvm-svn: 213256
2014-07-17 11:59:04 +00:00
Justin Holewinski ac451066f4 [NVPTX] Honor alignment on vector loads/stores
We were not considering the stated alignment on vector loads/stores,
leading us to generate vector instructions even when we do not have
sufficient alignment.

Now, for IR like:

  %1 = load <4 x float>, <4 x float>* %ptr, align 4

we will generate correct, conservative PTX like:

  ld.f32 ... [%ptr]
  ld.f32 ... [%ptr+4]
  ld.f32 ... [%ptr+8]
  ld.f32 ... [%ptr+12]

Or if we have an alignment of 8 (for example), we can
generate code like:

  ld.v2.f32 ... [%ptr]
  ld.v2.f32 ... [%ptr+8]

llvm-svn: 213186
2014-07-16 19:45:35 +00:00
Justin Holewinski 3e037d98e6 [NVPTX] Rename registers %fl -> %fd and %rl -> %rd
This matches the internal behavior of NVIDIA tools like libnvvm.

llvm-svn: 213168
2014-07-16 16:26:58 +00:00
Justin Holewinski a0d531f031 [NVPTX] Add reflect intrinsic (better than matching by function name)
Also clean up some of the logic in NVVMReflect.cpp while we're messing around in there.

llvm-svn: 211948
2014-06-27 18:36:11 +00:00
Justin Holewinski 2739c0175c [NVPTX] Add 'b' asm constraint
llvm-svn: 211946
2014-06-27 18:36:06 +00:00
Justin Holewinski 549c773619 [NVPTX] Error out if initializer is given for variable in an address space that does not support initialization
llvm-svn: 211943
2014-06-27 18:36:01 +00:00
Justin Holewinski 773ca40f5d [NVPTX] Add support for .managed variables for UVM
llvm-svn: 211942
2014-06-27 18:35:58 +00:00
Justin Holewinski d73767a80a [NVPTX] Emit .weak linkage for link_once, weak, available_externally, and common linkage
llvm-svn: 211941
2014-06-27 18:35:56 +00:00
Justin Holewinski b926d9d446 [NVPTX] Fix handling of ldg/ldu intrinsics.
The address space of the pointer must be global (1) for these intrinsics.  There must also be alignment metadata attached to the intrinsic calls, e.g.

%val = tail call i32 @llvm.nvvm.ldu.i.global.i32.p1i32(i32 addrspace(1)* %ptr), !align !0

!0 = metadata !{i32 4}

llvm-svn: 211939
2014-06-27 18:35:51 +00:00
Justin Holewinski 6e40f63e41 [NVPTX] Clean up argument lowering code and properly handle alignment for structs and vectors
llvm-svn: 211938
2014-06-27 18:35:44 +00:00
Justin Holewinski 360a5cfcd3 [NVPTX] Add support for [SHL,SRA,SRL]_PARTS
llvm-svn: 211936
2014-06-27 18:35:40 +00:00
Justin Holewinski eafe26d082 [NVPTX] Implement fma and imad contraction as target DAGCombiner patterns
This also introduces DAGCombiner patterns for mul.wide to multiply two smaller integers and produce a larger integer

llvm-svn: 211935
2014-06-27 18:35:37 +00:00
Justin Holewinski 832e09b4d9 [NVPTX] Add support for efficient rotate instructions on SM 3.2+
llvm-svn: 211934
2014-06-27 18:35:33 +00:00
Justin Holewinski 7be57de6b8 [NVPTX] Add missing isel patterns for 64-bit atomics
llvm-svn: 211933
2014-06-27 18:35:30 +00:00
Justin Holewinski ca7a4f136d [NVPTX] Add isel patterns for bit-field extract (bfe)
llvm-svn: 211932
2014-06-27 18:35:27 +00:00
Justin Holewinski 10c25968d8 [NVPTX] Add support for isspacep instruction
llvm-svn: 211931
2014-06-27 18:35:24 +00:00
Justin Holewinski 124fc1951f [NVPTX] Add support for envreg reads
llvm-svn: 211930
2014-06-27 18:35:21 +00:00
Justin Holewinski 7d5bf66f61 [NVPTX] Emit .weak when linkage is not external, internal, or private
llvm-svn: 211926
2014-06-27 18:35:10 +00:00
Jingyue Wu baabe5091c Canonicalize addrspacecast ConstExpr between different pointer types
As a follow-up to r210375 which canonicalizes addrspacecast
instructions, this patch canonicalizes addrspacecast constant
expressions.

Given clang uses ConstantExpr::getAddrSpaceCast to emit addrspacecast
cosntant expressions, this patch is also a step towards having the
frontend emit canonicalized addrspacecasts.

Piggyback a minor refactor in InstCombineCasts.cpp

Update three affected tests in addrspacecast-alias.ll,
access-non-generic.ll and constant-fold-gep.ll and added one new test in
constant-fold-address-space-pointer.ll

llvm-svn: 211004
2014-06-15 21:40:57 +00:00
Alp Toker d3d017cf00 Reduce verbiage of lit.local.cfg files
We can just split targets_to_build in one place and make it immutable.

llvm-svn: 210496
2014-06-09 22:42:55 +00:00
Rafael Espindola 42a4c9f9e0 Allow aliases to be unnamed_addr.
Alias with unnamed_addr were in a strange state. It is stored in GlobalValue,
the language reference talks about "unnamed_addr aliases" but the verifier
was rejecting them.

It seems natural to allow unnamed_addr in aliases:

* It is a property of how it is accessed, not of the data itself.
* It is perfectly possible to write code that depends on the address
of an alias.

This patch then makes unname_addr legal for aliases. One side effect is that
the syntax changes for a corner case: In globals, unnamed_addr is now printed
before the address space.

llvm-svn: 210302
2014-06-06 01:20:28 +00:00
Eli Bendersky 7cd70df708 Fix the test: DCE optimized away everything.
Use volatile store to protect the generated PTX from DCE.

Patch by Jingyue Wu.

llvm-svn: 206763
2014-04-21 17:23:12 +00:00
Justin Holewinski 30d56a7b86 [NVPTX] Add preliminary intrinsics and codegen support for textures/surfaces
This commit adds intrinsics and codegen support for the surface read/write and texture read instructions that take an explicit sampler parameter. Codegen operates on image handles at the PTX level, but falls back to direct replacement of handles with kernel arguments if image handles are not enabled. Note that image handles are explicitly disabled for all target architectures in this change (to be enabled later).

llvm-svn: 205907
2014-04-09 15:39:15 +00:00
Justin Holewinski 9d852a8e08 [NVPTX] Add support for addrspacecast in global variable initializers, including emitting generic() when casting to address space 0.
llvm-svn: 205906
2014-04-09 15:39:11 +00:00
Eli Bendersky bbef172f19 Optimize away unnecessary address casts.
Removes unnecessary casts from non-generic address spaces to the generic address
space for certain code patterns.

Patch by Jingyue Wu.

llvm-svn: 205571
2014-04-03 21:18:25 +00:00
Eli Bendersky 264cd4672d Fix for PR19099 - NVPTX produces invalid symbol names.
This is a more thorough fix for the issue than r203483. An IR pass will run
before NVPTX codegen to make sure there are no invalid symbol names that can't
be consumed by the ptxas assembler.

llvm-svn: 205212
2014-03-31 15:56:26 +00:00
Eli Bendersky a7b6679774 Add test to test/CodeGen/NVPTX for "alloca buffer" arguments.
Make sure such IR gets properly lowered to PTX.

llvm-svn: 204624
2014-03-24 16:52:30 +00:00
Justin Holewinski ba2fa6de4f [NVPTX] Add isel patterns for addrspacecast
llvm-svn: 204600
2014-03-24 11:17:53 +00:00
Eli Bendersky 2281ef91e6 Expose "noduplicate" attribute as a property for intrinsics.
The "noduplicate" function attribute exists to prevent certain optimizations
from duplicating calls to the function. This is important on platforms where
certain function call duplications are unsafe (for example execution barriers
for CUDA and OpenCL).

This patch makes it possible to specify intrinsics as "noduplicate" and
translates that to the appropriate function attribute.

llvm-svn: 204200
2014-03-18 23:51:07 +00:00
Rafael Espindola 2fb5bc33a3 Remove the linker_private and linker_private_weak linkages.
These linkages were introduced some time ago, but it was never very
clear what exactly their semantics were or what they should be used
for. Some investigation found these uses:

* utf-16 strings in clang.
* non-unnamed_addr strings produced by the sanitizers.

It turns out they were just working around a more fundamental problem.
For some sections a MachO linker needs a symbol in order to split the
section into atoms, and llvm had no idea that was the case. I fixed
that in r201700 and it is now safe to use the private linkage. When
the object ends up in a section that requires symbols, llvm will use a
'l' prefix instead of a 'L' prefix and things just work.

With that, these linkages were already dead, but there was a potential
future user in the objc metadata information. I am still looking at
CGObjcMac.cpp, but at this point I am convinced that linker_private
and linker_private_weak are not what they need.

The objc uses are currently split in

* Regular symbols (no '\01' prefix). LLVM already directly provides
whatever semantics they need.
* Uses of a private name (start with "\01L" or "\01l") and private
linkage. We can drop the "\01L" and "\01l" prefixes as soon as llvm
agrees with clang on L being ok or not for a given section. I have two
patches in code review for this.
* Uses of private name and weak linkage.

The last case is the one that one could think would fit one of these
linkages. That is not the case. The semantics are

* the linker will merge these symbol by *name*.
* the linker will hide them in the final DSO.

Given that the merging is done by name, any of the private (or
internal) linkages would be a bad match. They allow llvm to rename the
symbols, and that is really not what we want. From the llvm point of
view, these objects should really be (linkonce|weak)(_odr)?.

For now, just keeping the "\01l" prefix is probably the best for these
symbols. If we one day want to have a more direct support in llvm,
IMHO what we should add is not a linkage, it is just a hidden_symbol
attribute. It would be applicable to multiple linkages. For example,
on weak it would produce the current behavior we have for objc
metadata. On internal, it would be equivalent to private (and we
should then remove private).

llvm-svn: 203866
2014-03-13 23:18:37 +00:00
Eli Bendersky d47a5c2d3f Followup to r203483 - add test.
[forgot to 'svn add' before committing r203483]

llvm-svn: 203485
2014-03-10 20:36:04 +00:00
Gautam Chakrabarti 2c283400f9 [NVPTX] Fix emitting aggregate parameters
The code was missing the case for aggregate parameters and
hence was emitting them as .b0 type. Also fixed a couple
of comments.

llvm-svn: 200325
2014-01-28 18:35:29 +00:00
Justin Holewinski 7706107e6f [NVPTX] Add missing patterns for div.approx with immediate denominator
llvm-svn: 199746
2014-01-21 14:40:05 +00:00
Nico Rieck b5262d6d8f Fix non-deterministic SDNodeOrder-dependent codegen
Reset SelectionDAGBuilder's SDNodeOrder to ensure deterministic code
generation.

llvm-svn: 199050
2014-01-12 14:09:17 +00:00
Justin Holewinski 4459717bab [NVPTX] Fix off-by-one error when creating the VT list for an SDNode
llvm-svn: 196503
2013-12-05 12:58:00 +00:00
Justin Holewinski 3d49e5c655 [NVPTX] Fix handling of indirect calls
Using a special machine node is cleaner than an InlineAsm node, and fixes an assertion failure in InstrEmitter

llvm-svn: 194810
2013-11-15 12:30:04 +00:00
Justin Holewinski 124e93de93 [NVPTX] Properly handle bitcast ConstantExpr when checking for the alignment of function parameters
llvm-svn: 194410
2013-11-11 19:28:19 +00:00
Justin Holewinski 4f5bc9b33a [NVPTX] Fix logic error in loading vector parameters of more than 4 components
llvm-svn: 194409
2013-11-11 19:28:16 +00:00
Justin Holewinski a51418c1ca [NVPTX] Switch from StrongPHIElimination to PHIElimination in NVPTXTargetMachine, and add some missing optimization passes to addOptimizedRegAlloc
Fixes PR17529

llvm-svn: 192445
2013-10-11 12:39:39 +00:00
Justin Holewinski 660597d190 Make AsmPrinter::emitImplicitDef a virtual method so targets can emit custom comments for implicit defs
For NVPTX, this fixes a crash where the emitImplicitDef implementation was expecting physical registers,
while NVPTX uses virtual registers (with a couple of exceptions).  Now, the implicit def comment will be
emitted as a true PTX register name. Other targets can use this to customize the output of implicit def
comments.

Fixes PR17519

llvm-svn: 192444
2013-10-11 12:39:36 +00:00
Justin Holewinski a54daa4640 [NVPTX] Make constant vector test case endian-independent
llvm-svn: 190998
2013-09-19 13:14:44 +00:00
Justin Holewinski 95564bdf5e [NVPTX] Support constant vector globals
llvm-svn: 190997
2013-09-19 12:51:46 +00:00
Justin Holewinski aaa8b6e355 [NVPTX] Re-enable assembly printing support for inline assembly
This support was removed by accident during the MC conversion

llvm-svn: 189160
2013-08-24 01:17:23 +00:00
Daniel Dunbar 9efbedfd35 [tests] Cleanup initialization of test suffixes.
- Instead of setting the suffixes in a bunch of places, just set one master
   list in the top-level config. We now only modify the suffix list in a few
   suites that have one particular unique suffix (.ml, .mc, .yaml, .td, .py).

 - Aside from removing the need for a bunch of lit.local.cfg files, this enables
   4 tests that were inadvertently being skipped (one in
   Transforms/BranchFolding, a .s file each in DebugInfo/AArch64 and
   CodeGen/PowerPC, and one in CodeGen/SI which is now failing and has been
   XFAILED).

 - This commit also fixes a bunch of config files to use config.root instead of
   older copy-pasted code.

llvm-svn: 188513
2013-08-16 00:37:11 +00:00
Justin Holewinski debe686f05 [NVPTX] Add missing patterns for i1 [s,u]int_to_fp
llvm-svn: 187800
2013-08-06 14:13:34 +00:00
Justin Holewinski 871ec93909 [NVPTX] Fix bug in stack code generation causes by MC conversion
We do use a very small set of physical registers, so account for
them in the virtual register encoding between MachineInstr and MC

llvm-svn: 187799
2013-08-06 14:13:31 +00:00
Justin Holewinski a2a63d28df [NVPTX] Start conversion to MC infrastructure
This change converts the NVPTX target to use the MC infrastructure
instead of directly emitting MachineInstr instances. This brings
the target more up-to-date with LLVM TOT, and should fix PR15175
and PR15958 (libNVPTXInstPrinter is empty) as a side-effect.

llvm-svn: 187798
2013-08-06 14:13:27 +00:00
Justin Holewinski d3f2035a3c Add a target legalize hook for SplitVectorOperand (again)
CustomLowerNode was not being called during SplitVectorOperand,
meaning custom legalization could not be used by targets.

This also adds a test case for NVPTX that depends on this custom
legalization.

Differential Revision: http://llvm-reviews.chandlerc.com/D1195

Attempt to fix the buildbots by making the X86 test I just added platform independent

llvm-svn: 187202
2013-07-26 13:28:29 +00:00
Rafael Espindola 1d812728cc Revert "Add a target legalize hook for SplitVectorOperand"
This reverts commit 187198. It broke the bots.

The soft float test probably needs a -triple because of name differences.
On the hard float test I am getting a "roundss $1, %xmm0, %xmm0", instead of
"vroundss $1, %xmm0, %xmm0, %xmm0".

llvm-svn: 187201
2013-07-26 13:18:16 +00:00
Justin Holewinski f848a24e50 Add a target legalize hook for SplitVectorOperand
CustomLowerNode was not being called during SplitVectorOperand,
meaning custom legalization could not be used by targets.

This also adds a test case for NVPTX that depends on this custom
legalization.

Differential Revision: http://llvm-reviews.chandlerc.com/D1195

llvm-svn: 187198
2013-07-26 12:46:39 +00:00
Justin Holewinski cd069e6dec [NVPTX] Use approximate FP ops when unsafe-fp-math is used, and append
.ftz to instructions if the nvptx-f32ftz attribute is set to "true"

llvm-svn: 186820
2013-07-22 12:18:04 +00:00
Stephen Lin f799e3f944 Convert CodeGen/*/*.ll tests to use the new CHECK-LABEL for easier debugging. No functionality change and all tests pass after conversion.
This was done with the following sed invocation to catch label lines demarking function boundaries:
    sed -i '' "s/^;\( *\)\([A-Z0-9_]*\):\( *\)test\([A-Za-z0-9_-]*\):\( *\)$/;\1\2-LABEL:\3test\4:\5/g" test/CodeGen/*/*.ll
which was written conservatively to avoid false positives rather than false negatives. I scanned through all the changes and everything looks correct.

llvm-svn: 186258
2013-07-13 20:38:47 +00:00
Justin Holewinski d2bbdf05e0 [NVPTX] Add support for module-scope inline asm
Since we were explicitly not calling AsmPrinter::doInitialization,
any module-scope inline asm was not being printed.

llvm-svn: 185336
2013-07-01 13:00:14 +00:00
Justin Holewinski 51cb1349dc [NVPTX] 64-bit ADDC/ADDE are not legal
llvm-svn: 185333
2013-07-01 12:59:04 +00:00
Justin Holewinski dff28d215f [NVPTX] Fix vector loads from parameters that span multiple loads, and fix some typos
llvm-svn: 185332
2013-07-01 12:59:01 +00:00
Justin Holewinski a2911283e4 [NVPTX] Handle signext/zeroext attributes properly
Fix a case where we were incorrectly sign-extending a value when we should have been zero-extending the value.

Also change some SIGN_EXTEND to ANY_EXTEND because we really dont care and may have more opportunity to fold subexpressions

llvm-svn: 185331
2013-07-01 12:58:58 +00:00
Justin Holewinski 318c625ff4 [NVPTX] Add support for native SIGN_EXTEND_INREG where available
llvm-svn: 185330
2013-07-01 12:58:56 +00:00
Justin Holewinski e40e929eb1 [NVPTX] Add isel patterns for [reg+offset] form of ldg/ldu.
llvm-svn: 185329
2013-07-01 12:58:52 +00:00
Justin Holewinski e8c93e3378 [NVPTX] Make sure we zero out high-order 24 bits for 8-bit load into 32-bit value
llvm-svn: 185328
2013-07-01 12:58:48 +00:00
Justin Holewinski af258be134 [NVPTX] Add (1.0 / sqrt(x)) => rsqrt(x) generation when allowable by FP flags
llvm-svn: 185178
2013-06-28 17:58:13 +00:00
Justin Holewinski e04e4bdf71 [NVPTX] Calling conventions fix
Fix ABI handling for function
returning bool -- use st.param.b32 to return the value
and use ld.param.b32 in caller to load the return value.

llvm-svn: 185177
2013-06-28 17:58:10 +00:00
Justin Holewinski dc372df63b [NVPTX] Add support for cttz/ctlz/ctpop
llvm-svn: 185176
2013-06-28 17:58:07 +00:00
Justin Holewinski dc5e3b68f5 [NVPTX] Clean up comparison/select/convert patterns and factor out PTX instructions from their patterns
Test case is no breakage

llvm-svn: 185175
2013-06-28 17:58:04 +00:00
Justin Holewinski f8f7091722 [NVPTX] Remove i8 register class. PTX support for i8 (.b8, .u8, .s8) is rather poor and we're better off just ignoring it and letting LLVM expand all i8 ops out to i16.
llvm-svn: 185174
2013-06-28 17:57:59 +00:00
Justin Holewinski 120baee819 [NVPTX] Add support for vectorized function return values
llvm-svn: 185173
2013-06-28 17:57:55 +00:00
Justin Holewinski 44f5c60e58 [NVPTX] Clean up handling of formal arguments and enable generation of vector parameter loads
llvm-svn: 185172
2013-06-28 17:57:53 +00:00
Justin Holewinski b6e6cd356e [NVPTX] Add support for selecting CUDA vs OCL mode based on triple
IR for CUDA should use "nvptx[64]-nvidia-cuda", and IR for NV OpenCL should use "nvptx[64]-nvidia-nvcl"

llvm-svn: 184579
2013-06-21 18:51:49 +00:00
Justin Holewinski b96d1395f6 [NVPTX] Remove old CONST_NOT_GEN address space that is not being used anymore and causes constants to be emitted in the global address space
llvm-svn: 183652
2013-06-10 13:29:47 +00:00
Justin Holewinski dbb3b2f4b6 [NVPTX] Re-enable support for virtual registers in the final output
Now that 3.3 is branched, we are re-enabling virtual registers to help
iron out bugs before the next release. Some of the post-RA passes do
not play well with virtual registers, so we disable them for now. The
needed functionality of the PrologEpilogInserter pass is copied to a
new backend-specific NVPTXPrologEpilog pass.

The test for this commit is not breaking the existing tests.

llvm-svn: 182998
2013-05-31 12:14:49 +00:00
Justin Holewinski 994d66a345 [NVPTX] Fix case where a sext load of an i1 type may produce an
ld.u1 instead of an ld.u8.

llvm-svn: 182924
2013-05-30 12:22:39 +00:00
Justin Holewinski 48f4ad3fc0 [NVPTX] Add @llvm.nvvm.sqrt.f() intrinsic
llvm-svn: 182394
2013-05-21 16:51:30 +00:00
Justin Holewinski 4c47d87ba6 [NVPTX] Fix mis-use of CurrentFnSym in NVPTXAsmPrinter. This was causing a symbol name error in the output PTX.
llvm-svn: 182298
2013-05-20 16:42:18 +00:00
Justin Holewinski 01f89f0428 [NVPTX] Add GenericToNVVM IR converter to better handle idiomatic LLVM IR inputs
This converter currently only handles global variables in address space 0. For
these variables, they are promoted to address space 1 (global memory), and all
uses are updated to point to the result of a cvta.global instruction on the new
variable.

The motivation for this is address space 0 global variables are illegal since we
cannot declare variables in the generic address space.  Instead, we place the
variables in address space 1 and explicitly convert the pointer to address
space 0. This is primarily intended to help new users who expect to be able to
place global variables in the default address space.

llvm-svn: 182254
2013-05-20 12:13:32 +00:00
Justin Holewinski 700b6fa934 [NVPTX] Fix i1 kernel parameters and global variables. ABI rules say we need to use .u8 for i1 parameters for kernels.
llvm-svn: 182253
2013-05-20 12:13:28 +00:00
Justin Holewinski 59fd8ba5f5 [NVPTX] Remove support for SM < 2.0. This was never fully supported anyway.
llvm-svn: 178417
2013-03-30 14:29:30 +00:00
Justin Holewinski b94bd05b95 [NVPTX] Add NVVMReflect pass to allow compile-time selection of
specific code paths.

This allows us to write code like:

  if (__nvvm_reflect("FOO"))
    // Do something
  else
    // Do something else

and compile into a library, then give "FOO" a value at kernel
compile-time so the check becomes a no-op.

llvm-svn: 178416
2013-03-30 14:29:25 +00:00
Justin Holewinski e988409423 [NVPTX] Fix handling of vector arguments
llvm-svn: 177847
2013-03-24 21:17:47 +00:00
Justin Holewinski d068943809 Propagate DAG node ordering during type legalization and instruction selection
A node's ordering is only propagated during legalization if (a) the new node does
not have an ordering (is not a CSE'd node), or (b) the new node has an ordering
that is higher than the node being legalized.

llvm-svn: 177465
2013-03-20 00:10:32 +00:00
Justin Holewinski be8dc6499a [NVPTX] Disable vector registers
Vectors were being manually scalarized by the backend.  Instead,
let the target-independent code do all of the work.  The manual
scalarization was from a time before good target-independent support
for scalarization in LLVM. However, this forces us to specially-handle
vector loads and stores, which we can turn into PTX instructions that
produce/consume multiple operands.

llvm-svn: 174968
2013-02-12 14:18:49 +00:00
Justin Holewinski 4f12f53353 [NVPTX] Remove NoCapture from address space conversion intrinsics. NoCapture is not valid in this case, and was causing incorrect optimizations.
llvm-svn: 174896
2013-02-11 18:56:35 +00:00
Justin Holewinski fb711156ae [NVPTX] Fix crash with unnamed struct arguments
Patch by Eric Holk

llvm-svn: 169418
2012-12-05 20:50:28 +00:00
Justin Holewinski 0ac49bf846 Teach the legalizer how to handle operands for VSELECT nodes
If we need to split the operand of a VSELECT, it must be the mask operand. We
split the entire VSELECT operand with EXTRACT_SUBVECTOR.

llvm-svn: 168883
2012-11-29 14:26:28 +00:00
Justin Holewinski bc45119b44 Allow targets to prefer TypeSplitVector over TypePromoteInteger when computing the legalization method for vectors
For some targets, it is desirable to prefer scalarizing <N x i1> instead of promoting to a larger legal type, such as <N x i32>.

llvm-svn: 168882
2012-11-29 14:26:24 +00:00
Justin Holewinski 2c5ac70dd9 [NVPTX] Order global variables in def-use order before emiting them in the final assembly
llvm-svn: 168198
2012-11-16 21:03:51 +00:00
Justin Holewinski c6462aacd5 [NVPTX] Implement custom lowering of loads/stores for i1
Loads from i1 become loads from i8 followed by trunc
Stores to i1 become zext to i8 followed by store to i8

Fixes PR13291

llvm-svn: 167948
2012-11-14 19:19:16 +00:00
Justin Holewinski 1812ee9a5b [NVPTX] Add more precise PTX/SM target attributes
Each SM and PTX version is modeled as a subtarget feature/CPU. Additionally,
PTX 3.1 is added as the default PTX version to be out-of-the-box compatible
with CUDA 5.0.

Available CPUs for this target:

  sm_10 - Select the sm_10 processor.
  sm_11 - Select the sm_11 processor.
  sm_12 - Select the sm_12 processor.
  sm_13 - Select the sm_13 processor.
  sm_20 - Select the sm_20 processor.
  sm_21 - Select the sm_21 processor.
  sm_30 - Select the sm_30 processor.
  sm_35 - Select the sm_35 processor.

Available features for this target:

  ptx30 - Use PTX version 3.0.
  ptx31 - Use PTX version 3.1.
  sm_10 - Target SM 1.0.
  sm_11 - Target SM 1.1.
  sm_12 - Target SM 1.2.
  sm_13 - Target SM 1.3.
  sm_20 - Target SM 2.0.
  sm_21 - Target SM 2.1.
  sm_30 - Target SM 3.0.
  sm_35 - Target SM 3.5.

llvm-svn: 167699
2012-11-12 03:16:43 +00:00
Justin Holewinski 2dc9d072e5 [NVPTX] Use ABI alignment for parameters when alignment is not specified.
Affects SM 2.0+.  Fixes bug 13324.

llvm-svn: 167646
2012-11-09 23:50:24 +00:00
Peter Collingbourne 913869be45 Add llvm.fabs intrinsic.
llvm-svn: 157594
2012-05-28 21:48:37 +00:00
Justin Holewinski c98041d4d9 [NVPTX] Add a new test case for the newly-enabled call handling
NV_CONTRIB

llvm-svn: 157485
2012-05-25 17:20:38 +00:00
Justin Holewinski ae556d3ef7 This patch adds a new NVPTX back-end to LLVM which supports code generation for NVIDIA PTX 3.0. This back-end will (eventually) replace the current PTX back-end, while maintaining compatibility with it.
The new target machines are:

nvptx (old ptx32) => 32-bit PTX
nvptx64 (old ptx64) => 64-bit PTX

The sources are based on the internal NVIDIA NVPTX back-end, and
contain more functionality than the current PTX back-end currently
provides.

NV_CONTRIB

llvm-svn: 156196
2012-05-04 20:18:50 +00:00