Commit Graph

286 Commits

Author SHA1 Message Date
Justin Holewinski 2cb5e181d1 [NVPTX] Silence a GCC warning found by the buildbots
The cast to NVPTXTargetLowering was missing a 'const', but let's
just access the right pointer through the subtarget anyway.

llvm-svn: 213793
2014-07-23 20:23:47 +00:00
Justin Holewinski ecca715b3c [NVPTX] mul.wide generation works for any smaller integer source types, not just the next smaller power of two
llvm-svn: 213784
2014-07-23 18:46:03 +00:00
Justin Holewinski 511664dc76 [NVPTX] Make sure we do not generate MULWIDE ISD nodes when optimizations are disabled
With optimizations disabled, we disable the isel patterns for mul.wide; but we
were still generating MULWIDE ISD nodes.  Now, we only try to generate MULWIDE
ISD nodes in DAGCombine if the optimization level is not zero.

llvm-svn: 213773
2014-07-23 17:40:45 +00:00
Tim Northover 9e108a0e3a NVPTX: support fpext/fptrunc to and from f16.
llvm-svn: 213377
2014-07-18 13:01:43 +00:00
Tim Northover 5e54fe14a4 NVPTX: support direct f16 <-> f64 conversions via intrinsics.
Clang may well start emitting these soon, and while it may not be
directly relevant for OpenCL or GLSL, the instructions were just
sitting there waiting to be used.

llvm-svn: 213356
2014-07-18 08:30:10 +00:00
Justin Holewinski 428cf0e49a [NVPTX] Improve handling of FP fusion
We now consider the FPOpFusion flag when determining whether
to fuse ops.  We also explicitly emit add.rn when fusion is
disabled to prevent ptxas from fusing the operations on its
own.

llvm-svn: 213287
2014-07-17 18:10:09 +00:00
Justin Holewinski e5a1173f67 [NVPTX] Add missing .v4 qualifier on vector store instruction
llvm-svn: 213276
2014-07-17 16:58:56 +00:00
Justin Holewinski 18cfe7d634 [NVPTX] Flag surface/texture query instructions with IsTexSurfQuery
Also, add some tests to make sure we can handle surface/texture
queries on both Fermi and Kepler+.

llvm-svn: 213268
2014-07-17 14:51:33 +00:00
Justin Holewinski 9a2350e459 [NVPTX] Add more surface/texture intrinsics, including CUDA unified texture fetch
This also uses TSFlags to mark machine instructions that are surface/texture
accesses, as well as the vector width for surface operations.  This is used
to simplify some of the switch statements that need to detect surface/texture
instructions

llvm-svn: 213256
2014-07-17 11:59:04 +00:00
Tim Northover fd7e424935 CodeGen: extend f16 conversions to permit types > float.
This makes the two intrinsics @llvm.convert.from.f16 and
@llvm.convert.to.f16 accept types other than simple "float". This is
only strictly needed for the truncate operation, since otherwise
double rounding occurs and there's no way to represent the strict IEEE
conversion. However, for symmetry we allow larger types in the extend
too.

During legalization, we can expand an "fp16_to_double" operation into
two extends for convenience, but abort when the truncate isn't legal. A new
libcall is probably needed here.

Even after this commit, various target tweaks are needed to actually use the
extended intrinsics. I've put these into separate commits for clarity, so there
are no actual tests of f64 conversion here.

llvm-svn: 213248
2014-07-17 10:51:23 +00:00
Justin Holewinski ac451066f4 [NVPTX] Honor alignment on vector loads/stores
We were not considering the stated alignment on vector loads/stores,
leading us to generate vector instructions even when we do not have
sufficient alignment.

Now, for IR like:

  %1 = load <4 x float>, <4 x float>* %ptr, align 4

we will generate correct, conservative PTX like:

  ld.f32 ... [%ptr]
  ld.f32 ... [%ptr+4]
  ld.f32 ... [%ptr+8]
  ld.f32 ... [%ptr+12]

Or if we have an alignment of 8 (for example), we can
generate code like:

  ld.v2.f32 ... [%ptr]
  ld.v2.f32 ... [%ptr+8]

llvm-svn: 213186
2014-07-16 19:45:35 +00:00
Justin Holewinski 3e037d98e6 [NVPTX] Rename registers %fl -> %fd and %rl -> %rd
This matches the internal behavior of NVIDIA tools like libnvvm.

llvm-svn: 213168
2014-07-16 16:26:58 +00:00
David Majnemer 8bce66b093 CodeGen: Stick constant pool entries in COMDAT sections for WinCOFF
COFF lacks a feature that other object file formats support: mergeable
sections.

To work around this, MSVC sticks constant pool entries in special COMDAT
sections so that each constant is in it's own section.  This permits
unused constants to be dropped and it also allows duplicate constants in
different translation units to get merged together.

This fixes PR20262.

Differential Revision: http://reviews.llvm.org/D4482

llvm-svn: 213006
2014-07-14 22:57:27 +00:00
NAKAMURA Takumi c3b3897e8a NVPTX/LLVMBuild.txt: Add "Scalar" to required_libraries. It is really referenced.
llvm-svn: 212918
2014-07-14 02:52:19 +00:00
Chandler Carruth 9d010fffe1 [codegen,aarch64] Add a target hook to the code generator to control
vector type legalization strategies in a more fine grained manner, and
change the legalization of several v1iN types and v1f32 to be widening
rather than scalarization on AArch64.

This fixes an assertion failure caused by scalarizing nodes like "v1i32
trunc v1i64". As v1i64 is legal it will fail to scalarize v1i32.

This also provides a foundation for other targets to have more granular
control over how vector types are legalized.

Patch by Hao Liu, reviewed by Tim Northover. I'm committing it to allow
some work to start taking place on top of this patch as it adds some
really important hooks to the backend that I'd like to immediately start
using. =]

http://reviews.llvm.org/D4322

llvm-svn: 212242
2014-07-03 00:23:43 +00:00
Justin Holewinski 9982f06aa3 [NVPTX] Use GreatestCommonDivisor64 from MathExtras instead of using our own. Thanks Hal!
llvm-svn: 211952
2014-06-27 19:36:25 +00:00
Justin Holewinski a0d531f031 [NVPTX] Add reflect intrinsic (better than matching by function name)
Also clean up some of the logic in NVVMReflect.cpp while we're messing around in there.

llvm-svn: 211948
2014-06-27 18:36:11 +00:00
Justin Holewinski a8071856a6 [NVPTX] Handle all possible vector types in getSetCCResultType, not just the ones representable as MVTs
llvm-svn: 211947
2014-06-27 18:36:08 +00:00
Justin Holewinski 2739c0175c [NVPTX] Add 'b' asm constraint
llvm-svn: 211946
2014-06-27 18:36:06 +00:00
Justin Holewinski b5db95e465 [NVPTX] Simplify some argument lowering logic
llvm-svn: 211945
2014-06-27 18:36:04 +00:00
Justin Holewinski e519a4301b [NVPTX] Do not process samplers in GenericToNVVM
llvm-svn: 211944
2014-06-27 18:36:02 +00:00
Justin Holewinski 549c773619 [NVPTX] Error out if initializer is given for variable in an address space that does not support initialization
llvm-svn: 211943
2014-06-27 18:36:01 +00:00
Justin Holewinski 773ca40f5d [NVPTX] Add support for .managed variables for UVM
llvm-svn: 211942
2014-06-27 18:35:58 +00:00
Justin Holewinski d73767a80a [NVPTX] Emit .weak linkage for link_once, weak, available_externally, and common linkage
llvm-svn: 211941
2014-06-27 18:35:56 +00:00
Justin Holewinski 73cb5de546 [NVPTX] Variables that start with llvm. or nvvm. are reserved and should not be emitted
llvm-svn: 211940
2014-06-27 18:35:53 +00:00
Justin Holewinski b926d9d446 [NVPTX] Fix handling of ldg/ldu intrinsics.
The address space of the pointer must be global (1) for these intrinsics.  There must also be alignment metadata attached to the intrinsic calls, e.g.

%val = tail call i32 @llvm.nvvm.ldu.i.global.i32.p1i32(i32 addrspace(1)* %ptr), !align !0

!0 = metadata !{i32 4}

llvm-svn: 211939
2014-06-27 18:35:51 +00:00
Justin Holewinski 6e40f63e41 [NVPTX] Clean up argument lowering code and properly handle alignment for structs and vectors
llvm-svn: 211938
2014-06-27 18:35:44 +00:00
Justin Holewinski d7d8fe0e9c [NVPTX] Add missing boolean vector contents flag
llvm-svn: 211937
2014-06-27 18:35:42 +00:00
Justin Holewinski 360a5cfcd3 [NVPTX] Add support for [SHL,SRA,SRL]_PARTS
llvm-svn: 211936
2014-06-27 18:35:40 +00:00
Justin Holewinski eafe26d082 [NVPTX] Implement fma and imad contraction as target DAGCombiner patterns
This also introduces DAGCombiner patterns for mul.wide to multiply two smaller integers and produce a larger integer

llvm-svn: 211935
2014-06-27 18:35:37 +00:00
Justin Holewinski 832e09b4d9 [NVPTX] Add support for efficient rotate instructions on SM 3.2+
llvm-svn: 211934
2014-06-27 18:35:33 +00:00
Justin Holewinski 7be57de6b8 [NVPTX] Add missing isel patterns for 64-bit atomics
llvm-svn: 211933
2014-06-27 18:35:30 +00:00
Justin Holewinski ca7a4f136d [NVPTX] Add isel patterns for bit-field extract (bfe)
llvm-svn: 211932
2014-06-27 18:35:27 +00:00
Justin Holewinski 10c25968d8 [NVPTX] Add support for isspacep instruction
llvm-svn: 211931
2014-06-27 18:35:24 +00:00
Justin Holewinski 124fc1951f [NVPTX] Add support for envreg reads
llvm-svn: 211930
2014-06-27 18:35:21 +00:00
Justin Holewinski 602fa5b5d1 [NVPTX] Add target options for PTX 3.2/4.0 and SM 5.0 (Maxwell)
Default PTX version is set to PTX 3.2

llvm-svn: 211929
2014-06-27 18:35:18 +00:00
Justin Holewinski c3f31ebe6e [NVPTX] Update sub-target feature detection
llvm-svn: 211928
2014-06-27 18:35:16 +00:00
Justin Holewinski 6dca83987c [NVPTX] Directly control the Machine SSA passes that are invoked for NVPTX.
NVPTX is a bit special in the optimizations it requires, so this gives
us better control over the backend optimization pipeline.

llvm-svn: 211927
2014-06-27 18:35:14 +00:00
Justin Holewinski 7d5bf66f61 [NVPTX] Emit .weak when linkage is not external, internal, or private
llvm-svn: 211926
2014-06-27 18:35:10 +00:00
Justin Holewinski 0da758571c [NVPTX] Just use getTypeAllocSize() when computing return value size for structures and vectors
llvm-svn: 211925
2014-06-27 18:35:08 +00:00
Eric Christopher 493f91b6de Move NVPTX subtarget dependent variables from the target machine
to the subtarget.

llvm-svn: 211860
2014-06-27 04:33:14 +00:00
Eric Christopher 2ecb77e31f Use the target lowering we can get off of the DAG rather than off
of the cached target machine.

llvm-svn: 211858
2014-06-27 03:45:49 +00:00
Eric Christopher dd440f8727 Move the constructor for NVPTXFrameLowering into the implementation
file in preparation for the subtarget move.

llvm-svn: 211847
2014-06-27 02:05:24 +00:00
Eric Christopher f0dad2670d Remove unnecessary caching of the TargetMachine on NVPTXFrameLowering.
Adjust the constructor accordingly.

llvm-svn: 211846
2014-06-27 02:05:22 +00:00
Eric Christopher d8132862a9 Rework the logic for setting the TargetName. This appears to
be shorter and identical in goal.

llvm-svn: 211845
2014-06-27 02:05:19 +00:00
Eric Christopher e032a8c0bd Remove caching of the target machine in NVPTXInstrInfo and
update constructor accordingly.

llvm-svn: 211840
2014-06-27 01:27:08 +00:00
Eric Christopher a1869461e1 Remove comment that duplicated information in the constructor
that it's after.

llvm-svn: 211839
2014-06-27 01:27:06 +00:00
Eric Christopher e8f50281a9 Remove commented out code.
llvm-svn: 211838
2014-06-27 01:27:05 +00:00
Eric Christopher c22ad16bc6 Remove extraneous parens and extraneous const cast (and fix the
prototype for the function to patch what we were returning).

llvm-svn: 211837
2014-06-27 01:27:03 +00:00
Alp Toker e69170a110 Revert "Introduce a string_ostream string builder facilty"
Temporarily back out commits r211749, r211752 and r211754.

llvm-svn: 211814
2014-06-26 22:52:05 +00:00