Summary:
".weak" symbols cannot be consumed by ptxas (PR21685). This patch makes the
weak directive in MCAsmPrinter customizable, and disables emitting ".weak"
symbols for NVPTX.
Test Plan: weak-linkage.ll
Reviewers: jholewinski
Reviewed By: jholewinski
Subscribers: majnemer, jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D6455
llvm-svn: 223077
This works around the limitation that PTX does not allow .param space
loads/stores with arbitrary pointers.
If a function has a by-val struct ptr arg, say foo(%struct.x *byval %d), then
add the following instructions to the first basic block :
%temp = alloca %struct.x, align 8
%tt1 = bitcast %struct.x * %d to i8 *
%tt2 = llvm.nvvm.cvt.gen.to.param %tt2
%tempd = bitcast i8 addrspace(101) * to %struct.x addrspace(101) *
%tv = load %struct.x addrspace(101) * %tempd
store %struct.x %tv, %struct.x * %temp, align 8
The above code allocates some space in the stack and copies the incoming
struct from param space to local space. Then replace all occurences of %d
by %temp.
Fixes PR21465.
llvm-svn: 221377
Summary:
Fixes PR21100 which is caused by inconsistency between the declared return type
and the expected return type at the call site. The new behavior is consistent
with nvcc and the NVPTXTargetLowering::getPrototype function.
Test Plan: test/Codegen/NVPTX/vector-return.ll
Reviewers: jholewinski
Reviewed By: jholewinski
Subscribers: llvm-commits, meheff, eliben, jholewinski
Differential Revision: http://reviews.llvm.org/D5612
llvm-svn: 220607
Summary:
Fixes a FIXME in MachineSinking. Instead of using the simple heuristics in
isPostDominatedBy, use the real MachinePostDominatorTree and MachineLoopInfo.
The old heuristics caused instructions to sink unnecessarily, and might create
register pressure.
This is the second try of the fix. The first one (D4814) caused a performance
regression due to failing to sink instructions out of loops (PR21115). This
patch fixes PR21115 by sinking an instruction from a deeper loop to a shallower
one regardless of whether the target block post-dominates the source.
Thanks Alexey Volkov for reporting PR21115!
Test Plan:
Added a NVPTX codegen test to verify that our change prevents the backend from
over-sinking. It also shows the unnecessary register pressure caused by
over-sinking.
Added an X86 test to verify we can sink instructions out of loops regardless of
the dominance relationship. This test is reduced from Alexey's test in PR21115.
Updated an affected test in X86.
Also ran SPEC CINT2006 and llvm-test-suite for compilation time and runtime
performance. Results are attached separately in the review thread.
Reviewers: Jiangning, resistor, hfinkel
Reviewed By: hfinkel
Subscribers: hfinkel, bruno, volkalexey, llvm-commits, meheff, eliben, jholewinski
Differential Revision: http://reviews.llvm.org/D5633
llvm-svn: 219773
Summary:
Fixes a FIXME in MachineSinking. Instead of using the simple heuristics
in isPostDominatedBy, use the real MachinePostDominatorTree. The old
heuristics caused instructions to sink unnecessarily, and might create
register pressure.
Test Plan:
Added a NVPTX codegen test to verify that our change is in effect. It also
shows the unnecessary register pressure caused by over-sinking. Updated
affected tests in AArch64 and X86.
Reviewers: eliben, meheff, Jiangning
Reviewed By: Jiangning
Subscribers: jholewinski, aemerson, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D4814
llvm-svn: 216862
Summary:
Instead of specifying the alignment as metadata which may be destroyed by
transformation passes, make the alignment the second argument to ldu/ldg
intrinsic calls.
Test Plan:
ldu-ldg.ll
ldu-i8.ll
ldu-reg-plus-offset.ll
Reviewers: eliben, meheff, jholewinski
Reviewed By: meheff, jholewinski
Subscribers: jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D5093
llvm-svn: 216731
With optimizations disabled, we disable the isel patterns for mul.wide; but we
were still generating MULWIDE ISD nodes. Now, we only try to generate MULWIDE
ISD nodes in DAGCombine if the optimization level is not zero.
llvm-svn: 213773
Actual support for softening f16 operations is still limited, and can be added
when it's needed. But Soften is much closer to being a useful thing to try
than keeping it Legal when no registers can actually hold such values.
Longer term, we probably want something between Soften and Promote semantics
for most targets, it'll be more efficient to promote the 4 basic operations to
f32 than libcall them.
llvm-svn: 213372
Clang may well start emitting these soon, and while it may not be
directly relevant for OpenCL or GLSL, the instructions were just
sitting there waiting to be used.
llvm-svn: 213356
We now consider the FPOpFusion flag when determining whether
to fuse ops. We also explicitly emit add.rn when fusion is
disabled to prevent ptxas from fusing the operations on its
own.
llvm-svn: 213287
This also uses TSFlags to mark machine instructions that are surface/texture
accesses, as well as the vector width for surface operations. This is used
to simplify some of the switch statements that need to detect surface/texture
instructions
llvm-svn: 213256
We were not considering the stated alignment on vector loads/stores,
leading us to generate vector instructions even when we do not have
sufficient alignment.
Now, for IR like:
%1 = load <4 x float>, <4 x float>* %ptr, align 4
we will generate correct, conservative PTX like:
ld.f32 ... [%ptr]
ld.f32 ... [%ptr+4]
ld.f32 ... [%ptr+8]
ld.f32 ... [%ptr+12]
Or if we have an alignment of 8 (for example), we can
generate code like:
ld.v2.f32 ... [%ptr]
ld.v2.f32 ... [%ptr+8]
llvm-svn: 213186
The address space of the pointer must be global (1) for these intrinsics. There must also be alignment metadata attached to the intrinsic calls, e.g.
%val = tail call i32 @llvm.nvvm.ldu.i.global.i32.p1i32(i32 addrspace(1)* %ptr), !align !0!0 = metadata !{i32 4}
llvm-svn: 211939
As a follow-up to r210375 which canonicalizes addrspacecast
instructions, this patch canonicalizes addrspacecast constant
expressions.
Given clang uses ConstantExpr::getAddrSpaceCast to emit addrspacecast
cosntant expressions, this patch is also a step towards having the
frontend emit canonicalized addrspacecasts.
Piggyback a minor refactor in InstCombineCasts.cpp
Update three affected tests in addrspacecast-alias.ll,
access-non-generic.ll and constant-fold-gep.ll and added one new test in
constant-fold-address-space-pointer.ll
llvm-svn: 211004
Alias with unnamed_addr were in a strange state. It is stored in GlobalValue,
the language reference talks about "unnamed_addr aliases" but the verifier
was rejecting them.
It seems natural to allow unnamed_addr in aliases:
* It is a property of how it is accessed, not of the data itself.
* It is perfectly possible to write code that depends on the address
of an alias.
This patch then makes unname_addr legal for aliases. One side effect is that
the syntax changes for a corner case: In globals, unnamed_addr is now printed
before the address space.
llvm-svn: 210302
This commit adds intrinsics and codegen support for the surface read/write and texture read instructions that take an explicit sampler parameter. Codegen operates on image handles at the PTX level, but falls back to direct replacement of handles with kernel arguments if image handles are not enabled. Note that image handles are explicitly disabled for all target architectures in this change (to be enabled later).
llvm-svn: 205907
Removes unnecessary casts from non-generic address spaces to the generic address
space for certain code patterns.
Patch by Jingyue Wu.
llvm-svn: 205571
This is a more thorough fix for the issue than r203483. An IR pass will run
before NVPTX codegen to make sure there are no invalid symbol names that can't
be consumed by the ptxas assembler.
llvm-svn: 205212
The "noduplicate" function attribute exists to prevent certain optimizations
from duplicating calls to the function. This is important on platforms where
certain function call duplications are unsafe (for example execution barriers
for CUDA and OpenCL).
This patch makes it possible to specify intrinsics as "noduplicate" and
translates that to the appropriate function attribute.
llvm-svn: 204200
These linkages were introduced some time ago, but it was never very
clear what exactly their semantics were or what they should be used
for. Some investigation found these uses:
* utf-16 strings in clang.
* non-unnamed_addr strings produced by the sanitizers.
It turns out they were just working around a more fundamental problem.
For some sections a MachO linker needs a symbol in order to split the
section into atoms, and llvm had no idea that was the case. I fixed
that in r201700 and it is now safe to use the private linkage. When
the object ends up in a section that requires symbols, llvm will use a
'l' prefix instead of a 'L' prefix and things just work.
With that, these linkages were already dead, but there was a potential
future user in the objc metadata information. I am still looking at
CGObjcMac.cpp, but at this point I am convinced that linker_private
and linker_private_weak are not what they need.
The objc uses are currently split in
* Regular symbols (no '\01' prefix). LLVM already directly provides
whatever semantics they need.
* Uses of a private name (start with "\01L" or "\01l") and private
linkage. We can drop the "\01L" and "\01l" prefixes as soon as llvm
agrees with clang on L being ok or not for a given section. I have two
patches in code review for this.
* Uses of private name and weak linkage.
The last case is the one that one could think would fit one of these
linkages. That is not the case. The semantics are
* the linker will merge these symbol by *name*.
* the linker will hide them in the final DSO.
Given that the merging is done by name, any of the private (or
internal) linkages would be a bad match. They allow llvm to rename the
symbols, and that is really not what we want. From the llvm point of
view, these objects should really be (linkonce|weak)(_odr)?.
For now, just keeping the "\01l" prefix is probably the best for these
symbols. If we one day want to have a more direct support in llvm,
IMHO what we should add is not a linkage, it is just a hidden_symbol
attribute. It would be applicable to multiple linkages. For example,
on weak it would produce the current behavior we have for objc
metadata. On internal, it would be equivalent to private (and we
should then remove private).
llvm-svn: 203866
For NVPTX, this fixes a crash where the emitImplicitDef implementation was expecting physical registers,
while NVPTX uses virtual registers (with a couple of exceptions). Now, the implicit def comment will be
emitted as a true PTX register name. Other targets can use this to customize the output of implicit def
comments.
Fixes PR17519
llvm-svn: 192444
- Instead of setting the suffixes in a bunch of places, just set one master
list in the top-level config. We now only modify the suffix list in a few
suites that have one particular unique suffix (.ml, .mc, .yaml, .td, .py).
- Aside from removing the need for a bunch of lit.local.cfg files, this enables
4 tests that were inadvertently being skipped (one in
Transforms/BranchFolding, a .s file each in DebugInfo/AArch64 and
CodeGen/PowerPC, and one in CodeGen/SI which is now failing and has been
XFAILED).
- This commit also fixes a bunch of config files to use config.root instead of
older copy-pasted code.
llvm-svn: 188513
This change converts the NVPTX target to use the MC infrastructure
instead of directly emitting MachineInstr instances. This brings
the target more up-to-date with LLVM TOT, and should fix PR15175
and PR15958 (libNVPTXInstPrinter is empty) as a side-effect.
llvm-svn: 187798
CustomLowerNode was not being called during SplitVectorOperand,
meaning custom legalization could not be used by targets.
This also adds a test case for NVPTX that depends on this custom
legalization.
Differential Revision: http://llvm-reviews.chandlerc.com/D1195
Attempt to fix the buildbots by making the X86 test I just added platform independent
llvm-svn: 187202
This reverts commit 187198. It broke the bots.
The soft float test probably needs a -triple because of name differences.
On the hard float test I am getting a "roundss $1, %xmm0, %xmm0", instead of
"vroundss $1, %xmm0, %xmm0, %xmm0".
llvm-svn: 187201
CustomLowerNode was not being called during SplitVectorOperand,
meaning custom legalization could not be used by targets.
This also adds a test case for NVPTX that depends on this custom
legalization.
Differential Revision: http://llvm-reviews.chandlerc.com/D1195
llvm-svn: 187198
This was done with the following sed invocation to catch label lines demarking function boundaries:
sed -i '' "s/^;\( *\)\([A-Z0-9_]*\):\( *\)test\([A-Za-z0-9_-]*\):\( *\)$/;\1\2-LABEL:\3test\4:\5/g" test/CodeGen/*/*.ll
which was written conservatively to avoid false positives rather than false negatives. I scanned through all the changes and everything looks correct.
llvm-svn: 186258
Fix a case where we were incorrectly sign-extending a value when we should have been zero-extending the value.
Also change some SIGN_EXTEND to ANY_EXTEND because we really dont care and may have more opportunity to fold subexpressions
llvm-svn: 185331
Fix ABI handling for function
returning bool -- use st.param.b32 to return the value
and use ld.param.b32 in caller to load the return value.
llvm-svn: 185177
Now that 3.3 is branched, we are re-enabling virtual registers to help
iron out bugs before the next release. Some of the post-RA passes do
not play well with virtual registers, so we disable them for now. The
needed functionality of the PrologEpilogInserter pass is copied to a
new backend-specific NVPTXPrologEpilog pass.
The test for this commit is not breaking the existing tests.
llvm-svn: 182998
This converter currently only handles global variables in address space 0. For
these variables, they are promoted to address space 1 (global memory), and all
uses are updated to point to the result of a cvta.global instruction on the new
variable.
The motivation for this is address space 0 global variables are illegal since we
cannot declare variables in the generic address space. Instead, we place the
variables in address space 1 and explicitly convert the pointer to address
space 0. This is primarily intended to help new users who expect to be able to
place global variables in the default address space.
llvm-svn: 182254
specific code paths.
This allows us to write code like:
if (__nvvm_reflect("FOO"))
// Do something
else
// Do something else
and compile into a library, then give "FOO" a value at kernel
compile-time so the check becomes a no-op.
llvm-svn: 178416
A node's ordering is only propagated during legalization if (a) the new node does
not have an ordering (is not a CSE'd node), or (b) the new node has an ordering
that is higher than the node being legalized.
llvm-svn: 177465
Vectors were being manually scalarized by the backend. Instead,
let the target-independent code do all of the work. The manual
scalarization was from a time before good target-independent support
for scalarization in LLVM. However, this forces us to specially-handle
vector loads and stores, which we can turn into PTX instructions that
produce/consume multiple operands.
llvm-svn: 174968
If we need to split the operand of a VSELECT, it must be the mask operand. We
split the entire VSELECT operand with EXTRACT_SUBVECTOR.
llvm-svn: 168883
Each SM and PTX version is modeled as a subtarget feature/CPU. Additionally,
PTX 3.1 is added as the default PTX version to be out-of-the-box compatible
with CUDA 5.0.
Available CPUs for this target:
sm_10 - Select the sm_10 processor.
sm_11 - Select the sm_11 processor.
sm_12 - Select the sm_12 processor.
sm_13 - Select the sm_13 processor.
sm_20 - Select the sm_20 processor.
sm_21 - Select the sm_21 processor.
sm_30 - Select the sm_30 processor.
sm_35 - Select the sm_35 processor.
Available features for this target:
ptx30 - Use PTX version 3.0.
ptx31 - Use PTX version 3.1.
sm_10 - Target SM 1.0.
sm_11 - Target SM 1.1.
sm_12 - Target SM 1.2.
sm_13 - Target SM 1.3.
sm_20 - Target SM 2.0.
sm_21 - Target SM 2.1.
sm_30 - Target SM 3.0.
sm_35 - Target SM 3.5.
llvm-svn: 167699
The new target machines are:
nvptx (old ptx32) => 32-bit PTX
nvptx64 (old ptx64) => 64-bit PTX
The sources are based on the internal NVIDIA NVPTX back-end, and
contain more functionality than the current PTX back-end currently
provides.
NV_CONTRIB
llvm-svn: 156196