Using arguments with attribute inalloca creates problems for verification
of machine representation. This attribute instructs the backend that the
argument is prepared in stack prior to CALLSEQ_START..CALLSEQ_END
sequence (see http://llvm.org/docs/InAlloca.htm for details). Frame size
stored in CALLSEQ_START in this case does not count the size of this
argument. However CALLSEQ_END still keeps total frame size, as caller can
be responsible for cleanup of entire frame. So CALLSEQ_START and
CALLSEQ_END keep different frame size and the difference is treated by
MachineVerifier as stack error. Currently there is no way to distinguish
this case from actual errors.
This patch adds additional argument to CALLSEQ_START and its
target-specific counterparts to keep size of stack that is set up prior to
the call frame sequence. This argument allows MachineVerifier to calculate
actual frame size associated with frame setup instruction and correctly
process the case of inalloca arguments.
The changes made by the patch are:
- Frame setup instructions get the second mandatory argument. It
affects all targets that use frame pseudo instructions and touched many
files although the changes are uniform.
- Access to frame properties are implemented using special instructions
rather than calls getOperand(N).getImm(). For X86 and ARM such
replacement was made previously.
- Changes that reflect appearance of additional argument of frame setup
instruction. These involve proper instruction initialization and
methods that access instruction arguments.
- MachineVerifier retrieves frame size using method, which reports sum of
frame parts initialized inside frame instruction pair and outside it.
The patch implements approach proposed by Quentin Colombet in
https://bugs.llvm.org/show_bug.cgi?id=27481#c1.
It fixes 9 tests failed with machine verifier enabled and listed
in PR27481.
Differential Revision: https://reviews.llvm.org/D32394
llvm-svn: 302527
This patch enables support for .f16x2 operations.
Added new register type Float16x2.
Added support for .f16x2 instructions.
Added handling of vectorized loads/stores of v2f16 values.
Differential Revision: https://reviews.llvm.org/D30057
Differential Revision: https://reviews.llvm.org/D30310
llvm-svn: 296032
Summary:
This lets us lower to sqrt.approx and rsqrt.approx under more
circumstances.
* Now we emit sqrt.approx and rsqrt.approx for calls to @llvm.sqrt.f32,
when fast-math is enabled. Previously, we only would emit it for
calls to @llvm.nvvm.sqrt.f. (With this patch we no longer emit
sqrt.approx for calls to @llvm.nvvm.sqrt.f; we rely on intcombine to
simplify llvm.nvvm.sqrt.f into llvm.sqrt.f32.)
* Now we emit the ftz version of rsqrt.approx when ftz is enabled.
Previously, we only emitted rsqrt.approx when ftz was disabled.
Reviewers: hfinkel
Subscribers: llvm-commits, tra, jholewinski
Differential Revision: https://reviews.llvm.org/D28508
llvm-svn: 293605
Summary:
This change also lets us use max.{s,u}16. There's a vague warning in a
test about this maybe being less efficient, but I could not come up with
a case where the resulting SASS (sm_35 or sm_60) was different with or
without max.{s,u}16. It's true that nvcc seems to emit only
max.{s,u}32, but even ptxas 7.0 seems to have no problem generating
efficient SASS from max.{s,u}16 (the casts up to i32 and back down to
i16 seem to be implicit and nops, happening via register aliasing).
In the absence of evidence, better to have fewer special cases, emit
more straightforward code, etc. In particular, if a new GPU has 16-bit
min/max instructions, we want to be able to use them.
Reviewers: tra
Subscribers: jholewinski, llvm-commits
Differential Revision: https://reviews.llvm.org/D28732
llvm-svn: 292304
Summary: Previously we lowered it literally, to shifts and xors.
Reviewers: tra
Subscribers: jholewinski, llvm-commits
Differential Revision: https://reviews.llvm.org/D28722
llvm-svn: 292303
Summary:
Avoid an unnecessary conversion operation when using the result of
ctpop.i32 or ctpop.i16 as an i32, as in both cases the ptx instruction
we run returns an i32.
(Previously if we used the value as an i32, we'd do an unnecessary
zext+trunc.)
Reviewers: tra
Subscribers: jholewinski, llvm-commits
Differential Revision: https://reviews.llvm.org/D28721
llvm-svn: 292302
Summary:
* Disable "ctlz speculation", which inserts a branch on every ctlz(x) which
has defined behavior on x == 0 to check whether x is, in fact zero.
* Add DAG patterns that avoid re-truncating or re-expanding the result
of the 16- and 64-bit ctz instructions.
Reviewers: tra
Subscribers: llvm-commits, jholewinski
Differential Revision: https://reviews.llvm.org/D28719
llvm-svn: 292299
Only scalar half-precision operations are supported at the moment.
- Adds general support for 'half' type in NVPTX.
- fp16 math operations are supported on sm_53+ GPUs only
(can be disabled with --nvptx-no-f16-math).
- Type conversions to/from fp16 are supported on all GPU variants.
- On GPU variants that do not have full fp16 support (or if it's disabled),
fp16 operations are promoted to fp32 and results are converted back
to fp16 for storage.
Differential Revision: https://reviews.llvm.org/D28540
llvm-svn: 291956
Previously we'd always lower @llvm.{sin,cos}.f32 to {sin.cos}.approx.f32
instruction even when unsafe FP math was not allowed.
Clang-generated IR is not affected by this as it uses precise sin/cos
from CUDA's libdevice when unsafe math is disabled.
Differential Revision: https://reviews.llvm.org/D28619
llvm-svn: 291936
At least the plugin used by the LibreOffice build
(<https://wiki.documentfoundation.org/Development/Clang_plugins>) indirectly
uses those members (through inline functions in LLVM/Clang include files in turn
using them), but they are not exported by utils/extract_symbols.py on Windows,
and accessing data across DLL/EXE boundaries on Windows is generally
problematic.
Differential Revision: https://reviews.llvm.org/D26671
llvm-svn: 289647
Summary:
Previously these only worked via NVPTX-specific intrinsics.
This change will allow us to convert these target-specific intrinsics
into the general LLVM versions, allowing existing LLVM passes to reason
about their behavior.
It also gets us some minor codegen improvements as-is, from situations
where we canonicalize code into one of these llvm intrinsics.
Reviewers: majnemer
Subscribers: llvm-commits, jholewinski, tra
Differential Revision: https://reviews.llvm.org/D24300
llvm-svn: 281092
The names of the tablegen defs now match the names of the ISD nodes.
This makes the world a slightly saner place, as previously "fround" matched
ISD::FP_ROUND and not ISD::FROUND.
Differential Revision: https://reviews.llvm.org/D23597
llvm-svn: 279129
Summary:
We don't have sign-/zero-extending ldg/ldu instructions defined,
so we need to emulate them with explicit CVTs. We were originally
handling the i8 case, but not any other cases.
Fixes PR26185
Reviewers: jingyue, jlebar
Subscribers: jholewinski
Differential Revision: http://reviews.llvm.org/D19615
llvm-svn: 268272
Summary:
Tablegen tries to infer this from the selection DAG patterns defined for
the instructions, but it can't always.
An instructive example is CLZr64. CLZr32 is correctly inferred to have
no side-effects, but the selection DAG pattern for CLZr64 is slightly
more complicated, and in particular the ctlz DAG node is not at the root
of the pattern. Thus tablegen can't infer that CLZr64 has no
side-effects.
Reviewers: jholewinski
Subscribers: jholewinski, tra, llvm-commits
Differential Revision: http://reviews.llvm.org/D17472
llvm-svn: 265089
Summary:
Tablegen was unable to determine that param loads/stores were actually
reading or writing from memory. I think this isn't a problem in
practice for param stores, because those occur in a block right before
we make our call. But param loads don't have to at the very beginning
of a function, so should be annotated as mayLoad so we don't incorrectly
optimize them.
Reviewers: jholewinski
Subscribers: jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D17471
llvm-svn: 262381
Summary: Looks like this was caused by a typo.
Reviewers: jholewinski
Subscribers: jholewinski, llvm-commits, tra
Differential Revision: http://reviews.llvm.org/D17357
llvm-svn: 262380
Summary:
Calls sometimes need to be convergent. This is already handled at the
LLVM IR level, but it also needs to be handled at the MI level.
Ideally we'd propagate convergence from instructions, down through the
selection DAG, and into MIs. But this is Hard, and would affect
optimizations in the SDNs -- right now only SDNs with two operands have
any flags at all.
Instead, here's a much simpler hack: Add new opcodes for NVPTX for
convergent calls, and generate these when lowering convergent LLVM
calls.
Reviewers: jholewinski
Subscribers: jholewinski, chandlerc, joker.eph, jhen, tra, llvm-commits
Differential Revision: http://reviews.llvm.org/D17423
llvm-svn: 262373
Summary:
Also simplify some of the embedded C++ logic.
No functional changes.
Reviewers: jholewinski
Subscribers: llvm-commits, tra, jholewinski
Differential Revision: http://reviews.llvm.org/D17354
llvm-svn: 262371
Summary:
Otherwise we'll try to do unsafe optimizations on these MIs, such as
sinking loads below calls.
(I suspect that this is not the only bug in the NVPTX instruction
tablegen files; I need to comb through them.)
Reviewers: jholewinski, tra
Subscribers: jingyue, jhen, llvm-commits
Differential Revision: http://reviews.llvm.org/D17315
llvm-svn: 261113
[DebugInfo] Add debug locations to constant SD nodes
This adds debug location to constant nodes of Selection DAG and updates
all places that create constants to pass debug locations
(see PR13269).
Can't guarantee that all locations are correct, but in a lot of cases choice
is obvious, so most of them should be. At least all tests pass.
Tests for these changes do not cover everything, instead just check it for
SDNodes, ARM and AArch64 where it's easy to get incorrect locations on
constants.
This is not complete fix as FastISel contains workaround for wrong debug
locations, which drops locations from instructions on processing constants,
but there isn't currently a way to use debug locations from constants there
as llvm::Constant doesn't cache it (yet). Although this is a bit different
issue, not directly related to these changes.
Differential Revision: http://reviews.llvm.org/D9084
llvm-svn: 235989
This adds debug location to constant nodes of Selection DAG and updates
all places that create constants to pass debug locations
(see PR13269).
Can't guarantee that all locations are correct, but in a lot of cases choice
is obvious, so most of them should be. At least all tests pass.
Tests for these changes do not cover everything, instead just check it for
SDNodes, ARM and AArch64 where it's easy to get incorrect locations on
constants.
This is not complete fix as FastISel contains workaround for wrong debug
locations, which drops locations from instructions on processing constants,
but there isn't currently a way to use debug locations from constants there
as llvm::Constant doesn't cache it (yet). Although this is a bit different
issue, not directly related to these changes.
Differential Revision: http://reviews.llvm.org/D9084
llvm-svn: 235977
Instead of creating a pattern like "(p && a) || ((!p) && b)",
just expand the i8 operands to i32 and perform the selp on them.
Fixes PR22246
llvm-svn: 227123
We now consider the FPOpFusion flag when determining whether
to fuse ops. We also explicitly emit add.rn when fusion is
disabled to prevent ptxas from fusing the operations on its
own.
llvm-svn: 213287