Commit Graph

33 Commits

Author SHA1 Message Date
Nikita Popov 532dc62b90 [OpaquePtrs][Clang] Add -no-opaque-pointers to tests (NFC)
This adds -no-opaque-pointers to clang tests whose output will
change when opaque pointers are enabled by default. This is
intended to be part of the migration approach described in
https://discourse.llvm.org/t/enabling-opaque-pointers-by-default/61322/9.

The patch has been produced by replacing %clang_cc1 with
%clang_cc1 -no-opaque-pointers for tests that fail with opaque
pointers enabled. Worth noting that this doesn't cover all tests,
there's a remaining ~40 tests not using %clang_cc1 that will need
a followup change.

Differential Revision: https://reviews.llvm.org/D123115
2022-04-07 12:09:47 +02:00
Jakub Chlanda a895182302 [NVPTX] Add more FMA intriniscs/builtins
This patch adds builtins/intrinsics for the following variants of FMA:

NOTE: follow-up commit with the missing clang-side changes.

- f16, f16x2
  - rn
  - rn_ftz
  - rn_sat
  - rn_ftz_sat
  - rn_relu
  - rn_ftz_relu
- bf16, bf16x2
  - rn
  - rn_relu

ptxas (Cuda compilation tools, release 11.0, V11.0.194) is happy with the generated assembly.

Differential Revision: https://reviews.llvm.org/D118977
2022-03-01 11:07:11 -08:00
Jakub Chlanda 7a6d692b3b [NVPTX] Expose float tys min, max, abs, neg as builtins
Adds support for the following builtins:

abs, neg:
- .bf16,
- .bf16x2
min, max
- {.ftz}{.NaN}{.xorsign.abs}.f16
- {.ftz}{.NaN}{.xorsign.abs}.f16x2
- {.NaN}{.xorsign.abs}.bf16
- {.NaN}{.xorsign.abs}.bf16x2
- {.ftz}{.NaN}{.xorsign.abs}.f32

Differential Revision: https://reviews.llvm.org/D117887
2022-03-01 11:07:11 -08:00
Jack Kirk bef3eb8344 [Clang][NVPTX]Add NVPTX intrinsics and builtins for CUDA PTX cvt sm80 instructions
Adds NVPTX intrinsics and builtins for CUDA PTX cvt instructions for sm80
architectures and above. Requires ptx 7.0.

PTX ISA description of cvt instructions :
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#data-movement-and-conversion-instructions-cvt

Signed-off-by: JackAKirk <jack.kirk@codeplay.com>

Differential Revision: https://reviews.llvm.org/D116673
2022-01-13 13:29:48 -08:00
Stuart Adams 02c2468864 [Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX cp.async instructions
Adds NVPTX builtins and intrinsics for the CUDA PTX `cp.async` instructions for
`sm_80` architecture or newer.

PTX ISA description of `cp.async`:
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#data-movement-and-conversion-instructions-asynchronous-copy
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-cp-async-mbarrier-arrive

Authored-by: Stuart Adams <stuart.adams@codeplay.com>
Co-Authored-by: Alexander Johnston <alexander@codeplay.com>

Differential Revision: https://reviews.llvm.org/D100394
2021-05-17 09:46:59 -07:00
James Y Knight 8043d5a964 NFC: update clang tests to check ordering and alignment for atomicrmw/cmpxchg.
The ability to specify alignment was recently added, and it's an
important property which we should ensure is set as expected by
Clang. (Especially before making further changes to Clang's code in
this area.) But, because it's on the end of the lines, the existing
tests all ignore it.

Therefore, update all the tests to also verify the expected alignment
for atomicrmw and cmpxchg. While I was in there, I also updated uses
of 'load atomic' and 'store atomic', and added the memory ordering,
where that was missing.
2021-02-11 17:35:09 -05:00
Melanie Blower f5360d4bb3 Reapply "Add support for #pragma float_control" with buildbot fixes
Add support for #pragma float_control

Reviewers: rjmccall, erichkeane, sepavloff

Differential Revision: https://reviews.llvm.org/D72841

This reverts commit fce82c0ed3.
2020-05-04 05:51:25 -07:00
Melanie Blower fce82c0ed3 Revert "Reapply "Add support for #pragma float_control" with improvements to"
This reverts commit 69aacaf699.
2020-05-01 10:31:09 -07:00
Melanie Blower 69aacaf699 Reapply "Add support for #pragma float_control" with improvements to
test cases
Add support for #pragma float_control

Reviewers: rjmccall, erichkeane, sepavloff

Differential Revision: https://reviews.llvm.org/D72841

This reverts commit 85dc033cac, and makes
corrections to the test cases that failed on buildbots.
2020-05-01 10:03:30 -07:00
Melanie Blower 85dc033cac Revert "Add support for #pragma float_control"
This reverts commit 4f1e9a17e9.
due to fail on buildbot, sorry for the noise
2020-05-01 06:36:58 -07:00
Melanie Blower 4f1e9a17e9 Add support for #pragma float_control
Reviewers: rjmccall, erichkeane, sepavloff

Differential Revision: https://reviews.llvm.org/D72841
2020-05-01 06:14:24 -07:00
Benjamin Kramer 3b5e60b695 [CodeGen] NVPTX: Switch from atomic.load.add.f32 to atomicrmw fadd
llvm-svn: 365798
2019-07-11 17:44:11 +00:00
Artem Belevich 24e8a680e5 [NVPTX, CUDA] Improved feature constraints on NVPTX target builtins.
When NVPTX TARGET_BUILTIN specifies sm_XX or ptxYY as required feature,
consider those features available if we're compiling for GPU >= sm_XX or have
enabled PTX version >= ptxYY.

Differential Revision: https://reviews.llvm.org/D45061

llvm-svn: 329829
2018-04-11 17:51:19 +00:00
Artem Belevich 42960b4188 [NVPTX] Implemented bar.warp.sync, barrier.sync, and vote{.sync} instructions/intrinsics/builtins.
Differential Revision: https://reviews.llvm.org/D38148

llvm-svn: 313898
2017-09-21 18:44:49 +00:00
Artem Belevich 4654dc89be [NVPTX] Implemented shfl.sync instruction and supporting intrinsics/builtins.
Differential Revision: https://reviews.llvm.org/D38090

llvm-svn: 313820
2017-09-20 21:23:07 +00:00
Artem Belevich fda9905062 [CUDA] added __nvvm_atom_{sys|cta}_* builtins.
These builtins are available on sm_60+ GPU only.

Differential Revision: https://reviews.llvm.org/D24944

llvm-svn: 282609
2016-09-28 17:47:35 +00:00
Justin Lebar 495f1a22af [CUDA] Rename the __nvvm_bar0 builtin back to __syncthreads.
The builtin was renamed in r274770.  But __syncthreads is part of our
user-facing API, so we need to keep the name as-is.

Patch by Justin Bogner.

llvm-svn: 274780
2016-07-07 18:15:03 +00:00
Justin Bogner 2d5de7e568 NVPTX: Use the nvvm builtins to read SRegs rather than the legacy ptx ones
The ptx spellings were removed from LLVM in r274769.

llvm-svn: 274770
2016-07-07 16:41:08 +00:00
Justin Lebar 2e4ecfdebe [CUDA] Implement __ldg using intrinsics.
Summary:
Previously it was implemented as inline asm in the CUDA headers.

This change allows us to use the [addr+imm] addressing mode when
executing ld.global.nc instructions.  This translates into a 1.3x
speedup on some benchmarks that call this instruction from within an
unrolled loop.

Reviewers: tra, rsmith

Subscribers: jhen, cfe-commits, jholewinski

Differential Revision: http://reviews.llvm.org/D19990

llvm-svn: 270150
2016-05-19 22:49:13 +00:00
Justin Lebar 717d2b0a0d [CUDA] Implement atomicInc and atomicDec builtins
These functions cannot be implemented as atomicrmw or cmpxchg
instructions, so they are implemented as a call to the NVVM intrinsics
@llvm.nvvm.atomic.load.inc.32.p0i32 and
@llvm.nvvm.atomic.load.dec.32.p0i32.

Patch by Jason Henline.

Reviewers: jlebar

Differential Revision: http://reviews.llvm.org/D18322

llvm-svn: 264009
2016-03-22 00:09:28 +00:00
Jingyue Wu f1eca25b16 [CUDA] fix codegen for __nvvm_atom_cas_*
Summary: __nvvm_atom_cas_* returns the old value instead of whether the swap succeeds.

Reviewers: eliben, tra

Subscribers: jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D13306

llvm-svn: 248951
2015-09-30 21:49:32 +00:00
Artem Belevich 236cfdc4be [CUDA] 32-bit NVPTX should have 32-bit long type.
Currently it's 64-bit which will lead to mismatch between host and
device code if we compile for i386.

Differential Revision: http://reviews.llvm.org/D13181

llvm-svn: 248753
2015-09-28 22:54:08 +00:00
Jingyue Wu 2d69f9608e [CUDA] fix codegen for __nvvm_atom_min/max_gen_u*
Summary: Clang should emit "atomicrmw umin/umax" instead of "atomicrmw min/max".

Reviewers: eliben, tra

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D12487

llvm-svn: 246455
2015-08-31 17:25:51 +00:00
Artem Belevich d21e5c6684 [CUDA] Implemented __nvvm_atom_*_gen_* builtins.
Integer variants are implemented as atomicrmw or cmpxchg instructions.
Atomic add for floating point (__nvvm_atom_add_gen_f()) is implemented
as a call to an overloaded @llvm.nvvm.atomic.load.add.f32.* LVVM
intrinsic.

Differential Revision: http://reviews.llvm.org/D10666

llvm-svn: 240669
2015-06-25 18:29:42 +00:00
Justin Holewinski 6e9bfa344c [NVPTX] Fix type error for some builtins in BuiltinsNVPTX.def
llvm-svn: 223116
2014-12-02 12:58:24 +00:00
NAKAMURA Takumi a1d1388a2b clang/test/CodeGen/builtins-nvptx.c: Prune "REQUIRES: nvptx64-registered-target". "nvptx" should imply it.
llvm-svn: 196348
2013-12-04 03:41:02 +00:00
Justin Holewinski 9f3bfeb3b6 [NVPTX] Add entire list of supported builtins
llvm-svn: 182468
2013-05-22 12:58:29 +00:00
Justin Holewinski c0cad046b6 [NVPTX] Add __nvvm_* intrinsics as Clang builtins
Fixes bug 13354.

llvm-svn: 167647
2012-11-09 23:50:51 +00:00
NAKAMURA Takumi f1f6e99c53 Revert r166541, "clang/test: Add appropriate requirements as REQUIRES, corresponding to r166532."
According to r166543, it is not needed for now.

llvm-svn: 166544
2012-10-24 03:59:09 +00:00
NAKAMURA Takumi a22fe582d2 clang/test: Add appropriate requirements as REQUIRES, corresponding to r166532.
llvm-svn: 166541
2012-10-24 02:57:57 +00:00
Justin Holewinski c05323dd5c Un-XFAIL CodeGen/builtins-nvptx.c now that the proper changes have
landed in LLVM core

llvm-svn: 157418
2012-05-24 21:39:33 +00:00
John McCall d65dbd8e6a XFAIL this test, which does not pass on trunk since the grand
renaming in r157403.

llvm-svn: 157413
2012-05-24 20:58:21 +00:00
Justin Holewinski 83e9668133 Replace PTX back-end with NVPTX back-end in all places where Clang cares
NV_CONTRIB

llvm-svn: 157403
2012-05-24 17:43:12 +00:00