Commit Graph

6 Commits

Author SHA1 Message Date
Yaxun (Sam) Liu acb6f80d96 [CUDA][HIP] Fix overloading resolution
This patch implements correct hostness based overloading resolution
in isBetterOverloadCandidate.

Based on hostness, if one candidate is emittable whereas the other
candidate is not emittable, the emittable candidate is better.

If both candidates are emittable, or neither is emittable based on hostness, then
other rules should be used to determine which is better. This is because
hostness based overloading resolution is mostly for determining
viability of a function. If two functions are both viable, other factors
should take precedence in preference.

If other rules cannot determine which is better, CUDA preference will be
used again to determine which is better.

However, correct hostness based overloading resolution
requires overloading resolution diagnostics to be deferred,
which is not on by default. The rationale is that deferring
overloading resolution diagnostics may hide overloading reslolutions
issues in header files.

An option -fgpu-exclude-wrong-side-overloads is added, which is off by
default.

When -fgpu-exclude-wrong-side-overloads is off, keep the original behavior,
that is, exclude wrong side overloads only if there are same side overloads.
This may result in incorrect overloading resolution when there are no
same side candates, but is sufficient for most CUDA/HIP applications.

When -fgpu-exclude-wrong-side-overloads is on, enable deferring
overloading resolution diagnostics and enable correct hostness
based overloading resolution, i.e., always exclude wrong side overloads.

Differential Revision: https://reviews.llvm.org/D80450
2020-12-02 16:33:33 -05:00
Yaxun (Sam) Liu 3f4b5893ef [AMDGPU] Add option -munsafe-fp-atomics
Add an option -munsafe-fp-atomics for AMDGPU target.

When enabled, clang adds function attribute "amdgpu-unsafe-fp-atomics"
to any functions for amdgpu target. This allows amdgpu backend to use
unsafe fp atomic instructions in these functions.

Differential Revision: https://reviews.llvm.org/D91546
2020-11-16 21:52:12 -05:00
Yaxun (Sam) Liu e372c1d762 [HIP] Fix -fgpu-allow-device-init option
The option needs to be passed to both host and device compilation.

Differential Revision: https://reviews.llvm.org/D88550
2020-10-04 22:13:05 -04:00
Yaxun (Sam) Liu 2ae25647d1 [CUDA][HIP] Add -Xarch_device and -Xarch_host options
The argument after -Xarch_device will be added to the arguments for CUDA/HIP
device compilation and will be removed for host compilation.

The argument after -Xarch_host will be added to the arguments for CUDA/HIP
host compilation and will be removed for device compilation.

Differential Revision: https://reviews.llvm.org/D76520
2020-03-24 10:13:05 -04:00
Yaxun (Sam) Liu 6f79f80e6e [HIP] Fix duplicate clang -cc1 options on MSVC toolchain
HIPToolChain::TranslateArgs call TranslateArgs of host toolchain with
the input args to get a list of derived args called DAL, then
go through the input args by itself and append them to DAL.

This assumes that the host toolchain should not append any unchanged
args to DAL, otherwise there will be duplicates since
HIPToolChain will append it again.

This works for GNU toolchain since it returns an empty list for DAL.

However, MSVC toolchain will append unchanged args to DAL, which
causes duplicate args.

This patch let MSVC toolchain not append unchanged args for HIP
offloading kind, which fixes this issue.

Differential Revision: https://reviews.llvm.org/D76032
2020-03-18 14:48:04 -04:00
Yaxun (Sam) Liu 9f2d8b5c0c [HIP] Add option --gpu-max-threads-per-block=n
Add this option to change the default launch bounds.

Differential Revision: https://reviews.llvm.org/D71221
2020-01-07 11:18:00 -05:00