llvm-project/clang/test/Driver/hip-options.hip

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

102 lines
5.9 KiB
Plaintext
Raw Normal View History

// REQUIRES: clang-driver
// REQUIRES: x86-registered-target
// REQUIRES: amdgpu-registered-target
// RUN: %clang -### -x hip --gpu-max-threads-per-block=1024 %s 2>&1 | FileCheck %s
// Check that there are commands for both host- and device-side compilations.
//
// CHECK: clang{{.*}}" "-cc1" {{.*}} "-fcuda-is-device"
// CHECK-SAME: "--gpu-max-threads-per-block=1024"
// RUN: %clang -### -nogpuinc -nogpulib -fgpu-allow-device-init \
// RUN: %s 2>&1 | FileCheck -check-prefix=DEVINIT %s
// DEVINIT: clang{{.*}}" "-cc1" {{.*}}"-fgpu-allow-device-init"
// DEVINIT: clang{{.*}}" "-cc1" {{.*}}"-fgpu-allow-device-init"
// RUN: %clang -### -x hip -target x86_64-pc-windows-msvc -fms-extensions \
// RUN: -mllvm -amdgpu-early-inline-all=true %s 2>&1 | \
// RUN: FileCheck -check-prefix=MLLVM %s
// MLLVM-NOT: "-mllvm"{{.*}}"-amdgpu-early-inline-all=true"{{.*}}"-mllvm"{{.*}}"-amdgpu-early-inline-all=true"
// RUN: %clang -### -Xarch_device -g -nogpulib --cuda-gpu-arch=gfx900 \
// RUN: -Xarch_device -fcf-protection=branch \
// RUN: --cuda-gpu-arch=gfx906 %s 2>&1 | FileCheck -check-prefix=DEV %s
// DEV: clang{{.*}} "-fcuda-is-device" {{.*}} "-debug-info-kind={{.*}}" {{.*}} "-fcf-protection=branch"
// DEV: clang{{.*}} "-fcuda-is-device" {{.*}} "-debug-info-kind={{.*}}" {{.*}} "-fcf-protection=branch"
// DEV-NOT: clang{{.*}} {{.*}} "-debug-info-kind={{.*}}"
// RUN: %clang -### -Xarch_host -g -nogpulib --cuda-gpu-arch=gfx900 \
// RUN: --cuda-gpu-arch=gfx906 %s 2>&1 | FileCheck -check-prefix=HOST %s
// HOST-NOT: clang{{.*}} "-fcuda-is-device" {{.*}} "-debug-info-kind={{.*}}"
// HOST-NOT: clang{{.*}} "-fcuda-is-device" {{.*}} "-debug-info-kind={{.*}}"
// HOST: clang{{.*}} "-debug-info-kind={{.*}}"
// RUN: %clang -### -nogpuinc -nogpulib -munsafe-fp-atomics \
// RUN: --cuda-gpu-arch=gfx906 %s 2>&1 | FileCheck -check-prefix=UNSAFE-FP-ATOMICS %s
// UNSAFE-FP-ATOMICS: clang{{.*}} "-triple" "amdgcn-amd-amdhsa" {{.*}} "-munsafe-fp-atomics"
[CUDA][HIP] Fix overloading resolution This patch implements correct hostness based overloading resolution in isBetterOverloadCandidate. Based on hostness, if one candidate is emittable whereas the other candidate is not emittable, the emittable candidate is better. If both candidates are emittable, or neither is emittable based on hostness, then other rules should be used to determine which is better. This is because hostness based overloading resolution is mostly for determining viability of a function. If two functions are both viable, other factors should take precedence in preference. If other rules cannot determine which is better, CUDA preference will be used again to determine which is better. However, correct hostness based overloading resolution requires overloading resolution diagnostics to be deferred, which is not on by default. The rationale is that deferring overloading resolution diagnostics may hide overloading reslolutions issues in header files. An option -fgpu-exclude-wrong-side-overloads is added, which is off by default. When -fgpu-exclude-wrong-side-overloads is off, keep the original behavior, that is, exclude wrong side overloads only if there are same side overloads. This may result in incorrect overloading resolution when there are no same side candates, but is sufficient for most CUDA/HIP applications. When -fgpu-exclude-wrong-side-overloads is on, enable deferring overloading resolution diagnostics and enable correct hostness based overloading resolution, i.e., always exclude wrong side overloads. Differential Revision: https://reviews.llvm.org/D80450
2020-11-25 23:33:18 +08:00
// RUN: %clang -### -nogpuinc -nogpulib \
// RUN: --cuda-gpu-arch=gfx906 %s 2>&1 | FileCheck -check-prefix=DEFAULT-UNSAFE-FP-ATOMICS %s
// DEFAULT-UNSAFE-FP-ATOMICS-NOT: clang{{.*}} "-triple" "amdgcn-amd-amdhsa" {{.*}} "-munsafe-fp-atomics"
[CUDA][HIP] Fix overloading resolution This patch implements correct hostness based overloading resolution in isBetterOverloadCandidate. Based on hostness, if one candidate is emittable whereas the other candidate is not emittable, the emittable candidate is better. If both candidates are emittable, or neither is emittable based on hostness, then other rules should be used to determine which is better. This is because hostness based overloading resolution is mostly for determining viability of a function. If two functions are both viable, other factors should take precedence in preference. If other rules cannot determine which is better, CUDA preference will be used again to determine which is better. However, correct hostness based overloading resolution requires overloading resolution diagnostics to be deferred, which is not on by default. The rationale is that deferring overloading resolution diagnostics may hide overloading reslolutions issues in header files. An option -fgpu-exclude-wrong-side-overloads is added, which is off by default. When -fgpu-exclude-wrong-side-overloads is off, keep the original behavior, that is, exclude wrong side overloads only if there are same side overloads. This may result in incorrect overloading resolution when there are no same side candates, but is sufficient for most CUDA/HIP applications. When -fgpu-exclude-wrong-side-overloads is on, enable deferring overloading resolution diagnostics and enable correct hostness based overloading resolution, i.e., always exclude wrong side overloads. Differential Revision: https://reviews.llvm.org/D80450
2020-11-25 23:33:18 +08:00
// RUN: %clang -### -target x86_64-unknown-linux-gnu -nogpuinc -nogpulib -fgpu-exclude-wrong-side-overloads \
// RUN: --cuda-gpu-arch=gfx906 %s 2>&1 | FileCheck -check-prefix=FIX-OVERLOAD %s
// FIX-OVERLOAD: clang{{.*}} "-triple" "amdgcn-amd-amdhsa" {{.*}} "-fgpu-exclude-wrong-side-overloads" "-fgpu-defer-diag"
// FIX-OVERLOAD: clang{{.*}} "-triple" "x86_64-unknown-linux-gnu" {{.*}} "-fgpu-exclude-wrong-side-overloads" "-fgpu-defer-diag"
// Check -mconstructor-aliases is not passed to device compilation.
// RUN: %clang -### -target x86_64-unknown-linux-gnu -nogpuinc -nogpulib \
// RUN: --cuda-gpu-arch=gfx906 %s 2>&1 | FileCheck -check-prefix=CTA %s
// CTA: clang{{.*}} "-triple" "x86_64-unknown-linux-gnu" {{.*}} "-mconstructor-aliases"
// CTA-NOT: clang{{.*}} "-triple" "amdgcn-amd-amdhsa" {{.*}} "-mconstructor-aliases"
// RUN: %clang -### -target x86_64-unknown-linux-gnu -nogpuinc -nogpulib \
// RUN: --offload-arch=gfx906 -fgpu-inline-threshold=1000 %s 2>&1 | FileCheck -check-prefix=THRESH %s
// THRESH: clang{{.*}} "-triple" "amdgcn-amd-amdhsa" {{.*}} "-mllvm" "-inline-threshold=1000"
// THRESH-NOT: clang{{.*}} "-triple" "x86_64-unknown-linux-gnu" {{.*}} "-inline-threshold=1000"
// Check -foffload-lto=thin translated correctly.
// RUN: %clang -### -target x86_64-unknown-linux-gnu -nogpuinc -nogpulib \
[LTO] Fix -fwhole-program-vtables handling after HIP ThinLTO patch A recent change (D99683) to support ThinLTO for HIP caused a regression when compiling cuda code with -flto=thin -fwhole-program-vtables. Specifically, we now get an error: error: invalid argument '-fwhole-program-vtables' only allowed with '-flto' This error is coming from the device offload cc1 action being set up for the cuda compile, for which -flto=thin doesn't apply and gets dropped. This is a regression, but points to a potential issue that was silently occurring before the patch, details below. Before D99683, the check for fwhole-program-vtables in the driver looked like: if (WholeProgramVTables) { if (!D.isUsingLTO()) D.Diag(diag::err_drv_argument_only_allowed_with) << "-fwhole-program-vtables" << "-flto"; CmdArgs.push_back("-fwhole-program-vtables"); } And D.isUsingLTO() returned true since we have -flto=thin. However, because the cuda cc1 compile is doing device offloading, which didn't support any LTO, there was other code that suppressed -flto* options from being passed to the cc1 invocation. So the cc1 invocation silently had -fwhole-program-vtables without any -flto*. This seems potentially problematic, since if we had any virtual calls we would get type test assume sequences without the corresponding LTO pass that handles them. However, with the patch, which adds support for device offloading LTO option -foffload-lto=thin, the code has changed so that we set a bool IsUsingLTO based on either -flto* or -foffload-lto*, depending on whether this is the device offloading action. For the device offload action in our compile, since we don't have -foffload-lto, IsUsingLTO is false, and the check for LTO with -fwhole-program-vtables now fails. What we should do is only pass through -fwhole-program-vtables to the cc1 invocation that has LTO enabled (either the device offload action with -foffload-lto, or the non-device offload action with -flto), and otherwise drop the -fwhole-program-vtables for the non-LTO action. Then we should error only if we have -fwhole-program-vtables without any -f*lto* options. Differential Revision: https://reviews.llvm.org/D103579
2021-06-03 07:37:23 +08:00
// RUN: --cuda-gpu-arch=gfx906 -foffload-lto=thin -fwhole-program-vtables %s 2>&1 \
// RUN: | FileCheck -check-prefix=HIPTHINLTO %s
// RUN: %clang -### -target x86_64-unknown-linux-gnu -nogpuinc -nogpulib \
// RUN: --cuda-gpu-arch=gfx906 -fgpu-rdc -foffload-lto=thin -fwhole-program-vtables %s 2>&1 \
// RUN: | FileCheck -check-prefix=HIPTHINLTO %s
[LTO] Fix -fwhole-program-vtables handling after HIP ThinLTO patch A recent change (D99683) to support ThinLTO for HIP caused a regression when compiling cuda code with -flto=thin -fwhole-program-vtables. Specifically, we now get an error: error: invalid argument '-fwhole-program-vtables' only allowed with '-flto' This error is coming from the device offload cc1 action being set up for the cuda compile, for which -flto=thin doesn't apply and gets dropped. This is a regression, but points to a potential issue that was silently occurring before the patch, details below. Before D99683, the check for fwhole-program-vtables in the driver looked like: if (WholeProgramVTables) { if (!D.isUsingLTO()) D.Diag(diag::err_drv_argument_only_allowed_with) << "-fwhole-program-vtables" << "-flto"; CmdArgs.push_back("-fwhole-program-vtables"); } And D.isUsingLTO() returned true since we have -flto=thin. However, because the cuda cc1 compile is doing device offloading, which didn't support any LTO, there was other code that suppressed -flto* options from being passed to the cc1 invocation. So the cc1 invocation silently had -fwhole-program-vtables without any -flto*. This seems potentially problematic, since if we had any virtual calls we would get type test assume sequences without the corresponding LTO pass that handles them. However, with the patch, which adds support for device offloading LTO option -foffload-lto=thin, the code has changed so that we set a bool IsUsingLTO based on either -flto* or -foffload-lto*, depending on whether this is the device offloading action. For the device offload action in our compile, since we don't have -foffload-lto, IsUsingLTO is false, and the check for LTO with -fwhole-program-vtables now fails. What we should do is only pass through -fwhole-program-vtables to the cc1 invocation that has LTO enabled (either the device offload action with -foffload-lto, or the non-device offload action with -flto), and otherwise drop the -fwhole-program-vtables for the non-LTO action. Then we should error only if we have -fwhole-program-vtables without any -f*lto* options. Differential Revision: https://reviews.llvm.org/D103579
2021-06-03 07:37:23 +08:00
// Ensure we don't error about -fwhole-program-vtables for the non-device offload compile.
// HIPTHINLTO-NOT: error: invalid argument '-fwhole-program-vtables' only allowed with '-flto'
// HIPTHINLTO-NOT: clang{{.*}} "-triple" "x86_64-unknown-linux-gnu" {{.*}} "-flto-unit"
// HIPTHINLTO: clang{{.*}} "-triple" "amdgcn-amd-amdhsa" {{.*}} "-flto=thin" "-flto-unit" {{.*}} "-fwhole-program-vtables"
// HIPTHINLTO-NOT: clang{{.*}} "-triple" "x86_64-unknown-linux-gnu" {{.*}} "-flto-unit"
// HIPTHINLTO: lld{{.*}}"-plugin-opt=mcpu=gfx906" "-plugin-opt=thinlto" "-plugin-opt=-force-import-all"
// Check that -flto=thin is handled correctly, particularly with -fwhole-program-vtables.
//
// RUN: %clang -### -target x86_64-unknown-linux-gnu -nogpuinc -nogpulib \
[LTO] Fix -fwhole-program-vtables handling after HIP ThinLTO patch A recent change (D99683) to support ThinLTO for HIP caused a regression when compiling cuda code with -flto=thin -fwhole-program-vtables. Specifically, we now get an error: error: invalid argument '-fwhole-program-vtables' only allowed with '-flto' This error is coming from the device offload cc1 action being set up for the cuda compile, for which -flto=thin doesn't apply and gets dropped. This is a regression, but points to a potential issue that was silently occurring before the patch, details below. Before D99683, the check for fwhole-program-vtables in the driver looked like: if (WholeProgramVTables) { if (!D.isUsingLTO()) D.Diag(diag::err_drv_argument_only_allowed_with) << "-fwhole-program-vtables" << "-flto"; CmdArgs.push_back("-fwhole-program-vtables"); } And D.isUsingLTO() returned true since we have -flto=thin. However, because the cuda cc1 compile is doing device offloading, which didn't support any LTO, there was other code that suppressed -flto* options from being passed to the cc1 invocation. So the cc1 invocation silently had -fwhole-program-vtables without any -flto*. This seems potentially problematic, since if we had any virtual calls we would get type test assume sequences without the corresponding LTO pass that handles them. However, with the patch, which adds support for device offloading LTO option -foffload-lto=thin, the code has changed so that we set a bool IsUsingLTO based on either -flto* or -foffload-lto*, depending on whether this is the device offloading action. For the device offload action in our compile, since we don't have -foffload-lto, IsUsingLTO is false, and the check for LTO with -fwhole-program-vtables now fails. What we should do is only pass through -fwhole-program-vtables to the cc1 invocation that has LTO enabled (either the device offload action with -foffload-lto, or the non-device offload action with -flto), and otherwise drop the -fwhole-program-vtables for the non-LTO action. Then we should error only if we have -fwhole-program-vtables without any -f*lto* options. Differential Revision: https://reviews.llvm.org/D103579
2021-06-03 07:37:23 +08:00
// RUN: --cuda-gpu-arch=gfx906 -flto=thin -fwhole-program-vtables %s 2>&1 \
// RUN: | FileCheck -check-prefix=THINLTO %s
[LTO] Fix -fwhole-program-vtables handling after HIP ThinLTO patch A recent change (D99683) to support ThinLTO for HIP caused a regression when compiling cuda code with -flto=thin -fwhole-program-vtables. Specifically, we now get an error: error: invalid argument '-fwhole-program-vtables' only allowed with '-flto' This error is coming from the device offload cc1 action being set up for the cuda compile, for which -flto=thin doesn't apply and gets dropped. This is a regression, but points to a potential issue that was silently occurring before the patch, details below. Before D99683, the check for fwhole-program-vtables in the driver looked like: if (WholeProgramVTables) { if (!D.isUsingLTO()) D.Diag(diag::err_drv_argument_only_allowed_with) << "-fwhole-program-vtables" << "-flto"; CmdArgs.push_back("-fwhole-program-vtables"); } And D.isUsingLTO() returned true since we have -flto=thin. However, because the cuda cc1 compile is doing device offloading, which didn't support any LTO, there was other code that suppressed -flto* options from being passed to the cc1 invocation. So the cc1 invocation silently had -fwhole-program-vtables without any -flto*. This seems potentially problematic, since if we had any virtual calls we would get type test assume sequences without the corresponding LTO pass that handles them. However, with the patch, which adds support for device offloading LTO option -foffload-lto=thin, the code has changed so that we set a bool IsUsingLTO based on either -flto* or -foffload-lto*, depending on whether this is the device offloading action. For the device offload action in our compile, since we don't have -foffload-lto, IsUsingLTO is false, and the check for LTO with -fwhole-program-vtables now fails. What we should do is only pass through -fwhole-program-vtables to the cc1 invocation that has LTO enabled (either the device offload action with -foffload-lto, or the non-device offload action with -flto), and otherwise drop the -fwhole-program-vtables for the non-LTO action. Then we should error only if we have -fwhole-program-vtables without any -f*lto* options. Differential Revision: https://reviews.llvm.org/D103579
2021-06-03 07:37:23 +08:00
// Ensure we don't error about -fwhole-program-vtables for the device offload compile. We should
// drop -fwhole-program-vtables for the device offload compile and pass it through for the
// non-device offload compile along with -flto=thin.
// THINLTO-NOT: error: invalid argument '-fwhole-program-vtables' only allowed with '-flto'
// THINLTO-NOT: clang{{.*}}" "-triple" "amdgcn-amd-amdhsa" {{.*}} "-fwhole-program-vtables"
// THINLTO: clang{{.*}}" "-triple" "x86_64-unknown-linux-gnu" {{.*}} "-flto=thin" {{.*}} "-fwhole-program-vtables"
// THINLTO-NOT: clang{{.*}}" "-triple" "amdgcn-amd-amdhsa" {{.*}} "-fwhole-program-vtables"
// Check -fopenmp is allowed with HIP but -fopenmp-targets= is not allowed.
// RUN: %clang -### -target x86_64-unknown-linux-gnu -nogpuinc -nogpulib \
// RUN: --offload-arch=gfx906 -fopenmp %s 2>&1 | FileCheck -check-prefix=OMP %s
// OMP-NOT: clang{{.*}} "-triple" "amdgcn-amd-amdhsa" {{.*}} "-fopenmp"
// OMP: clang{{.*}} "-triple" "x86_64-unknown-linux-gnu" {{.*}} "-fopenmp"
// RUN: not %clang -target x86_64-unknown-linux-gnu -nogpuinc -nogpulib \
// RUN: --offload-arch=gfx906 -fopenmp -fopenmp-targets=amdgcn %s 2>&1 \
// RUN: | FileCheck -check-prefix=OMPTGT %s
// OMPTGT: unsupported option '-fopenmp-targets=' for language mode 'HIP'