forked from lijiext/lammps
Updating Kokkos lib to v2.03.05
This commit is contained in:
parent
286d4f2743
commit
5c1d17d1c0
|
@ -1,5 +1,53 @@
|
|||
# Change Log
|
||||
|
||||
|
||||
## [2.03.05](https://github.com/kokkos/kokkos/tree/2.03.05) (2017-05-27)
|
||||
[Full Changelog](https://github.com/kokkos/kokkos/compare/2.03.00...2.03.05)
|
||||
|
||||
**Implemented enhancements:**
|
||||
|
||||
- Harmonize Custom Reductions over nesting levels [\#802](https://github.com/kokkos/kokkos/issues/802)
|
||||
- Prevent users directly including KokkosCore\_config.h [\#815](https://github.com/kokkos/kokkos/issues/815)
|
||||
- DualView aborts on concurrent host/device modify \(in debug mode\) [\#814](https://github.com/kokkos/kokkos/issues/814)
|
||||
- Abort when running on a NVIDIA CC5.0 or higher architecture with code compiled for CC \< 5.0 [\#813](https://github.com/kokkos/kokkos/issues/813)
|
||||
- Add "name" function to ExecSpaces [\#806](https://github.com/kokkos/kokkos/issues/806)
|
||||
- Allow null Future in task spawn dependences [\#795](https://github.com/kokkos/kokkos/issues/795)
|
||||
- Add Unit Tests for Kokkos::complex [\#785](https://github.com/kokkos/kokkos/issues/785)
|
||||
- Add pow function for Kokkos::complex [\#784](https://github.com/kokkos/kokkos/issues/784)
|
||||
- Square root of a complex [\#729](https://github.com/kokkos/kokkos/issues/729)
|
||||
- Command line processing of --threads argument prevents users from having any commandline arguments starting with --threads [\#760](https://github.com/kokkos/kokkos/issues/760)
|
||||
- Protected deprecated API with appropriate macro [\#756](https://github.com/kokkos/kokkos/issues/756)
|
||||
- Allow task scheduler memory pool to be used by tasks [\#747](https://github.com/kokkos/kokkos/issues/747)
|
||||
- View bounds checking on host-side performance: constructing a std::string [\#723](https://github.com/kokkos/kokkos/issues/723)
|
||||
- Add check for AppleClang as compiler distinct from check for Clang. [\#705](https://github.com/kokkos/kokkos/issues/705)
|
||||
- Uninclude source files for specific configurations to prevent link warning. [\#701](https://github.com/kokkos/kokkos/issues/701)
|
||||
- Add --small option to snapshot script [\#697](https://github.com/kokkos/kokkos/issues/697)
|
||||
- CMake Standalone Support [\#674](https://github.com/kokkos/kokkos/issues/674)
|
||||
- CMake build unit test and install [\#808](https://github.com/kokkos/kokkos/issues/808)
|
||||
- CMake: Fix having kokkos as a subdirectory in a pure cmake project [\#629](https://github.com/kokkos/kokkos/issues/629)
|
||||
- Tribits macro assumes build directory is in top level source directory [\#654](https://github.com/kokkos/kokkos/issues/654)
|
||||
- Use bin/nvcc\_wrapper, not config/nvcc\_wrapper [\#562](https://github.com/kokkos/kokkos/issues/562)
|
||||
- Allow MemoryPool::allocate\(\) to be called from multiple threads per warp. [\#487](https://github.com/kokkos/kokkos/issues/487)
|
||||
- Allow MemoryPool::allocate\\(\\) to be called from multiple threads per warp. [\#487](https://github.com/kokkos/kokkos/issues/487)
|
||||
- Move OpenMP 4.5 OpenMPTarget backend into Develop [\#456](https://github.com/kokkos/kokkos/issues/456)
|
||||
- Testing on ARM testbed [\#288](https://github.com/kokkos/kokkos/issues/288)
|
||||
|
||||
**Fixed bugs:**
|
||||
|
||||
- Fix label in OpenMP parallel\_reduce verify\_initialized [\#834](https://github.com/kokkos/kokkos/issues/834)
|
||||
- TeamScratch Level 1 on Cuda hangs [\#820](https://github.com/kokkos/kokkos/issues/820)
|
||||
- \[bug\] memory pool. [\#786](https://github.com/kokkos/kokkos/issues/786)
|
||||
- Some Reduction Tests fail on Intel 18 with aggressive vectorization on [\#774](https://github.com/kokkos/kokkos/issues/774)
|
||||
- Error copying dynamic view on copy of memory pool [\#773](https://github.com/kokkos/kokkos/issues/773)
|
||||
- CUDA stack overflow with TaskDAG test [\#758](https://github.com/kokkos/kokkos/issues/758)
|
||||
- ThreadVectorRange Customized Reduction Bug [\#739](https://github.com/kokkos/kokkos/issues/739)
|
||||
- set\_scratch\_size overflows [\#726](https://github.com/kokkos/kokkos/issues/726)
|
||||
- Get wrong results for compiler checks in Makefile on OS X. [\#706](https://github.com/kokkos/kokkos/issues/706)
|
||||
- Fix check if multiple host architectures enabled. [\#702](https://github.com/kokkos/kokkos/issues/702)
|
||||
- Threads Backend Does not Pass on Cray Compilers [\#609](https://github.com/kokkos/kokkos/issues/609)
|
||||
- Rare bug in memory pool where allocation can finish on superblock in empty state [\#452](https://github.com/kokkos/kokkos/issues/452)
|
||||
- LDFLAGS in core/unit\_test/Makefile: potential "undefined reference" to pthread lib [\#148](https://github.com/kokkos/kokkos/issues/148)
|
||||
|
||||
## [2.03.00](https://github.com/kokkos/kokkos/tree/2.03.00) (2017-04-25)
|
||||
[Full Changelog](https://github.com/kokkos/kokkos/compare/2.02.15...2.03.00)
|
||||
|
||||
|
|
|
@ -5,11 +5,12 @@ ELSE()
|
|||
ENDIF()
|
||||
|
||||
IF(NOT KOKKOS_HAS_TRILINOS)
|
||||
CMAKE_MINIMUM_REQUIRED(VERSION 2.8.11 FATAL_ERROR)
|
||||
INCLUDE(cmake/tribits.cmake)
|
||||
SET(CMAKE_CXX_STANDARD 11)
|
||||
ENDIF()
|
||||
cmake_minimum_required(VERSION 3.1 FATAL_ERROR)
|
||||
project(Kokkos CXX)
|
||||
|
||||
INCLUDE(cmake/kokkos.cmake)
|
||||
ELSE()
|
||||
#------------------------------------------------------------------------------
|
||||
#
|
||||
# A) Forward delcare the package so that certain options are also defined for
|
||||
# subpackages
|
||||
|
@ -17,14 +18,13 @@ ENDIF()
|
|||
|
||||
TRIBITS_PACKAGE_DECL(Kokkos) # ENABLE_SHADOWING_WARNINGS)
|
||||
|
||||
|
||||
#------------------------------------------------------------------------------
|
||||
#
|
||||
# B) Define the common options for Kokkos first so they can be used by
|
||||
# subpackages as well.
|
||||
#
|
||||
|
||||
|
||||
|
||||
# mfh 01 Aug 2016: See Issue #61:
|
||||
#
|
||||
# https://github.com/kokkos/kokkos/issues/61
|
||||
|
@ -83,10 +83,10 @@ TRIBITS_ADD_OPTION_AND_DEFINE(
|
|||
)
|
||||
|
||||
ASSERT_DEFINED(TPL_ENABLE_Pthread)
|
||||
IF (Kokkos_ENABLE_Pthread AND NOT TPL_ENABLE_Pthread)
|
||||
IF(Kokkos_ENABLE_Pthread AND NOT TPL_ENABLE_Pthread)
|
||||
MESSAGE(FATAL_ERROR "You set Kokkos_ENABLE_Pthread=ON, but Trilinos' support for Pthread(s) is not enabled (TPL_ENABLE_Pthread=OFF). This is not allowed. Please enable Pthreads in Trilinos before attempting to enable Kokkos' support for Pthreads.")
|
||||
ENDIF ()
|
||||
IF (NOT TPL_ENABLE_Pthread)
|
||||
ENDIF()
|
||||
IF(NOT TPL_ENABLE_Pthread)
|
||||
ADD_DEFINITIONS(-DGTEST_HAS_PTHREAD=0)
|
||||
ENDIF()
|
||||
|
||||
|
@ -98,12 +98,13 @@ TRIBITS_ADD_OPTION_AND_DEFINE(
|
|||
)
|
||||
|
||||
TRIBITS_ADD_OPTION_AND_DEFINE(
|
||||
Kokkos_ENABLE_Qthreads
|
||||
Kokkos_ENABLE_QTHREAD
|
||||
KOKKOS_HAVE_QTHREADS
|
||||
"Enable Qthreads support in Kokkos."
|
||||
"${TPL_ENABLE_QTHREADS}"
|
||||
"${TPL_ENABLE_QTHREAD}"
|
||||
)
|
||||
|
||||
# TODO: No longer an option in Kokkos. Needs to be removed.
|
||||
TRIBITS_ADD_OPTION_AND_DEFINE(
|
||||
Kokkos_ENABLE_CXX11
|
||||
KOKKOS_HAVE_CXX11
|
||||
|
@ -118,6 +119,7 @@ TRIBITS_ADD_OPTION_AND_DEFINE(
|
|||
"${TPL_ENABLE_HWLOC}"
|
||||
)
|
||||
|
||||
# TODO: This is currently not used in Kokkos. Should it be removed?
|
||||
TRIBITS_ADD_OPTION_AND_DEFINE(
|
||||
Kokkos_ENABLE_MPI
|
||||
KOKKOS_HAVE_MPI
|
||||
|
@ -154,13 +156,27 @@ TRIBITS_ADD_OPTION_AND_DEFINE(
|
|||
"${Kokkos_ENABLE_Debug_Bounds_Check_DEFAULT}"
|
||||
)
|
||||
|
||||
TRIBITS_ADD_OPTION_AND_DEFINE(
|
||||
Kokkos_ENABLE_Debug_DualView_Modify_Check
|
||||
KOKKOS_ENABLE_DEBUG_DUALVIEW_MODIFY_CHECK
|
||||
"Enable abort when Kokkos::DualView modified on host and device without sync."
|
||||
"${Kokkos_ENABLE_DEBUG}"
|
||||
)
|
||||
|
||||
TRIBITS_ADD_OPTION_AND_DEFINE(
|
||||
Kokkos_ENABLE_Profiling
|
||||
KOKKOS_ENABLE_PROFILING_INTERNAL
|
||||
KOKKOS_ENABLE_PROFILING
|
||||
"Enable KokkosP profiling support for kernel data collections."
|
||||
"${TPL_ENABLE_DLlib}"
|
||||
)
|
||||
|
||||
TRIBITS_ADD_OPTION_AND_DEFINE(
|
||||
Kokkos_ENABLE_Profiling_Load_Print
|
||||
KOKKOS_ENABLE_PROFILING_LOAD_PRINT
|
||||
"Print to standard output which profiling library was loaded."
|
||||
OFF
|
||||
)
|
||||
|
||||
# placeholder for future device...
|
||||
TRIBITS_ADD_OPTION_AND_DEFINE(
|
||||
Kokkos_ENABLE_Winthread
|
||||
|
@ -169,6 +185,7 @@ TRIBITS_ADD_OPTION_AND_DEFINE(
|
|||
"${TPL_ENABLE_Winthread}"
|
||||
)
|
||||
|
||||
# TODO: No longer an option in Kokkos. Needs to be removed.
|
||||
# use new/old View
|
||||
TRIBITS_ADD_OPTION_AND_DEFINE(
|
||||
Kokkos_USING_DEPRECATED_VIEW
|
||||
|
@ -177,12 +194,12 @@ TRIBITS_ADD_OPTION_AND_DEFINE(
|
|||
OFF
|
||||
)
|
||||
|
||||
|
||||
#------------------------------------------------------------------------------
|
||||
#
|
||||
# C) Install Kokkos' executable scripts
|
||||
#
|
||||
|
||||
|
||||
# nvcc_wrapper is Kokkos' wrapper for NVIDIA's NVCC CUDA compiler.
|
||||
# Kokkos needs nvcc_wrapper in order to build. Other libraries and
|
||||
# executables also need nvcc_wrapper. Thus, we need to install it.
|
||||
|
@ -199,6 +216,8 @@ INSTALL(PROGRAMS ${CMAKE_CURRENT_SOURCE_DIR}/bin/nvcc_wrapper DESTINATION bin)
|
|||
|
||||
TRIBITS_PROCESS_SUBPACKAGES()
|
||||
|
||||
|
||||
#------------------------------------------------------------------------------
|
||||
#
|
||||
# E) If Kokkos itself is enabled, process the Kokkos package
|
||||
#
|
||||
|
@ -213,3 +232,4 @@ TRIBITS_EXCLUDE_FILES(
|
|||
)
|
||||
|
||||
TRIBITS_PACKAGE_POSTPROCESS()
|
||||
ENDIF()
|
||||
|
|
|
@ -35,23 +35,26 @@ KOKKOS_INTERNAL_USE_MEMKIND := $(strip $(shell echo $(KOKKOS_USE_TPLS) | grep "e
|
|||
# Check for advanced settings.
|
||||
KOKKOS_INTERNAL_OPT_RANGE_AGGRESSIVE_VECTORIZATION := $(strip $(shell echo $(KOKKOS_OPTIONS) | grep "aggressive_vectorization" | wc -l))
|
||||
KOKKOS_INTERNAL_DISABLE_PROFILING := $(strip $(shell echo $(KOKKOS_OPTIONS) | grep "disable_profiling" | wc -l))
|
||||
KOKKOS_INTERNAL_DISABLE_DUALVIEW_MODIFY_CHECK := $(strip $(shell echo $(KOKKOS_OPTIONS) | grep "disable_dualview_modify_check" | wc -l))
|
||||
KOKKOS_INTERNAL_ENABLE_PROFILING_LOAD_PRINT := $(strip $(shell echo $(KOKKOS_OPTIONS) | grep "enable_profile_load_print" | wc -l))
|
||||
KOKKOS_INTERNAL_CUDA_USE_LDG := $(strip $(shell echo $(KOKKOS_CUDA_OPTIONS) | grep "use_ldg" | wc -l))
|
||||
KOKKOS_INTERNAL_CUDA_USE_UVM := $(strip $(shell echo $(KOKKOS_CUDA_OPTIONS) | grep "force_uvm" | wc -l))
|
||||
KOKKOS_INTERNAL_CUDA_USE_RELOC := $(strip $(shell echo $(KOKKOS_CUDA_OPTIONS) | grep "rdc" | wc -l))
|
||||
KOKKOS_INTERNAL_CUDA_USE_LAMBDA := $(strip $(shell echo $(KOKKOS_CUDA_OPTIONS) | grep "enable_lambda" | wc -l))
|
||||
|
||||
# Check for Kokkos Host Execution Spaces one of which must be on.
|
||||
KOKKOS_INTERNAL_USE_OPENMP := $(strip $(shell echo $(KOKKOS_DEVICES) | grep OpenMP | wc -l))
|
||||
KOKKOS_INTERNAL_USE_OPENMPTARGET := $(strip $(shell echo $(KOKKOS_DEVICES) | grep OpenMPTarget | wc -l))
|
||||
KOKKOS_INTERNAL_USE_OPENMP := $(strip $(shell echo $(subst OpenMPTarget,,$(KOKKOS_DEVICES)) | grep OpenMP | wc -l))
|
||||
KOKKOS_INTERNAL_USE_PTHREADS := $(strip $(shell echo $(KOKKOS_DEVICES) | grep Pthread | wc -l))
|
||||
KOKKOS_INTERNAL_USE_QTHREADS := $(strip $(shell echo $(KOKKOS_DEVICES) | grep Qthreads | wc -l))
|
||||
KOKKOS_INTERNAL_USE_SERIAL := $(strip $(shell echo $(KOKKOS_DEVICES) | grep Serial | wc -l))
|
||||
|
||||
ifeq ($(KOKKOS_INTERNAL_USE_OPENMP), 0)
|
||||
ifeq ($(KOKKOS_INTERNAL_USE_PTHREADS), 0)
|
||||
ifeq ($(KOKKOS_INTERNAL_USE_QTHREADS), 0)
|
||||
KOKKOS_INTERNAL_USE_SERIAL := 1
|
||||
endif
|
||||
endif
|
||||
ifeq ($(KOKKOS_INTERNAL_USE_PTHREADS), 0)
|
||||
ifeq ($(KOKKOS_INTERNAL_USE_QTHREADS), 0)
|
||||
KOKKOS_INTERNAL_USE_SERIAL := 1
|
||||
endif
|
||||
endif
|
||||
endif
|
||||
|
||||
# Check for other Execution Spaces.
|
||||
|
@ -64,24 +67,25 @@ ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
|
|||
endif
|
||||
|
||||
# Check OS.
|
||||
KOKKOS_OS := $(shell uname -s)
|
||||
KOKKOS_INTERNAL_OS_CYGWIN := $(shell uname -s | grep CYGWIN | wc -l)
|
||||
KOKKOS_INTERNAL_OS_LINUX := $(shell uname -s | grep Linux | wc -l)
|
||||
KOKKOS_INTERNAL_OS_DARWIN := $(shell uname -s | grep Darwin | wc -l)
|
||||
KOKKOS_OS := $(strip $(shell uname -s))
|
||||
KOKKOS_INTERNAL_OS_CYGWIN := $(strip $(shell uname -s | grep CYGWIN | wc -l))
|
||||
KOKKOS_INTERNAL_OS_LINUX := $(strip $(shell uname -s | grep Linux | wc -l))
|
||||
KOKKOS_INTERNAL_OS_DARWIN := $(strip $(shell uname -s | grep Darwin | wc -l))
|
||||
|
||||
# Check compiler.
|
||||
KOKKOS_INTERNAL_COMPILER_INTEL := $(shell $(CXX) --version 2>&1 | grep "Intel Corporation" | wc -l)
|
||||
KOKKOS_INTERNAL_COMPILER_PGI := $(shell $(CXX) --version 2>&1 | grep PGI | wc -l)
|
||||
KOKKOS_INTERNAL_COMPILER_XL := $(shell $(CXX) -qversion 2>&1 | grep XL | wc -l)
|
||||
KOKKOS_INTERNAL_COMPILER_CRAY := $(shell $(CXX) -craype-verbose 2>&1 | grep "CC-" | wc -l)
|
||||
KOKKOS_INTERNAL_COMPILER_NVCC := $(shell $(CXX) --version 2>&1 | grep "nvcc" | wc -l)
|
||||
KOKKOS_INTERNAL_COMPILER_INTEL := $(strip $(shell $(CXX) --version 2>&1 | grep "Intel Corporation" | wc -l))
|
||||
KOKKOS_INTERNAL_COMPILER_PGI := $(strip $(shell $(CXX) --version 2>&1 | grep PGI | wc -l))
|
||||
KOKKOS_INTERNAL_COMPILER_XL := $(strip $(shell $(CXX) -qversion 2>&1 | grep XL | wc -l))
|
||||
KOKKOS_INTERNAL_COMPILER_CRAY := $(strip $(shell $(CXX) -craype-verbose 2>&1 | grep "CC-" | wc -l))
|
||||
KOKKOS_INTERNAL_COMPILER_NVCC := $(strip $(shell $(CXX) --version 2>&1 | grep nvcc | wc -l))
|
||||
KOKKOS_INTERNAL_COMPILER_CLANG := $(strip $(shell $(CXX) --version 2>&1 | grep clang | wc -l))
|
||||
KOKKOS_INTERNAL_COMPILER_APPLE_CLANG := $(strip $(shell $(CXX) --version 2>&1 | grep "apple-darwin" | wc -l))
|
||||
ifneq ($(OMPI_CXX),)
|
||||
KOKKOS_INTERNAL_COMPILER_NVCC := $(shell $(OMPI_CXX) --version 2>&1 | grep "nvcc" | wc -l)
|
||||
KOKKOS_INTERNAL_COMPILER_NVCC := $(strip $(shell $(OMPI_CXX) --version 2>&1 | grep "nvcc" | wc -l))
|
||||
endif
|
||||
ifneq ($(MPICH_CXX),)
|
||||
KOKKOS_INTERNAL_COMPILER_NVCC := $(shell $(MPICH_CXX) --version 2>&1 | grep "nvcc" | wc -l)
|
||||
KOKKOS_INTERNAL_COMPILER_NVCC := $(strip $(shell $(MPICH_CXX) --version 2>&1 | grep "nvcc" | wc -l))
|
||||
endif
|
||||
KOKKOS_INTERNAL_COMPILER_CLANG := $(shell $(CXX) --version 2>&1 | grep "clang" | wc -l)
|
||||
|
||||
ifeq ($(KOKKOS_INTERNAL_COMPILER_CLANG), 2)
|
||||
KOKKOS_INTERNAL_COMPILER_CLANG = 1
|
||||
|
@ -90,6 +94,11 @@ ifeq ($(KOKKOS_INTERNAL_COMPILER_XL), 2)
|
|||
KOKKOS_INTERNAL_COMPILER_XL = 1
|
||||
endif
|
||||
|
||||
# Apple Clang passes both clang and apple clang tests, so turn off clang.
|
||||
ifeq ($(KOKKOS_INTERNAL_COMPILER_APPLE_CLANG), 1)
|
||||
KOKKOS_INTERNAL_COMPILER_CLANG = 0
|
||||
endif
|
||||
|
||||
ifeq ($(KOKKOS_INTERNAL_COMPILER_CLANG), 1)
|
||||
KOKKOS_INTERNAL_COMPILER_CLANG_VERSION := $(shell clang --version | grep version | cut -d ' ' -f3 | tr -d '.')
|
||||
|
||||
|
@ -97,29 +106,43 @@ ifeq ($(KOKKOS_INTERNAL_COMPILER_CLANG), 1)
|
|||
ifeq ($(shell test $(KOKKOS_INTERNAL_COMPILER_CLANG_VERSION) -lt 400; echo $$?),0)
|
||||
$(error Compiling Cuda code directly with Clang requires version 4.0.0 or higher)
|
||||
endif
|
||||
|
||||
KOKKOS_INTERNAL_CUDA_USE_LAMBDA := 1
|
||||
endif
|
||||
endif
|
||||
|
||||
# Set OpenMP flags.
|
||||
ifeq ($(KOKKOS_INTERNAL_COMPILER_PGI), 1)
|
||||
KOKKOS_INTERNAL_OPENMP_FLAG := -mp
|
||||
else
|
||||
ifeq ($(KOKKOS_INTERNAL_COMPILER_CLANG), 1)
|
||||
KOKKOS_INTERNAL_OPENMP_FLAG := -fopenmp=libomp
|
||||
else
|
||||
ifeq ($(KOKKOS_INTERNAL_COMPILER_XL), 1)
|
||||
KOKKOS_INTERNAL_OPENMP_FLAG := -qsmp=omp
|
||||
ifeq ($(KOKKOS_INTERNAL_COMPILER_APPLE_CLANG), 1)
|
||||
KOKKOS_INTERNAL_OPENMP_FLAG := -fopenmp=libomp
|
||||
else
|
||||
ifeq ($(KOKKOS_INTERNAL_COMPILER_CRAY), 1)
|
||||
# OpenMP is turned on by default in Cray compiler environment.
|
||||
KOKKOS_INTERNAL_OPENMP_FLAG :=
|
||||
ifeq ($(KOKKOS_INTERNAL_COMPILER_XL), 1)
|
||||
KOKKOS_INTERNAL_OPENMP_FLAG := -qsmp=omp
|
||||
else
|
||||
KOKKOS_INTERNAL_OPENMP_FLAG := -fopenmp
|
||||
ifeq ($(KOKKOS_INTERNAL_COMPILER_CRAY), 1)
|
||||
# OpenMP is turned on by default in Cray compiler environment.
|
||||
KOKKOS_INTERNAL_OPENMP_FLAG :=
|
||||
else
|
||||
KOKKOS_INTERNAL_OPENMP_FLAG := -fopenmp
|
||||
endif
|
||||
endif
|
||||
endif
|
||||
endif
|
||||
endif
|
||||
ifeq ($(KOKKOS_INTERNAL_COMPILER_XL), 1)
|
||||
KOKKOS_INTERNAL_OPENMPTARGET_FLAG := -DKOKKOS_IBM_XL_OMP45_WORKAROUND -qsmp=omp -qoffload -qnoeh
|
||||
else
|
||||
ifeq ($(KOKKOS_INTERNAL_COMPILER_CLANG), 1)
|
||||
KOKKOS_INTERNAL_OPENMPTARGET_FLAG := -DKOKKOS_BUG_WORKAROUND_IBM_CLANG_OMP45_VIEW_INIT -fopenmp-implicit-declare-target -fopenmp-targets=nvptx64-nvidia-cuda -fopenmp -fopenmp=libomp
|
||||
endif
|
||||
endif
|
||||
|
||||
# Set C++11 flags.
|
||||
ifeq ($(KOKKOS_INTERNAL_COMPILER_PGI), 1)
|
||||
KOKKOS_INTERNAL_CXX11_FLAG := --c++11
|
||||
else
|
||||
|
@ -146,7 +169,7 @@ KOKKOS_INTERNAL_USE_ARCH_SKX := $(strip $(shell echo $(KOKKOS_ARCH) | grep SKX |
|
|||
KOKKOS_INTERNAL_USE_ARCH_KNL := $(strip $(shell echo $(KOKKOS_ARCH) | grep KNL | wc -l))
|
||||
|
||||
# NVIDIA based.
|
||||
NVCC_WRAPPER := $(KOKKOS_PATH)/config/nvcc_wrapper
|
||||
NVCC_WRAPPER := $(KOKKOS_PATH)/bin/nvcc_wrapper
|
||||
KOKKOS_INTERNAL_USE_ARCH_KEPLER30 := $(strip $(shell echo $(KOKKOS_ARCH) | grep Kepler30 | wc -l))
|
||||
KOKKOS_INTERNAL_USE_ARCH_KEPLER32 := $(strip $(shell echo $(KOKKOS_ARCH) | grep Kepler32 | wc -l))
|
||||
KOKKOS_INTERNAL_USE_ARCH_KEPLER35 := $(strip $(shell echo $(KOKKOS_ARCH) | grep Kepler35 | wc -l))
|
||||
|
@ -180,10 +203,20 @@ ifeq ($(KOKKOS_INTERNAL_USE_ARCH_NVIDIA), 0)
|
|||
+ $(KOKKOS_INTERNAL_USE_ARCH_MAXWELL53) | bc))
|
||||
endif
|
||||
|
||||
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_NVIDIA), 1)
|
||||
ifeq ($(KOKKOS_INTERNAL_USE_OPENMPTARGET), 1)
|
||||
ifeq ($(KOKKOS_INTERNAL_COMPILER_CLANG), 1)
|
||||
KOKKOS_INTERNAL_NVCC_PATH := $(shell which nvcc)
|
||||
CUDA_PATH ?= $(KOKKOS_INTERNAL_NVCC_PATH:/bin/nvcc=)
|
||||
KOKKOS_INTERNAL_OPENMPTARGET_FLAG := $(KOKKOS_INTERNAL_OPENMPTARGET_FLAG) --cuda-path=$(CUDA_PATH)
|
||||
endif
|
||||
endif
|
||||
endif
|
||||
# ARM based.
|
||||
KOKKOS_INTERNAL_USE_ARCH_ARMV80 := $(strip $(shell echo $(KOKKOS_ARCH) | grep ARMv80 | wc -l))
|
||||
KOKKOS_INTERNAL_USE_ARCH_ARMV81 := $(strip $(shell echo $(KOKKOS_ARCH) | grep ARMv81 | wc -l))
|
||||
KOKKOS_INTERNAL_USE_ARCH_ARMV8_THUNDERX := $(strip $(shell echo $(KOKKOS_ARCH) | grep ARMv8-ThunderX | wc -l))
|
||||
KOKKOS_INTERNAL_USE_ARCH_ARM := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_ARCH_ARMV80)+$(KOKKOS_INTERNAL_USE_ARCH_ARMV81)+$(KOKKOS_INTERNAL_USE_ARCH_ARMV8_THUNDERX) | bc))
|
||||
|
||||
# IBM based.
|
||||
KOKKOS_INTERNAL_USE_ARCH_BGQ := $(strip $(shell echo $(KOKKOS_ARCH) | grep BGQ | wc -l))
|
||||
|
@ -206,8 +239,11 @@ KOKKOS_INTERNAL_USE_ISA_X86_64 := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_
|
|||
KOKKOS_INTERNAL_USE_ISA_KNC := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_ARCH_KNC) | bc ))
|
||||
KOKKOS_INTERNAL_USE_ISA_POWERPCLE := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_ARCH_POWER8)+$(KOKKOS_INTERNAL_USE_ARCH_POWER9) | bc ))
|
||||
|
||||
# Decide whether we can support transactional memory
|
||||
KOKKOS_INTERNAL_USE_TM := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_ARCH_BDW)+$(KOKKOS_INTERNAL_USE_ARCH_SKX) | bc ))
|
||||
|
||||
# Incompatible flags?
|
||||
KOKKOS_INTERNAL_USE_ARCH_MULTIHOST := $(strip $(shell echo "$(KOKKOS_INTERNAL_USE_ARCH_AVX)+$(KOKKOS_INTERNAL_USE_ARCH_AVX2)+$(KOKKOS_INTERNAL_USE_ARCH_KNC)+$(KOKKOS_INTERNAL_USE_ARCH_IBM)+$(KOKKOS_INTERNAL_USE_ARCH_AMDAVX)+$(KOKKOS_INTERNAL_USE_ARCH_ARMV80)+$(KOKKOS_INTERNAL_USE_ARCH_ARMV81)+$(KOKKOS_INTERNAL_USE_ARCH_ARMV8_THUNDERX)>1" | bc ))
|
||||
KOKKOS_INTERNAL_USE_ARCH_MULTIHOST := $(strip $(shell echo "$(KOKKOS_INTERNAL_USE_ARCH_AVX)+$(KOKKOS_INTERNAL_USE_ARCH_AVX2)+$(KOKKOS_INTERNAL_USE_ARCH_AVX512MIC)+$(KOKKOS_INTERNAL_USE_ARCH_AVX512XEON)+$(KOKKOS_INTERNAL_USE_ARCH_KNC)+$(KOKKOS_INTERNAL_USE_ARCH_IBM)+$(KOKKOS_INTERNAL_USE_ARCH_ARM)>1" | bc ))
|
||||
KOKKOS_INTERNAL_USE_ARCH_MULTIGPU := $(strip $(shell echo "$(KOKKOS_INTERNAL_USE_ARCH_NVIDIA)>1" | bc))
|
||||
|
||||
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_MULTIHOST), 1)
|
||||
|
@ -240,12 +276,22 @@ tmp := $(shell echo "Makefile constructed configuration:" >> KokkosCore_config.t
|
|||
tmp := $(shell date >> KokkosCore_config.tmp)
|
||||
tmp := $(shell echo "----------------------------------------------*/" >> KokkosCore_config.tmp)
|
||||
|
||||
tmp := $(shell echo '\#if !defined(KOKKOS_MACROS_HPP) || defined(KOKKOS_CORE_CONFIG_H)' >> KokkosCore_config.tmp)
|
||||
tmp := $(shell echo '\#error "Do not include KokkosCore_config.h directly; include Kokkos_Macros.hpp instead."' >> KokkosCore_config.tmp)
|
||||
tmp := $(shell echo '\#else' >> KokkosCore_config.tmp)
|
||||
tmp := $(shell echo '\#define KOKKOS_CORE_CONFIG_H' >> KokkosCore_config.tmp)
|
||||
tmp := $(shell echo '\#endif' >> KokkosCore_config.tmp)
|
||||
|
||||
tmp := $(shell echo "/* Execution Spaces */" >> KokkosCore_config.tmp)
|
||||
|
||||
ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
|
||||
tmp := $(shell echo "\#define KOKKOS_HAVE_CUDA 1" >> KokkosCore_config.tmp )
|
||||
endif
|
||||
|
||||
ifeq ($(KOKKOS_INTERNAL_USE_OPENMPTARGET), 1)
|
||||
tmp := $(shell echo '\#define KOKKOS_ENABLE_OPENMPTARGET 1' >> KokkosCore_config.tmp)
|
||||
endif
|
||||
|
||||
ifeq ($(KOKKOS_INTERNAL_USE_OPENMP), 1)
|
||||
tmp := $(shell echo '\#define KOKKOS_HAVE_OPENMP 1' >> KokkosCore_config.tmp)
|
||||
endif
|
||||
|
@ -262,6 +308,12 @@ ifeq ($(KOKKOS_INTERNAL_USE_SERIAL), 1)
|
|||
tmp := $(shell echo "\#define KOKKOS_HAVE_SERIAL 1" >> KokkosCore_config.tmp )
|
||||
endif
|
||||
|
||||
ifeq ($(KOKKOS_INTERNAL_USE_TM), 1)
|
||||
tmp := $(shell echo "\#ifndef __CUDA_ARCH__" >> KokkosCore_config.tmp )
|
||||
tmp := $(shell echo "\#define KOKKOS_ENABLE_TM" >> KokkosCore_config.tmp )
|
||||
tmp := $(shell echo "\#endif" >> KokkosCore_config.tmp )
|
||||
endif
|
||||
|
||||
ifeq ($(KOKKOS_INTERNAL_USE_ISA_X86_64), 1)
|
||||
tmp := $(shell echo "\#ifndef __CUDA_ARCH__" >> KokkosCore_config.tmp )
|
||||
tmp := $(shell echo "\#define KOKKOS_USE_ISA_X86_64" >> KokkosCore_config.tmp )
|
||||
|
@ -293,13 +345,21 @@ ifeq ($(KOKKOS_INTERNAL_ENABLE_CXX1Z), 1)
|
|||
endif
|
||||
|
||||
ifeq ($(KOKKOS_INTERNAL_ENABLE_DEBUG), 1)
|
||||
ifeq ($(KOKKOS_INTERNAL_COMPILER_NVCC), 1)
|
||||
KOKKOS_CXXFLAGS += -lineinfo
|
||||
endif
|
||||
ifeq ($(KOKKOS_INTERNAL_COMPILER_NVCC), 1)
|
||||
KOKKOS_CXXFLAGS += -lineinfo
|
||||
endif
|
||||
|
||||
KOKKOS_CXXFLAGS += -g
|
||||
KOKKOS_LDFLAGS += -g -ldl
|
||||
tmp := $(shell echo "\#define KOKKOS_ENABLE_DEBUG_BOUNDS_CHECK 1" >> KokkosCore_config.tmp )
|
||||
tmp := $(shell echo "\#define KOKKOS_HAVE_DEBUG 1" >> KokkosCore_config.tmp )
|
||||
ifeq ($(KOKKOS_INTERNAL_DISABLE_DUALVIEW_MODIFY_CHECK), 0)
|
||||
tmp := $(shell echo "\#define KOKKOS_ENABLE_DEBUG_DUALVIEW_MODIFY_CHECK 1" >> KokkosCore_config.tmp )
|
||||
endif
|
||||
endif
|
||||
|
||||
ifeq ($(KOKKOS_INTERNAL_ENABLE_PROFILING_LOAD_PRINT), 1)
|
||||
tmp := $(shell echo "\#define KOKKOS_ENABLE_PROFILING_LOAD_PRINT 1" >> KokkosCore_config.tmp )
|
||||
endif
|
||||
|
||||
ifeq ($(KOKKOS_INTERNAL_USE_HWLOC), 1)
|
||||
|
@ -311,8 +371,6 @@ endif
|
|||
|
||||
ifeq ($(KOKKOS_INTERNAL_USE_LIBRT), 1)
|
||||
tmp := $(shell echo "\#define KOKKOS_USE_LIBRT 1" >> KokkosCore_config.tmp )
|
||||
tmp := $(shell echo "\#define PREC_TIMER 1" >> KokkosCore_config.tmp )
|
||||
tmp := $(shell echo "\#define KOKKOSP_ENABLE_RTLIB 1" >> KokkosCore_config.tmp )
|
||||
KOKKOS_LIBS += -lrt
|
||||
endif
|
||||
|
||||
|
@ -323,8 +381,8 @@ ifeq ($(KOKKOS_INTERNAL_USE_MEMKIND), 1)
|
|||
tmp := $(shell echo "\#define KOKKOS_HAVE_HBWSPACE 1" >> KokkosCore_config.tmp )
|
||||
endif
|
||||
|
||||
ifeq ($(KOKKOS_INTERNAL_DISABLE_PROFILING), 1)
|
||||
tmp := $(shell echo "\#define KOKKOS_ENABLE_PROFILING 0" >> KokkosCore_config.tmp )
|
||||
ifeq ($(KOKKOS_INTERNAL_DISABLE_PROFILING), 0)
|
||||
tmp := $(shell echo "\#define KOKKOS_ENABLE_PROFILING" >> KokkosCore_config.tmp )
|
||||
endif
|
||||
|
||||
tmp := $(shell echo "/* Optimization Settings */" >> KokkosCore_config.tmp)
|
||||
|
@ -336,39 +394,44 @@ endif
|
|||
tmp := $(shell echo "/* Cuda Settings */" >> KokkosCore_config.tmp)
|
||||
|
||||
ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
|
||||
ifeq ($(KOKKOS_INTERNAL_CUDA_USE_LDG), 1)
|
||||
tmp := $(shell echo "\#define KOKKOS_CUDA_USE_LDG_INTRINSIC 1" >> KokkosCore_config.tmp )
|
||||
else
|
||||
ifeq ($(KOKKOS_INTERNAL_COMPILER_CLANG), 1)
|
||||
tmp := $(shell echo "\#define KOKKOS_CUDA_USE_LDG_INTRINSIC 1" >> KokkosCore_config.tmp )
|
||||
endif
|
||||
endif
|
||||
|
||||
ifeq ($(KOKKOS_INTERNAL_CUDA_USE_LDG), 1)
|
||||
tmp := $(shell echo "\#define KOKKOS_CUDA_USE_LDG_INTRINSIC 1" >> KokkosCore_config.tmp )
|
||||
endif
|
||||
ifeq ($(KOKKOS_INTERNAL_CUDA_USE_UVM), 1)
|
||||
tmp := $(shell echo "\#define KOKKOS_CUDA_USE_UVM 1" >> KokkosCore_config.tmp )
|
||||
endif
|
||||
|
||||
ifeq ($(KOKKOS_INTERNAL_CUDA_USE_UVM), 1)
|
||||
tmp := $(shell echo "\#define KOKKOS_CUDA_USE_UVM 1" >> KokkosCore_config.tmp )
|
||||
tmp := $(shell echo "\#define KOKKOS_USE_CUDA_UVM 1" >> KokkosCore_config.tmp )
|
||||
endif
|
||||
ifeq ($(KOKKOS_INTERNAL_CUDA_USE_RELOC), 1)
|
||||
tmp := $(shell echo "\#define KOKKOS_CUDA_USE_RELOCATABLE_DEVICE_CODE 1" >> KokkosCore_config.tmp )
|
||||
KOKKOS_CXXFLAGS += --relocatable-device-code=true
|
||||
KOKKOS_LDFLAGS += --relocatable-device-code=true
|
||||
endif
|
||||
|
||||
ifeq ($(KOKKOS_INTERNAL_CUDA_USE_RELOC), 1)
|
||||
tmp := $(shell echo "\#define KOKKOS_CUDA_USE_RELOCATABLE_DEVICE_CODE 1" >> KokkosCore_config.tmp )
|
||||
KOKKOS_CXXFLAGS += --relocatable-device-code=true
|
||||
KOKKOS_LDFLAGS += --relocatable-device-code=true
|
||||
endif
|
||||
ifeq ($(KOKKOS_INTERNAL_CUDA_USE_LAMBDA), 1)
|
||||
ifeq ($(KOKKOS_INTERNAL_COMPILER_NVCC), 1)
|
||||
ifeq ($(shell test $(KOKKOS_INTERNAL_COMPILER_NVCC_VERSION) -gt 70; echo $$?),0)
|
||||
tmp := $(shell echo "\#define KOKKOS_CUDA_USE_LAMBDA 1" >> KokkosCore_config.tmp )
|
||||
KOKKOS_CXXFLAGS += -expt-extended-lambda
|
||||
else
|
||||
$(warning Warning: Cuda Lambda support was requested but NVCC version is too low. This requires NVCC for Cuda version 7.5 or higher. Disabling Lambda support now.)
|
||||
endif
|
||||
endif
|
||||
|
||||
ifeq ($(KOKKOS_INTERNAL_CUDA_USE_LAMBDA), 1)
|
||||
ifeq ($(KOKKOS_INTERNAL_COMPILER_NVCC), 1)
|
||||
ifeq ($(shell test $(KOKKOS_INTERNAL_COMPILER_NVCC_VERSION) -gt 70; echo $$?),0)
|
||||
ifeq ($(KOKKOS_INTERNAL_COMPILER_CLANG), 1)
|
||||
tmp := $(shell echo "\#define KOKKOS_CUDA_USE_LAMBDA 1" >> KokkosCore_config.tmp )
|
||||
KOKKOS_CXXFLAGS += -expt-extended-lambda
|
||||
else
|
||||
$(warning Warning: Cuda Lambda support was requested but NVCC version is too low. This requires NVCC for Cuda version 7.5 or higher. Disabling Lambda support now.)
|
||||
endif
|
||||
endif
|
||||
|
||||
ifeq ($(KOKKOS_INTERNAL_COMPILER_CLANG), 1)
|
||||
tmp := $(shell echo "\#define KOKKOS_CUDA_USE_LAMBDA 1" >> KokkosCore_config.tmp )
|
||||
tmp := $(shell echo "\#define KOKKOS_CUDA_CLANG_WORKAROUND" >> KokkosCore_config.tmp )
|
||||
endif
|
||||
endif
|
||||
|
||||
endif
|
||||
|
||||
# Add Architecture flags.
|
||||
|
||||
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_ARMV80), 1)
|
||||
|
@ -469,7 +532,7 @@ ifeq ($(KOKKOS_INTERNAL_USE_ARCH_POWER9), 1)
|
|||
endif
|
||||
endif
|
||||
|
||||
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_AVX2), 1)
|
||||
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_HSW), 1)
|
||||
tmp := $(shell echo "\#define KOKKOS_ARCH_AVX2 1" >> KokkosCore_config.tmp )
|
||||
|
||||
ifeq ($(KOKKOS_INTERNAL_COMPILER_INTEL), 1)
|
||||
|
@ -491,6 +554,28 @@ ifeq ($(KOKKOS_INTERNAL_USE_ARCH_AVX2), 1)
|
|||
endif
|
||||
endif
|
||||
|
||||
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_BDW), 1)
|
||||
tmp := $(shell echo "\#define KOKKOS_ARCH_AVX2 1" >> KokkosCore_config.tmp )
|
||||
|
||||
ifeq ($(KOKKOS_INTERNAL_COMPILER_INTEL), 1)
|
||||
KOKKOS_CXXFLAGS += -xCORE-AVX2
|
||||
KOKKOS_LDFLAGS += -xCORE-AVX2
|
||||
else
|
||||
ifeq ($(KOKKOS_INTERNAL_COMPILER_CRAY), 1)
|
||||
|
||||
else
|
||||
ifeq ($(KOKKOS_INTERNAL_COMPILER_PGI), 1)
|
||||
KOKKOS_CXXFLAGS += -tp=haswell
|
||||
KOKKOS_LDFLAGS += -tp=haswell
|
||||
else
|
||||
# Assume that this is a really a GNU compiler.
|
||||
KOKKOS_CXXFLAGS += -march=core-avx2 -mtune=core-avx2 -mrtm
|
||||
KOKKOS_LDFLAGS += -march=core-avx2 -mtune=core-avx2 -mrtm
|
||||
endif
|
||||
endif
|
||||
endif
|
||||
endif
|
||||
|
||||
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_AVX512MIC), 1)
|
||||
tmp := $(shell echo "\#define KOKKOS_ARCH_AVX512MIC 1" >> KokkosCore_config.tmp )
|
||||
|
||||
|
@ -501,12 +586,12 @@ ifeq ($(KOKKOS_INTERNAL_USE_ARCH_AVX512MIC), 1)
|
|||
ifeq ($(KOKKOS_INTERNAL_COMPILER_CRAY), 1)
|
||||
|
||||
else
|
||||
ifeq ($(KOKKOS_INTERNAL_COMPILER_PGI), 1)
|
||||
ifeq ($(KOKKOS_INTERNAL_COMPILER_PGI), 1)
|
||||
|
||||
else
|
||||
# Asssume that this is really a GNU compiler.
|
||||
KOKKOS_CXXFLAGS += -march=knl
|
||||
KOKKOS_LDFLAGS += -march=knl
|
||||
KOKKOS_CXXFLAGS += -march=knl -mtune=knl
|
||||
KOKKOS_LDFLAGS += -march=knl -mtune=knl
|
||||
endif
|
||||
endif
|
||||
endif
|
||||
|
@ -526,8 +611,8 @@ ifeq ($(KOKKOS_INTERNAL_USE_ARCH_AVX512XEON), 1)
|
|||
|
||||
else
|
||||
# Nothing here yet.
|
||||
KOKKOS_CXXFLAGS += -march=skylake-avx512
|
||||
KOKKOS_LDFLAGS += -march=skylake-avx512
|
||||
KOKKOS_CXXFLAGS += -march=skylake-avx512 -mtune=skylake-avx512 -mrtm
|
||||
KOKKOS_LDFLAGS += -march=skylake-avx512 -mtune=skylake-avx512 -mrtm
|
||||
endif
|
||||
endif
|
||||
endif
|
||||
|
@ -541,70 +626,67 @@ endif
|
|||
|
||||
# Figure out the architecture flag for Cuda.
|
||||
ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
|
||||
ifeq ($(KOKKOS_INTERNAL_COMPILER_NVCC), 1)
|
||||
KOKKOS_INTERNAL_CUDA_ARCH_FLAG=-arch
|
||||
endif
|
||||
ifeq ($(KOKKOS_INTERNAL_COMPILER_CLANG), 1)
|
||||
KOKKOS_INTERNAL_CUDA_ARCH_FLAG=--cuda-gpu-arch
|
||||
KOKKOS_CXXFLAGS += -x cuda
|
||||
endif
|
||||
|
||||
ifeq ($(KOKKOS_INTERNAL_COMPILER_NVCC), 1)
|
||||
KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG=-arch
|
||||
endif
|
||||
ifeq ($(KOKKOS_INTERNAL_COMPILER_CLANG), 1)
|
||||
KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG=--cuda-gpu-arch
|
||||
KOKKOS_CXXFLAGS += -x cuda
|
||||
endif
|
||||
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_KEPLER30), 1)
|
||||
tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER 1" >> KokkosCore_config.tmp )
|
||||
tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER30 1" >> KokkosCore_config.tmp )
|
||||
KOKKOS_INTERNAL_CUDA_ARCH_FLAG := $(KOKKOS_INTERNAL_CUDA_ARCH_FLAG)=sm_30
|
||||
endif
|
||||
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_KEPLER32), 1)
|
||||
tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER 1" >> KokkosCore_config.tmp )
|
||||
tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER32 1" >> KokkosCore_config.tmp )
|
||||
KOKKOS_INTERNAL_CUDA_ARCH_FLAG := $(KOKKOS_INTERNAL_CUDA_ARCH_FLAG)=sm_32
|
||||
endif
|
||||
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_KEPLER35), 1)
|
||||
tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER 1" >> KokkosCore_config.tmp )
|
||||
tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER35 1" >> KokkosCore_config.tmp )
|
||||
KOKKOS_INTERNAL_CUDA_ARCH_FLAG := $(KOKKOS_INTERNAL_CUDA_ARCH_FLAG)=sm_35
|
||||
endif
|
||||
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_KEPLER37), 1)
|
||||
tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER 1" >> KokkosCore_config.tmp )
|
||||
tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER37 1" >> KokkosCore_config.tmp )
|
||||
KOKKOS_INTERNAL_CUDA_ARCH_FLAG := $(KOKKOS_INTERNAL_CUDA_ARCH_FLAG)=sm_37
|
||||
endif
|
||||
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_MAXWELL50), 1)
|
||||
tmp := $(shell echo "\#define KOKKOS_ARCH_MAXWELL 1" >> KokkosCore_config.tmp )
|
||||
tmp := $(shell echo "\#define KOKKOS_ARCH_MAXWELL50 1" >> KokkosCore_config.tmp )
|
||||
KOKKOS_INTERNAL_CUDA_ARCH_FLAG := $(KOKKOS_INTERNAL_CUDA_ARCH_FLAG)=sm_50
|
||||
endif
|
||||
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_MAXWELL52), 1)
|
||||
tmp := $(shell echo "\#define KOKKOS_ARCH_MAXWELL 1" >> KokkosCore_config.tmp )
|
||||
tmp := $(shell echo "\#define KOKKOS_ARCH_MAXWELL52 1" >> KokkosCore_config.tmp )
|
||||
KOKKOS_INTERNAL_CUDA_ARCH_FLAG := $(KOKKOS_INTERNAL_CUDA_ARCH_FLAG)=sm_52
|
||||
endif
|
||||
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_MAXWELL53), 1)
|
||||
tmp := $(shell echo "\#define KOKKOS_ARCH_MAXWELL 1" >> KokkosCore_config.tmp )
|
||||
tmp := $(shell echo "\#define KOKKOS_ARCH_MAXWELL53 1" >> KokkosCore_config.tmp )
|
||||
KOKKOS_INTERNAL_CUDA_ARCH_FLAG := $(KOKKOS_INTERNAL_CUDA_ARCH_FLAG)=sm_53
|
||||
endif
|
||||
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_PASCAL60), 1)
|
||||
tmp := $(shell echo "\#define KOKKOS_ARCH_PASCAL 1" >> KokkosCore_config.tmp )
|
||||
tmp := $(shell echo "\#define KOKKOS_ARCH_PASCAL60 1" >> KokkosCore_config.tmp )
|
||||
KOKKOS_INTERNAL_CUDA_ARCH_FLAG := $(KOKKOS_INTERNAL_CUDA_ARCH_FLAG)=sm_60
|
||||
endif
|
||||
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_PASCAL61), 1)
|
||||
tmp := $(shell echo "\#define KOKKOS_ARCH_PASCAL 1" >> KokkosCore_config.tmp )
|
||||
tmp := $(shell echo "\#define KOKKOS_ARCH_PASCAL61 1" >> KokkosCore_config.tmp )
|
||||
KOKKOS_INTERNAL_CUDA_ARCH_FLAG := $(KOKKOS_INTERNAL_CUDA_ARCH_FLAG)=sm_61
|
||||
endif
|
||||
|
||||
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_KEPLER30), 1)
|
||||
tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER 1" >> KokkosCore_config.tmp )
|
||||
tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER30 1" >> KokkosCore_config.tmp )
|
||||
KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_30
|
||||
KOKKOS_LDFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_30
|
||||
endif
|
||||
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_KEPLER32), 1)
|
||||
tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER 1" >> KokkosCore_config.tmp )
|
||||
tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER32 1" >> KokkosCore_config.tmp )
|
||||
KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_32
|
||||
KOKKOS_LDFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_32
|
||||
endif
|
||||
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_KEPLER35), 1)
|
||||
tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER 1" >> KokkosCore_config.tmp )
|
||||
tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER35 1" >> KokkosCore_config.tmp )
|
||||
KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_35
|
||||
KOKKOS_LDFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_35
|
||||
endif
|
||||
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_KEPLER37), 1)
|
||||
tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER 1" >> KokkosCore_config.tmp )
|
||||
tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER37 1" >> KokkosCore_config.tmp )
|
||||
KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_37
|
||||
KOKKOS_LDFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_37
|
||||
endif
|
||||
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_MAXWELL50), 1)
|
||||
tmp := $(shell echo "\#define KOKKOS_ARCH_MAXWELL 1" >> KokkosCore_config.tmp )
|
||||
tmp := $(shell echo "\#define KOKKOS_ARCH_MAXWELL50 1" >> KokkosCore_config.tmp )
|
||||
KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_50
|
||||
KOKKOS_LDFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_50
|
||||
endif
|
||||
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_MAXWELL52), 1)
|
||||
tmp := $(shell echo "\#define KOKKOS_ARCH_MAXWELL 1" >> KokkosCore_config.tmp )
|
||||
tmp := $(shell echo "\#define KOKKOS_ARCH_MAXWELL52 1" >> KokkosCore_config.tmp )
|
||||
KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_52
|
||||
KOKKOS_LDFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_52
|
||||
endif
|
||||
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_MAXWELL53), 1)
|
||||
tmp := $(shell echo "\#define KOKKOS_ARCH_MAXWELL 1" >> KokkosCore_config.tmp )
|
||||
tmp := $(shell echo "\#define KOKKOS_ARCH_MAXWELL53 1" >> KokkosCore_config.tmp )
|
||||
KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_53
|
||||
KOKKOS_LDFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_53
|
||||
endif
|
||||
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_PASCAL61), 1)
|
||||
tmp := $(shell echo "\#define KOKKOS_ARCH_PASCAL 1" >> KokkosCore_config.tmp )
|
||||
tmp := $(shell echo "\#define KOKKOS_ARCH_PASCAL61 1" >> KokkosCore_config.tmp )
|
||||
KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_61
|
||||
KOKKOS_LDFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_61
|
||||
endif
|
||||
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_PASCAL60), 1)
|
||||
tmp := $(shell echo "\#define KOKKOS_ARCH_PASCAL 1" >> KokkosCore_config.tmp )
|
||||
tmp := $(shell echo "\#define KOKKOS_ARCH_PASCAL60 1" >> KokkosCore_config.tmp )
|
||||
KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_60
|
||||
KOKKOS_LDFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_60
|
||||
endif
|
||||
ifneq ($(KOKKOS_INTERNAL_USE_ARCH_NVIDIA), 0)
|
||||
KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_CUDA_ARCH_FLAG)
|
||||
|
||||
ifeq ($(KOKKOS_INTERNAL_COMPILER_NVCC), 1)
|
||||
KOKKOS_LDFLAGS += $(KOKKOS_INTERNAL_CUDA_ARCH_FLAG)
|
||||
endif
|
||||
endif
|
||||
endif
|
||||
|
||||
KOKKOS_INTERNAL_LS_CONFIG := $(shell ls KokkosCore_config.h)
|
||||
|
@ -630,9 +712,24 @@ KOKKOS_SRC += $(wildcard $(KOKKOS_PATH)/containers/src/impl/*.cpp)
|
|||
ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
|
||||
KOKKOS_SRC += $(wildcard $(KOKKOS_PATH)/core/src/Cuda/*.cpp)
|
||||
KOKKOS_HEADERS += $(wildcard $(KOKKOS_PATH)/core/src/Cuda/*.hpp)
|
||||
KOKKOS_CXXFLAGS += -I$(CUDA_PATH)/include
|
||||
KOKKOS_CPPFLAGS += -I$(CUDA_PATH)/include
|
||||
KOKKOS_LDFLAGS += -L$(CUDA_PATH)/lib64
|
||||
KOKKOS_LIBS += -lcudart -lcuda
|
||||
|
||||
ifeq ($(KOKKOS_INTERNAL_COMPILER_CLANG), 1)
|
||||
KOKKOS_CXXFLAGS += --cuda-path=$(CUDA_PATH)
|
||||
endif
|
||||
endif
|
||||
|
||||
ifeq ($(KOKKOS_INTERNAL_USE_OPENMPTARGET), 1)
|
||||
KOKKOS_SRC += $(KOKKOS_PATH)/core/src/OpenMPTarget/Kokkos_OpenMPTarget_Exec.cpp $(KOKKOS_PATH)/core/src/OpenMPTarget/Kokkos_OpenMPTargetSpace.cpp
|
||||
KOKKOS_HEADERS += $(wildcard $(KOKKOS_PATH)/core/src/OpenMPTarget/*.hpp)
|
||||
ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
|
||||
KOKKOS_CXXFLAGS += -Xcompiler $(KOKKOS_INTERNAL_OPENMPTARGET_FLAG)
|
||||
else
|
||||
KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_OPENMPTARGET_FLAG)
|
||||
endif
|
||||
KOKKOS_LDFLAGS += $(KOKKOS_INTERNAL_OPENMPTARGET_FLAG)
|
||||
endif
|
||||
|
||||
ifeq ($(KOKKOS_INTERNAL_USE_OPENMP), 1)
|
||||
|
@ -666,10 +763,27 @@ endif
|
|||
ifeq ($(KOKKOS_INTERNAL_COMPILER_CLANG), 1)
|
||||
KOKKOS_INTERNAL_GCC_PATH = $(shell which g++)
|
||||
KOKKOS_INTERNAL_GCC_TOOLCHAIN = $(KOKKOS_INTERNAL_GCC_PATH:/bin/g++=)
|
||||
KOKKOS_CXXFLAGS += --gcc-toolchain=$(KOKKOS_INTERNAL_GCC_TOOLCHAIN) -DKOKKOS_CUDA_CLANG_WORKAROUND -DKOKKOS_CUDA_USE_LDG_INTRINSIC
|
||||
KOKKOS_CXXFLAGS += --gcc-toolchain=$(KOKKOS_INTERNAL_GCC_TOOLCHAIN)
|
||||
KOKKOS_LDFLAGS += --gcc-toolchain=$(KOKKOS_INTERNAL_GCC_TOOLCHAIN)
|
||||
endif
|
||||
|
||||
# Don't include Kokkos_HBWSpace.cpp if not using MEMKIND to avoid a link warning.
|
||||
ifneq ($(KOKKOS_INTERNAL_USE_MEMKIND), 1)
|
||||
KOKKOS_SRC := $(filter-out $(KOKKOS_PATH)/core/src/impl/Kokkos_HBWSpace.cpp,$(KOKKOS_SRC))
|
||||
endif
|
||||
|
||||
# Don't include Kokkos_Profiling_Interface.cpp if not using profiling to avoid a link warning.
|
||||
ifeq ($(KOKKOS_INTERNAL_DISABLE_PROFILING), 1)
|
||||
KOKKOS_SRC := $(filter-out $(KOKKOS_PATH)/core/src/impl/Kokkos_Profiling_Interface.cpp,$(KOKKOS_SRC))
|
||||
endif
|
||||
|
||||
# Don't include Kokkos_Serial.cpp or Kokkos_Serial_Task.cpp if not using Serial
|
||||
# device to avoid a link warning.
|
||||
ifneq ($(KOKKOS_INTERNAL_USE_SERIAL), 1)
|
||||
KOKKOS_SRC := $(filter-out $(KOKKOS_PATH)/core/src/impl/Kokkos_Serial.cpp,$(KOKKOS_SRC))
|
||||
KOKKOS_SRC := $(filter-out $(KOKKOS_PATH)/core/src/impl/Kokkos_Serial_Task.cpp,$(KOKKOS_SRC))
|
||||
endif
|
||||
|
||||
# With Cygwin functions such as fdopen and fileno are not defined
|
||||
# when strict ansi is enabled. strict ansi gets enabled with --std=c++11
|
||||
# though. So we hard undefine it here. Not sure if that has any bad side effects
|
||||
|
|
|
@ -53,11 +53,20 @@ Kokkos_Qthreads_Task.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/Qthreads/K
|
|||
endif
|
||||
|
||||
ifeq ($(KOKKOS_INTERNAL_USE_OPENMP), 1)
|
||||
Kokkos_OpenMPexec.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/OpenMP/Kokkos_OpenMPexec.cpp
|
||||
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/OpenMP/Kokkos_OpenMPexec.cpp
|
||||
Kokkos_OpenMP_Exec.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/OpenMP/Kokkos_OpenMP_Exec.cpp
|
||||
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/OpenMP/Kokkos_OpenMP_Exec.cpp
|
||||
Kokkos_OpenMP_Task.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/OpenMP/Kokkos_OpenMP_Task.cpp
|
||||
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/OpenMP/Kokkos_OpenMP_Task.cpp
|
||||
endif
|
||||
|
||||
ifeq ($(KOKKOS_INTERNAL_USE_OPENMPTARGET), 1)
|
||||
Kokkos_OpenMPTarget_Exec.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/OpenMPTarget/Kokkos_OpenMPTarget_Exec.cpp
|
||||
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/OpenMPTarget/Kokkos_OpenMPTarget_Exec.cpp
|
||||
Kokkos_OpenMPTargetSpace.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/OpenMPTarget/Kokkos_OpenMPTargetSpace.cpp
|
||||
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/OpenMPTarget/Kokkos_OpenMPTargetSpace.cpp
|
||||
#Kokkos_OpenMPTarget_Task.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/OpenMPTarget/Kokkos_OpenMPTarget_Task.cpp
|
||||
# $(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/OpenMPTarget/Kokkos_OpenMPTarget_Task.cpp
|
||||
endif
|
||||
|
||||
Kokkos_HBWSpace.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/impl/Kokkos_HBWSpace.cpp
|
||||
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/impl/Kokkos_HBWSpace.cpp
|
||||
|
|
|
@ -0,0 +1 @@
|
|||
void KOKKOS_ALGORITHMS_SRC_DUMMY_PREVENT_LINK_ERROR() {}
|
|
@ -674,7 +674,7 @@ namespace Kokkos {
|
|||
const double V = 2.0*drand() - 1.0;
|
||||
S = U*U+V*V;
|
||||
}
|
||||
return U*sqrt(-2.0*log(S)/S);
|
||||
return U*std::sqrt(-2.0*log(S)/S);
|
||||
}
|
||||
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
|
@ -917,7 +917,7 @@ namespace Kokkos {
|
|||
const double V = 2.0*drand() - 1.0;
|
||||
S = U*U+V*V;
|
||||
}
|
||||
return U*sqrt(-2.0*log(S)/S);
|
||||
return U*std::sqrt(-2.0*log(S)/S);
|
||||
}
|
||||
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
|
@ -1171,7 +1171,7 @@ namespace Kokkos {
|
|||
const double V = 2.0*drand() - 1.0;
|
||||
S = U*U+V*V;
|
||||
}
|
||||
return U*sqrt(-2.0*log(S)/S);
|
||||
return U*std::sqrt(-2.0*log(S)/S);
|
||||
}
|
||||
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
|
|
|
@ -8,7 +8,7 @@ default: build_all
|
|||
echo "End Build"
|
||||
|
||||
ifneq (,$(findstring Cuda,$(KOKKOS_DEVICES)))
|
||||
CXX = $(KOKKOS_PATH)/config/nvcc_wrapper
|
||||
CXX = $(KOKKOS_PATH)/bin/nvcc_wrapper
|
||||
else
|
||||
CXX = g++
|
||||
endif
|
||||
|
@ -21,8 +21,8 @@ include $(KOKKOS_PATH)/Makefile.kokkos
|
|||
|
||||
KOKKOS_CXXFLAGS += -I$(GTEST_PATH) -I${KOKKOS_PATH}/algorithms/unit_tests
|
||||
|
||||
TEST_TARGETS =
|
||||
TARGETS =
|
||||
TEST_TARGETS =
|
||||
TARGETS =
|
||||
|
||||
ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
|
||||
OBJ_CUDA = TestCuda.o UnitTestMain.o gtest-all.o
|
||||
|
@ -49,16 +49,16 @@ ifeq ($(KOKKOS_INTERNAL_USE_SERIAL), 1)
|
|||
endif
|
||||
|
||||
KokkosAlgorithms_UnitTest_Cuda: $(OBJ_CUDA) $(KOKKOS_LINK_DEPENDS)
|
||||
$(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) $(OBJ_CUDA) $(KOKKOS_LIBS) $(LIB) -o KokkosAlgorithms_UnitTest_Cuda
|
||||
$(LINK) $(EXTRA_PATH) $(OBJ_CUDA) $(KOKKOS_LIBS) $(LIB) $(KOKKOS_LDFLAGS) $(LDFLAGS) -o KokkosAlgorithms_UnitTest_Cuda
|
||||
|
||||
KokkosAlgorithms_UnitTest_Threads: $(OBJ_THREADS) $(KOKKOS_LINK_DEPENDS)
|
||||
$(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) $(OBJ_THREADS) $(KOKKOS_LIBS) $(LIB) -o KokkosAlgorithms_UnitTest_Threads
|
||||
$(LINK) $(EXTRA_PATH) $(OBJ_THREADS) $(KOKKOS_LIBS) $(LIB) $(KOKKOS_LDFLAGS) $(LDFLAGS) -o KokkosAlgorithms_UnitTest_Threads
|
||||
|
||||
KokkosAlgorithms_UnitTest_OpenMP: $(OBJ_OPENMP) $(KOKKOS_LINK_DEPENDS)
|
||||
$(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) $(OBJ_OPENMP) $(KOKKOS_LIBS) $(LIB) -o KokkosAlgorithms_UnitTest_OpenMP
|
||||
$(LINK) $(EXTRA_PATH) $(OBJ_OPENMP) $(KOKKOS_LIBS) $(LIB) $(KOKKOS_LDFLAGS) $(LDFLAGS) -o KokkosAlgorithms_UnitTest_OpenMP
|
||||
|
||||
KokkosAlgorithms_UnitTest_Serial: $(OBJ_SERIAL) $(KOKKOS_LINK_DEPENDS)
|
||||
$(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) $(OBJ_SERIAL) $(KOKKOS_LIBS) $(LIB) -o KokkosAlgorithms_UnitTest_Serial
|
||||
$(LINK) $(EXTRA_PATH) $(OBJ_SERIAL) $(KOKKOS_LIBS) $(LIB) $(KOKKOS_LDFLAGS) $(LDFLAGS) -o KokkosAlgorithms_UnitTest_Serial
|
||||
|
||||
test-cuda: KokkosAlgorithms_UnitTest_Cuda
|
||||
./KokkosAlgorithms_UnitTest_Cuda
|
||||
|
@ -76,7 +76,7 @@ build_all: $(TARGETS)
|
|||
|
||||
test: $(TEST_TARGETS)
|
||||
|
||||
clean: kokkos-clean
|
||||
clean: kokkos-clean
|
||||
rm -f *.o $(TARGETS)
|
||||
|
||||
# Compilation rules
|
||||
|
@ -84,6 +84,5 @@ clean: kokkos-clean
|
|||
%.o:%.cpp $(KOKKOS_CPP_DEPENDS)
|
||||
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $<
|
||||
|
||||
gtest-all.o:$(GTEST_PATH)/gtest/gtest-all.cc
|
||||
gtest-all.o:$(GTEST_PATH)/gtest/gtest-all.cc
|
||||
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $(GTEST_PATH)/gtest/gtest-all.cc
|
||||
|
||||
|
|
|
@ -1,13 +1,13 @@
|
|||
/*
|
||||
//@HEADER
|
||||
// ************************************************************************
|
||||
//
|
||||
//
|
||||
// Kokkos v. 2.0
|
||||
// Copyright (2014) Sandia Corporation
|
||||
//
|
||||
//
|
||||
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
|
||||
// the U.S. Government retains certain rights in this software.
|
||||
//
|
||||
//
|
||||
// Redistribution and use in source and binary forms, with or without
|
||||
// modification, are permitted provided that the following conditions are
|
||||
// met:
|
||||
|
@ -36,12 +36,15 @@
|
|||
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
//
|
||||
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
|
||||
//
|
||||
//
|
||||
// ************************************************************************
|
||||
//@HEADER
|
||||
*/
|
||||
|
||||
#include <stdint.h>
|
||||
#include <Kokkos_Macros.hpp>
|
||||
#ifdef KOKKOS_ENABLE_CUDA
|
||||
|
||||
#include <cstdint>
|
||||
#include <iostream>
|
||||
#include <iomanip>
|
||||
|
||||
|
@ -49,8 +52,6 @@
|
|||
|
||||
#include <Kokkos_Core.hpp>
|
||||
|
||||
#ifdef KOKKOS_ENABLE_CUDA
|
||||
|
||||
#include <TestRandom.hpp>
|
||||
#include <TestSort.hpp>
|
||||
|
||||
|
@ -105,6 +106,7 @@ CUDA_SORT_UNSIGNED(171)
|
|||
#undef CUDA_RANDOM_XORSHIFT1024
|
||||
#undef CUDA_SORT_UNSIGNED
|
||||
}
|
||||
|
||||
#else
|
||||
void KOKKOS_ALGORITHMS_UNITTESTS_TESTCUDA_PREVENT_LINK_ERROR() {}
|
||||
#endif /* #ifdef KOKKOS_ENABLE_CUDA */
|
||||
|
||||
|
|
|
@ -1,13 +1,13 @@
|
|||
/*
|
||||
//@HEADER
|
||||
// ************************************************************************
|
||||
//
|
||||
//
|
||||
// Kokkos v. 2.0
|
||||
// Copyright (2014) Sandia Corporation
|
||||
//
|
||||
//
|
||||
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
|
||||
// the U.S. Government retains certain rights in this software.
|
||||
//
|
||||
//
|
||||
// Redistribution and use in source and binary forms, with or without
|
||||
// modification, are permitted provided that the following conditions are
|
||||
// met:
|
||||
|
@ -36,13 +36,16 @@
|
|||
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
//
|
||||
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
|
||||
//
|
||||
//
|
||||
// ************************************************************************
|
||||
//@HEADER
|
||||
*/
|
||||
|
||||
#include <gtest/gtest.h>
|
||||
|
||||
#include <Kokkos_Macros.hpp>
|
||||
#ifdef KOKKOS_ENABLE_OPENMP
|
||||
|
||||
#include <gtest/gtest.h>
|
||||
#include <Kokkos_Core.hpp>
|
||||
|
||||
//----------------------------------------------------------------------------
|
||||
|
@ -52,7 +55,6 @@
|
|||
|
||||
namespace Test {
|
||||
|
||||
#ifdef KOKKOS_ENABLE_OPENMP
|
||||
class openmp : public ::testing::Test {
|
||||
protected:
|
||||
static void SetUpTestCase()
|
||||
|
@ -97,6 +99,8 @@ OPENMP_SORT_UNSIGNED(171)
|
|||
#undef OPENMP_RANDOM_XORSHIFT64
|
||||
#undef OPENMP_RANDOM_XORSHIFT1024
|
||||
#undef OPENMP_SORT_UNSIGNED
|
||||
#endif
|
||||
} // namespace test
|
||||
#else
|
||||
void KOKKOS_ALGORITHMS_UNITTESTS_TESTOPENMP_PREVENT_LINK_ERROR() {}
|
||||
#endif
|
||||
|
||||
|
|
|
@ -295,7 +295,7 @@ struct test_random_scalar {
|
|||
parallel_reduce (num_draws/1024, functor_type (pool, density_1d, density_3d), result);
|
||||
|
||||
//printf("Result: %lf %lf %lf\n",result.mean/num_draws/3,result.variance/num_draws/3,result.covariance/num_draws/2);
|
||||
double tolerance = 1.6*sqrt(1.0/num_draws);
|
||||
double tolerance = 1.6*std::sqrt(1.0/num_draws);
|
||||
double mean_expect = 0.5*Kokkos::rand<rnd_type,Scalar>::max();
|
||||
double variance_expect = 1.0/3.0*mean_expect*mean_expect;
|
||||
double mean_eps = mean_expect/(result.mean/num_draws/3)-1.0;
|
||||
|
@ -321,7 +321,7 @@ struct test_random_scalar {
|
|||
typedef test_histogram1d_functor<typename RandomGenerator::device_type> functor_type;
|
||||
parallel_reduce (HIST_DIM1D, functor_type (density_1d, num_draws), result);
|
||||
|
||||
double tolerance = 6*sqrt(1.0/HIST_DIM1D);
|
||||
double tolerance = 6*std::sqrt(1.0/HIST_DIM1D);
|
||||
double mean_expect = 1.0*num_draws*3/HIST_DIM1D;
|
||||
double variance_expect = 1.0*num_draws*3/HIST_DIM1D*(1.0-1.0/HIST_DIM1D);
|
||||
double covariance_expect = -1.0*num_draws*3/HIST_DIM1D/HIST_DIM1D;
|
||||
|
@ -354,7 +354,7 @@ struct test_random_scalar {
|
|||
typedef test_histogram3d_functor<typename RandomGenerator::device_type> functor_type;
|
||||
parallel_reduce (HIST_DIM1D, functor_type (density_3d, num_draws), result);
|
||||
|
||||
double tolerance = 6*sqrt(1.0/HIST_DIM1D);
|
||||
double tolerance = 6*std::sqrt(1.0/HIST_DIM1D);
|
||||
double mean_expect = 1.0*num_draws/HIST_DIM1D;
|
||||
double variance_expect = 1.0*num_draws/HIST_DIM1D*(1.0-1.0/HIST_DIM1D);
|
||||
double covariance_expect = -1.0*num_draws/HIST_DIM1D/HIST_DIM1D;
|
||||
|
|
|
@ -1,13 +1,13 @@
|
|||
/*
|
||||
//@HEADER
|
||||
// ************************************************************************
|
||||
//
|
||||
//
|
||||
// Kokkos v. 2.0
|
||||
// Copyright (2014) Sandia Corporation
|
||||
//
|
||||
//
|
||||
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
|
||||
// the U.S. Government retains certain rights in this software.
|
||||
//
|
||||
//
|
||||
// Redistribution and use in source and binary forms, with or without
|
||||
// modification, are permitted provided that the following conditions are
|
||||
// met:
|
||||
|
@ -36,11 +36,14 @@
|
|||
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
//
|
||||
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
|
||||
//
|
||||
//
|
||||
// ************************************************************************
|
||||
//@HEADER
|
||||
*/
|
||||
|
||||
#include <Kokkos_Macros.hpp>
|
||||
#ifdef KOKKOS_ENABLE_SERIAL
|
||||
|
||||
#include <gtest/gtest.h>
|
||||
|
||||
#include <Kokkos_Core.hpp>
|
||||
|
@ -55,7 +58,6 @@
|
|||
|
||||
namespace Test {
|
||||
|
||||
#ifdef KOKKOS_ENABLE_SERIAL
|
||||
class serial : public ::testing::Test {
|
||||
protected:
|
||||
static void SetUpTestCase()
|
||||
|
@ -93,7 +95,9 @@ SERIAL_SORT_UNSIGNED(171)
|
|||
#undef SERIAL_RANDOM_XORSHIFT1024
|
||||
#undef SERIAL_SORT_UNSIGNED
|
||||
|
||||
#endif // KOKKOS_ENABLE_SERIAL
|
||||
} // namespace Test
|
||||
#else
|
||||
void KOKKOS_ALGORITHMS_UNITTESTS_TESTSERIAL_PREVENT_LINK_ERROR() {}
|
||||
#endif // KOKKOS_ENABLE_SERIAL
|
||||
|
||||
|
||||
|
|
|
@ -1,12 +1,12 @@
|
|||
//@HEADER
|
||||
// ************************************************************************
|
||||
//
|
||||
//
|
||||
// Kokkos v. 2.0
|
||||
// Copyright (2014) Sandia Corporation
|
||||
//
|
||||
//
|
||||
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
|
||||
// the U.S. Government retains certain rights in this software.
|
||||
//
|
||||
//
|
||||
// Redistribution and use in source and binary forms, with or without
|
||||
// modification, are permitted provided that the following conditions are
|
||||
// met:
|
||||
|
@ -35,12 +35,12 @@
|
|||
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
//
|
||||
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
|
||||
//
|
||||
//
|
||||
// ************************************************************************
|
||||
//@HEADER
|
||||
|
||||
#ifndef TESTSORT_HPP_
|
||||
#define TESTSORT_HPP_
|
||||
#ifndef KOKKOS_ALGORITHMS_UNITTESTS_TESTSORT_HPP
|
||||
#define KOKKOS_ALGORITHMS_UNITTESTS_TESTSORT_HPP
|
||||
|
||||
#include <gtest/gtest.h>
|
||||
#include<Kokkos_Core.hpp>
|
||||
|
@ -212,7 +212,12 @@ void test_dynamic_view_sort(unsigned int n )
|
|||
const size_t upper_bound = 2 * n ;
|
||||
|
||||
typename KeyDynamicViewType::memory_pool
|
||||
pool( memory_space() , 2 * n * sizeof(KeyType) );
|
||||
pool( memory_space()
|
||||
, n * sizeof(KeyType) * 1.2
|
||||
, 500 /* min block size in bytes */
|
||||
, 30000 /* max block size in bytes */
|
||||
, 1000000 /* min superblock size in bytes */
|
||||
);
|
||||
|
||||
KeyDynamicViewType keys("Keys",pool,upper_bound);
|
||||
|
||||
|
@ -272,4 +277,4 @@ void test_sort(unsigned int N)
|
|||
|
||||
}
|
||||
}
|
||||
#endif /* TESTSORT_HPP_ */
|
||||
#endif /* KOKKOS_ALGORITHMS_UNITTESTS_TESTSORT_HPP */
|
||||
|
|
|
@ -1,13 +1,13 @@
|
|||
/*
|
||||
//@HEADER
|
||||
// ************************************************************************
|
||||
//
|
||||
//
|
||||
// Kokkos v. 2.0
|
||||
// Copyright (2014) Sandia Corporation
|
||||
//
|
||||
//
|
||||
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
|
||||
// the U.S. Government retains certain rights in this software.
|
||||
//
|
||||
//
|
||||
// Redistribution and use in source and binary forms, with or without
|
||||
// modification, are permitted provided that the following conditions are
|
||||
// met:
|
||||
|
@ -36,11 +36,14 @@
|
|||
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
//
|
||||
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
|
||||
//
|
||||
//
|
||||
// ************************************************************************
|
||||
//@HEADER
|
||||
*/
|
||||
|
||||
#include <Kokkos_Macros.hpp>
|
||||
#ifdef KOKKOS_ENABLE_THREADS
|
||||
|
||||
#include <gtest/gtest.h>
|
||||
|
||||
#include <Kokkos_Core.hpp>
|
||||
|
@ -55,7 +58,6 @@
|
|||
|
||||
namespace Test {
|
||||
|
||||
#ifdef KOKKOS_ENABLE_PTHREAD
|
||||
class threads : public ::testing::Test {
|
||||
protected:
|
||||
static void SetUpTestCase()
|
||||
|
@ -107,7 +109,9 @@ THREADS_SORT_UNSIGNED(171)
|
|||
#undef THREADS_RANDOM_XORSHIFT1024
|
||||
#undef THREADS_SORT_UNSIGNED
|
||||
|
||||
#endif
|
||||
} // namespace Test
|
||||
#else
|
||||
void KOKKOS_ALGORITHMS_UNITTESTS_TESTTHREADS_PREVENT_LINK_ERROR() {}
|
||||
#endif
|
||||
|
||||
|
||||
|
|
|
@ -7,7 +7,7 @@ default: build
|
|||
echo "Start Build"
|
||||
|
||||
ifneq (,$(findstring Cuda,$(KOKKOS_DEVICES)))
|
||||
CXX = ${KOKKOS_PATH}/config/nvcc_wrapper
|
||||
CXX = ${KOKKOS_PATH}/bin/nvcc_wrapper
|
||||
EXE = bytes_and_flops.cuda
|
||||
KOKKOS_DEVICES = "Cuda,OpenMP"
|
||||
KOKKOS_ARCH = "SNB,Kepler35"
|
||||
|
@ -22,7 +22,7 @@ CXXFLAGS = -O3 -g
|
|||
|
||||
DEPFLAGS = -M
|
||||
LINK = ${CXX}
|
||||
LINKFLAGS =
|
||||
LINKFLAGS =
|
||||
|
||||
OBJ = $(SRC:.cpp=.o)
|
||||
LIB =
|
||||
|
@ -34,7 +34,7 @@ build: $(EXE)
|
|||
$(EXE): $(OBJ) $(KOKKOS_LINK_DEPENDS)
|
||||
$(LINK) $(KOKKOS_LDFLAGS) $(LINKFLAGS) $(EXTRA_PATH) $(OBJ) $(KOKKOS_LIBS) $(LIB) -o $(EXE)
|
||||
|
||||
clean: kokkos-clean
|
||||
clean: kokkos-clean
|
||||
rm -f *.o *.cuda *.host
|
||||
|
||||
# Compilation rules
|
||||
|
|
|
@ -7,7 +7,7 @@ default: build
|
|||
echo "Start Build"
|
||||
|
||||
ifneq (,$(findstring Cuda,$(KOKKOS_DEVICES)))
|
||||
CXX = ${KOKKOS_PATH}/config/nvcc_wrapper
|
||||
CXX = ${KOKKOS_PATH}/bin/nvcc_wrapper
|
||||
EXE = gather.cuda
|
||||
KOKKOS_DEVICES = "Cuda,OpenMP"
|
||||
KOKKOS_ARCH = "SNB,Kepler35"
|
||||
|
@ -22,7 +22,7 @@ CXXFLAGS = -O3 -g
|
|||
|
||||
DEPFLAGS = -M
|
||||
LINK = ${CXX}
|
||||
LINKFLAGS =
|
||||
LINKFLAGS =
|
||||
|
||||
OBJ = $(SRC:.cpp=.o)
|
||||
LIB =
|
||||
|
@ -35,10 +35,10 @@ build: $(EXE)
|
|||
$(EXE): $(OBJ) $(KOKKOS_LINK_DEPENDS)
|
||||
$(LINK) $(KOKKOS_LDFLAGS) $(LINKFLAGS) $(EXTRA_PATH) $(OBJ) $(KOKKOS_LIBS) $(LIB) -o $(EXE)
|
||||
|
||||
clean: kokkos-clean
|
||||
clean: kokkos-clean
|
||||
rm -f *.o *.cuda *.host
|
||||
|
||||
# Compilation rules
|
||||
|
||||
%.o:%.cpp $(KOKKOS_CPP_DEPENDS) gather_unroll.hpp gather.hpp
|
||||
%.o:%.cpp $(KOKKOS_CPP_DEPENDS) gather_unroll.hpp gather.hpp
|
||||
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $<
|
||||
|
|
|
@ -0,0 +1,18 @@
|
|||
# - Config file for the Kokkos package
|
||||
# It defines the following variables
|
||||
# Kokkos_INCLUDE_DIRS - include directories for Kokkos
|
||||
# Kokkos_LIBRARIES - libraries to link against
|
||||
|
||||
# Compute paths
|
||||
GET_FILENAME_COMPONENT(Kokkos_CMAKE_DIR "${CMAKE_CURRENT_LIST_FILE}" PATH)
|
||||
SET(Kokkos_INCLUDE_DIRS "@CONF_INCLUDE_DIRS@")
|
||||
|
||||
# Our library dependencies (contains definitions for IMPORTED targets)
|
||||
IF(NOT TARGET kokkos AND NOT Kokkos_BINARY_DIR)
|
||||
INCLUDE("${Kokkos_CMAKE_DIR}/KokkosTargets.cmake")
|
||||
ENDIF()
|
||||
|
||||
# These are IMPORTED targets created by KokkosTargets.cmake
|
||||
SET(Kokkos_LIBRARY_DIRS @INSTALL_LIB_DIR@)
|
||||
SET(Kokkos_LIBRARIES @Kokkos_LIBRARIES_NAMES@)
|
||||
SET(Kokkos_TPL_LIBRARIES @KOKKOS_LIBS@)
|
|
@ -0,0 +1,20 @@
|
|||
#.rst:
|
||||
# FindHWLOC
|
||||
# ----------
|
||||
#
|
||||
# Try to find HWLOC.
|
||||
#
|
||||
# The following variables are defined:
|
||||
#
|
||||
# HWLOC_FOUND - System has HWLOC
|
||||
# HWLOC_INCLUDE_DIR - HWLOC include directory
|
||||
# HWLOC_LIBRARIES - Libraries needed to use HWLOC
|
||||
|
||||
find_path(HWLOC_INCLUDE_DIR hwloc.h)
|
||||
find_library(HWLOC_LIBRARIES hwloc)
|
||||
|
||||
include(FindPackageHandleStandardArgs)
|
||||
find_package_handle_standard_args(HWLOC DEFAULT_MSG
|
||||
HWLOC_INCLUDE_DIR HWLOC_LIBRARIES)
|
||||
|
||||
mark_as_advanced(HWLOC_INCLUDE_DIR HWLOC_LIBRARIES)
|
|
@ -0,0 +1,20 @@
|
|||
#.rst:
|
||||
# FindMemkind
|
||||
# ----------
|
||||
#
|
||||
# Try to find Memkind.
|
||||
#
|
||||
# The following variables are defined:
|
||||
#
|
||||
# MEMKIND_FOUND - System has Memkind
|
||||
# MEMKIND_INCLUDE_DIR - Memkind include directory
|
||||
# MEMKIND_LIBRARIES - Libraries needed to use Memkind
|
||||
|
||||
find_path(MEMKIND_INCLUDE_DIR memkind.h)
|
||||
find_library(MEMKIND_LIBRARIES memkind)
|
||||
|
||||
include(FindPackageHandleStandardArgs)
|
||||
find_package_handle_standard_args(Memkind DEFAULT_MSG
|
||||
MEMKIND_INCLUDE_DIR MEMKIND_LIBRARIES)
|
||||
|
||||
mark_as_advanced(MEMKIND_INCLUDE_DIR MEMKIND_LIBRARIES)
|
|
@ -0,0 +1,20 @@
|
|||
#.rst:
|
||||
# FindQthreads
|
||||
# ----------
|
||||
#
|
||||
# Try to find Qthreads.
|
||||
#
|
||||
# The following variables are defined:
|
||||
#
|
||||
# QTHREADS_FOUND - System has Qthreads
|
||||
# QTHREADS_INCLUDE_DIR - Qthreads include directory
|
||||
# QTHREADS_LIBRARIES - Libraries needed to use Qthreads
|
||||
|
||||
find_path(QTHREADS_INCLUDE_DIR qthread.h)
|
||||
find_library(QTHREADS_LIBRARIES qthread)
|
||||
|
||||
include(FindPackageHandleStandardArgs)
|
||||
find_package_handle_standard_args(Qthreads DEFAULT_MSG
|
||||
QTHREADS_INCLUDE_DIR QTHREADS_LIBRARIES)
|
||||
|
||||
mark_as_advanced(QTHREADS_INCLUDE_DIR QTHREADS_LIBRARIES)
|
File diff suppressed because it is too large
Load Diff
|
@ -60,34 +60,12 @@ Step 2:
|
|||
// -------------------------------------------------------------------------------- //
|
||||
|
||||
Step 3:
|
||||
3.1. Build and test Trilinos with 3 different configurations; a configure-all script is provided in Trilinos and should be modified to test each of the following 3 configurations with appropriate environment variable(s):
|
||||
3.1. Build and test Trilinos with 4 different configurations; Run scripts for white and shepard are provided in kokkos/config/trilinos-integration
|
||||
|
||||
- GCC/4.7.2-OpenMP/Complex
|
||||
Run tests with the following environment variable:
|
||||
Usually its a good idea to run those script via nohup.
|
||||
You can run all four at the same time, use separate directories for each.
|
||||
|
||||
export OMP_NUM_THREADS=2
|
||||
|
||||
|
||||
- Intel/15.0.2-Serial/NoComplex
|
||||
|
||||
|
||||
- GCC/4.8.4/CUDA/7.5.18-Cuda/Serial/NoComplex
|
||||
Run tests with the following environment variables:
|
||||
|
||||
export CUDA_LAUNCH_BLOCKING=1
|
||||
export CUDA_MANAGED_FORCE_DEVICE_ALLOC=1
|
||||
|
||||
|
||||
mkdir Build
|
||||
cd Build
|
||||
cp TRILINOS_PATH/sampleScripts/Sandia-SEMS/configure-all ./
|
||||
** Set the path to Trilinos appropriately within the configure-all script **
|
||||
source $SEMS_MODULE_ROOT/utils/sems-modules-init.sh kokkos
|
||||
source configure-all
|
||||
make -k (-k means "keep going" to get past build errors; -j12 can also be specified to build with 12 threads, for example)
|
||||
ctest
|
||||
|
||||
3.2. Compare the failed test output to the test output on the dashboard ( testing.sandia.gov/cdash select Trilinos ); investigate and fix problems if new tests fail after the Kokkos snapshot
|
||||
3.2. Compare the failed test output between the pristine and the updated runs; investigate and fix problems if new tests fail after the Kokkos snapshot
|
||||
|
||||
// -------------------------------------------------------------------------------- //
|
||||
|
||||
|
@ -134,7 +112,7 @@ Step 4: Once all Trilinos tests pass promote Kokkos develop branch to master on
|
|||
master: sha1
|
||||
develop: sha1
|
||||
|
||||
git push --follow-tags origin master
|
||||
4.4. Do NOT push yet
|
||||
|
||||
// -------------------------------------------------------------------------------- //
|
||||
|
||||
|
@ -156,9 +134,15 @@ Step 5:
|
|||
|
||||
python KOKKOS_PATH/config/snapshot.py KOKKOS_PATH TRILINOS_PATH/packages
|
||||
|
||||
5.3. Push the updated develop branch of Trilinos to Github - congratulations!!!
|
||||
5.3. Run checkin-test to push to trilinos using the CI build modules (gcc/4.9.3)
|
||||
|
||||
(From Trilinos directory):
|
||||
git push
|
||||
The modules are listed in kokkos/config/trilinos-integration/checkin-test
|
||||
Run checkin-test, forward dependencies and optional dependencies must be enabled
|
||||
If push failed because someone else clearly broke something, push manually.
|
||||
If push failed for unclear reasons, investigate, fix, and potentially start over from step 2 after reseting your local kokkos/master branch
|
||||
|
||||
Step 6: Push Kokkos to master
|
||||
|
||||
git push --follow-tags origin master
|
||||
|
||||
// -------------------------------------------------------------------------------- //
|
||||
|
|
|
@ -13,7 +13,7 @@
|
|||
# module load cmake/2.8.11.2 gcc/4.8.3 cuda/6.5.14 nvcc-wrapper/gnu
|
||||
#
|
||||
# The 'nvcc-wrapper' module should load a script that matches
|
||||
# kokkos/config/nvcc_wrapper
|
||||
# kokkos/bin/nvcc_wrapper
|
||||
#
|
||||
#-----------------------------------------------------------------------------
|
||||
# Source and installation directories:
|
||||
|
|
|
@ -13,7 +13,7 @@
|
|||
# module load cmake/2.8.11.2 gcc/4.8.3 cuda/6.5.14 nvcc-wrapper/gnu
|
||||
#
|
||||
# The 'nvcc-wrapper' module should load a script that matches
|
||||
# kokkos/config/nvcc_wrapper
|
||||
# kokkos/bin/nvcc_wrapper
|
||||
#
|
||||
#-----------------------------------------------------------------------------
|
||||
# Source and installation directories:
|
||||
|
|
|
@ -13,7 +13,7 @@
|
|||
# module load cmake/2.8.11.2 gcc/4.8.3 cuda/6.5.14 nvcc-wrapper/gnu
|
||||
#
|
||||
# The 'nvcc-wrapper' module should load a script that matches
|
||||
# kokkos/config/nvcc_wrapper
|
||||
# kokkos/bin/nvcc_wrapper
|
||||
#
|
||||
#-----------------------------------------------------------------------------
|
||||
# Source and installation directories:
|
||||
|
|
|
@ -5,4 +5,5 @@ tag: 2.02.00 date: 10:30:2016 master: 6c90a581 develop: ca3dd56e
|
|||
tag: 2.02.01 date: 11:01:2016 master: 9c698c86 develop: b0072304
|
||||
tag: 2.02.07 date: 12:16:2016 master: 4b4cc4ba develop: 382c0966
|
||||
tag: 2.02.15 date: 02:10:2017 master: 8c64cd93 develop: 28dea8b6
|
||||
tag: 2.03.00 date: 04:25:2017 master: 120d9ce7 develop: 015ba641
|
||||
tag: 2.03.00 date: 04:25:2017 master: 120d9ce7 develop: 015ba641
|
||||
tag: 2.03.05 date: 05:27:2017 master: 36b92f43 develop: 79073186
|
||||
|
|
|
@ -27,7 +27,7 @@ import subprocess, argparse, re, doctest, os, datetime, traceback
|
|||
def parse_cmdline(description):
|
||||
parser = argparse.ArgumentParser(usage="snapshot.py [options] source destination", description=description)
|
||||
|
||||
parser.add_argument("-n", "--no-comit", action="store_false", dest="create_commit", default=True,
|
||||
parser.add_argument("-n", "--no-commit", action="store_false", dest="create_commit", default=True,
|
||||
help="Do not perform a commit or create a commit message.")
|
||||
parser.add_argument("-v", "--verbose", action="store_true", dest="verbose_mode", default=False,
|
||||
help="Enable verbose mode.")
|
||||
|
@ -39,6 +39,8 @@ def parse_cmdline(description):
|
|||
help="Type of repository of the source, use none to skip all repository operations.")
|
||||
parser.add_argument("--dest-repo", choices=["git","none"], default="",
|
||||
help="Type of repository of the destination, use none to skip all repository operations.")
|
||||
parser.add_argument("--small", action="store_true", dest="small_mode",
|
||||
help="Don't include tests and other extra files when copying.")
|
||||
|
||||
parser.add_argument("source", help="Source project to snapshot from.")
|
||||
parser.add_argument("destination", help="Destination to snapshot too.")
|
||||
|
@ -58,9 +60,9 @@ def validate_options(options):
|
|||
|
||||
options.source = os.path.abspath(options.source)
|
||||
options.destination = os.path.abspath(options.destination)
|
||||
|
||||
|
||||
if os.path.exists(options.source):
|
||||
apparent_source_repo_type, source_root = deterimine_repo_type(options.source)
|
||||
apparent_source_repo_type, source_root = determine_repo_type(options.source)
|
||||
else:
|
||||
raise RuntimeError("Could not find source directory of %s." % options.source)
|
||||
options.source_root = source_root
|
||||
|
@ -69,7 +71,7 @@ def validate_options(options):
|
|||
print "Could not find destination directory of %s so it will be created." % options.destination
|
||||
os.makedirs(options.destination)
|
||||
|
||||
apparent_dest_repo_type, dest_root = deterimine_repo_type(options.destination)
|
||||
apparent_dest_repo_type, dest_root = determine_repo_type(options.destination)
|
||||
options.dest_root = dest_root
|
||||
|
||||
#error on svn repo types for now
|
||||
|
@ -111,7 +113,7 @@ def run_cmd(cmd, options, working_dir="."):
|
|||
print "==== %s stderr ====" % cmd_str
|
||||
print proc_stderr
|
||||
print "==== %s stderr ====" % cmd_str
|
||||
|
||||
|
||||
if ret_val != 0:
|
||||
raise RuntimeError("Command '%s' failed with error code %d. Error message:%s%s%sstdout:%s" % \
|
||||
(cmd_str, ret_val, os.linesep, proc_stderr, os.linesep, proc_stdout))
|
||||
|
@ -119,7 +121,7 @@ def run_cmd(cmd, options, working_dir="."):
|
|||
return proc_stdout, proc_stderr
|
||||
#end run_cmd
|
||||
|
||||
def deterimine_repo_type(location):
|
||||
def determine_repo_type(location):
|
||||
apparent_repo_type = "none"
|
||||
|
||||
while location != "":
|
||||
|
@ -133,16 +135,32 @@ def deterimine_repo_type(location):
|
|||
location = location[:location.rfind(os.sep)]
|
||||
|
||||
return apparent_repo_type, location
|
||||
|
||||
#end deterimine_repo_type
|
||||
#end determine_repo_type
|
||||
|
||||
def rsync(source, dest, options):
|
||||
rsync_cmd = ["rsync", "-ar", "--delete"]
|
||||
if options.debug_mode:
|
||||
rsync_cmd.append("-v")
|
||||
|
||||
if options.small_mode or options.source_repo == "git":
|
||||
rsync_cmd.append("--delete-excluded")
|
||||
|
||||
if options.small_mode:
|
||||
rsync_cmd.append("--include=config/master_history.txt")
|
||||
rsync_cmd.append("--include=cmake/tpls")
|
||||
rsync_cmd.append("--exclude=benchmarks/")
|
||||
rsync_cmd.append("--exclude=config/*")
|
||||
rsync_cmd.append("--exclude=doc/")
|
||||
rsync_cmd.append("--exclude=example/")
|
||||
rsync_cmd.append("--exclude=tpls/")
|
||||
rsync_cmd.append("--exclude=HOW_TO_SNAPSHOT")
|
||||
rsync_cmd.append("--exclude=unit_test")
|
||||
rsync_cmd.append("--exclude=unit_tests")
|
||||
rsync_cmd.append("--exclude=perf_test")
|
||||
rsync_cmd.append("--exclude=performance_tests")
|
||||
|
||||
if options.source_repo == "git":
|
||||
rsync_cmd.append("--exclude=.git")
|
||||
rsync_cmd.append("--exclude=.git*")
|
||||
|
||||
rsync_cmd.append(options.source)
|
||||
rsync_cmd.append(options.destination)
|
||||
|
@ -171,28 +189,27 @@ def find_git_commit_information(options):
|
|||
('sems', 'software.sandia.gov:/git/sems')
|
||||
"""
|
||||
git_log_cmd = ["git", "log", "-1"]
|
||||
|
||||
|
||||
output, error = run_cmd(git_log_cmd, options, options.source)
|
||||
|
||||
|
||||
commit_match = re.match("commit ([0-9a-fA-F]+)", output)
|
||||
commit_id = commit_match.group(1)
|
||||
commit_log = output
|
||||
|
||||
|
||||
git_remote_cmd = ["git", "remote", "-v"]
|
||||
output, error = run_cmd(git_remote_cmd, options, options.source)
|
||||
|
||||
|
||||
remote_match = re.search("origin\s([^ ]*/([^ ]+))", output, re.MULTILINE)
|
||||
if not remote_match:
|
||||
raise RuntimeError("Could not find origin of repo at %s. Consider using none for source repo type." % (options.source))
|
||||
|
||||
source_location = remote_match.group(1)
|
||||
source_name = remote_match.group(2).strip()
|
||||
|
||||
|
||||
if source_name[-1] == "/":
|
||||
source_name = source_name[:-1]
|
||||
|
||||
return commit_id, commit_log, source_name, source_location
|
||||
|
||||
#end find_git_commit_information
|
||||
|
||||
def do_git_commit(message, options):
|
||||
|
@ -201,10 +218,10 @@ def do_git_commit(message, options):
|
|||
|
||||
git_add_cmd = ["git", "add", "-A"]
|
||||
run_cmd(git_add_cmd, options, options.destination)
|
||||
|
||||
|
||||
git_commit_cmd = ["git", "commit", "-m%s" % message]
|
||||
run_cmd(git_commit_cmd, options, options.destination)
|
||||
|
||||
|
||||
git_log_cmd = ["git", "log", "--format=%h", "-1"]
|
||||
commit_sha1, error = run_cmd(git_log_cmd, options, options.destination)
|
||||
|
||||
|
@ -214,7 +231,7 @@ def do_git_commit(message, options):
|
|||
def verify_git_repo_clean(location, options):
|
||||
git_status_cmd = ["git", "status", "--porcelain"]
|
||||
output, error = run_cmd(git_status_cmd, options, location)
|
||||
|
||||
|
||||
if output != "":
|
||||
if options.no_validate_repo == False:
|
||||
raise RuntimeError("%s is not clean.%sPlease commit or stash all changes before running snapshot."
|
||||
|
@ -223,7 +240,6 @@ def verify_git_repo_clean(location, options):
|
|||
print "WARNING: %s is not clean. Proceeding anyway." % location
|
||||
print "WARNING: This could lead to differences in the source and destination."
|
||||
print "WARNING: It could also lead to extra files being included in the snapshot commit."
|
||||
|
||||
#end verify_git_repo_clean
|
||||
|
||||
def main(options):
|
||||
|
@ -238,14 +254,14 @@ def main(options):
|
|||
commit_log = "Unknown commit from %s snapshotted at: %s" % (options.source, datetime.datetime.now())
|
||||
repo_name = options.source
|
||||
repo_location = options.source
|
||||
|
||||
|
||||
commit_message = create_commit_message(commit_id, commit_log, repo_name, repo_location) + os.linesep*2
|
||||
|
||||
|
||||
if options.dest_repo == "git":
|
||||
verify_git_repo_clean(options.destination, options)
|
||||
|
||||
rsync(options.source, options.destination, options)
|
||||
|
||||
|
||||
if options.dest_repo == "git":
|
||||
do_git_commit(commit_message, options)
|
||||
elif options.dest_repo == "none":
|
||||
|
@ -256,10 +272,6 @@ def main(options):
|
|||
cwd = os.getcwd()
|
||||
print "No commit done by request. Please use file at:"
|
||||
print "%s%sif you wish to commit this to a repo later." % (cwd+"/"+file_name, os.linesep)
|
||||
|
||||
|
||||
|
||||
|
||||
#end main
|
||||
|
||||
if (__name__ == "__main__"):
|
||||
|
@ -267,7 +279,7 @@ if (__name__ == "__main__"):
|
|||
doctest.testmod()
|
||||
sys.exit(0)
|
||||
|
||||
try:
|
||||
try:
|
||||
options = parse_cmdline(__doc__)
|
||||
main(options)
|
||||
except RuntimeError, e:
|
||||
|
@ -275,5 +287,5 @@ if (__name__ == "__main__"):
|
|||
if "--debug" in sys.argv:
|
||||
traceback.print_exc()
|
||||
sys.exit(1)
|
||||
else:
|
||||
else:
|
||||
sys.exit(0)
|
||||
|
|
|
@ -24,6 +24,8 @@ elif [[ "$HOSTNAME" =~ node.* ]]; then # Warning: very generic name
|
|||
fi
|
||||
elif [[ "$HOSTNAME" =~ apollo ]]; then
|
||||
MACHINE=apollo
|
||||
elif [[ "$HOSTNAME" =~ sullivan ]]; then
|
||||
MACHINE=sullivan
|
||||
elif [ ! -z "$SEMS_MODULEFILES_ROOT" ]; then
|
||||
MACHINE=sems
|
||||
else
|
||||
|
@ -152,7 +154,7 @@ if [ "$MACHINE" = "sems" ]; then
|
|||
"gcc/5.1.0 $BASE_MODULE_LIST "Serial" g++ $GCC_WARNING_FLAGS"
|
||||
"intel/16.0.1 $BASE_MODULE_LIST "OpenMP" icpc $INTEL_WARNING_FLAGS"
|
||||
"clang/3.9.0 $BASE_MODULE_LIST "Pthread_Serial" clang++ $CLANG_WARNING_FLAGS"
|
||||
"cuda/8.0.44 $CUDA8_MODULE_LIST "Cuda_OpenMP" $KOKKOS_PATH/config/nvcc_wrapper $CUDA_WARNING_FLAGS"
|
||||
"cuda/8.0.44 $CUDA8_MODULE_LIST "Cuda_OpenMP" $KOKKOS_PATH/bin/nvcc_wrapper $CUDA_WARNING_FLAGS"
|
||||
)
|
||||
else
|
||||
# Format: (compiler module-list build-list exe-name warning-flag)
|
||||
|
@ -164,6 +166,7 @@ if [ "$MACHINE" = "sems" ]; then
|
|||
"clang/3.6.1 $BASE_MODULE_LIST $CLANG_BUILD_LIST clang++ $CLANG_WARNING_FLAGS"
|
||||
"clang/3.7.1 $BASE_MODULE_LIST $CLANG_BUILD_LIST clang++ $CLANG_WARNING_FLAGS"
|
||||
"clang/3.8.1 $BASE_MODULE_LIST $CLANG_BUILD_LIST clang++ $CLANG_WARNING_FLAGS"
|
||||
"clang/3.9.0 $BASE_MODULE_LIST $CLANG_BUILD_LIST clang++ $CLANG_WARNING_FLAGS"
|
||||
"cuda/7.0.28 $CUDA_MODULE_LIST $CUDA_BUILD_LIST $KOKKOS_PATH/config/nvcc_wrapper $CUDA_WARNING_FLAGS"
|
||||
"cuda/7.5.18 $CUDA_MODULE_LIST $CUDA_BUILD_LIST $KOKKOS_PATH/config/nvcc_wrapper $CUDA_WARNING_FLAGS"
|
||||
"cuda/8.0.44 $CUDA8_MODULE_LIST $CUDA_BUILD_LIST $KOKKOS_PATH/config/nvcc_wrapper $CUDA_WARNING_FLAGS"
|
||||
|
@ -184,7 +187,7 @@ elif [ "$MACHINE" = "white" ]; then
|
|||
# Format: (compiler module-list build-list exe-name warning-flag)
|
||||
COMPILERS=("gcc/5.4.0 $BASE_MODULE_LIST $IBM_BUILD_LIST g++ $GCC_WARNING_FLAGS"
|
||||
"ibm/13.1.3 $IBM_MODULE_LIST $IBM_BUILD_LIST xlC $IBM_WARNING_FLAGS"
|
||||
"cuda/8.0.44 $CUDA_MODULE_LIST $CUDA_IBM_BUILD_LIST ${KOKKOS_PATH}/config/nvcc_wrapper $CUDA_WARNING_FLAGS"
|
||||
"cuda/8.0.44 $CUDA_MODULE_LIST $CUDA_IBM_BUILD_LIST ${KOKKOS_PATH}/bin/nvcc_wrapper $CUDA_WARNING_FLAGS"
|
||||
)
|
||||
|
||||
if [ -z "$ARCH_FLAG" ]; then
|
||||
|
@ -221,7 +224,7 @@ elif [ "$MACHINE" = "sullivan" ]; then
|
|||
BASE_MODULE_LIST="<COMPILER_NAME>/<COMPILER_VERSION>"
|
||||
|
||||
# Format: (compiler module-list build-list exe-name warning-flag)
|
||||
COMPILERS=("gcc/5.3.0 $BASE_MODULE_LIST $ARM_GCC_BUILD_LIST g++ $GCC_WARNING_FLAGS")
|
||||
COMPILERS=("gcc/6.1.0 $BASE_MODULE_LIST $ARM_GCC_BUILD_LIST g++ $GCC_WARNING_FLAGS")
|
||||
|
||||
if [ -z "$ARCH_FLAG" ]; then
|
||||
ARCH_FLAG="--arch=ARMv8-ThunderX"
|
||||
|
@ -278,11 +281,11 @@ elif [ "$MACHINE" = "apollo" ]; then
|
|||
"intel/16.0.1 $BASE_MODULE_LIST "OpenMP" icpc $INTEL_WARNING_FLAGS"
|
||||
"clang/3.9.0 $BASE_MODULE_LIST "Pthread_Serial" clang++ $CLANG_WARNING_FLAGS"
|
||||
"clang/head $CLANG_MODULE_LIST "Cuda_Pthread" clang++ $CUDA_WARNING_FLAGS"
|
||||
"cuda/8.0.44 $CUDA_MODULE_LIST "Cuda_OpenMP" $KOKKOS_PATH/config/nvcc_wrapper $CUDA_WARNING_FLAGS"
|
||||
"cuda/8.0.44 $CUDA_MODULE_LIST "Cuda_OpenMP" $KOKKOS_PATH/bin/nvcc_wrapper $CUDA_WARNING_FLAGS"
|
||||
)
|
||||
else
|
||||
# Format: (compiler module-list build-list exe-name warning-flag)
|
||||
COMPILERS=("cuda/8.0.44 $CUDA8_MODULE_LIST $BUILD_LIST_CUDA_NVCC $KOKKOS_PATH/config/nvcc_wrapper $CUDA_WARNING_FLAGS"
|
||||
COMPILERS=("cuda/8.0.44 $CUDA8_MODULE_LIST $BUILD_LIST_CUDA_NVCC $KOKKOS_PATH/bin/nvcc_wrapper $CUDA_WARNING_FLAGS"
|
||||
"clang/head $CLANG_MODULE_LIST $BUILD_LIST_CUDA_CLANG clang++ $CUDA_WARNING_FLAGS"
|
||||
"clang/3.9.0 $CLANG_MODULE_LIST $BUILD_LIST_CLANG clang++ $CLANG_WARNING_FLAGS"
|
||||
"gcc/4.7.2 $BASE_MODULE_LIST $GCC_BUILD_LIST g++ $GCC_WARNING_FLAGS"
|
||||
|
@ -295,8 +298,8 @@ elif [ "$MACHINE" = "apollo" ]; then
|
|||
"intel/16.0.1 $BASE_MODULE_LIST $INTEL_BUILD_LIST icpc $INTEL_WARNING_FLAGS"
|
||||
"clang/3.5.2 $BASE_MODULE_LIST $CLANG_BUILD_LIST clang++ $CLANG_WARNING_FLAGS"
|
||||
"clang/3.6.1 $BASE_MODULE_LIST $CLANG_BUILD_LIST clang++ $CLANG_WARNING_FLAGS"
|
||||
"cuda/7.0.28 $CUDA_MODULE_LIST $CUDA_BUILD_LIST $KOKKOS_PATH/config/nvcc_wrapper $CUDA_WARNING_FLAGS"
|
||||
"cuda/7.5.18 $CUDA_MODULE_LIST $CUDA_BUILD_LIST $KOKKOS_PATH/config/nvcc_wrapper $CUDA_WARNING_FLAGS"
|
||||
"cuda/7.0.28 $CUDA_MODULE_LIST $CUDA_BUILD_LIST $KOKKOS_PATH/bin/nvcc_wrapper $CUDA_WARNING_FLAGS"
|
||||
"cuda/7.5.18 $CUDA_MODULE_LIST $CUDA_BUILD_LIST $KOKKOS_PATH/bin/nvcc_wrapper $CUDA_WARNING_FLAGS"
|
||||
)
|
||||
fi
|
||||
|
||||
|
|
|
@ -48,7 +48,7 @@ esac
|
|||
#nvcc wrapper and make the wrapper the compiler.
|
||||
if [ $cuda_compiler != "" ]; then
|
||||
export NVCC_WRAPPER_DEFAULT_COMPILER=$compiler
|
||||
compiler=$kokkos_path/config/nvcc_wrapper
|
||||
compiler=$kokkos_path/bin/nvcc_wrapper
|
||||
fi
|
||||
|
||||
if [ $host_compiler_brand == "intel" -a $cuda_compiler != "" ]; then
|
||||
|
|
|
@ -0,0 +1,4 @@
|
|||
module purge
|
||||
module load sems-env sems-gcc/4.9.3 sems-openmpi/1.10.1 sems-hdf5/1.8.12/parallel sems-netcdf/4.3.2/parallel sems-python/2.7.9 sems-zlib/1.2.8/base sems-cmake/3.5.2 sems-parmetis/4.0.3/64bit_parallel sems-scotch/6.0.3/nopthread_64bit_parallel sems-boost/1.59.0/base
|
||||
|
||||
#Run Trilinos CheckinTest
|
|
@ -1,5 +1,18 @@
|
|||
#!/bin/bash -le
|
||||
|
||||
TRILINOS_UPDATE_BRANCH=$1
|
||||
TRILINOS_PRISTINE_BRANCH=$2
|
||||
|
||||
if [ -z $TRILINOS_UPDATE_BRANCH ]
|
||||
then
|
||||
TRILINOS_UPDATE_BRANCH=develop
|
||||
fi
|
||||
|
||||
if [ -z $TRILINOS_PRISTINE_BRANCH ]
|
||||
then
|
||||
TRILINOS_PRISTINE_BRANCH=develop
|
||||
fi
|
||||
|
||||
export TRILINOS_UPDATED_PATH=${PWD}/trilinos-update
|
||||
export TRILINOS_PRISTINE_PATH=${PWD}/trilinos-pristine
|
||||
|
||||
|
@ -16,8 +29,8 @@ if [ ! -d "${TRILINOS_PRISTINE_PATH}" ]; then
|
|||
fi
|
||||
|
||||
cd ${TRILINOS_UPDATED_PATH}
|
||||
git checkout develop
|
||||
git reset --hard origin/develop
|
||||
git checkout $TRILINOS_UPDATE_BRANCH
|
||||
git reset --hard origin/$TRILINOS_UPDATE_BRANCH
|
||||
git pull
|
||||
cd ..
|
||||
|
||||
|
@ -28,18 +41,14 @@ echo ""
|
|||
echo ""
|
||||
echo "Trilinos State:"
|
||||
git log --pretty=oneline --since=7.days
|
||||
SHA=`git log --pretty=oneline --since=7.days | head -n 2 | tail -n 1 | awk '{print $1}'`
|
||||
cd ..
|
||||
|
||||
cd ${TRILINOS_PRISTINE_PATH}
|
||||
git status
|
||||
git log --pretty=oneline --since=7.days
|
||||
echo "Checkout develop"
|
||||
git checkout develop
|
||||
echo "Checkout $TRILINOS_PRISTINE_BRANCH"
|
||||
git checkout $TRILINOS_PRISTINE_BRANCH
|
||||
echo "Pull"
|
||||
git pull
|
||||
echo "Checkout SHA"
|
||||
git checkout ${SHA}
|
||||
cd ..
|
||||
|
||||
cd ${TRILINOS_PRISTINE_PATH}
|
||||
|
|
|
@ -0,0 +1,60 @@
|
|||
#!/bin/bash -el
|
||||
ulimit -c 0
|
||||
module load devpack/openmpi/1.10.0/intel/16.1.056/cuda/none
|
||||
|
||||
KOKKOS_BRANCH=$1
|
||||
TRILINOS_UPDATE_BRANCH=$2
|
||||
TRILINOS_PRISTINE_BRANCH=$3
|
||||
|
||||
if [ -z $KOKKOS_BRANCH ]
|
||||
then
|
||||
KOKKOS_BRANCH=develop
|
||||
fi
|
||||
|
||||
if [ -z $TRILINOS_UPDATE_BRANCH ]
|
||||
then
|
||||
TRILINOS_UPDATE_BRANCH=develop
|
||||
fi
|
||||
|
||||
if [ -z $TRILINOS_PRISTINE_BRANCH ]
|
||||
then
|
||||
TRILINOS_PRISTINE_BRANCH=develop
|
||||
fi
|
||||
|
||||
export OMP_NUM_THREADS=8
|
||||
export JENKINS_DO_CUDA=OFF
|
||||
export JENKINS_DO_OPENMP=OFF
|
||||
export JENKINS_DO_PTHREAD=ON
|
||||
export JENKINS_DO_SERIAL=OFF
|
||||
export JENKINS_DO_COMPLEX=OFF
|
||||
|
||||
export ARCH_CXX_FLAG="-xCORE-AVX2 -mkl"
|
||||
export ARCH_C_FLAG="-xCORE-AVX2 -mkl"
|
||||
export BLAS_LIBRARIES="-mkl;${MKLROOT}/lib/intel64/libmkl_intel_lp64.a;${MKLROOT}/lib/intel64/libmkl_intel_thread.a;${MKLROOT}/lib/intel64/libmkl_core.a"
|
||||
export LAPACK_LIBRARIES=${BLAS_LIBRARIES}
|
||||
|
||||
export JENKINS_DO_TESTS=ON
|
||||
export JENKINS_DO_EXAMPLES=ON
|
||||
export JENKINS_DO_SHARED=OFF
|
||||
|
||||
export QUEUE=haswell
|
||||
|
||||
|
||||
module load python
|
||||
|
||||
|
||||
export KOKKOS_PATH=${PWD}/kokkos
|
||||
|
||||
#Already done:
|
||||
if [ ! -d "${KOKKOS_PATH}" ]; then
|
||||
git clone https://github.com/kokkos/kokkos ${KOKKOS_PATH}
|
||||
fi
|
||||
|
||||
cd ${KOKKOS_PATH}
|
||||
git checkout $KOKKOS_BRANCH
|
||||
git pull
|
||||
cd ..
|
||||
|
||||
source ${KOKKOS_PATH}/config/trilinos-integration/prepare_trilinos_repos.sh $TRILINOS_UPDATE_BRANCH $TRILINOS_PRISTINE_BRANCH
|
||||
|
||||
${TRILINOS_UPDATED_PATH}/sampleScripts/Sandia-SEMS/run_repo_comparison_slurm ${TRILINOS_UPDATED_PATH} ${TRILINOS_PRISTINE_PATH} ${TRILINOS_UPDATED_PATH}/sampleScripts/Sandia-SEMS/configure-testbeds-jenkins-all TestCompare ${QUEUE}
|
|
@ -0,0 +1,60 @@
|
|||
#!/bin/bash -el
|
||||
ulimit -c 0
|
||||
module load devpack/openmpi/1.10.0/intel/16.1.056/cuda/none
|
||||
|
||||
KOKKOS_BRANCH=$1
|
||||
TRILINOS_UPDATE_BRANCH=$2
|
||||
TRILINOS_PRISTINE_BRANCH=$3
|
||||
|
||||
if [ -z $KOKKOS_BRANCH ]
|
||||
then
|
||||
KOKKOS_BRANCH=develop
|
||||
fi
|
||||
|
||||
if [ -z $TRILINOS_UPDATE_BRANCH ]
|
||||
then
|
||||
TRILINOS_UPDATE_BRANCH=develop
|
||||
fi
|
||||
|
||||
if [ -z $TRILINOS_PRISTINE_BRANCH ]
|
||||
then
|
||||
TRILINOS_PRISTINE_BRANCH=develop
|
||||
fi
|
||||
|
||||
export OMP_NUM_THREADS=8
|
||||
export JENKINS_DO_CUDA=OFF
|
||||
export JENKINS_DO_OPENMP=OFF
|
||||
export JENKINS_DO_PTHREAD=OFF
|
||||
export JENKINS_DO_SERIAL=ON
|
||||
export JENKINS_DO_COMPLEX=ON
|
||||
|
||||
export ARCH_CXX_FLAG="-xCORE-AVX2 -mkl"
|
||||
export ARCH_C_FLAG="-xCORE-AVX2 -mkl"
|
||||
export BLAS_LIBRARIES="-mkl;${MKLROOT}/lib/intel64/libmkl_intel_lp64.a;${MKLROOT}/lib/intel64/libmkl_intel_thread.a;${MKLROOT}/lib/intel64/libmkl_core.a"
|
||||
export LAPACK_LIBRARIES=${BLAS_LIBRARIES}
|
||||
|
||||
export JENKINS_DO_TESTS=ON
|
||||
export JENKINS_DO_EXAMPLES=ON
|
||||
export JENKINS_DO_SHARED=OFF
|
||||
|
||||
export QUEUE=haswell
|
||||
|
||||
|
||||
module load python
|
||||
|
||||
|
||||
export KOKKOS_PATH=${PWD}/kokkos
|
||||
|
||||
#Already done:
|
||||
if [ ! -d "${KOKKOS_PATH}" ]; then
|
||||
git clone https://github.com/kokkos/kokkos ${KOKKOS_PATH}
|
||||
fi
|
||||
|
||||
cd ${KOKKOS_PATH}
|
||||
git checkout $KOKKOS_BRANCH
|
||||
git pull
|
||||
cd ..
|
||||
|
||||
source ${KOKKOS_PATH}/config/trilinos-integration/prepare_trilinos_repos.sh $TRILINOS_UPDATE_BRANCH $TRILINOS_PRISTINE_BRANCH
|
||||
|
||||
${TRILINOS_UPDATED_PATH}/sampleScripts/Sandia-SEMS/run_repo_comparison_slurm ${TRILINOS_UPDATED_PATH} ${TRILINOS_PRISTINE_PATH} ${TRILINOS_UPDATED_PATH}/sampleScripts/Sandia-SEMS/configure-testbeds-jenkins-all TestCompare ${QUEUE}
|
|
@ -0,0 +1,63 @@
|
|||
#!/bin/bash -el
|
||||
ulimit -c 0
|
||||
|
||||
KOKKOS_BRANCH=$1
|
||||
TRILINOS_UPDATE_BRANCH=$2
|
||||
TRILINOS_PRISTINE_BRANCH=$3
|
||||
|
||||
if [ -z $KOKKOS_BRANCH ]
|
||||
then
|
||||
KOKKOS_BRANCH=develop
|
||||
fi
|
||||
|
||||
if [ -z $TRILINOS_UPDATE_BRANCH ]
|
||||
then
|
||||
TRILINOS_UPDATE_BRANCH=develop
|
||||
fi
|
||||
|
||||
if [ -z $TRILINOS_PRISTINE_BRANCH ]
|
||||
then
|
||||
TRILINOS_PRISTINE_BRANCH=develop
|
||||
fi
|
||||
|
||||
module load devpack/openmpi/1.10.4/gcc/5.4.0/cuda/8.0.44
|
||||
export OMP_NUM_THREADS=8
|
||||
export JENKINS_DO_CUDA=ON
|
||||
export JENKINS_DO_OPENMP=OFF
|
||||
export JENKINS_DO_PTHREAD=OFF
|
||||
export JENKINS_DO_SERIAL=ON
|
||||
export JENKINS_DO_COMPLEX=OFF
|
||||
|
||||
export JENKINS_ARCH_CXX_FLAG="-mcpu=power8 -arch=sm_37"
|
||||
export JENKINS_ARCH_C_FLAG="-mcpu=power8"
|
||||
export BLAS_LIBRARIES="${BLAS_ROOT}/lib/libblas.a;gfortran;gomp"
|
||||
export LAPACK_LIBRARIES="${LAPACK_ROOT}/lib/liblapack.a;gfortran;gomp"
|
||||
|
||||
export JENKINS_DO_TESTS=ON
|
||||
export JENKINS_DO_EXAMPLES=ON
|
||||
|
||||
export QUEUE=rhel7F
|
||||
|
||||
module load python
|
||||
|
||||
export KOKKOS_PATH=${PWD}/kokkos
|
||||
|
||||
#Already done:
|
||||
if [ ! -d "${KOKKOS_PATH}" ]; then
|
||||
git clone https://github.com/kokkos/kokkos ${KOKKOS_PATH}
|
||||
fi
|
||||
|
||||
export OMPI_CXX=${KOKKOS_PATH}/bin/nvcc_wrapper
|
||||
|
||||
cd ${KOKKOS_PATH}
|
||||
git checkout $KOKKOS_BRANCH
|
||||
git pull
|
||||
cd ..
|
||||
|
||||
export CUDA_LAUNCH_BLOCKING=1
|
||||
export CUDA_MANAGED_FORCE_DEVICE_ALLOC=1
|
||||
|
||||
source ${KOKKOS_PATH}/config/trilinos-integration/prepare_trilinos_repos.sh $TRILINOS_UPDATE_BRANCH $TRILINOS_PRISTINE_BRANCH
|
||||
|
||||
${TRILINOS_UPDATED_PATH}/sampleScripts/Sandia-SEMS/run_repo_comparison_lsf ${TRILINOS_UPDATED_PATH} ${TRILINOS_PRISTINE_PATH} ${TRILINOS_UPDATED_PATH}/sampleScripts/Sandia-SEMS/configure-testbeds-jenkins-all TestCompare ${QUEUE}
|
||||
|
|
@ -0,0 +1,58 @@
|
|||
#!/bin/bash -el
|
||||
ulimit -c 0
|
||||
|
||||
KOKKOS_BRANCH=$1
|
||||
TRILINOS_UPDATE_BRANCH=$2
|
||||
TRILINOS_PRISTINE_BRANCH=$3
|
||||
|
||||
if [ -z $KOKKOS_BRANCH ]
|
||||
then
|
||||
KOKKOS_BRANCH=develop
|
||||
fi
|
||||
|
||||
if [ -z $TRILINOS_UPDATE_BRANCH ]
|
||||
then
|
||||
TRILINOS_UPDATE_BRANCH=develop
|
||||
fi
|
||||
|
||||
if [ -z $TRILINOS_PRISTINE_BRANCH ]
|
||||
then
|
||||
TRILINOS_PRISTINE_BRANCH=develop
|
||||
fi
|
||||
|
||||
module load devpack/openmpi/1.10.4/gcc/5.4.0/cuda/8.0.44
|
||||
export OMP_NUM_THREADS=8
|
||||
export JENKINS_DO_CUDA=OFF
|
||||
export JENKINS_DO_OPENMP=ON
|
||||
export JENKINS_DO_PTHREAD=OFF
|
||||
export JENKINS_DO_SERIAL=OFF
|
||||
export JENKINS_DO_COMPLEX=OFF
|
||||
|
||||
export JENKINS_ARCH_CXX_FLAG="-mcpu=power8"
|
||||
export JENKINS_ARCH_C_FLAG="-mcpu=power8"
|
||||
export BLAS_LIBRARIES="${BLAS_ROOT}/lib/libblas.a;gfortran;gomp"
|
||||
export LAPACK_LIBRARIES="${LAPACK_ROOT}/lib/liblapack.a;gfortran;gomp"
|
||||
|
||||
export JENKINS_DO_TESTS=ON
|
||||
export JENKINS_DO_EXAMPLES=ON
|
||||
|
||||
export QUEUE=rhel7F
|
||||
|
||||
module load python
|
||||
|
||||
export KOKKOS_PATH=${PWD}/kokkos
|
||||
|
||||
#Already done:
|
||||
if [ ! -d "${KOKKOS_PATH}" ]; then
|
||||
git clone https://github.com/kokkos/kokkos ${KOKKOS_PATH}
|
||||
fi
|
||||
|
||||
cd ${KOKKOS_PATH}
|
||||
git checkout $KOKKOS_BRANCH
|
||||
git pull
|
||||
cd ..
|
||||
|
||||
source ${KOKKOS_PATH}/config/trilinos-integration/prepare_trilinos_repos.sh $TRILINOS_UPDATE_BRANCH $TRILINOS_PRISTINE_BRANCH
|
||||
|
||||
${TRILINOS_UPDATED_PATH}/sampleScripts/Sandia-SEMS/run_repo_comparison_lsf ${TRILINOS_UPDATED_PATH} ${TRILINOS_PRISTINE_PATH} ${TRILINOS_UPDATED_PATH}/sampleScripts/Sandia-SEMS/configure-testbeds-jenkins-all TestCompare ${QUEUE}
|
||||
|
|
@ -8,7 +8,7 @@ default: build_all
|
|||
echo "End Build"
|
||||
|
||||
ifneq (,$(findstring Cuda,$(KOKKOS_DEVICES)))
|
||||
CXX = $(KOKKOS_PATH)/config/nvcc_wrapper
|
||||
CXX = $(KOKKOS_PATH)/bin/nvcc_wrapper
|
||||
else
|
||||
CXX = g++
|
||||
endif
|
||||
|
@ -21,8 +21,8 @@ include $(KOKKOS_PATH)/Makefile.kokkos
|
|||
|
||||
KOKKOS_CXXFLAGS += -I$(GTEST_PATH) -I${KOKKOS_PATH}/containers/performance_tests
|
||||
|
||||
TEST_TARGETS =
|
||||
TARGETS =
|
||||
TEST_TARGETS =
|
||||
TARGETS =
|
||||
|
||||
ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
|
||||
OBJ_CUDA = TestCuda.o TestMain.o gtest-all.o
|
||||
|
@ -65,7 +65,7 @@ build_all: $(TARGETS)
|
|||
|
||||
test: $(TEST_TARGETS)
|
||||
|
||||
clean: kokkos-clean
|
||||
clean: kokkos-clean
|
||||
rm -f *.o $(TARGETS)
|
||||
|
||||
# Compilation rules
|
||||
|
@ -73,6 +73,5 @@ clean: kokkos-clean
|
|||
%.o:%.cpp $(KOKKOS_CPP_DEPENDS)
|
||||
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $<
|
||||
|
||||
gtest-all.o:$(GTEST_PATH)/gtest/gtest-all.cc
|
||||
gtest-all.o:$(GTEST_PATH)/gtest/gtest-all.cc
|
||||
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $(GTEST_PATH)/gtest/gtest-all.cc
|
||||
|
||||
|
|
|
@ -1,13 +1,13 @@
|
|||
/*
|
||||
//@HEADER
|
||||
// ************************************************************************
|
||||
//
|
||||
//
|
||||
// Kokkos v. 2.0
|
||||
// Copyright (2014) Sandia Corporation
|
||||
//
|
||||
//
|
||||
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
|
||||
// the U.S. Government retains certain rights in this software.
|
||||
//
|
||||
//
|
||||
// Redistribution and use in source and binary forms, with or without
|
||||
// modification, are permitted provided that the following conditions are
|
||||
// met:
|
||||
|
@ -36,12 +36,15 @@
|
|||
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
//
|
||||
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
|
||||
//
|
||||
//
|
||||
// ************************************************************************
|
||||
//@HEADER
|
||||
*/
|
||||
|
||||
#include <stdint.h>
|
||||
#include <Kokkos_Macros.hpp>
|
||||
#if defined( KOKKOS_ENABLE_CUDA )
|
||||
|
||||
#include <cstdint>
|
||||
#include <string>
|
||||
#include <iostream>
|
||||
#include <iomanip>
|
||||
|
@ -52,8 +55,6 @@
|
|||
|
||||
#include <Kokkos_Core.hpp>
|
||||
|
||||
#if defined( KOKKOS_ENABLE_CUDA )
|
||||
|
||||
#include <TestDynRankView.hpp>
|
||||
|
||||
#include <Kokkos_UnorderedMap.hpp>
|
||||
|
@ -79,7 +80,7 @@ protected:
|
|||
}
|
||||
};
|
||||
|
||||
TEST_F( cuda, dynrankview_perf )
|
||||
TEST_F( cuda, dynrankview_perf )
|
||||
{
|
||||
std::cout << "Cuda" << std::endl;
|
||||
std::cout << " DynRankView vs View: Initialization Only " << std::endl;
|
||||
|
@ -105,5 +106,6 @@ TEST_F( cuda, unordered_map_performance_far)
|
|||
}
|
||||
|
||||
}
|
||||
|
||||
#else
|
||||
void KOKKOS_CONTAINERS_PERFORMANCE_TESTS_TESTCUDA_PREVENT_EMPTY_LINK_ERROR() {}
|
||||
#endif /* #if defined( KOKKOS_ENABLE_CUDA ) */
|
||||
|
|
|
@ -1,13 +1,13 @@
|
|||
|
||||
//@HEADER
|
||||
// ************************************************************************
|
||||
//
|
||||
//
|
||||
// Kokkos v. 2.0
|
||||
// Copyright (2014) Sandia Corporation
|
||||
//
|
||||
//
|
||||
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
|
||||
// the U.S. Government retains certain rights in this software.
|
||||
//
|
||||
//
|
||||
// Redistribution and use in source and binary forms, with or without
|
||||
// modification, are permitted provided that the following conditions are
|
||||
// met:
|
||||
|
@ -36,7 +36,7 @@
|
|||
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
//
|
||||
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
|
||||
//
|
||||
//
|
||||
// ************************************************************************
|
||||
//@HEADER
|
||||
|
||||
|
@ -263,3 +263,4 @@ void test_dynrankview_op_perf( const int par_size )
|
|||
|
||||
} //end Performance
|
||||
#endif
|
||||
|
||||
|
|
|
@ -1,13 +1,13 @@
|
|||
/*
|
||||
//@HEADER
|
||||
// ************************************************************************
|
||||
//
|
||||
//
|
||||
// Kokkos v. 2.0
|
||||
// Copyright (2014) Sandia Corporation
|
||||
//
|
||||
//
|
||||
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
|
||||
// the U.S. Government retains certain rights in this software.
|
||||
//
|
||||
//
|
||||
// Redistribution and use in source and binary forms, with or without
|
||||
// modification, are permitted provided that the following conditions are
|
||||
// met:
|
||||
|
@ -36,11 +36,14 @@
|
|||
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
//
|
||||
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
|
||||
//
|
||||
//
|
||||
// ************************************************************************
|
||||
//@HEADER
|
||||
*/
|
||||
|
||||
#include <Kokkos_Macros.hpp>
|
||||
#if defined( KOKKOS_ENABLE_OPENMP )
|
||||
|
||||
#include <gtest/gtest.h>
|
||||
|
||||
#include <Kokkos_Core.hpp>
|
||||
|
@ -93,7 +96,7 @@ protected:
|
|||
}
|
||||
};
|
||||
|
||||
TEST_F( openmp, dynrankview_perf )
|
||||
TEST_F( openmp, dynrankview_perf )
|
||||
{
|
||||
std::cout << "OpenMP" << std::endl;
|
||||
std::cout << " DynRankView vs View: Initialization Only " << std::endl;
|
||||
|
@ -137,4 +140,7 @@ TEST_F( openmp, unordered_map_performance_far)
|
|||
}
|
||||
|
||||
} // namespace test
|
||||
#else
|
||||
void KOKKOS_CONTAINERS_PERFORMANCE_TESTS_TESTOPENMP_PREVENT_EMPTY_LINK_ERROR() {}
|
||||
#endif
|
||||
|
||||
|
|
|
@ -1,13 +1,13 @@
|
|||
/*
|
||||
//@HEADER
|
||||
// ************************************************************************
|
||||
//
|
||||
//
|
||||
// Kokkos v. 2.0
|
||||
// Copyright (2014) Sandia Corporation
|
||||
//
|
||||
//
|
||||
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
|
||||
// the U.S. Government retains certain rights in this software.
|
||||
//
|
||||
//
|
||||
// Redistribution and use in source and binary forms, with or without
|
||||
// modification, are permitted provided that the following conditions are
|
||||
// met:
|
||||
|
@ -36,11 +36,14 @@
|
|||
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
//
|
||||
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
|
||||
//
|
||||
//
|
||||
// ************************************************************************
|
||||
//@HEADER
|
||||
*/
|
||||
|
||||
#include <Kokkos_Macros.hpp>
|
||||
#if defined( KOKKOS_ENABLE_THREADS )
|
||||
|
||||
#include <gtest/gtest.h>
|
||||
|
||||
#include <Kokkos_Core.hpp>
|
||||
|
@ -87,7 +90,7 @@ protected:
|
|||
}
|
||||
};
|
||||
|
||||
TEST_F( threads, dynrankview_perf )
|
||||
TEST_F( threads, dynrankview_perf )
|
||||
{
|
||||
std::cout << "Threads" << std::endl;
|
||||
std::cout << " DynRankView vs View: Initialization Only " << std::endl;
|
||||
|
@ -132,4 +135,7 @@ TEST_F( threads, unordered_map_performance_far)
|
|||
|
||||
} // namespace Performance
|
||||
|
||||
#else
|
||||
void KOKKOS_CONTAINERS_PERFORMANCE_TESTS_TESTTHREADS_PREVENT_EMPTY_LINK_ERROR() {}
|
||||
#endif
|
||||
|
||||
|
|
|
@ -1,13 +1,13 @@
|
|||
/*
|
||||
//@HEADER
|
||||
// ************************************************************************
|
||||
//
|
||||
//
|
||||
// Kokkos v. 2.0
|
||||
// Copyright (2014) Sandia Corporation
|
||||
//
|
||||
//
|
||||
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
|
||||
// the U.S. Government retains certain rights in this software.
|
||||
//
|
||||
//
|
||||
// Redistribution and use in source and binary forms, with or without
|
||||
// modification, are permitted provided that the following conditions are
|
||||
// met:
|
||||
|
@ -36,7 +36,7 @@
|
|||
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
//
|
||||
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
|
||||
//
|
||||
//
|
||||
// ************************************************************************
|
||||
//@HEADER
|
||||
*/
|
||||
|
@ -435,3 +435,4 @@ void deep_copy( ConstBitset<DstDevice> & dst, ConstBitset<SrcDevice> const& src)
|
|||
} // namespace Kokkos
|
||||
|
||||
#endif //KOKKOS_BITSET_HPP
|
||||
|
||||
|
|
|
@ -442,6 +442,17 @@ public:
|
|||
modified_host () = (modified_device () > modified_host () ?
|
||||
modified_device () : modified_host ()) + 1;
|
||||
}
|
||||
|
||||
#ifdef KOKKOS_ENABLE_DEBUG_DUALVIEW_MODIFY_CHECK
|
||||
if (modified_host() && modified_device()) {
|
||||
std::string msg = "Kokkos::DualView::modify ERROR: ";
|
||||
msg += "Concurrent modification of host and device views ";
|
||||
msg += "in DualView \"";
|
||||
msg += d_view.label();
|
||||
msg += "\"\n";
|
||||
Kokkos::abort(msg.c_str());
|
||||
}
|
||||
#endif
|
||||
}
|
||||
|
||||
//@}
|
||||
|
@ -624,3 +635,4 @@ deep_copy (const ExecutionSpace& exec ,
|
|||
} // namespace Kokkos
|
||||
|
||||
#endif
|
||||
|
||||
|
|
|
@ -1,13 +1,13 @@
|
|||
/*
|
||||
//@HEADER
|
||||
// ************************************************************************
|
||||
//
|
||||
//
|
||||
// Kokkos v. 2.0
|
||||
// Copyright (2014) Sandia Corporation
|
||||
//
|
||||
//
|
||||
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
|
||||
// the U.S. Government retains certain rights in this software.
|
||||
//
|
||||
//
|
||||
// Redistribution and use in source and binary forms, with or without
|
||||
// modification, are permitted provided that the following conditions are
|
||||
// met:
|
||||
|
@ -36,7 +36,7 @@
|
|||
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
//
|
||||
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
|
||||
//
|
||||
//
|
||||
// ************************************************************************
|
||||
//@HEADER
|
||||
*/
|
||||
|
@ -140,21 +140,21 @@ struct DynRankDimTraits {
|
|||
static typename std::enable_if< (std::is_same<Layout , Kokkos::LayoutStride>::value) , Layout>::type createLayout( const Layout& layout )
|
||||
{
|
||||
return Layout( layout.dimension[0] != unspecified ? layout.dimension[0] : 1
|
||||
, layout.stride[0]
|
||||
, layout.stride[0]
|
||||
, layout.dimension[1] != unspecified ? layout.dimension[1] : 1
|
||||
, layout.stride[1]
|
||||
, layout.stride[1]
|
||||
, layout.dimension[2] != unspecified ? layout.dimension[2] : 1
|
||||
, layout.stride[2]
|
||||
, layout.stride[2]
|
||||
, layout.dimension[3] != unspecified ? layout.dimension[3] : 1
|
||||
, layout.stride[3]
|
||||
, layout.stride[3]
|
||||
, layout.dimension[4] != unspecified ? layout.dimension[4] : 1
|
||||
, layout.stride[4]
|
||||
, layout.stride[4]
|
||||
, layout.dimension[5] != unspecified ? layout.dimension[5] : 1
|
||||
, layout.stride[5]
|
||||
, layout.stride[5]
|
||||
, layout.dimension[6] != unspecified ? layout.dimension[6] : 1
|
||||
, layout.stride[6]
|
||||
, layout.stride[6]
|
||||
, layout.dimension[7] != unspecified ? layout.dimension[7] : 1
|
||||
, layout.stride[7]
|
||||
, layout.stride[7]
|
||||
);
|
||||
}
|
||||
|
||||
|
@ -188,7 +188,7 @@ struct DynRankDimTraits {
|
|||
KOKKOS_INLINE_FUNCTION
|
||||
static typename std::enable_if< (std::is_same<Layout , Kokkos::LayoutRight>::value || std::is_same<Layout , Kokkos::LayoutLeft>::value) && std::is_integral<iType>::value , Layout >::type reconstructLayout( const Layout& layout , iType dynrank )
|
||||
{
|
||||
return Layout( dynrank > 0 ? layout.dimension[0] : ~size_t(0)
|
||||
return Layout( dynrank > 0 ? layout.dimension[0] : ~size_t(0)
|
||||
, dynrank > 1 ? layout.dimension[1] : ~size_t(0)
|
||||
, dynrank > 2 ? layout.dimension[2] : ~size_t(0)
|
||||
, dynrank > 3 ? layout.dimension[3] : ~size_t(0)
|
||||
|
@ -205,27 +205,27 @@ struct DynRankDimTraits {
|
|||
static typename std::enable_if< (std::is_same<Layout , Kokkos::LayoutStride>::value) && std::is_integral<iType>::value , Layout >::type reconstructLayout( const Layout& layout , iType dynrank )
|
||||
{
|
||||
return Layout( dynrank > 0 ? layout.dimension[0] : ~size_t(0)
|
||||
, dynrank > 0 ? layout.stride[0] : (0)
|
||||
, dynrank > 0 ? layout.stride[0] : (0)
|
||||
, dynrank > 1 ? layout.dimension[1] : ~size_t(0)
|
||||
, dynrank > 1 ? layout.stride[1] : (0)
|
||||
, dynrank > 1 ? layout.stride[1] : (0)
|
||||
, dynrank > 2 ? layout.dimension[2] : ~size_t(0)
|
||||
, dynrank > 2 ? layout.stride[2] : (0)
|
||||
, dynrank > 2 ? layout.stride[2] : (0)
|
||||
, dynrank > 3 ? layout.dimension[3] : ~size_t(0)
|
||||
, dynrank > 3 ? layout.stride[3] : (0)
|
||||
, dynrank > 3 ? layout.stride[3] : (0)
|
||||
, dynrank > 4 ? layout.dimension[4] : ~size_t(0)
|
||||
, dynrank > 4 ? layout.stride[4] : (0)
|
||||
, dynrank > 4 ? layout.stride[4] : (0)
|
||||
, dynrank > 5 ? layout.dimension[5] : ~size_t(0)
|
||||
, dynrank > 5 ? layout.stride[5] : (0)
|
||||
, dynrank > 5 ? layout.stride[5] : (0)
|
||||
, dynrank > 6 ? layout.dimension[6] : ~size_t(0)
|
||||
, dynrank > 6 ? layout.stride[6] : (0)
|
||||
, dynrank > 6 ? layout.stride[6] : (0)
|
||||
, dynrank > 7 ? layout.dimension[7] : ~size_t(0)
|
||||
, dynrank > 7 ? layout.stride[7] : (0)
|
||||
, dynrank > 7 ? layout.stride[7] : (0)
|
||||
);
|
||||
}
|
||||
|
||||
|
||||
/** \brief Debug bounds-checking routines */
|
||||
// Enhanced debug checking - most infrastructure matches that of functions in
|
||||
// Enhanced debug checking - most infrastructure matches that of functions in
|
||||
// Kokkos_ViewMapping; additional checks for extra arguments beyond rank are 0
|
||||
template< unsigned , typename iType0 , class MapType >
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
|
@ -235,20 +235,20 @@ bool dyn_rank_view_verify_operator_bounds( const iType0 & , const MapType & )
|
|||
template< unsigned R , typename iType0 , class MapType , typename iType1 , class ... Args >
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
bool dyn_rank_view_verify_operator_bounds
|
||||
( const iType0 & rank
|
||||
( const iType0 & rank
|
||||
, const MapType & map
|
||||
, const iType1 & i
|
||||
, Args ... args
|
||||
)
|
||||
{
|
||||
if ( static_cast<iType0>(R) < rank ) {
|
||||
if ( static_cast<iType0>(R) < rank ) {
|
||||
return ( size_t(i) < map.extent(R) )
|
||||
&& dyn_rank_view_verify_operator_bounds<R+1>( rank , map , args ... );
|
||||
}
|
||||
else if ( i != 0 ) {
|
||||
printf("DynRankView Debug Bounds Checking Error: at rank %u\n Extra arguments beyond the rank must be zero \n",R);
|
||||
return ( false )
|
||||
&& dyn_rank_view_verify_operator_bounds<R+1>( rank , map , args ... );
|
||||
&& dyn_rank_view_verify_operator_bounds<R+1>( rank , map , args ... );
|
||||
}
|
||||
else {
|
||||
return ( true )
|
||||
|
@ -281,20 +281,24 @@ void dyn_rank_view_error_operator_bounds
|
|||
}
|
||||
|
||||
// op_rank = rank of the operator version that was called
|
||||
template< typename iType0 , typename iType1 , class MapType , class ... Args >
|
||||
template< typename MemorySpace
|
||||
, typename iType0 , typename iType1 , class MapType , class ... Args >
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
void dyn_rank_view_verify_operator_bounds
|
||||
( const iType0 & op_rank , const iType1 & rank , const char* label , const MapType & map , Args ... args )
|
||||
( const iType0 & op_rank , const iType1 & rank
|
||||
, const Kokkos::Impl::SharedAllocationTracker & tracker
|
||||
, const MapType & map , Args ... args )
|
||||
{
|
||||
if ( static_cast<iType0>(rank) > op_rank ) {
|
||||
Kokkos::abort( "DynRankView Bounds Checking Error: Need at least rank arguments to the operator()" );
|
||||
Kokkos::abort( "DynRankView Bounds Checking Error: Need at least rank arguments to the operator()" );
|
||||
}
|
||||
|
||||
if ( ! dyn_rank_view_verify_operator_bounds<0>( rank , map , args ... ) ) {
|
||||
#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
|
||||
enum { LEN = 1024 };
|
||||
char buffer[ LEN ];
|
||||
int n = snprintf(buffer,LEN,"DynRankView bounds error of view %s (", label);
|
||||
const std::string label = tracker.template get_label<MemorySpace>();
|
||||
int n = snprintf(buffer,LEN,"DynRankView bounds error of view %s (", label.c_str());
|
||||
dyn_rank_view_error_operator_bounds<0>( buffer + n , LEN - n , map , args ... );
|
||||
Kokkos::Impl::throw_runtime_exception(std::string(buffer));
|
||||
#else
|
||||
|
@ -347,7 +351,7 @@ private:
|
|||
std::is_same< typename DstTraits::array_layout
|
||||
, typename SrcTraits::array_layout >::value ||
|
||||
std::is_same< typename DstTraits::array_layout
|
||||
, Kokkos::LayoutStride >::value
|
||||
, Kokkos::LayoutStride >::value
|
||||
};
|
||||
|
||||
public:
|
||||
|
@ -381,9 +385,9 @@ public:
|
|||
} //end Impl
|
||||
|
||||
/* \class DynRankView
|
||||
* \brief Container that creates a Kokkos view with rank determined at runtime.
|
||||
* \brief Container that creates a Kokkos view with rank determined at runtime.
|
||||
* Essentially this is a rank 7 view that wraps the access operators
|
||||
* to yield the functionality of a view
|
||||
* to yield the functionality of a view
|
||||
*
|
||||
* Changes from View
|
||||
* 1. The rank of the DynRankView is returned by the method rank()
|
||||
|
@ -410,14 +414,14 @@ class DynRankView : public ViewTraits< DataType , Properties ... >
|
|||
{
|
||||
static_assert( !std::is_array<DataType>::value && !std::is_pointer<DataType>::value , "Cannot template DynRankView with array or pointer datatype - must be pod" );
|
||||
|
||||
private:
|
||||
private:
|
||||
template < class , class ... > friend class DynRankView ;
|
||||
template < class , class ... > friend class Impl::ViewMapping ;
|
||||
|
||||
public:
|
||||
public:
|
||||
typedef ViewTraits< DataType , Properties ... > drvtraits ;
|
||||
|
||||
typedef View< DataType******* , Properties...> view_type ;
|
||||
typedef View< DataType******* , Properties...> view_type ;
|
||||
|
||||
typedef ViewTraits< DataType******* , Properties ... > traits ;
|
||||
|
||||
|
@ -430,7 +434,7 @@ private:
|
|||
map_type m_map ;
|
||||
unsigned m_rank;
|
||||
|
||||
public:
|
||||
public:
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
view_type & DownCast() const { return ( view_type & ) (*this); }
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
|
@ -588,7 +592,7 @@ private:
|
|||
// rank of the calling operator - included as first argument in ARG
|
||||
#define KOKKOS_IMPL_VIEW_OPERATOR_VERIFY( ARG ) \
|
||||
DynRankView::template verify_space< Kokkos::Impl::ActiveExecutionMemorySpace >::check(); \
|
||||
Kokkos::Experimental::Impl::dyn_rank_view_verify_operator_bounds ARG ;
|
||||
Kokkos::Experimental::Impl::dyn_rank_view_verify_operator_bounds< typename traits::memory_space > ARG ;
|
||||
|
||||
#else
|
||||
|
||||
|
@ -607,14 +611,10 @@ public:
|
|||
// Rank 0
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
reference_type operator()() const
|
||||
{
|
||||
#ifndef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
|
||||
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY( (0 , this->rank() , NULL , m_map) )
|
||||
#else
|
||||
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY( (0 , this->rank() , m_track.template get_label<typename traits::memory_space>().c_str(),m_map) )
|
||||
#endif
|
||||
{
|
||||
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY( (0 , this->rank(), m_track, m_map) )
|
||||
return implementation_map().reference();
|
||||
//return m_map.reference(0,0,0,0,0,0,0);
|
||||
//return m_map.reference(0,0,0,0,0,0,0);
|
||||
}
|
||||
|
||||
// Rank 1
|
||||
|
@ -624,6 +624,8 @@ public:
|
|||
typename std::enable_if< std::is_same<typename drvtraits::value_type, typename drvtraits::scalar_array_type>::value && std::is_integral<iType>::value, reference_type>::type
|
||||
operator[](const iType & i0) const
|
||||
{
|
||||
//Phalanx is violating this, since they use the operator to access ALL elements in the allocation
|
||||
//KOKKOS_IMPL_VIEW_OPERATOR_VERIFY( (1 , this->rank(), m_track, m_map) )
|
||||
return data()[i0];
|
||||
}
|
||||
|
||||
|
@ -647,14 +649,10 @@ public:
|
|||
template< typename iType >
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
typename std::enable_if< (std::is_same<typename traits::specialize , void>::value && std::is_integral<iType>::value), reference_type>::type
|
||||
operator()(const iType & i0 ) const
|
||||
{
|
||||
#ifndef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
|
||||
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY( (1 , this->rank() , NULL , m_map , i0) )
|
||||
#else
|
||||
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY( (1 , this->rank() , m_track.template get_label<typename traits::memory_space>().c_str(),m_map,i0) )
|
||||
#endif
|
||||
return m_map.reference(i0);
|
||||
operator()(const iType & i0 ) const
|
||||
{
|
||||
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY( (1 , this->rank(), m_track, m_map, i0) )
|
||||
return m_map.reference(i0);
|
||||
}
|
||||
|
||||
template< typename iType >
|
||||
|
@ -662,11 +660,7 @@ public:
|
|||
typename std::enable_if< !(std::is_same<typename traits::specialize , void>::value && std::is_integral<iType>::value), reference_type>::type
|
||||
operator()(const iType & i0 ) const
|
||||
{
|
||||
#ifndef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
|
||||
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY( (1 , this->rank() , NULL , m_map , i0) )
|
||||
#else
|
||||
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY( (1 , this->rank() , m_track.template get_label<typename traits::memory_space>().c_str(),m_map,i0) )
|
||||
#endif
|
||||
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY( (1 , this->rank(), m_track, m_map, i0) )
|
||||
return m_map.reference(i0,0,0,0,0,0,0);
|
||||
}
|
||||
|
||||
|
@ -674,155 +668,111 @@ public:
|
|||
template< typename iType0 , typename iType1 >
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
typename std::enable_if< (std::is_same<typename traits::specialize , void>::value && std::is_integral<iType0>::value && std::is_integral<iType1>::value), reference_type>::type
|
||||
operator()(const iType0 & i0 , const iType1 & i1 ) const
|
||||
{
|
||||
#ifndef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
|
||||
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY( (2 , this->rank() , NULL , m_map , i0 , i1) )
|
||||
#else
|
||||
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY( (2 , this->rank() , m_track.template get_label<typename traits::memory_space>().c_str(),m_map,i0,i1) )
|
||||
#endif
|
||||
return m_map.reference(i0,i1);
|
||||
operator()(const iType0 & i0 , const iType1 & i1 ) const
|
||||
{
|
||||
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY( (2 , this->rank(), m_track, m_map, i0, i1) )
|
||||
return m_map.reference(i0,i1);
|
||||
}
|
||||
|
||||
template< typename iType0 , typename iType1 >
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
typename std::enable_if< !(std::is_same<typename drvtraits::specialize , void>::value && std::is_integral<iType0>::value), reference_type>::type
|
||||
operator()(const iType0 & i0 , const iType1 & i1 ) const
|
||||
{
|
||||
#ifndef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
|
||||
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY( (2 , this->rank() , NULL , m_map , i0 , i1) )
|
||||
#else
|
||||
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY( (2 , this->rank() , m_track.template get_label<typename traits::memory_space>().c_str(),m_map,i0,i1) )
|
||||
#endif
|
||||
return m_map.reference(i0,i1,0,0,0,0,0);
|
||||
operator()(const iType0 & i0 , const iType1 & i1 ) const
|
||||
{
|
||||
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY( (2 , this->rank(), m_track, m_map, i0, i1) )
|
||||
return m_map.reference(i0,i1,0,0,0,0,0);
|
||||
}
|
||||
|
||||
// Rank 3
|
||||
template< typename iType0 , typename iType1 , typename iType2 >
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
typename std::enable_if< (std::is_same<typename traits::specialize , void>::value && std::is_integral<iType0>::value && std::is_integral<iType1>::value && std::is_integral<iType2>::value), reference_type>::type
|
||||
operator()(const iType0 & i0 , const iType1 & i1 , const iType2 & i2 ) const
|
||||
{
|
||||
#ifndef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
|
||||
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY( (3 , this->rank() , NULL , m_map , i0 , i1 , i2) )
|
||||
#else
|
||||
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY( (3 , this->rank() , m_track.template get_label<typename traits::memory_space>().c_str(),m_map,i0,i1,i2) )
|
||||
#endif
|
||||
return m_map.reference(i0,i1,i2);
|
||||
operator()(const iType0 & i0 , const iType1 & i1 , const iType2 & i2 ) const
|
||||
{
|
||||
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY( (3 , this->rank(), m_track, m_map, i0, i1, i2) )
|
||||
return m_map.reference(i0,i1,i2);
|
||||
}
|
||||
|
||||
template< typename iType0 , typename iType1 , typename iType2 >
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
typename std::enable_if< !(std::is_same<typename drvtraits::specialize , void>::value && std::is_integral<iType0>::value), reference_type>::type
|
||||
operator()(const iType0 & i0 , const iType1 & i1 , const iType2 & i2 ) const
|
||||
{
|
||||
#ifndef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
|
||||
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY( (3 , this->rank() , NULL , m_map , i0 , i1 , i2) )
|
||||
#else
|
||||
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY( (3 , this->rank() , m_track.template get_label<typename traits::memory_space>().c_str(),m_map,i0,i1,i2) )
|
||||
#endif
|
||||
return m_map.reference(i0,i1,i2,0,0,0,0);
|
||||
operator()(const iType0 & i0 , const iType1 & i1 , const iType2 & i2 ) const
|
||||
{
|
||||
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY( (3 , this->rank(), m_track, m_map, i0, i1, i2) )
|
||||
return m_map.reference(i0,i1,i2,0,0,0,0);
|
||||
}
|
||||
|
||||
// Rank 4
|
||||
template< typename iType0 , typename iType1 , typename iType2 , typename iType3 >
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
typename std::enable_if< (std::is_same<typename traits::specialize , void>::value && std::is_integral<iType0>::value && std::is_integral<iType1>::value && std::is_integral<iType2>::value && std::is_integral<iType3>::value), reference_type>::type
|
||||
operator()(const iType0 & i0 , const iType1 & i1 , const iType2 & i2 , const iType3 & i3 ) const
|
||||
{
|
||||
#ifndef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
|
||||
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY( (4 , this->rank() , NULL , m_map , i0 , i1 , i2 , i3) )
|
||||
#else
|
||||
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY( (4 , this->rank() , m_track.template get_label<typename traits::memory_space>().c_str(),m_map,i0,i1,i2,i3) )
|
||||
#endif
|
||||
return m_map.reference(i0,i1,i2,i3);
|
||||
operator()(const iType0 & i0 , const iType1 & i1 , const iType2 & i2 , const iType3 & i3 ) const
|
||||
{
|
||||
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY( (4 , this->rank(), m_track, m_map, i0, i1, i2, i3) )
|
||||
return m_map.reference(i0,i1,i2,i3);
|
||||
}
|
||||
|
||||
template< typename iType0 , typename iType1 , typename iType2 , typename iType3 >
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
typename std::enable_if< !(std::is_same<typename drvtraits::specialize , void>::value && std::is_integral<iType0>::value), reference_type>::type
|
||||
operator()(const iType0 & i0 , const iType1 & i1 , const iType2 & i2 , const iType3 & i3 ) const
|
||||
{
|
||||
#ifndef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
|
||||
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY( (4 , this->rank() , NULL , m_map , i0 , i1 , i2 , i3) )
|
||||
#else
|
||||
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY( (4 , this->rank() , m_track.template get_label<typename traits::memory_space>().c_str(),m_map,i0,i1,i2,i3) )
|
||||
#endif
|
||||
return m_map.reference(i0,i1,i2,i3,0,0,0);
|
||||
operator()(const iType0 & i0 , const iType1 & i1 , const iType2 & i2 , const iType3 & i3 ) const
|
||||
{
|
||||
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY( (4 , this->rank(), m_track, m_map, i0, i1, i2, i3) )
|
||||
return m_map.reference(i0,i1,i2,i3,0,0,0);
|
||||
}
|
||||
|
||||
// Rank 5
|
||||
template< typename iType0 , typename iType1 , typename iType2 , typename iType3, typename iType4 >
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
typename std::enable_if< (std::is_same<typename traits::specialize , void>::value && std::is_integral<iType0>::value && std::is_integral<iType1>::value && std::is_integral<iType2>::value && std::is_integral<iType3>::value && std::is_integral<iType4>::value), reference_type>::type
|
||||
operator()(const iType0 & i0 , const iType1 & i1 , const iType2 & i2 , const iType3 & i3 , const iType4 & i4 ) const
|
||||
{
|
||||
#ifndef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
|
||||
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY( (5 , this->rank() , NULL , m_map , i0 , i1 , i2 , i3, i4) )
|
||||
#else
|
||||
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY( (5 , this->rank() , m_track.template get_label<typename traits::memory_space>().c_str(),m_map,i0,i1,i2,i3,i4) )
|
||||
#endif
|
||||
return m_map.reference(i0,i1,i2,i3,i4);
|
||||
operator()(const iType0 & i0 , const iType1 & i1 , const iType2 & i2 , const iType3 & i3 , const iType4 & i4 ) const
|
||||
{
|
||||
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY( (5 , this->rank(), m_track, m_map, i0, i1, i2, i3, i4) )
|
||||
return m_map.reference(i0,i1,i2,i3,i4);
|
||||
}
|
||||
|
||||
template< typename iType0 , typename iType1 , typename iType2 , typename iType3, typename iType4 >
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
typename std::enable_if< !(std::is_same<typename drvtraits::specialize , void>::value && std::is_integral<iType0>::value), reference_type>::type
|
||||
operator()(const iType0 & i0 , const iType1 & i1 , const iType2 & i2 , const iType3 & i3 , const iType4 & i4 ) const
|
||||
{
|
||||
#ifndef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
|
||||
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY( (5 , this->rank() , NULL , m_map , i0 , i1 , i2 , i3, i4) )
|
||||
#else
|
||||
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY( (5 , this->rank() , m_track.template get_label<typename traits::memory_space>().c_str(),m_map,i0,i1,i2,i3,i4) )
|
||||
#endif
|
||||
return m_map.reference(i0,i1,i2,i3,i4,0,0);
|
||||
operator()(const iType0 & i0 , const iType1 & i1 , const iType2 & i2 , const iType3 & i3 , const iType4 & i4 ) const
|
||||
{
|
||||
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY( (5 , this->rank(), m_track, m_map, i0, i1, i2, i3, i4) )
|
||||
return m_map.reference(i0,i1,i2,i3,i4,0,0);
|
||||
}
|
||||
|
||||
// Rank 6
|
||||
template< typename iType0 , typename iType1 , typename iType2 , typename iType3, typename iType4 , typename iType5 >
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
typename std::enable_if< (std::is_same<typename traits::specialize , void>::value && std::is_integral<iType0>::value && std::is_integral<iType1>::value && std::is_integral<iType2>::value && std::is_integral<iType3>::value && std::is_integral<iType4>::value && std::is_integral<iType5>::value), reference_type>::type
|
||||
operator()(const iType0 & i0 , const iType1 & i1 , const iType2 & i2 , const iType3 & i3 , const iType4 & i4 , const iType5 & i5 ) const
|
||||
{
|
||||
#ifndef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
|
||||
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY( (6 , this->rank() , NULL , m_map , i0 , i1 , i2 , i3, i4 , i5) )
|
||||
#else
|
||||
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY( (6 , this->rank() , m_track.template get_label<typename traits::memory_space>().c_str(),m_map,i0,i1,i2,i3,i4,i5) )
|
||||
#endif
|
||||
return m_map.reference(i0,i1,i2,i3,i4,i5);
|
||||
operator()(const iType0 & i0 , const iType1 & i1 , const iType2 & i2 , const iType3 & i3 , const iType4 & i4 , const iType5 & i5 ) const
|
||||
{
|
||||
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY( (6 , this->rank(), m_track, m_map, i0, i1, i2, i3, i4, i5) )
|
||||
return m_map.reference(i0,i1,i2,i3,i4,i5);
|
||||
}
|
||||
|
||||
template< typename iType0 , typename iType1 , typename iType2 , typename iType3, typename iType4 , typename iType5 >
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
typename std::enable_if< !(std::is_same<typename drvtraits::specialize , void>::value && std::is_integral<iType0>::value), reference_type>::type
|
||||
operator()(const iType0 & i0 , const iType1 & i1 , const iType2 & i2 , const iType3 & i3 , const iType4 & i4 , const iType5 & i5 ) const
|
||||
{
|
||||
#ifndef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
|
||||
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY( (6 , this->rank() , NULL , m_map , i0 , i1 , i2 , i3, i4 , i5) )
|
||||
#else
|
||||
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY( (6 , this->rank() , m_track.template get_label<typename traits::memory_space>().c_str(),m_map,i0,i1,i2,i3,i4,i5) )
|
||||
#endif
|
||||
return m_map.reference(i0,i1,i2,i3,i4,i5,0);
|
||||
operator()(const iType0 & i0 , const iType1 & i1 , const iType2 & i2 , const iType3 & i3 , const iType4 & i4 , const iType5 & i5 ) const
|
||||
{
|
||||
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY( (6 , this->rank(), m_track, m_map, i0, i1, i2, i3, i4, i5) )
|
||||
return m_map.reference(i0,i1,i2,i3,i4,i5,0);
|
||||
}
|
||||
|
||||
// Rank 7
|
||||
template< typename iType0 , typename iType1 , typename iType2 , typename iType3, typename iType4 , typename iType5 , typename iType6 >
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
typename std::enable_if< (std::is_integral<iType0>::value && std::is_integral<iType1>::value && std::is_integral<iType2>::value && std::is_integral<iType3>::value && std::is_integral<iType4>::value && std::is_integral<iType5>::value && std::is_integral<iType6>::value), reference_type>::type
|
||||
operator()(const iType0 & i0 , const iType1 & i1 , const iType2 & i2 , const iType3 & i3 , const iType4 & i4 , const iType5 & i5 , const iType6 & i6 ) const
|
||||
{
|
||||
#ifndef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
|
||||
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY( (7 , this->rank() , NULL , m_map , i0 , i1 , i2 , i3, i4 , i5 , i6) )
|
||||
#else
|
||||
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY( (7 , this->rank() , m_track.template get_label<typename traits::memory_space>().c_str(),m_map,i0,i1,i2,i3,i4,i5,i6) )
|
||||
#endif
|
||||
return m_map.reference(i0,i1,i2,i3,i4,i5,i6);
|
||||
operator()(const iType0 & i0 , const iType1 & i1 , const iType2 & i2 , const iType3 & i3 , const iType4 & i4 , const iType5 & i5 , const iType6 & i6 ) const
|
||||
{
|
||||
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY( (7 , this->rank(), m_track, m_map, i0, i1, i2, i3, i4, i5, i6) )
|
||||
return m_map.reference(i0,i1,i2,i3,i4,i5,i6);
|
||||
}
|
||||
|
||||
#undef KOKKOS_IMPL_VIEW_OPERATOR_VERIFY
|
||||
|
||||
//----------------------------------------
|
||||
// Standard constructor, destructor, and assignment operators...
|
||||
// Standard constructor, destructor, and assignment operators...
|
||||
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
~DynRankView() {}
|
||||
|
@ -840,7 +790,7 @@ public:
|
|||
DynRankView & operator = ( const DynRankView & rhs ) { m_track = rhs.m_track; m_map = rhs.m_map; m_rank = rhs.m_rank; return *this; }
|
||||
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
DynRankView & operator = ( DynRankView && rhs ) { m_track = rhs.m_track; m_map = rhs.m_map; m_rank = rhs.m_rank; return *this; }
|
||||
DynRankView & operator = ( DynRankView && rhs ) { m_track = rhs.m_track; m_map = rhs.m_map; m_rank = rhs.m_rank; return *this; }
|
||||
|
||||
//----------------------------------------
|
||||
// Compatible view copy constructor and assignment
|
||||
|
@ -1068,7 +1018,7 @@ public:
|
|||
DynRankView( const Label & arg_label
|
||||
, typename std::enable_if<
|
||||
Kokkos::Experimental::Impl::is_view_label<Label>::value ,
|
||||
const size_t >::type arg_N0 = ~size_t(0)
|
||||
const size_t >::type arg_N0 = ~size_t(0)
|
||||
, const size_t arg_N1 = ~size_t(0)
|
||||
, const size_t arg_N2 = ~size_t(0)
|
||||
, const size_t arg_N3 = ~size_t(0)
|
||||
|
@ -1104,7 +1054,7 @@ public:
|
|||
, const size_t arg_N6 = ~size_t(0)
|
||||
, const size_t arg_N7 = ~size_t(0)
|
||||
)
|
||||
: DynRankView(Impl::ViewCtorProp< std::string , Kokkos::Experimental::Impl::WithoutInitializing_t >( arg_prop.label , Kokkos::Experimental::WithoutInitializing ), arg_N0, arg_N1, arg_N2, arg_N3, arg_N4, arg_N5, arg_N6, arg_N7 )
|
||||
: DynRankView(Impl::ViewCtorProp< std::string , Kokkos::Experimental::Impl::WithoutInitializing_t >( arg_prop.label , Kokkos::Experimental::WithoutInitializing ), arg_N0, arg_N1, arg_N2, arg_N3, arg_N4, arg_N5, arg_N6, arg_N7 )
|
||||
{}
|
||||
|
||||
//----------------------------------------
|
||||
|
@ -1182,7 +1132,7 @@ public:
|
|||
, const typename traits::array_layout & arg_layout )
|
||||
: DynRankView( Impl::ViewCtorProp<pointer_type>(
|
||||
reinterpret_cast<pointer_type>(
|
||||
arg_space.get_shmem( map_type::memory_span(
|
||||
arg_space.get_shmem( map_type::memory_span(
|
||||
Impl::DynRankDimTraits<typename traits::specialize>::createLayout( arg_layout ) //is this correct?
|
||||
) ) ) )
|
||||
, arg_layout )
|
||||
|
@ -1206,7 +1156,7 @@ public:
|
|||
Impl::DynRankDimTraits<typename traits::specialize>::createLayout(
|
||||
typename traits::array_layout
|
||||
( arg_N0 , arg_N1 , arg_N2 , arg_N3
|
||||
, arg_N4 , arg_N5 , arg_N6 , arg_N7 ) ) ) ) )
|
||||
, arg_N4 , arg_N5 , arg_N6 , arg_N7 ) ) ) ) )
|
||||
)
|
||||
, typename traits::array_layout
|
||||
( arg_N0 , arg_N1 , arg_N2 , arg_N3
|
||||
|
@ -1241,7 +1191,7 @@ struct ViewMapping
|
|||
, Kokkos::LayoutRight >::value ||
|
||||
std::is_same< typename SrcTraits::array_layout
|
||||
, Kokkos::LayoutStride >::value
|
||||
)
|
||||
)
|
||||
), DynRankSubviewTag >::type
|
||||
, SrcTraits
|
||||
, Args ... >
|
||||
|
@ -1266,19 +1216,19 @@ private:
|
|||
|
||||
typedef typename SrcTraits::value_type value_type ;
|
||||
|
||||
typedef value_type******* data_type ;
|
||||
typedef value_type******* data_type ;
|
||||
|
||||
public:
|
||||
|
||||
typedef Kokkos::ViewTraits
|
||||
< data_type
|
||||
, array_layout
|
||||
, array_layout
|
||||
, typename SrcTraits::device_type
|
||||
, typename SrcTraits::memory_traits > traits_type ;
|
||||
|
||||
typedef Kokkos::View
|
||||
< data_type
|
||||
, array_layout
|
||||
, array_layout
|
||||
, typename SrcTraits::device_type
|
||||
, typename SrcTraits::memory_traits > type ;
|
||||
|
||||
|
@ -1289,17 +1239,17 @@ public:
|
|||
static_assert( Kokkos::Impl::is_memory_traits< MemoryTraits >::value , "" );
|
||||
|
||||
typedef Kokkos::ViewTraits
|
||||
< data_type
|
||||
< data_type
|
||||
, array_layout
|
||||
, typename SrcTraits::device_type
|
||||
, MemoryTraits > traits_type ;
|
||||
|
||||
typedef Kokkos::View
|
||||
< data_type
|
||||
< data_type
|
||||
, array_layout
|
||||
, typename SrcTraits::device_type
|
||||
, MemoryTraits > type ;
|
||||
};
|
||||
};
|
||||
|
||||
|
||||
typedef typename SrcTraits::dimension dimension ;
|
||||
|
@ -1318,7 +1268,7 @@ public:
|
|||
|
||||
template < typename T , class ... P >
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
static ret_type subview( const unsigned src_rank , Kokkos::Experimental::DynRankView< T , P...> const & src
|
||||
static ret_type subview( const unsigned src_rank , Kokkos::Experimental::DynRankView< T , P...> const & src
|
||||
, Args ... args )
|
||||
{
|
||||
|
||||
|
@ -1339,8 +1289,8 @@ public:
|
|||
|
||||
ret_type dst ;
|
||||
|
||||
const SubviewExtents< 7 , rank > extents =
|
||||
ExtentGenerator< Args ... >::generator( src.m_map.m_offset.m_dim , args... ) ;
|
||||
const SubviewExtents< 7 , rank > extents =
|
||||
ExtentGenerator< Args ... >::generator( src.m_map.m_offset.m_dim , args... ) ;
|
||||
|
||||
dst_offset_type tempdst( src.m_map.m_offset , extents ) ;
|
||||
|
||||
|
@ -1392,12 +1342,12 @@ using Subdynrankview = typename Kokkos::Experimental::Impl::ViewMapping< Kokkos:
|
|||
|
||||
template< class D , class ... P , class ...Args >
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
Subdynrankview< ViewTraits<D******* , P...> , Args... >
|
||||
Subdynrankview< ViewTraits<D******* , P...> , Args... >
|
||||
subdynrankview( const Kokkos::Experimental::DynRankView< D , P... > &src , Args...args)
|
||||
{
|
||||
if ( src.rank() > sizeof...(Args) ) //allow sizeof...(Args) >= src.rank(), ignore the remaining args
|
||||
{ Kokkos::abort("subdynrankview: num of args must be >= rank of the source DynRankView"); }
|
||||
|
||||
|
||||
typedef Kokkos::Experimental::Impl::ViewMapping< Kokkos::Experimental::Impl::DynRankSubviewTag , Kokkos::ViewTraits< D*******, P... > , Args... > metafcn ;
|
||||
|
||||
return metafcn::subview( src.rank() , src , args... );
|
||||
|
@ -1406,7 +1356,7 @@ subdynrankview( const Kokkos::Experimental::DynRankView< D , P... > &src , Args.
|
|||
//Wrapper to allow subview function name
|
||||
template< class D , class ... P , class ...Args >
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
Subdynrankview< ViewTraits<D******* , P...> , Args... >
|
||||
Subdynrankview< ViewTraits<D******* , P...> , Args... >
|
||||
subview( const Kokkos::Experimental::DynRankView< D , P... > &src , Args...args)
|
||||
{
|
||||
return subdynrankview( src , args... );
|
||||
|
@ -1508,7 +1458,7 @@ struct DynRankViewFill {
|
|||
};
|
||||
|
||||
template< class OutputView >
|
||||
struct DynRankViewFill< OutputView , typename std::enable_if< OutputView::Rank == 0 >::type > {
|
||||
struct DynRankViewFill< OutputView , typename std::enable_if< OutputView::Rank == 0 >::type > {
|
||||
DynRankViewFill( const OutputView & dst , const typename OutputView::const_value_type & src )
|
||||
{
|
||||
Kokkos::Impl::DeepCopy< typename OutputView::memory_space , Kokkos::HostSpace >
|
||||
|
@ -1648,9 +1598,9 @@ void deep_copy
|
|||
|
||||
// If same type, equal layout, equal dimensions, equal span, and contiguous memory then can byte-wise copy
|
||||
if ( rank(src) == 0 && rank(dst) == 0 )
|
||||
{
|
||||
{
|
||||
typedef typename dst_type::value_type value_type ;
|
||||
Kokkos::Impl::DeepCopy< dst_memory_space , src_memory_space >( dst.data() , src.data() , sizeof(value_type) );
|
||||
Kokkos::Impl::DeepCopy< dst_memory_space , src_memory_space >( dst.data() , src.data() , sizeof(value_type) );
|
||||
}
|
||||
else if ( std::is_same< typename DstType::traits::value_type ,
|
||||
typename SrcType::traits::non_const_value_type >::value &&
|
||||
|
@ -1826,7 +1776,7 @@ create_mirror( const DynRankView<T,P...> & src
|
|||
typedef DynRankView<T,P...> src_type ;
|
||||
typedef typename src_type::HostMirror dst_type ;
|
||||
|
||||
return dst_type( std::string( src.label() ).append("_mirror")
|
||||
return dst_type( std::string( src.label() ).append("_mirror")
|
||||
, Impl::reconstructLayout(src.layout(), src.rank()) );
|
||||
}
|
||||
|
||||
|
@ -1870,7 +1820,7 @@ create_mirror_view( const DynRankView<T,P...> & src
|
|||
)>::type * = 0
|
||||
)
|
||||
{
|
||||
return Kokkos::Experimental::create_mirror( src );
|
||||
return Kokkos::Experimental::create_mirror( src );
|
||||
}
|
||||
|
||||
// Create a mirror view in a new space (specialization for same space)
|
||||
|
@ -1966,3 +1916,4 @@ using Kokkos::Experimental::realloc ;
|
|||
|
||||
} //end Kokkos
|
||||
#endif
|
||||
|
||||
|
|
|
@ -86,7 +86,7 @@ private:
|
|||
|
||||
public:
|
||||
|
||||
typedef Kokkos::Experimental::MemoryPool< typename traits::device_type > memory_pool ;
|
||||
typedef Kokkos::MemoryPool< typename traits::device_type > memory_pool ;
|
||||
|
||||
private:
|
||||
|
||||
|
@ -275,6 +275,10 @@ public:
|
|||
ch[jc_try] = reinterpret_cast<value_type*>(
|
||||
m_pool.allocate( sizeof(value_type) << m_chunk_shift ));
|
||||
|
||||
if ( 0 == ch[jc_try] ) {
|
||||
Kokkos::abort("DynamicView::resize_parallel exhausted memory pool");
|
||||
}
|
||||
|
||||
Kokkos::memory_fence();
|
||||
}
|
||||
}
|
||||
|
@ -436,7 +440,7 @@ public:
|
|||
void operator()( unsigned i ) const
|
||||
{
|
||||
if ( m_destroy && i < m_chunk_max && 0 != m_chunks[i] ) {
|
||||
m_pool.deallocate( m_chunks[i] , m_pool.get_min_block_size() );
|
||||
m_pool.deallocate( m_chunks[i] , m_pool.min_block_size() );
|
||||
}
|
||||
m_chunks[i] = 0 ;
|
||||
}
|
||||
|
@ -495,7 +499,7 @@ public:
|
|||
// The memory pool chunk is guaranteed to be a power of two
|
||||
, m_chunk_shift(
|
||||
Kokkos::Impl::integral_power_of_two(
|
||||
m_pool.get_min_block_size()/sizeof(typename traits::value_type)) )
|
||||
m_pool.min_block_size()/sizeof(typename traits::value_type)) )
|
||||
, m_chunk_mask( ( 1 << m_chunk_shift ) - 1 )
|
||||
, m_chunk_max( ( arg_size_max + m_chunk_mask ) >> m_chunk_shift )
|
||||
{
|
||||
|
|
|
@ -1,13 +1,13 @@
|
|||
/*
|
||||
//@HEADER
|
||||
// ************************************************************************
|
||||
//
|
||||
//
|
||||
// Kokkos v. 2.0
|
||||
// Copyright (2014) Sandia Corporation
|
||||
//
|
||||
//
|
||||
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
|
||||
// the U.S. Government retains certain rights in this software.
|
||||
//
|
||||
//
|
||||
// Redistribution and use in source and binary forms, with or without
|
||||
// modification, are permitted provided that the following conditions are
|
||||
// met:
|
||||
|
@ -36,7 +36,7 @@
|
|||
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
//
|
||||
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
|
||||
//
|
||||
//
|
||||
// ************************************************************************
|
||||
//@HEADER
|
||||
*/
|
||||
|
@ -114,7 +114,7 @@ private:
|
|||
|
||||
|
||||
template <typename ReportType, typename DeviceType>
|
||||
inline int ErrorReporter<ReportType, DeviceType>::getNumReports()
|
||||
inline int ErrorReporter<ReportType, DeviceType>::getNumReports()
|
||||
{
|
||||
int num_reports = 0;
|
||||
Kokkos::deep_copy(num_reports,m_numReportsAttempted);
|
||||
|
@ -194,3 +194,4 @@ void ErrorReporter<ReportType, DeviceType>::resize(const size_t new_size)
|
|||
} // namespace kokkos
|
||||
|
||||
#endif
|
||||
|
||||
|
|
|
@ -1,12 +1,12 @@
|
|||
//@HEADER
|
||||
// ************************************************************************
|
||||
//
|
||||
//
|
||||
// Kokkos v. 2.0
|
||||
// Copyright (2014) Sandia Corporation
|
||||
//
|
||||
//
|
||||
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
|
||||
// the U.S. Government retains certain rights in this software.
|
||||
//
|
||||
//
|
||||
// Redistribution and use in source and binary forms, with or without
|
||||
// modification, are permitted provided that the following conditions are
|
||||
// met:
|
||||
|
@ -35,7 +35,7 @@
|
|||
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
//
|
||||
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
|
||||
//
|
||||
//
|
||||
// ************************************************************************
|
||||
//@HEADER
|
||||
|
||||
|
@ -170,4 +170,3 @@ struct less_equal
|
|||
|
||||
#endif //KOKKOS_FUNCTIONAL_HPP
|
||||
|
||||
|
||||
|
|
|
@ -1,13 +1,13 @@
|
|||
/*
|
||||
//@HEADER
|
||||
// ************************************************************************
|
||||
//
|
||||
//
|
||||
// Kokkos v. 2.0
|
||||
// Copyright (2014) Sandia Corporation
|
||||
//
|
||||
//
|
||||
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
|
||||
// the U.S. Government retains certain rights in this software.
|
||||
//
|
||||
//
|
||||
// Redistribution and use in source and binary forms, with or without
|
||||
// modification, are permitted provided that the following conditions are
|
||||
// met:
|
||||
|
@ -36,7 +36,7 @@
|
|||
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
//
|
||||
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
|
||||
//
|
||||
//
|
||||
// ************************************************************************
|
||||
//@HEADER
|
||||
*/
|
||||
|
@ -61,7 +61,7 @@
|
|||
|
||||
#include <iostream>
|
||||
|
||||
#include <stdint.h>
|
||||
#include <cstdint>
|
||||
#include <stdexcept>
|
||||
|
||||
|
||||
|
@ -847,3 +847,4 @@ inline void deep_copy( UnorderedMap<DKey, DT, DDevice, Hasher, EqualTo>
|
|||
} // namespace Kokkos
|
||||
|
||||
#endif //KOKKOS_UNORDERED_MAP_HPP
|
||||
|
||||
|
|
|
@ -1,13 +1,13 @@
|
|||
/*
|
||||
//@HEADER
|
||||
// ************************************************************************
|
||||
//
|
||||
//
|
||||
// Kokkos v. 2.0
|
||||
// Copyright (2014) Sandia Corporation
|
||||
//
|
||||
//
|
||||
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
|
||||
// the U.S. Government retains certain rights in this software.
|
||||
//
|
||||
//
|
||||
// Redistribution and use in source and binary forms, with or without
|
||||
// modification, are permitted provided that the following conditions are
|
||||
// met:
|
||||
|
@ -36,7 +36,7 @@
|
|||
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
//
|
||||
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
|
||||
//
|
||||
//
|
||||
// ************************************************************************
|
||||
//@HEADER
|
||||
*/
|
||||
|
@ -281,3 +281,4 @@ public:
|
|||
|
||||
}
|
||||
#endif
|
||||
|
||||
|
|
|
@ -46,7 +46,7 @@
|
|||
|
||||
#include <Kokkos_Macros.hpp>
|
||||
#include <impl/Kokkos_BitOps.hpp>
|
||||
#include <stdint.h>
|
||||
#include <cstdint>
|
||||
|
||||
#include <cstdio>
|
||||
#include <climits>
|
||||
|
|
|
@ -1,12 +1,12 @@
|
|||
//@HEADER
|
||||
// ************************************************************************
|
||||
//
|
||||
//
|
||||
// Kokkos v. 2.0
|
||||
// Copyright (2014) Sandia Corporation
|
||||
//
|
||||
//
|
||||
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
|
||||
// the U.S. Government retains certain rights in this software.
|
||||
//
|
||||
//
|
||||
// Redistribution and use in source and binary forms, with or without
|
||||
// modification, are permitted provided that the following conditions are
|
||||
// met:
|
||||
|
@ -35,7 +35,7 @@
|
|||
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
//
|
||||
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
|
||||
//
|
||||
//
|
||||
// ************************************************************************
|
||||
//@HEADER
|
||||
|
||||
|
@ -43,7 +43,7 @@
|
|||
#define KOKKOS_FUNCTIONAL_IMPL_HPP
|
||||
|
||||
#include <Kokkos_Macros.hpp>
|
||||
#include <stdint.h>
|
||||
#include <cstdint>
|
||||
|
||||
namespace Kokkos { namespace Impl {
|
||||
|
||||
|
@ -193,3 +193,4 @@ bool bitwise_equal(T const * const a_ptr, T const * const b_ptr)
|
|||
}} // namespace Kokkos::Impl
|
||||
|
||||
#endif //KOKKOS_FUNCTIONAL_IMPL_HPP
|
||||
|
||||
|
|
|
@ -1,13 +1,13 @@
|
|||
/*
|
||||
//@HEADER
|
||||
// ************************************************************************
|
||||
//
|
||||
//
|
||||
// Kokkos v. 2.0
|
||||
// Copyright (2014) Sandia Corporation
|
||||
//
|
||||
//
|
||||
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
|
||||
// the U.S. Government retains certain rights in this software.
|
||||
//
|
||||
//
|
||||
// Redistribution and use in source and binary forms, with or without
|
||||
// modification, are permitted provided that the following conditions are
|
||||
// met:
|
||||
|
@ -36,7 +36,7 @@
|
|||
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
//
|
||||
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
|
||||
//
|
||||
//
|
||||
// ************************************************************************
|
||||
//@HEADER
|
||||
*/
|
||||
|
@ -45,7 +45,7 @@
|
|||
#define KOKKOS_UNORDERED_MAP_IMPL_HPP
|
||||
|
||||
#include <Kokkos_Core_fwd.hpp>
|
||||
#include <stdint.h>
|
||||
#include <cstdint>
|
||||
|
||||
#include <cstdio>
|
||||
#include <climits>
|
||||
|
@ -295,3 +295,4 @@ struct UnorderedMapCanAssign<const Key,const Value,const Key,Value> : public tru
|
|||
}} //Kokkos::Impl
|
||||
|
||||
#endif // KOKKOS_UNORDERED_MAP_IMPL_HPP
|
||||
|
||||
|
|
|
@ -8,7 +8,7 @@ default: build_all
|
|||
echo "End Build"
|
||||
|
||||
ifneq (,$(findstring Cuda,$(KOKKOS_DEVICES)))
|
||||
CXX = $(KOKKOS_PATH)/config/nvcc_wrapper
|
||||
CXX = $(KOKKOS_PATH)/bin/nvcc_wrapper
|
||||
else
|
||||
CXX = g++
|
||||
endif
|
||||
|
@ -21,8 +21,8 @@ include $(KOKKOS_PATH)/Makefile.kokkos
|
|||
|
||||
KOKKOS_CXXFLAGS += -I$(GTEST_PATH) -I${KOKKOS_PATH}/containers/unit_tests
|
||||
|
||||
TEST_TARGETS =
|
||||
TARGETS =
|
||||
TEST_TARGETS =
|
||||
TARGETS =
|
||||
|
||||
ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
|
||||
OBJ_CUDA = TestCuda.o UnitTestMain.o gtest-all.o
|
||||
|
@ -49,16 +49,16 @@ ifeq ($(KOKKOS_INTERNAL_USE_SERIAL), 1)
|
|||
endif
|
||||
|
||||
KokkosContainers_UnitTest_Cuda: $(OBJ_CUDA) $(KOKKOS_LINK_DEPENDS)
|
||||
$(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) $(OBJ_CUDA) $(KOKKOS_LIBS) $(LIB) -o KokkosContainers_UnitTest_Cuda
|
||||
$(LINK) $(EXTRA_PATH) $(OBJ_CUDA) $(KOKKOS_LIBS) $(LIB) $(KOKKOS_LDFLAGS) $(LDFLAGS) -o KokkosContainers_UnitTest_Cuda
|
||||
|
||||
KokkosContainers_UnitTest_Threads: $(OBJ_THREADS) $(KOKKOS_LINK_DEPENDS)
|
||||
$(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) $(OBJ_THREADS) $(KOKKOS_LIBS) $(LIB) -o KokkosContainers_UnitTest_Threads
|
||||
$(LINK) $(EXTRA_PATH) $(OBJ_THREADS) $(KOKKOS_LIBS) $(LIB) $(KOKKOS_LDFLAGS) $(LDFLAGS) -o KokkosContainers_UnitTest_Threads
|
||||
|
||||
KokkosContainers_UnitTest_OpenMP: $(OBJ_OPENMP) $(KOKKOS_LINK_DEPENDS)
|
||||
$(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) $(OBJ_OPENMP) $(KOKKOS_LIBS) $(LIB) -o KokkosContainers_UnitTest_OpenMP
|
||||
$(LINK) $(EXTRA_PATH) $(OBJ_OPENMP) $(KOKKOS_LIBS) $(LIB) $(KOKKOS_LDFLAGS) $(LDFLAGS) -o KokkosContainers_UnitTest_OpenMP
|
||||
|
||||
KokkosContainers_UnitTest_Serial: $(OBJ_SERIAL) $(KOKKOS_LINK_DEPENDS)
|
||||
$(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) $(OBJ_SERIAL) $(KOKKOS_LIBS) $(LIB) -o KokkosContainers_UnitTest_Serial
|
||||
$(LINK) $(EXTRA_PATH) $(OBJ_SERIAL) $(KOKKOS_LIBS) $(LIB) $(KOKKOS_LDFLAGS) $(LDFLAGS) -o KokkosContainers_UnitTest_Serial
|
||||
|
||||
test-cuda: KokkosContainers_UnitTest_Cuda
|
||||
./KokkosContainers_UnitTest_Cuda
|
||||
|
@ -76,7 +76,7 @@ build_all: $(TARGETS)
|
|||
|
||||
test: $(TEST_TARGETS)
|
||||
|
||||
clean: kokkos-clean
|
||||
clean: kokkos-clean
|
||||
rm -f *.o $(TARGETS)
|
||||
|
||||
# Compilation rules
|
||||
|
@ -84,6 +84,5 @@ clean: kokkos-clean
|
|||
%.o:%.cpp $(KOKKOS_CPP_DEPENDS)
|
||||
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $<
|
||||
|
||||
gtest-all.o:$(GTEST_PATH)/gtest/gtest-all.cc
|
||||
gtest-all.o:$(GTEST_PATH)/gtest/gtest-all.cc
|
||||
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $(GTEST_PATH)/gtest/gtest-all.cc
|
||||
|
||||
|
|
|
@ -1,263 +0,0 @@
|
|||
//@HEADER
|
||||
// ************************************************************************
|
||||
//
|
||||
// Kokkos v. 2.0
|
||||
// Copyright (2014) Sandia Corporation
|
||||
//
|
||||
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
|
||||
// the U.S. Government retains certain rights in this software.
|
||||
//
|
||||
// Redistribution and use in source and binary forms, with or without
|
||||
// modification, are permitted provided that the following conditions are
|
||||
// met:
|
||||
//
|
||||
// 1. Redistributions of source code must retain the above copyright
|
||||
// notice, this list of conditions and the following disclaimer.
|
||||
//
|
||||
// 2. Redistributions in binary form must reproduce the above copyright
|
||||
// notice, this list of conditions and the following disclaimer in the
|
||||
// documentation and/or other materials provided with the distribution.
|
||||
//
|
||||
// 3. Neither the name of the Corporation nor the names of the
|
||||
// contributors may be used to endorse or promote products derived from
|
||||
// this software without specific prior written permission.
|
||||
//
|
||||
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
|
||||
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
||||
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
|
||||
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
|
||||
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
//
|
||||
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
|
||||
//
|
||||
// ************************************************************************
|
||||
//@HEADER
|
||||
|
||||
|
||||
#ifndef KOKKOS_TEST_COMPLEX_HPP
|
||||
#define KOKKOS_TEST_COMPLEX_HPP
|
||||
|
||||
#include <Kokkos_Complex.hpp>
|
||||
#include <gtest/gtest.h>
|
||||
#include <iostream>
|
||||
|
||||
namespace Test {
|
||||
|
||||
namespace Impl {
|
||||
template <typename RealType>
|
||||
void testComplexConstructors () {
|
||||
typedef Kokkos::complex<RealType> complex_type;
|
||||
|
||||
complex_type z1;
|
||||
complex_type z2 (0.0, 0.0);
|
||||
complex_type z3 (1.0, 0.0);
|
||||
complex_type z4 (0.0, 1.0);
|
||||
complex_type z5 (-1.0, -2.0);
|
||||
|
||||
ASSERT_TRUE( z1 == z2 );
|
||||
ASSERT_TRUE( z1 != z3 );
|
||||
ASSERT_TRUE( z1 != z4 );
|
||||
ASSERT_TRUE( z1 != z5 );
|
||||
|
||||
ASSERT_TRUE( z2 != z3 );
|
||||
ASSERT_TRUE( z2 != z4 );
|
||||
ASSERT_TRUE( z2 != z5 );
|
||||
|
||||
ASSERT_TRUE( z3 != z4 );
|
||||
ASSERT_TRUE( z3 != z5 );
|
||||
|
||||
complex_type z6 (-1.0, -2.0);
|
||||
ASSERT_TRUE( z5 == z6 );
|
||||
|
||||
// Make sure that complex has value semantics, in particular, that
|
||||
// equality tests use values and not pointers, so that
|
||||
// reassignment actually changes the value.
|
||||
z1 = complex_type (-3.0, -4.0);
|
||||
ASSERT_TRUE( z1.real () == -3.0 );
|
||||
ASSERT_TRUE( z1.imag () == -4.0 );
|
||||
ASSERT_TRUE( z1 != z2 );
|
||||
|
||||
complex_type z7 (1.0);
|
||||
ASSERT_TRUE( z3 == z7 );
|
||||
ASSERT_TRUE( z7 == 1.0 );
|
||||
ASSERT_TRUE( z7 != -1.0 );
|
||||
|
||||
z7 = complex_type (5.0);
|
||||
ASSERT_TRUE( z7.real () == 5.0 );
|
||||
ASSERT_TRUE( z7.imag () == 0.0 );
|
||||
}
|
||||
|
||||
template <typename RealType>
|
||||
void testPlus () {
|
||||
typedef Kokkos::complex<RealType> complex_type;
|
||||
|
||||
complex_type z1 (1.0, -1.0);
|
||||
complex_type z2 (-1.0, 1.0);
|
||||
complex_type z3 = z1 + z2;
|
||||
ASSERT_TRUE( z3 == complex_type (0.0, 0.0) );
|
||||
}
|
||||
|
||||
template <typename RealType>
|
||||
void testMinus () {
|
||||
typedef Kokkos::complex<RealType> complex_type;
|
||||
|
||||
// Test binary minus.
|
||||
complex_type z1 (1.0, -1.0);
|
||||
complex_type z2 (-1.0, 1.0);
|
||||
complex_type z3 = z1 - z2;
|
||||
ASSERT_TRUE( z3 == complex_type (2.0, -2.0) );
|
||||
|
||||
// Test unary minus.
|
||||
complex_type z4 (3.0, -4.0);
|
||||
ASSERT_TRUE( -z1 == complex_type (-3.0, 4.0) );
|
||||
}
|
||||
|
||||
template <typename RealType>
|
||||
void testTimes () {
|
||||
typedef Kokkos::complex<RealType> complex_type;
|
||||
|
||||
complex_type z1 (1.0, -1.0);
|
||||
complex_type z2 (-1.0, 1.0);
|
||||
complex_type z3 = z1 * z2;
|
||||
ASSERT_TRUE( z3 == complex_type (0.0, 2.0) );
|
||||
|
||||
// Make sure that std::complex * Kokkos::complex works too.
|
||||
std::complex<RealType> z4 (-1.0, 1.0);
|
||||
complex_type z5 = z4 * z1;
|
||||
ASSERT_TRUE( z5 == complex_type (0.0, 2.0) );
|
||||
}
|
||||
|
||||
template <typename RealType>
|
||||
void testDivide () {
|
||||
typedef Kokkos::complex<RealType> complex_type;
|
||||
|
||||
// Test division of a complex number by a real number.
|
||||
complex_type z1 (1.0, -1.0);
|
||||
complex_type z2 (1.0 / 2.0, -1.0 / 2.0);
|
||||
ASSERT_TRUE( z1 / 2.0 == z2 );
|
||||
|
||||
// (-1+2i)/(1-i) == ((-1+2i)(1+i)) / ((1-i)(1+i))
|
||||
// (-1+2i)(1+i) == -3 + i
|
||||
complex_type z3 (-1.0, 2.0);
|
||||
complex_type z4 (1.0, -1.0);
|
||||
complex_type z5 (-3.0, 1.0);
|
||||
ASSERT_TRUE(z3 * Kokkos::conj (z4) == z5 );
|
||||
|
||||
// Test division of a complex number by a complex number.
|
||||
// This assumes that RealType is a floating-point type.
|
||||
complex_type z6 (Kokkos::real (z5) / 2.0,
|
||||
Kokkos::imag (z5) / 2.0);
|
||||
|
||||
complex_type z7 = z3 / z4;
|
||||
ASSERT_TRUE( z7 == z6 );
|
||||
}
|
||||
|
||||
template <typename RealType>
|
||||
void testOutsideKernel () {
|
||||
testComplexConstructors<RealType> ();
|
||||
testPlus<RealType> ();
|
||||
testTimes<RealType> ();
|
||||
testDivide<RealType> ();
|
||||
}
|
||||
|
||||
|
||||
template<typename RealType, typename Device>
|
||||
void testCreateView () {
|
||||
typedef Kokkos::complex<RealType> complex_type;
|
||||
Kokkos::View<complex_type*, Device> x ("x", 10);
|
||||
ASSERT_TRUE( x.dimension_0 () == 10 );
|
||||
|
||||
// Test that View assignment works.
|
||||
Kokkos::View<complex_type*, Device> x_nonconst = x;
|
||||
Kokkos::View<const complex_type*, Device> x_const = x;
|
||||
}
|
||||
|
||||
template<typename RealType, typename Device>
|
||||
class Fill {
|
||||
public:
|
||||
typedef typename Device::execution_space execution_space;
|
||||
|
||||
typedef Kokkos::View<Kokkos::complex<RealType>*, Device> view_type;
|
||||
typedef typename view_type::size_type size_type;
|
||||
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
void operator () (const size_type i) const {
|
||||
x_(i) = val_;
|
||||
}
|
||||
|
||||
Fill (const view_type& x, const Kokkos::complex<RealType>& val) :
|
||||
x_ (x), val_ (val)
|
||||
{}
|
||||
|
||||
private:
|
||||
view_type x_;
|
||||
const Kokkos::complex<RealType> val_;
|
||||
};
|
||||
|
||||
template<typename RealType, typename Device>
|
||||
class Sum {
|
||||
public:
|
||||
typedef typename Device::execution_space execution_space;
|
||||
|
||||
typedef Kokkos::View<const Kokkos::complex<RealType>*, Device> view_type;
|
||||
typedef typename view_type::size_type size_type;
|
||||
typedef Kokkos::complex<RealType> value_type;
|
||||
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
void operator () (const size_type i, Kokkos::complex<RealType>& sum) const {
|
||||
sum += x_(i);
|
||||
}
|
||||
|
||||
Sum (const view_type& x) : x_ (x) {}
|
||||
|
||||
private:
|
||||
view_type x_;
|
||||
};
|
||||
|
||||
template<typename RealType, typename Device>
|
||||
void testInsideKernel () {
|
||||
typedef Kokkos::complex<RealType> complex_type;
|
||||
typedef Kokkos::View<complex_type*, Device> view_type;
|
||||
typedef typename view_type::size_type size_type;
|
||||
|
||||
const size_type N = 1000;
|
||||
view_type x ("x", N);
|
||||
ASSERT_TRUE( x.dimension_0 () == N );
|
||||
|
||||
// Kokkos::parallel_reduce (N, [=] (const size_type i, complex_type& result) {
|
||||
// result += x[i];
|
||||
// });
|
||||
|
||||
Kokkos::parallel_for (N, Fill<RealType, Device> (x, complex_type (1.0, -1.0)));
|
||||
|
||||
complex_type sum;
|
||||
Kokkos::parallel_reduce (N, Sum<RealType, Device> (x), sum);
|
||||
|
||||
ASSERT_TRUE( sum.real () == 1000.0 && sum.imag () == -1000.0 );
|
||||
}
|
||||
} // namespace Impl
|
||||
|
||||
|
||||
template <typename Device>
|
||||
void testComplex ()
|
||||
{
|
||||
Impl::testOutsideKernel<float> ();
|
||||
Impl::testOutsideKernel<double> ();
|
||||
|
||||
Impl::testCreateView<float, Device> ();
|
||||
Impl::testCreateView<double, Device> ();
|
||||
|
||||
Impl::testInsideKernel<float, Device> ();
|
||||
Impl::testInsideKernel<double, Device> ();
|
||||
}
|
||||
|
||||
|
||||
} // namespace Test
|
||||
|
||||
#endif // KOKKOS_TEST_COMPLEX_HPP
|
|
@ -1,13 +1,13 @@
|
|||
/*
|
||||
//@HEADER
|
||||
// ************************************************************************
|
||||
//
|
||||
//
|
||||
// Kokkos v. 2.0
|
||||
// Copyright (2014) Sandia Corporation
|
||||
//
|
||||
//
|
||||
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
|
||||
// the U.S. Government retains certain rights in this software.
|
||||
//
|
||||
//
|
||||
// Redistribution and use in source and binary forms, with or without
|
||||
// modification, are permitted provided that the following conditions are
|
||||
// met:
|
||||
|
@ -36,14 +36,17 @@
|
|||
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
//
|
||||
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
|
||||
//
|
||||
//
|
||||
// ************************************************************************
|
||||
//@HEADER
|
||||
*/
|
||||
|
||||
#include <Kokkos_Macros.hpp>
|
||||
#ifdef KOKKOS_ENABLE_CUDA
|
||||
|
||||
#include <iostream>
|
||||
#include <iomanip>
|
||||
#include <stdint.h>
|
||||
#include <cstdint>
|
||||
|
||||
#include <gtest/gtest.h>
|
||||
|
||||
|
@ -69,7 +72,6 @@
|
|||
//----------------------------------------------------------------------------
|
||||
|
||||
|
||||
#ifdef KOKKOS_ENABLE_CUDA
|
||||
|
||||
namespace Test {
|
||||
|
||||
|
@ -237,5 +239,7 @@ TEST_F(cuda, ErrorReporter)
|
|||
|
||||
}
|
||||
|
||||
#else
|
||||
void KOKKOS_CONTAINERS_UNIT_TESTS_TESTCUDA_PREVENT_EMPTY_LINK_ERROR() {}
|
||||
#endif /* #ifdef KOKKOS_ENABLE_CUDA */
|
||||
|
||||
|
|
|
@ -1,13 +1,13 @@
|
|||
/*
|
||||
//@HEADER
|
||||
// ************************************************************************
|
||||
//
|
||||
//
|
||||
// Kokkos v. 2.0
|
||||
// Copyright (2014) Sandia Corporation
|
||||
//
|
||||
//
|
||||
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
|
||||
// the U.S. Government retains certain rights in this software.
|
||||
//
|
||||
//
|
||||
// Redistribution and use in source and binary forms, with or without
|
||||
// modification, are permitted provided that the following conditions are
|
||||
// met:
|
||||
|
@ -36,7 +36,7 @@
|
|||
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
//
|
||||
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
|
||||
//
|
||||
//
|
||||
// ************************************************************************
|
||||
//@HEADER
|
||||
*/
|
||||
|
@ -119,3 +119,4 @@ void test_dualview_combinations(unsigned int size)
|
|||
} // namespace Test
|
||||
|
||||
#endif //KOKKOS_TEST_UNORDERED_MAP_HPP
|
||||
|
||||
|
|
|
@ -61,7 +61,7 @@ struct TestDynamicView
|
|||
typedef typename Space::execution_space execution_space ;
|
||||
typedef typename Space::memory_space memory_space ;
|
||||
|
||||
typedef Kokkos::Experimental::MemoryPool<typename Space::device_type> memory_pool_type;
|
||||
typedef Kokkos::MemoryPool<typename Space::device_type> memory_pool_type;
|
||||
|
||||
typedef Kokkos::Experimental::DynamicView<Scalar*,Space> view_type;
|
||||
typedef typename view_type::const_type const_view_type ;
|
||||
|
@ -131,7 +131,12 @@ struct TestDynamicView
|
|||
|
||||
// printf("TestDynamicView::run(%d) construct memory pool\n",arg_total_size);
|
||||
|
||||
memory_pool_type pool( memory_space() , arg_total_size * sizeof(Scalar) * 1.2 );
|
||||
memory_pool_type pool( memory_space()
|
||||
, arg_total_size * sizeof(Scalar) * 1.2
|
||||
, 500 /* min block size in bytes */
|
||||
, 30000 /* max block size in bytes */
|
||||
, 1000000 /* min superblock size in bytes */
|
||||
);
|
||||
|
||||
// printf("TestDynamicView::run(%d) construct dynamic view\n",arg_total_size);
|
||||
|
||||
|
|
|
@ -1,13 +1,13 @@
|
|||
/*
|
||||
//@HEADER
|
||||
// ************************************************************************
|
||||
//
|
||||
//
|
||||
// Kokkos v. 2.0
|
||||
// Copyright (2014) Sandia Corporation
|
||||
//
|
||||
//
|
||||
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
|
||||
// the U.S. Government retains certain rights in this software.
|
||||
//
|
||||
//
|
||||
// Redistribution and use in source and binary forms, with or without
|
||||
// modification, are permitted provided that the following conditions are
|
||||
// met:
|
||||
|
@ -36,7 +36,7 @@
|
|||
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
//
|
||||
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
|
||||
//
|
||||
//
|
||||
// ************************************************************************
|
||||
//@HEADER
|
||||
*/
|
||||
|
@ -225,3 +225,4 @@ struct ErrorReporterDriverNativeOpenMP : public ErrorReporterDriverBase<Kokkos::
|
|||
|
||||
} // namespace Test
|
||||
#endif // #ifndef KOKKOS_TEST_ERROR_REPORTING_HPP
|
||||
|
||||
|
|
|
@ -1,13 +1,13 @@
|
|||
/*
|
||||
//@HEADER
|
||||
// ************************************************************************
|
||||
//
|
||||
//
|
||||
// Kokkos v. 2.0
|
||||
// Copyright (2014) Sandia Corporation
|
||||
//
|
||||
//
|
||||
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
|
||||
// the U.S. Government retains certain rights in this software.
|
||||
//
|
||||
//
|
||||
// Redistribution and use in source and binary forms, with or without
|
||||
// modification, are permitted provided that the following conditions are
|
||||
// met:
|
||||
|
@ -36,11 +36,14 @@
|
|||
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
//
|
||||
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
|
||||
//
|
||||
//
|
||||
// ************************************************************************
|
||||
//@HEADER
|
||||
*/
|
||||
|
||||
#include <Kokkos_Macros.hpp>
|
||||
#ifdef KOKKOS_ENABLE_OPENMP
|
||||
|
||||
#include <gtest/gtest.h>
|
||||
|
||||
#include <Kokkos_Core.hpp>
|
||||
|
@ -56,7 +59,6 @@
|
|||
#include <TestVector.hpp>
|
||||
#include <TestDualView.hpp>
|
||||
#include <TestDynamicView.hpp>
|
||||
#include <TestComplex.hpp>
|
||||
|
||||
#include <Kokkos_DynRankView.hpp>
|
||||
#include <TestDynViewAPI.hpp>
|
||||
|
@ -68,7 +70,6 @@
|
|||
|
||||
namespace Test {
|
||||
|
||||
#ifdef KOKKOS_ENABLE_OPENMP
|
||||
class openmp : public ::testing::Test {
|
||||
protected:
|
||||
static void SetUpTestCase()
|
||||
|
@ -91,11 +92,6 @@ protected:
|
|||
}
|
||||
};
|
||||
|
||||
TEST_F( openmp, complex )
|
||||
{
|
||||
testComplex<Kokkos::OpenMP> ();
|
||||
}
|
||||
|
||||
TEST_F( openmp, dyn_view_api) {
|
||||
TestDynViewAPI< double , Kokkos::OpenMP >();
|
||||
}
|
||||
|
@ -172,7 +168,6 @@ OPENMP_DUALVIEW_COMBINE_TEST( 10 )
|
|||
#undef OPENMP_DEEP_COPY
|
||||
#undef OPENMP_VECTOR_COMBINE_TEST
|
||||
#undef OPENMP_DUALVIEW_COMBINE_TEST
|
||||
#endif
|
||||
|
||||
|
||||
TEST_F( openmp , dynamic_view )
|
||||
|
@ -204,3 +199,7 @@ TEST_F(openmp, ErrorReporterNativeOpenMP)
|
|||
|
||||
} // namespace test
|
||||
|
||||
#else
|
||||
void KOKKOS_CONTAINERS_UNIT_TESTS_TESTOPENMP_PREVENT_EMPTY_LINK_ERROR() {}
|
||||
#endif
|
||||
|
||||
|
|
|
@ -1,13 +1,13 @@
|
|||
/*
|
||||
//@HEADER
|
||||
// ************************************************************************
|
||||
//
|
||||
//
|
||||
// Kokkos v. 2.0
|
||||
// Copyright (2014) Sandia Corporation
|
||||
//
|
||||
//
|
||||
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
|
||||
// the U.S. Government retains certain rights in this software.
|
||||
//
|
||||
//
|
||||
// Redistribution and use in source and binary forms, with or without
|
||||
// modification, are permitted provided that the following conditions are
|
||||
// met:
|
||||
|
@ -36,19 +36,18 @@
|
|||
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
//
|
||||
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
|
||||
//
|
||||
//
|
||||
// ************************************************************************
|
||||
//@HEADER
|
||||
*/
|
||||
|
||||
#include <Kokkos_Macros.hpp>
|
||||
#ifdef KOKKOS_ENABLE_SERIAL
|
||||
|
||||
#include <gtest/gtest.h>
|
||||
|
||||
#include <Kokkos_Core.hpp>
|
||||
|
||||
#if ! defined(KOKKOS_ENABLE_SERIAL)
|
||||
# error "It doesn't make sense to build this file unless the Kokkos::Serial device is enabled. If you see this message, it probably means that there is an error in Kokkos' CMake build infrastructure."
|
||||
#else
|
||||
|
||||
#include <Kokkos_Bitset.hpp>
|
||||
#include <Kokkos_UnorderedMap.hpp>
|
||||
#include <Kokkos_Vector.hpp>
|
||||
|
@ -59,7 +58,6 @@
|
|||
#include <TestVector.hpp>
|
||||
#include <TestDualView.hpp>
|
||||
#include <TestDynamicView.hpp>
|
||||
#include <TestComplex.hpp>
|
||||
|
||||
#include <iomanip>
|
||||
|
||||
|
@ -105,11 +103,6 @@ TEST_F( serial , staticcrsgraph )
|
|||
TestStaticCrsGraph::run_test_graph3< Kokkos::Serial >(75, 100000);
|
||||
}
|
||||
|
||||
TEST_F( serial, complex )
|
||||
{
|
||||
testComplex<Kokkos::Serial> ();
|
||||
}
|
||||
|
||||
TEST_F( serial, bitset )
|
||||
{
|
||||
test_bitset<Kokkos::Serial> ();
|
||||
|
@ -190,6 +183,7 @@ TEST_F(serial, ErrorReporter)
|
|||
|
||||
} // namespace Test
|
||||
|
||||
#else
|
||||
void KOKKOS_CONTAINERS_UNIT_TESTS_TESTSERIAL_PREVENT_EMPTY_LINK_ERROR() {}
|
||||
#endif // KOKKOS_ENABLE_SERIAL
|
||||
|
||||
|
||||
|
|
|
@ -1,13 +1,13 @@
|
|||
/*
|
||||
//@HEADER
|
||||
// ************************************************************************
|
||||
//
|
||||
//
|
||||
// Kokkos v. 2.0
|
||||
// Copyright (2014) Sandia Corporation
|
||||
//
|
||||
//
|
||||
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
|
||||
// the U.S. Government retains certain rights in this software.
|
||||
//
|
||||
//
|
||||
// Redistribution and use in source and binary forms, with or without
|
||||
// modification, are permitted provided that the following conditions are
|
||||
// met:
|
||||
|
@ -36,7 +36,7 @@
|
|||
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
//
|
||||
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
|
||||
//
|
||||
//
|
||||
// ************************************************************************
|
||||
//@HEADER
|
||||
*/
|
||||
|
@ -184,4 +184,3 @@ void run_test_graph3(size_t B, size_t N)
|
|||
}
|
||||
} /* namespace TestStaticCrsGraph */
|
||||
|
||||
|
||||
|
|
|
@ -1,13 +1,13 @@
|
|||
/*
|
||||
//@HEADER
|
||||
// ************************************************************************
|
||||
//
|
||||
//
|
||||
// Kokkos v. 2.0
|
||||
// Copyright (2014) Sandia Corporation
|
||||
//
|
||||
//
|
||||
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
|
||||
// the U.S. Government retains certain rights in this software.
|
||||
//
|
||||
//
|
||||
// Redistribution and use in source and binary forms, with or without
|
||||
// modification, are permitted provided that the following conditions are
|
||||
// met:
|
||||
|
@ -36,17 +36,18 @@
|
|||
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
//
|
||||
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
|
||||
//
|
||||
//
|
||||
// ************************************************************************
|
||||
//@HEADER
|
||||
*/
|
||||
|
||||
#include <Kokkos_Macros.hpp>
|
||||
#ifdef KOKKOS_ENABLE_THREADS
|
||||
|
||||
#include <gtest/gtest.h>
|
||||
|
||||
#include <Kokkos_Core.hpp>
|
||||
|
||||
#if defined( KOKKOS_ENABLE_PTHREAD )
|
||||
|
||||
#include <Kokkos_Bitset.hpp>
|
||||
#include <Kokkos_UnorderedMap.hpp>
|
||||
|
||||
|
@ -201,6 +202,7 @@ TEST_F(threads, ErrorReporter)
|
|||
|
||||
} // namespace Test
|
||||
|
||||
|
||||
#endif /* #if defined( KOKKOS_ENABLE_PTHREAD ) */
|
||||
#else
|
||||
void KOKKOS_CONTAINERS_UNIT_TESTS_TESTTHREADS_PREVENT_EMPTY_LINK_ERROR() {}
|
||||
#endif /* #if defined( KOKKOS_ENABLE_THREADS ) */
|
||||
|
||||
|
|
|
@ -1,12 +1,12 @@
|
|||
//@HEADER
|
||||
// ************************************************************************
|
||||
//
|
||||
//
|
||||
// Kokkos v. 2.0
|
||||
// Copyright (2014) Sandia Corporation
|
||||
//
|
||||
//
|
||||
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
|
||||
// the U.S. Government retains certain rights in this software.
|
||||
//
|
||||
//
|
||||
// Redistribution and use in source and binary forms, with or without
|
||||
// modification, are permitted provided that the following conditions are
|
||||
// met:
|
||||
|
@ -35,7 +35,7 @@
|
|||
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
//
|
||||
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
|
||||
//
|
||||
//
|
||||
// ************************************************************************
|
||||
//@HEADER
|
||||
|
||||
|
@ -311,3 +311,4 @@ void test_deep_copy( uint32_t num_nodes )
|
|||
} // namespace Test
|
||||
|
||||
#endif //KOKKOS_TEST_UNORDERED_MAP_HPP
|
||||
|
||||
|
|
|
@ -1,12 +1,12 @@
|
|||
//@HEADER
|
||||
// ************************************************************************
|
||||
//
|
||||
//
|
||||
// Kokkos v. 2.0
|
||||
// Copyright (2014) Sandia Corporation
|
||||
//
|
||||
//
|
||||
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
|
||||
// the U.S. Government retains certain rights in this software.
|
||||
//
|
||||
//
|
||||
// Redistribution and use in source and binary forms, with or without
|
||||
// modification, are permitted provided that the following conditions are
|
||||
// met:
|
||||
|
@ -35,7 +35,7 @@
|
|||
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
//
|
||||
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
|
||||
//
|
||||
//
|
||||
// ************************************************************************
|
||||
//@HEADER
|
||||
|
||||
|
@ -129,3 +129,4 @@ void test_vector_combinations(unsigned int size)
|
|||
} // namespace Test
|
||||
|
||||
#endif //KOKKOS_TEST_UNORDERED_MAP_HPP
|
||||
|
||||
|
|
|
@ -1,15 +1,38 @@
|
|||
#ifndef KOKKOS_CORE_CONFIG_H
|
||||
#if !defined(KOKKOS_MACROS_HPP) || defined(KOKKOS_CORE_CONFIG_H)
|
||||
#error "Don't include KokkosCore_config.h directly; include Kokkos_Macros.hpp instead."
|
||||
#else
|
||||
#define KOKKOS_CORE_CONFIG_H
|
||||
#endif
|
||||
|
||||
/* The trivial 'src/build_common.sh' creates a config
|
||||
* that must stay in sync with this file.
|
||||
*/
|
||||
#cmakedefine KOKKOS_FOR_SIERRA
|
||||
|
||||
#if !defined( KOKKOS_FOR_SIERRA )
|
||||
#ifndef KOKKOS_FOR_SIERRA
|
||||
|
||||
#cmakedefine KOKKOS_HAVE_MPI
|
||||
#cmakedefine KOKKOS_HAVE_CUDA
|
||||
#cmakedefine KOKKOS_HAVE_OPENMP
|
||||
#cmakedefine KOKKOS_HAVE_PTHREAD
|
||||
#cmakedefine KOKKOS_HAVE_QTHREADS
|
||||
#cmakedefine KOKKOS_HAVE_SERIAL
|
||||
#cmakedefine KOKKOS_HAVE_Winthread
|
||||
|
||||
#cmakedefine KOKKOS_HAVE_HWLOC
|
||||
#cmakedefine KOKKOS_ENABLE_HBWSPACE
|
||||
#cmakedefine KOKKOS_ENABLE_LIBRT
|
||||
|
||||
#cmakedefine KOKKOS_HAVE_DEBUG
|
||||
#cmakedefine KOKKOS_ENABLE_DEBUG_BOUNDS_CHECK
|
||||
#cmakedefine KOKKOS_ENABLE_DEBUG_DUALVIEW_MODIFY_CHECK
|
||||
#cmakedefine KOKKOS_ENABLE_PROFILING
|
||||
#cmakedefine KOKKOS_ENABLE_PROFILING_LOAD_PRINT
|
||||
|
||||
#cmakedefine KOKKOS_ENABLE_AGGRESSIVE_VECTORIZATION
|
||||
|
||||
#ifdef KOKKOS_HAVE_CUDA
|
||||
|
||||
#cmakedefine KOKKOS_ENABLE_CUDA_LDG_INTRINSIC
|
||||
|
||||
// mfh 16 Sep 2014: If passed in on the command line, that overrides
|
||||
// any value of KOKKOS_USE_CUDA_UVM here. Doing this should prevent build
|
||||
|
@ -23,26 +46,8 @@
|
|||
// hesitate to do that now, because I'm not sure if all the files are
|
||||
// including KokkosCore_config.h (or a header file that includes it) like
|
||||
// they should.
|
||||
|
||||
#if ! defined(KOKKOS_USE_CUDA_UVM)
|
||||
#ifndef KOKKOS_USE_CUDA_UVM
|
||||
#cmakedefine KOKKOS_USE_CUDA_UVM
|
||||
#endif // ! defined(KOKKOS_USE_CUDA_UVM)
|
||||
|
||||
#cmakedefine KOKKOS_HAVE_PTHREAD
|
||||
#cmakedefine KOKKOS_HAVE_SERIAL
|
||||
#cmakedefine KOKKOS_HAVE_QTHREADS
|
||||
#cmakedefine KOKKOS_HAVE_Winthread
|
||||
#cmakedefine KOKKOS_HAVE_OPENMP
|
||||
#cmakedefine KOKKOS_HAVE_HWLOC
|
||||
#cmakedefine KOKKOS_HAVE_DEBUG
|
||||
#cmakedefine KOKKOS_ENABLE_DEBUG_BOUNDS_CHECK
|
||||
#cmakedefine KOKKOS_HAVE_CXX11
|
||||
#cmakedefine KOKKOS_HAVE_CUSPARSE
|
||||
#cmakedefine KOKKOS_ENABLE_PROFILING_INTERNAL
|
||||
#ifdef KOKKOS_ENABLE_PROFILING_INTERNAL
|
||||
#define KOKKOS_ENABLE_PROFILING 1
|
||||
#else
|
||||
#define KOKKOS_ENABLE_PROFILING 0
|
||||
#endif
|
||||
|
||||
#cmakedefine KOKKOS_HAVE_CUDA_RDC
|
||||
|
@ -55,13 +60,51 @@
|
|||
#define KOKKOS_CUDA_USE_LAMBDA 1
|
||||
#endif
|
||||
|
||||
#endif
|
||||
|
||||
#cmakedefine KOKKOS_CUDA_CLANG_WORKAROUND
|
||||
|
||||
#ifndef __CUDA_ARCH__
|
||||
#cmakedefine KOKKOS_ENABLE_ISA_X86_64
|
||||
#cmakedefine KOKKOS_ENABLE_ISA_KNC
|
||||
#cmakedefine KOKKOS_ENABLE_ISA_POWERPCLE
|
||||
#endif
|
||||
|
||||
#cmakedefine KOKKOS_ARCH_ARMV80 1
|
||||
#cmakedefine KOKKOS_ARCH_ARMV81 1
|
||||
#cmakedefine KOKKOS_ARCH_ARMV8_THUNDERX 1
|
||||
#cmakedefine KOKKOS_ARCH_AVX 1
|
||||
#cmakedefine KOKKOS_ARCH_AVX2 1
|
||||
#cmakedefine KOKKOS_ARCH_AVX512MIC 1
|
||||
#cmakedefine KOKKOS_ARCH_AVX512XEON 1
|
||||
#cmakedefine KOKKOS_ARCH_KNC 1
|
||||
#cmakedefine KOKKOS_ARCH_POWER8 1
|
||||
#cmakedefine KOKKOS_ARCH_POWER9 1
|
||||
#cmakedefine KOKKOS_ARCH_KEPLER 1
|
||||
#cmakedefine KOKKOS_ARCH_KEPLER30 1
|
||||
#cmakedefine KOKKOS_ARCH_KEPLER32 1
|
||||
#cmakedefine KOKKOS_ARCH_KEPLER35 1
|
||||
#cmakedefine KOKKOS_ARCH_KEPLER37 1
|
||||
#cmakedefine KOKKOS_ARCH_MAXWELL 1
|
||||
#cmakedefine KOKKOS_ARCH_MAXWELL50 1
|
||||
#cmakedefine KOKKOS_ARCH_MAXWELL52 1
|
||||
#cmakedefine KOKKOS_ARCH_MAXWELL53 1
|
||||
#cmakedefine KOKKOS_ARCH_PASCAL 1
|
||||
#cmakedefine KOKKOS_ARCH_PASCAL60 1
|
||||
#cmakedefine KOKKOS_ARCH_PASCAL61 1
|
||||
|
||||
// Don't forbid users from defining this macro on the command line,
|
||||
// but still make sure that CMake logic can control its definition.
|
||||
#if ! defined(KOKKOS_HAVE_CXX11_DISPATCH_LAMBDA)
|
||||
#ifndef KOKKOS_HAVE_CXX11_DISPATCH_LAMBDA
|
||||
#cmakedefine KOKKOS_HAVE_CXX11_DISPATCH_LAMBDA 1
|
||||
#endif // KOKKOS_HAVE_CXX11_DISPATCH_LAMBDA
|
||||
#endif
|
||||
|
||||
// TODO: These are currently not used in Kokkos. Should they be removed?
|
||||
#cmakedefine KOKKOS_HAVE_MPI
|
||||
#cmakedefine KOKKOS_HAVE_CUSPARSE
|
||||
|
||||
// TODO: No longer options in Kokkos. Need to be removed.
|
||||
#cmakedefine KOKKOS_USING_DEPRECATED_VIEW
|
||||
#cmakedefine KOKKOS_HAVE_CXX11
|
||||
|
||||
#endif // KOKKOS_FOR_SIERRA
|
||||
#endif // KOKKOS_CORE_CONFIG_H
|
||||
|
|
|
@ -1,11 +1,18 @@
|
|||
|
||||
INCLUDE_DIRECTORIES(${CMAKE_CURRENT_BINRARY_DIR})
|
||||
INCLUDE_DIRECTORIES(${CMAKE_CURRENT_BINARY_DIR})
|
||||
INCLUDE_DIRECTORIES(REQUIRED_DURING_INSTALLATION_TESTING ${CMAKE_CURRENT_SOURCE_DIR})
|
||||
|
||||
# warning: PerfTest_CustomReduction.cpp uses
|
||||
# ../../algorithms/src/Kokkos_Random.hpp
|
||||
# we'll just allow it to be included, but note
|
||||
# that in TriBITS KokkosAlgorithms can be disabled...
|
||||
INCLUDE_DIRECTORIES("${CMAKE_CURRENT_SOURCE_DIR}/../../algorithms/src")
|
||||
|
||||
SET(SOURCES
|
||||
PerfTestMain.cpp
|
||||
PerfTestHost.cpp
|
||||
PerfTestCuda.cpp
|
||||
PerfTestGramSchmidt.cpp
|
||||
PerfTestHexGrad.cpp
|
||||
PerfTest_CustomReduction.cpp
|
||||
)
|
||||
|
||||
# Per #374, we always want to build this test, but we only want to run
|
||||
|
|
|
@ -8,12 +8,14 @@ default: build_all
|
|||
echo "End Build"
|
||||
|
||||
ifneq (,$(findstring Cuda,$(KOKKOS_DEVICES)))
|
||||
CXX = $(KOKKOS_PATH)/config/nvcc_wrapper
|
||||
CXX = $(KOKKOS_PATH)/bin/nvcc_wrapper
|
||||
KOKKOS_CUDA_OPTIONS=enable_lambda
|
||||
else
|
||||
CXX = g++
|
||||
endif
|
||||
|
||||
CXXFLAGS = -O3
|
||||
CXXFLAGS = -O3
|
||||
#CXXFLAGS += -DGENERIC_REDUCER
|
||||
LINK ?= $(CXX)
|
||||
LDFLAGS ?= -lpthread
|
||||
|
||||
|
@ -21,23 +23,49 @@ include $(KOKKOS_PATH)/Makefile.kokkos
|
|||
|
||||
KOKKOS_CXXFLAGS += -I$(GTEST_PATH) -I${KOKKOS_PATH}/core/perf_test
|
||||
|
||||
TEST_TARGETS =
|
||||
TARGETS =
|
||||
TEST_TARGETS =
|
||||
TARGETS =
|
||||
|
||||
OBJ_PERF = PerfTestHost.o PerfTestCuda.o PerfTestMain.o gtest-all.o
|
||||
#
|
||||
|
||||
OBJ_PERF = PerfTestMain.o gtest-all.o
|
||||
OBJ_PERF += PerfTestGramSchmidt.o
|
||||
OBJ_PERF += PerfTestHexGrad.o
|
||||
OBJ_PERF += PerfTest_CustomReduction.o
|
||||
TARGETS += KokkosCore_PerformanceTest
|
||||
TEST_TARGETS += test-performance
|
||||
|
||||
#
|
||||
|
||||
OBJ_ATOMICS = test_atomic.o
|
||||
TARGETS += KokkosCore_PerformanceTest_Atomics
|
||||
TEST_TARGETS += test-atomic
|
||||
|
||||
#
|
||||
|
||||
OBJ_MEMPOOL = test_mempool.o
|
||||
TARGETS += KokkosCore_PerformanceTest_Mempool
|
||||
TEST_TARGETS += test-mempool
|
||||
|
||||
#
|
||||
|
||||
OBJ_TASKDAG = test_taskdag.o
|
||||
TARGETS += KokkosCore_PerformanceTest_TaskDAG
|
||||
TEST_TARGETS += test-taskdag
|
||||
|
||||
#
|
||||
|
||||
KokkosCore_PerformanceTest: $(OBJ_PERF) $(KOKKOS_LINK_DEPENDS)
|
||||
$(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) $(OBJ_PERF) $(KOKKOS_LIBS) $(LIB) -o KokkosCore_PerformanceTest
|
||||
$(LINK) $(EXTRA_PATH) $(OBJ_PERF) $(KOKKOS_LIBS) $(LIB) $(KOKKOS_LDFLAGS) $(LDFLAGS) -o KokkosCore_PerformanceTest
|
||||
|
||||
KokkosCore_PerformanceTest_Atomics: $(OBJ_ATOMICS) $(KOKKOS_LINK_DEPENDS)
|
||||
$(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) $(OBJ_ATOMICS) $(KOKKOS_LIBS) $(LIB) -o KokkosCore_PerformanceTest_Atomics
|
||||
$(LINK) $(EXTRA_PATH) $(OBJ_ATOMICS) $(KOKKOS_LIBS) $(LIB) $(KOKKOS_LDFLAGS) $(LDFLAGS) -o KokkosCore_PerformanceTest_Atomics
|
||||
|
||||
KokkosCore_PerformanceTest_Mempool: $(OBJ_MEMPOOL) $(KOKKOS_LINK_DEPENDS)
|
||||
$(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) $(OBJ_MEMPOOL) $(KOKKOS_LIBS) $(LIB) -o KokkosCore_PerformanceTest_Mempool
|
||||
|
||||
KokkosCore_PerformanceTest_TaskDAG: $(OBJ_TASKDAG) $(KOKKOS_LINK_DEPENDS)
|
||||
$(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) $(OBJ_TASKDAG) $(KOKKOS_LIBS) $(LIB) -o KokkosCore_PerformanceTest_TaskDAG
|
||||
|
||||
test-performance: KokkosCore_PerformanceTest
|
||||
./KokkosCore_PerformanceTest
|
||||
|
@ -45,12 +73,18 @@ test-performance: KokkosCore_PerformanceTest
|
|||
test-atomic: KokkosCore_PerformanceTest_Atomics
|
||||
./KokkosCore_PerformanceTest_Atomics
|
||||
|
||||
test-mempool: KokkosCore_PerformanceTest_Mempool
|
||||
./KokkosCore_PerformanceTest_Mempool
|
||||
|
||||
test-taskdag: KokkosCore_PerformanceTest_TaskDAG
|
||||
./KokkosCore_PerformanceTest_TaskDAG
|
||||
|
||||
|
||||
build_all: $(TARGETS)
|
||||
|
||||
test: $(TEST_TARGETS)
|
||||
|
||||
clean: kokkos-clean
|
||||
clean: kokkos-clean
|
||||
rm -f *.o $(TARGETS)
|
||||
|
||||
# Compilation rules
|
||||
|
@ -58,5 +92,6 @@ clean: kokkos-clean
|
|||
%.o:%.cpp $(KOKKOS_CPP_DEPENDS)
|
||||
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $<
|
||||
|
||||
gtest-all.o:$(GTEST_PATH)/gtest/gtest-all.cc
|
||||
gtest-all.o:$(GTEST_PATH)/gtest/gtest-all.cc
|
||||
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $(GTEST_PATH)/gtest/gtest-all.cc
|
||||
|
||||
|
|
|
@ -1,199 +0,0 @@
|
|||
/*
|
||||
//@HEADER
|
||||
// ************************************************************************
|
||||
//
|
||||
// Kokkos v. 2.0
|
||||
// Copyright (2014) Sandia Corporation
|
||||
//
|
||||
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
|
||||
// the U.S. Government retains certain rights in this software.
|
||||
//
|
||||
// Redistribution and use in source and binary forms, with or without
|
||||
// modification, are permitted provided that the following conditions are
|
||||
// met:
|
||||
//
|
||||
// 1. Redistributions of source code must retain the above copyright
|
||||
// notice, this list of conditions and the following disclaimer.
|
||||
//
|
||||
// 2. Redistributions in binary form must reproduce the above copyright
|
||||
// notice, this list of conditions and the following disclaimer in the
|
||||
// documentation and/or other materials provided with the distribution.
|
||||
//
|
||||
// 3. Neither the name of the Corporation nor the names of the
|
||||
// contributors may be used to endorse or promote products derived from
|
||||
// this software without specific prior written permission.
|
||||
//
|
||||
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
|
||||
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
||||
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
|
||||
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
|
||||
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
//
|
||||
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
|
||||
//
|
||||
// ************************************************************************
|
||||
//@HEADER
|
||||
*/
|
||||
|
||||
#include <iostream>
|
||||
#include <iomanip>
|
||||
#include <algorithm>
|
||||
#include <gtest/gtest.h>
|
||||
|
||||
#include <Kokkos_Core.hpp>
|
||||
|
||||
#if defined( KOKKOS_ENABLE_CUDA )
|
||||
|
||||
#include <impl/Kokkos_Timer.hpp>
|
||||
|
||||
#include <PerfTestMDRange.hpp>
|
||||
|
||||
#include <PerfTestHexGrad.hpp>
|
||||
#include <PerfTestBlasKernels.hpp>
|
||||
#include <PerfTestGramSchmidt.hpp>
|
||||
#include <PerfTestDriver.hpp>
|
||||
|
||||
|
||||
namespace Test {
|
||||
|
||||
class cuda : public ::testing::Test {
|
||||
protected:
|
||||
static void SetUpTestCase() {
|
||||
Kokkos::HostSpace::execution_space::initialize();
|
||||
Kokkos::Cuda::initialize( Kokkos::Cuda::SelectDevice(0) );
|
||||
}
|
||||
static void TearDownTestCase() {
|
||||
Kokkos::Cuda::finalize();
|
||||
Kokkos::HostSpace::execution_space::finalize();
|
||||
}
|
||||
};
|
||||
|
||||
//TEST_F( cuda, mdrange_lr ) {
|
||||
// EXPECT_NO_THROW( (run_test_mdrange<Kokkos::Cuda , Kokkos::LayoutRight>( 5, 8, "Kokkos::Cuda" )) );
|
||||
//}
|
||||
|
||||
//TEST_F( cuda, mdrange_ll ) {
|
||||
// EXPECT_NO_THROW( (run_test_mdrange<Kokkos::Cuda , Kokkos::LayoutLeft>( 5, 8, "Kokkos::Cuda" )) );
|
||||
//}
|
||||
|
||||
TEST_F( cuda, hexgrad )
|
||||
{
|
||||
EXPECT_NO_THROW( run_test_hexgrad< Kokkos::Cuda >( 10 , 20, "Kokkos::Cuda" ) );
|
||||
}
|
||||
|
||||
TEST_F( cuda, gramschmidt )
|
||||
{
|
||||
EXPECT_NO_THROW( run_test_gramschmidt< Kokkos::Cuda >( 10 , 20, "Kokkos::Cuda" ) );
|
||||
}
|
||||
|
||||
namespace {
|
||||
|
||||
template <typename T>
|
||||
struct TextureFetch
|
||||
{
|
||||
typedef Kokkos::View< T *, Kokkos::CudaSpace> array_type;
|
||||
typedef Kokkos::View< const T *, Kokkos::CudaSpace, Kokkos::MemoryRandomAccess> const_array_type;
|
||||
typedef Kokkos::View< int *, Kokkos::CudaSpace> index_array_type;
|
||||
typedef Kokkos::View< const int *, Kokkos::CudaSpace> const_index_array_type;
|
||||
|
||||
struct FillArray
|
||||
{
|
||||
array_type m_array;
|
||||
FillArray( const array_type & array )
|
||||
: m_array(array)
|
||||
{}
|
||||
|
||||
void apply() const
|
||||
{
|
||||
Kokkos::parallel_for( Kokkos::RangePolicy<Kokkos::Cuda,int>(0,m_array.dimension_0()), *this);
|
||||
}
|
||||
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
void operator()(int i) const { m_array(i) = i; }
|
||||
};
|
||||
|
||||
struct RandomIndexes
|
||||
{
|
||||
index_array_type m_indexes;
|
||||
typename index_array_type::HostMirror m_host_indexes;
|
||||
RandomIndexes( const index_array_type & indexes)
|
||||
: m_indexes(indexes)
|
||||
, m_host_indexes(Kokkos::create_mirror(m_indexes))
|
||||
{}
|
||||
|
||||
void apply() const
|
||||
{
|
||||
Kokkos::parallel_for( Kokkos::RangePolicy<Kokkos::HostSpace::execution_space,int>(0,m_host_indexes.dimension_0()), *this);
|
||||
//random shuffle
|
||||
Kokkos::HostSpace::execution_space::fence();
|
||||
std::random_shuffle(m_host_indexes.ptr_on_device(), m_host_indexes.ptr_on_device() + m_host_indexes.dimension_0());
|
||||
Kokkos::deep_copy(m_indexes,m_host_indexes);
|
||||
}
|
||||
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
void operator()(int i) const { m_host_indexes(i) = i; }
|
||||
};
|
||||
|
||||
struct RandomReduce
|
||||
{
|
||||
const_array_type m_array;
|
||||
const_index_array_type m_indexes;
|
||||
RandomReduce( const const_array_type & array, const const_index_array_type & indexes)
|
||||
: m_array(array)
|
||||
, m_indexes(indexes)
|
||||
{}
|
||||
|
||||
void apply(T & reduce) const
|
||||
{
|
||||
Kokkos::parallel_reduce( Kokkos::RangePolicy<Kokkos::Cuda,int>(0,m_array.dimension_0()), *this, reduce);
|
||||
}
|
||||
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
void operator()(int i, T & reduce) const
|
||||
{ reduce += m_array(m_indexes(i)); }
|
||||
};
|
||||
|
||||
static void run(int size, double & reduce_time, T &reduce)
|
||||
{
|
||||
array_type array("array",size);
|
||||
index_array_type indexes("indexes",size);
|
||||
|
||||
{ FillArray f(array); f.apply(); }
|
||||
{ RandomIndexes f(indexes); f.apply(); }
|
||||
|
||||
Kokkos::Cuda::fence();
|
||||
|
||||
Kokkos::Timer timer;
|
||||
for (int j=0; j<10; ++j) {
|
||||
RandomReduce f(array,indexes);
|
||||
f.apply(reduce);
|
||||
}
|
||||
Kokkos::Cuda::fence();
|
||||
reduce_time = timer.seconds();
|
||||
}
|
||||
};
|
||||
|
||||
} // unnamed namespace
|
||||
|
||||
TEST_F( cuda, texture_double )
|
||||
{
|
||||
printf("Random reduce of double through texture fetch\n");
|
||||
for (int i=1; i<=26; ++i) {
|
||||
int size = 1<<i;
|
||||
double time = 0;
|
||||
double reduce = 0;
|
||||
TextureFetch<double>::run(size,time,reduce);
|
||||
printf(" time = %1.3e size = 2^%d\n", time, i);
|
||||
}
|
||||
}
|
||||
|
||||
} // namespace Test
|
||||
|
||||
#endif /* #if defined( KOKKOS_ENABLE_CUDA ) */
|
||||
|
|
@ -398,91 +398,5 @@ void run_test_mdrange( int exp_beg , int exp_end, const char deviceTypeName[], i
|
|||
}
|
||||
|
||||
|
||||
template< class DeviceType >
|
||||
void run_test_hexgrad( int exp_beg , int exp_end, const char deviceTypeName[] )
|
||||
{
|
||||
std::string label_hexgrad ;
|
||||
label_hexgrad.append( "\"HexGrad< double , " );
|
||||
// mfh 06 Jun 2013: This only appends "DeviceType" (literally) to
|
||||
// the string, not the actual name of the device type. Thus, I've
|
||||
// modified the function to take the name of the device type.
|
||||
//
|
||||
//label_hexgrad.append( KOKKOS_MACRO_TO_STRING( DeviceType ) );
|
||||
label_hexgrad.append( deviceTypeName );
|
||||
label_hexgrad.append( " >\"" );
|
||||
|
||||
for (int i = exp_beg ; i < exp_end ; ++i) {
|
||||
double min_seconds = 0.0 ;
|
||||
double max_seconds = 0.0 ;
|
||||
double avg_seconds = 0.0 ;
|
||||
|
||||
const int parallel_work_length = 1<<i;
|
||||
|
||||
for ( int j = 0 ; j < NUMBER_OF_TRIALS ; ++j ) {
|
||||
const double seconds = HexGrad< DeviceType >::test(parallel_work_length) ;
|
||||
|
||||
if ( 0 == j ) {
|
||||
min_seconds = seconds ;
|
||||
max_seconds = seconds ;
|
||||
}
|
||||
else {
|
||||
if ( seconds < min_seconds ) min_seconds = seconds ;
|
||||
if ( seconds > max_seconds ) max_seconds = seconds ;
|
||||
}
|
||||
avg_seconds += seconds ;
|
||||
}
|
||||
avg_seconds /= NUMBER_OF_TRIALS ;
|
||||
|
||||
std::cout << label_hexgrad
|
||||
<< " , " << parallel_work_length
|
||||
<< " , " << min_seconds
|
||||
<< " , " << ( min_seconds / parallel_work_length )
|
||||
<< std::endl ;
|
||||
}
|
||||
}
|
||||
|
||||
template< class DeviceType >
|
||||
void run_test_gramschmidt( int exp_beg , int exp_end, const char deviceTypeName[] )
|
||||
{
|
||||
std::string label_gramschmidt ;
|
||||
label_gramschmidt.append( "\"GramSchmidt< double , " );
|
||||
// mfh 06 Jun 2013: This only appends "DeviceType" (literally) to
|
||||
// the string, not the actual name of the device type. Thus, I've
|
||||
// modified the function to take the name of the device type.
|
||||
//
|
||||
//label_gramschmidt.append( KOKKOS_MACRO_TO_STRING( DeviceType ) );
|
||||
label_gramschmidt.append( deviceTypeName );
|
||||
label_gramschmidt.append( " >\"" );
|
||||
|
||||
for (int i = exp_beg ; i < exp_end ; ++i) {
|
||||
double min_seconds = 0.0 ;
|
||||
double max_seconds = 0.0 ;
|
||||
double avg_seconds = 0.0 ;
|
||||
|
||||
const int parallel_work_length = 1<<i;
|
||||
|
||||
for ( int j = 0 ; j < NUMBER_OF_TRIALS ; ++j ) {
|
||||
const double seconds = ModifiedGramSchmidt< double , DeviceType >::test(parallel_work_length, 32 ) ;
|
||||
|
||||
if ( 0 == j ) {
|
||||
min_seconds = seconds ;
|
||||
max_seconds = seconds ;
|
||||
}
|
||||
else {
|
||||
if ( seconds < min_seconds ) min_seconds = seconds ;
|
||||
if ( seconds > max_seconds ) max_seconds = seconds ;
|
||||
}
|
||||
avg_seconds += seconds ;
|
||||
}
|
||||
avg_seconds /= NUMBER_OF_TRIALS ;
|
||||
|
||||
std::cout << label_gramschmidt
|
||||
<< " , " << parallel_work_length
|
||||
<< " , " << min_seconds
|
||||
<< " , " << ( min_seconds / parallel_work_length )
|
||||
<< std::endl ;
|
||||
}
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
|
|
|
@ -41,6 +41,10 @@
|
|||
//@HEADER
|
||||
*/
|
||||
|
||||
#include <Kokkos_Core.hpp>
|
||||
#include <gtest/gtest.h>
|
||||
#include <PerfTest_Category.hpp>
|
||||
|
||||
#include <cmath>
|
||||
#include <PerfTestBlasKernels.hpp>
|
||||
|
||||
|
@ -70,7 +74,7 @@ struct InvNorm2 : public Kokkos::DotSingle< VectorView > {
|
|||
KOKKOS_INLINE_FUNCTION
|
||||
void final( value_type & result ) const
|
||||
{
|
||||
result = sqrt( result );
|
||||
result = std::sqrt( result );
|
||||
Rjj() = result ;
|
||||
inv() = ( 0 < result ) ? 1.0 / result : 0 ;
|
||||
}
|
||||
|
@ -157,7 +161,7 @@ struct ModifiedGramSchmidt
|
|||
|
||||
for ( size_type j = 0 ; j < count ; ++j ) {
|
||||
// Reduction : tmp = dot( Q(:,j) , Q(:,j) );
|
||||
// PostProcess : tmp = sqrt( tmp ); R(j,j) = tmp ; tmp = 1 / tmp ;
|
||||
// PostProcess : tmp = std::sqrt( tmp ); R(j,j) = tmp ; tmp = 1 / tmp ;
|
||||
const vector_type Qj = Kokkos::subview( Q_ , Kokkos::ALL() , j );
|
||||
const value_view Rjj = Kokkos::subview( R_ , j , j );
|
||||
|
||||
|
@ -222,5 +226,58 @@ struct ModifiedGramSchmidt
|
|||
}
|
||||
};
|
||||
|
||||
template< class DeviceType >
|
||||
void run_test_gramschmidt( int exp_beg , int exp_end, int num_trials, const char deviceTypeName[] )
|
||||
{
|
||||
std::string label_gramschmidt ;
|
||||
label_gramschmidt.append( "\"GramSchmidt< double , " );
|
||||
label_gramschmidt.append( deviceTypeName );
|
||||
label_gramschmidt.append( " >\"" );
|
||||
|
||||
for (int i = exp_beg ; i < exp_end ; ++i) {
|
||||
double min_seconds = 0.0 ;
|
||||
double max_seconds = 0.0 ;
|
||||
double avg_seconds = 0.0 ;
|
||||
|
||||
const int parallel_work_length = 1<<i;
|
||||
|
||||
for ( int j = 0 ; j < num_trials ; ++j ) {
|
||||
const double seconds = ModifiedGramSchmidt< double , DeviceType >::test(parallel_work_length, 32 ) ;
|
||||
|
||||
if ( 0 == j ) {
|
||||
min_seconds = seconds ;
|
||||
max_seconds = seconds ;
|
||||
}
|
||||
else {
|
||||
if ( seconds < min_seconds ) min_seconds = seconds ;
|
||||
if ( seconds > max_seconds ) max_seconds = seconds ;
|
||||
}
|
||||
avg_seconds += seconds ;
|
||||
}
|
||||
avg_seconds /= num_trials ;
|
||||
|
||||
std::cout << label_gramschmidt
|
||||
<< " , " << parallel_work_length
|
||||
<< " , " << min_seconds
|
||||
<< " , " << ( min_seconds / parallel_work_length )
|
||||
<< std::endl ;
|
||||
}
|
||||
}
|
||||
|
||||
TEST_F( default_exec, gramschmidt ) {
|
||||
int exp_beg = 10;
|
||||
int exp_end = 20;
|
||||
int num_trials = 5;
|
||||
|
||||
if(command_line_num_args()>1)
|
||||
exp_beg = atoi(command_line_arg(1));
|
||||
if(command_line_num_args()>2)
|
||||
exp_end = atoi(command_line_arg(2));
|
||||
if(command_line_num_args()>3)
|
||||
num_trials = atoi(command_line_arg(3));
|
||||
|
||||
EXPECT_NO_THROW(run_test_gramschmidt< Kokkos::DefaultExecutionSpace>( exp_beg, exp_end, num_trials, Kokkos::DefaultExecutionSpace::name() ));
|
||||
}
|
||||
|
||||
}
|
||||
|
|
@ -41,6 +41,10 @@
|
|||
//@HEADER
|
||||
*/
|
||||
|
||||
#include <Kokkos_Core.hpp>
|
||||
#include <gtest/gtest.h>
|
||||
#include <PerfTest_Category.hpp>
|
||||
|
||||
namespace Test {
|
||||
|
||||
template< class DeviceType ,
|
||||
|
@ -264,5 +268,58 @@ struct HexGrad
|
|||
}
|
||||
};
|
||||
|
||||
template< class DeviceType >
|
||||
void run_test_hexgrad( int exp_beg , int exp_end, int num_trials, const char deviceTypeName[] )
|
||||
{
|
||||
std::string label_hexgrad ;
|
||||
label_hexgrad.append( "\"HexGrad< double , " );
|
||||
label_hexgrad.append( deviceTypeName );
|
||||
label_hexgrad.append( " >\"" );
|
||||
|
||||
for (int i = exp_beg ; i < exp_end ; ++i) {
|
||||
double min_seconds = 0.0 ;
|
||||
double max_seconds = 0.0 ;
|
||||
double avg_seconds = 0.0 ;
|
||||
|
||||
const int parallel_work_length = 1<<i;
|
||||
|
||||
for ( int j = 0 ; j < num_trials ; ++j ) {
|
||||
const double seconds = HexGrad< DeviceType >::test(parallel_work_length) ;
|
||||
|
||||
if ( 0 == j ) {
|
||||
min_seconds = seconds ;
|
||||
max_seconds = seconds ;
|
||||
}
|
||||
else {
|
||||
if ( seconds < min_seconds ) min_seconds = seconds ;
|
||||
if ( seconds > max_seconds ) max_seconds = seconds ;
|
||||
}
|
||||
avg_seconds += seconds ;
|
||||
}
|
||||
avg_seconds /= num_trials ;
|
||||
|
||||
std::cout << label_hexgrad
|
||||
<< " , " << parallel_work_length
|
||||
<< " , " << min_seconds
|
||||
<< " , " << ( min_seconds / parallel_work_length )
|
||||
<< std::endl ;
|
||||
}
|
||||
}
|
||||
|
||||
TEST_F( default_exec, hexgrad ) {
|
||||
int exp_beg = 10;
|
||||
int exp_end = 20;
|
||||
int num_trials = 5;
|
||||
|
||||
if(command_line_num_args()>1)
|
||||
exp_beg = atoi(command_line_arg(1));
|
||||
if(command_line_num_args()>2)
|
||||
exp_end = atoi(command_line_arg(2));
|
||||
if(command_line_num_args()>3)
|
||||
num_trials = atoi(command_line_arg(3));
|
||||
|
||||
EXPECT_NO_THROW(run_test_hexgrad< Kokkos::DefaultExecutionSpace >( exp_beg, exp_end, num_trials, Kokkos::DefaultExecutionSpace::name() ));
|
||||
}
|
||||
|
||||
}
|
||||
|
|
@ -1,125 +0,0 @@
|
|||
/*
|
||||
//@HEADER
|
||||
// ************************************************************************
|
||||
//
|
||||
// Kokkos v. 2.0
|
||||
// Copyright (2014) Sandia Corporation
|
||||
//
|
||||
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
|
||||
// the U.S. Government retains certain rights in this software.
|
||||
//
|
||||
// Redistribution and use in source and binary forms, with or without
|
||||
// modification, are permitted provided that the following conditions are
|
||||
// met:
|
||||
//
|
||||
// 1. Redistributions of source code must retain the above copyright
|
||||
// notice, this list of conditions and the following disclaimer.
|
||||
//
|
||||
// 2. Redistributions in binary form must reproduce the above copyright
|
||||
// notice, this list of conditions and the following disclaimer in the
|
||||
// documentation and/or other materials provided with the distribution.
|
||||
//
|
||||
// 3. Neither the name of the Corporation nor the names of the
|
||||
// contributors may be used to endorse or promote products derived from
|
||||
// this software without specific prior written permission.
|
||||
//
|
||||
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
|
||||
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
||||
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
|
||||
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
|
||||
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
//
|
||||
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
|
||||
//
|
||||
// ************************************************************************
|
||||
//@HEADER
|
||||
*/
|
||||
|
||||
#include <gtest/gtest.h>
|
||||
|
||||
#include <Kokkos_Core.hpp>
|
||||
|
||||
#if defined( KOKKOS_ENABLE_OPENMP )
|
||||
|
||||
typedef Kokkos::OpenMP TestHostDevice ;
|
||||
const char TestHostDeviceName[] = "Kokkos::OpenMP" ;
|
||||
|
||||
#elif defined( KOKKOS_ENABLE_PTHREAD )
|
||||
|
||||
typedef Kokkos::Threads TestHostDevice ;
|
||||
const char TestHostDeviceName[] = "Kokkos::Threads" ;
|
||||
|
||||
#elif defined( KOKKOS_ENABLE_SERIAL )
|
||||
|
||||
typedef Kokkos::Serial TestHostDevice ;
|
||||
const char TestHostDeviceName[] = "Kokkos::Serial" ;
|
||||
|
||||
#else
|
||||
# error "You must enable at least one of the following execution spaces in order to build this test: Kokkos::Threads, Kokkos::OpenMP, or Kokkos::Serial."
|
||||
#endif
|
||||
|
||||
#include <impl/Kokkos_Timer.hpp>
|
||||
|
||||
#include <PerfTestMDRange.hpp>
|
||||
|
||||
#include <PerfTestHexGrad.hpp>
|
||||
#include <PerfTestBlasKernels.hpp>
|
||||
#include <PerfTestGramSchmidt.hpp>
|
||||
#include <PerfTestDriver.hpp>
|
||||
|
||||
//------------------------------------------------------------------------
|
||||
|
||||
namespace Test {
|
||||
|
||||
class host : public ::testing::Test {
|
||||
protected:
|
||||
static void SetUpTestCase()
|
||||
{
|
||||
if(Kokkos::hwloc::available()) {
|
||||
const unsigned numa_count = Kokkos::hwloc::get_available_numa_count();
|
||||
const unsigned cores_per_numa = Kokkos::hwloc::get_available_cores_per_numa();
|
||||
const unsigned threads_per_core = Kokkos::hwloc::get_available_threads_per_core();
|
||||
|
||||
unsigned threads_count = 0 ;
|
||||
|
||||
threads_count = std::max( 1u , numa_count )
|
||||
* std::max( 2u , cores_per_numa * threads_per_core );
|
||||
|
||||
TestHostDevice::initialize( threads_count );
|
||||
} else {
|
||||
const unsigned thread_count = 4 ;
|
||||
TestHostDevice::initialize( thread_count );
|
||||
}
|
||||
}
|
||||
|
||||
static void TearDownTestCase()
|
||||
{
|
||||
TestHostDevice::finalize();
|
||||
}
|
||||
};
|
||||
|
||||
//TEST_F( host, mdrange_lr ) {
|
||||
// EXPECT_NO_THROW( (run_test_mdrange<TestHostDevice , Kokkos::LayoutRight> (5, 8, TestHostDeviceName) ) );
|
||||
//}
|
||||
|
||||
//TEST_F( host, mdrange_ll ) {
|
||||
// EXPECT_NO_THROW( (run_test_mdrange<TestHostDevice , Kokkos::LayoutLeft> (5, 8, TestHostDeviceName) ) );
|
||||
//}
|
||||
|
||||
TEST_F( host, hexgrad ) {
|
||||
EXPECT_NO_THROW(run_test_hexgrad< TestHostDevice>( 10, 20, TestHostDeviceName ));
|
||||
}
|
||||
|
||||
TEST_F( host, gramschmidt ) {
|
||||
EXPECT_NO_THROW(run_test_gramschmidt< TestHostDevice>( 10, 20, TestHostDeviceName ));
|
||||
}
|
||||
|
||||
} // namespace Test
|
||||
|
||||
|
|
@ -42,8 +42,37 @@
|
|||
*/
|
||||
|
||||
#include <gtest/gtest.h>
|
||||
#include <Kokkos_Core.hpp>
|
||||
|
||||
namespace Test {
|
||||
int command_line_num_args(int n = 0) {
|
||||
static int n_args = 0;
|
||||
if(n>0)
|
||||
n_args = n;
|
||||
return n_args;
|
||||
}
|
||||
|
||||
const char* command_line_arg(int k, char** input_args = NULL) {
|
||||
static char** args;
|
||||
if(input_args != NULL)
|
||||
args = input_args;
|
||||
if(command_line_num_args() > k)
|
||||
return args[k];
|
||||
else
|
||||
return NULL;
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
int main(int argc, char *argv[]) {
|
||||
::testing::InitGoogleTest(&argc,argv);
|
||||
return RUN_ALL_TESTS();
|
||||
Kokkos::initialize(argc,argv);
|
||||
|
||||
(void) Test::command_line_num_args(argc);
|
||||
(void) Test::command_line_arg(0,argv);
|
||||
|
||||
int result = RUN_ALL_TESTS();
|
||||
|
||||
Kokkos::finalize();
|
||||
return result;
|
||||
}
|
||||
|
|
|
@ -41,16 +41,28 @@
|
|||
//@HEADER
|
||||
*/
|
||||
|
||||
#include <cuda/TestCuda.hpp>
|
||||
#ifndef KOKKOS_TEST_THREADS_HPP
|
||||
#define KOKKOS_TEST_THREADS_HPP
|
||||
|
||||
#include <gtest/gtest.h>
|
||||
|
||||
namespace Test {
|
||||
|
||||
TEST_F( cuda, view_api_a )
|
||||
{
|
||||
typedef Kokkos::View< const int *, Kokkos::Cuda, Kokkos::MemoryTraits<Kokkos::RandomAccess> > view_texture_managed;
|
||||
typedef Kokkos::View< const int *, Kokkos::Cuda, Kokkos::MemoryTraits<Kokkos::RandomAccess | Kokkos::Unmanaged> > view_texture_unmanaged;
|
||||
extern int command_line_num_args(int n = 0);
|
||||
extern const char* command_line_arg(int k, char** input_args = NULL);
|
||||
|
||||
TestViewAPI< double, Kokkos::Cuda >();
|
||||
}
|
||||
class default_exec : public ::testing::Test {
|
||||
protected:
|
||||
static void SetUpTestCase() {
|
||||
}
|
||||
|
||||
static void TearDownTestCase() {
|
||||
}
|
||||
};
|
||||
|
||||
} // namespace Test
|
||||
|
||||
#define TEST_CATEGORY default_exec
|
||||
#define TEST_EXECSPACE Kokkos::DefaultExecutionSpace
|
||||
|
||||
#endif
|
|
@ -0,0 +1,115 @@
|
|||
/*
|
||||
//@HEADER
|
||||
// ************************************************************************
|
||||
//
|
||||
// Kokkos v. 2.0
|
||||
// Copyright (2014) Sandia Corporation
|
||||
//
|
||||
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
|
||||
// the U.S. Government retains certain rights in this software.
|
||||
//
|
||||
// Redistribution and use in source and binary forms, with or without
|
||||
// modification, are permitted provided that the following conditions are
|
||||
// met:
|
||||
//
|
||||
// 1. Redistributions of source code must retain the above copyright
|
||||
// notice, this list of conditions and the following disclaimer.
|
||||
//
|
||||
// 2. Redistributions in binary form must reproduce the above copyright
|
||||
// notice, this list of conditions and the following disclaimer in the
|
||||
// documentation and/or other materials provided with the distribution.
|
||||
//
|
||||
// 3. Neither the name of the Corporation nor the names of the
|
||||
// contributors may be used to endorse or promote products derived from
|
||||
// this software without specific prior written permission.
|
||||
//
|
||||
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
|
||||
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
||||
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
|
||||
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
|
||||
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
//
|
||||
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
|
||||
//
|
||||
// ************************************************************************
|
||||
//@HEADER
|
||||
*/
|
||||
|
||||
#include <Kokkos_Core.hpp>
|
||||
#include <gtest/gtest.h>
|
||||
#include <PerfTest_Category.hpp>
|
||||
#include <Kokkos_Random.hpp>
|
||||
|
||||
#ifdef KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA
|
||||
namespace Test {
|
||||
template<class Scalar>
|
||||
void custom_reduction_test(int N, int R, int num_trials) {
|
||||
Kokkos::Random_XorShift64_Pool<> rand_pool(183291);
|
||||
Kokkos::View<Scalar*> a("A",N);
|
||||
Kokkos::fill_random(a,rand_pool,1.0);
|
||||
|
||||
Scalar max;
|
||||
|
||||
// Warm up
|
||||
Kokkos::parallel_reduce(Kokkos::TeamPolicy<>(N/1024,32), KOKKOS_LAMBDA( const Kokkos::TeamPolicy<>::member_type& team, Scalar& lmax) {
|
||||
Scalar team_max = Scalar(0);
|
||||
for(int rr = 0; rr<R; rr++) {
|
||||
int i = team.league_rank();
|
||||
Kokkos::parallel_reduce(Kokkos::TeamThreadRange(team,32), [&] (const int& j, Scalar& thread_max) {
|
||||
Scalar t_max = Scalar(0);
|
||||
Kokkos::parallel_reduce(Kokkos::ThreadVectorRange(team,32), [&] (const int& k, Scalar& max_) {
|
||||
const Scalar val = a((i*32 + j)*32 + k);
|
||||
if(val>lmax) lmax = val;
|
||||
if((k == 11) && (j==17) && (i==2)) lmax = 11.5;
|
||||
},Kokkos::Experimental::Max<Scalar>(t_max));
|
||||
if(t_max>thread_max) thread_max = t_max;
|
||||
},Kokkos::Experimental::Max<Scalar>(team_max));
|
||||
}
|
||||
if(team_max>lmax) lmax = team_max;
|
||||
},Kokkos::Experimental::Max<Scalar>(max));
|
||||
|
||||
// Timing
|
||||
Kokkos::Timer timer;
|
||||
for(int r = 0; r<num_trials; r++) {
|
||||
Kokkos::parallel_reduce(Kokkos::TeamPolicy<>(N/1024,32), KOKKOS_LAMBDA( const Kokkos::TeamPolicy<>::member_type& team, Scalar& lmax) {
|
||||
Scalar team_max = Scalar(0);
|
||||
for(int rr = 0; rr<R; rr++) {
|
||||
int i = team.league_rank();
|
||||
Kokkos::parallel_reduce(Kokkos::TeamThreadRange(team,32), [&] (const int& j, Scalar& thread_max) {
|
||||
Scalar t_max = Scalar(0);
|
||||
Kokkos::parallel_reduce(Kokkos::ThreadVectorRange(team,32), [&] (const int& k, Scalar& max_) {
|
||||
const Scalar val = a((i*32 + j)*32 + k);
|
||||
if(val>lmax) lmax = val;
|
||||
if((k == 11) && (j==17) && (i==2)) lmax = 11.5;
|
||||
},Kokkos::Experimental::Max<Scalar>(t_max));
|
||||
if(t_max>thread_max) thread_max = t_max;
|
||||
},Kokkos::Experimental::Max<Scalar>(team_max));
|
||||
}
|
||||
if(team_max>lmax) lmax = team_max;
|
||||
},Kokkos::Experimental::Max<Scalar>(max));
|
||||
}
|
||||
double time = timer.seconds();
|
||||
printf("%e %e %e\n",time,1.0*N*R*num_trials*sizeof(Scalar)/time/1024/1024/1024,max);
|
||||
}
|
||||
|
||||
TEST_F( default_exec, custom_reduction ) {
|
||||
int N = 100000;
|
||||
int R = 1000;
|
||||
int num_trials = 1;
|
||||
|
||||
if(command_line_num_args()>1)
|
||||
N = atoi(command_line_arg(1));
|
||||
if(command_line_num_args()>2)
|
||||
R = atoi(command_line_arg(2));
|
||||
if(command_line_num_args()>3)
|
||||
num_trials = atoi(command_line_arg(3));
|
||||
custom_reduction_test<double>(N,R,num_trials);
|
||||
}
|
||||
}
|
||||
#endif
|
|
@ -0,0 +1,25 @@
|
|||
#!/bin/bash -e
|
||||
NT=$1
|
||||
PROG="./KokkosCore_PerformanceTest_Mempool"
|
||||
COMMON_ARGS="--kokkos-threads=$NT --fill_stride=1 --fill_level=70 --chunk_span=5 --repeat_inner=100"
|
||||
|
||||
postproc() {
|
||||
cat log | head -n 1 | rev | cut -d ' ' -f 1 | rev >> xvals
|
||||
cat log | tail -n 1 | rev | cut -d ' ' -f 1 | rev >> yvals
|
||||
}
|
||||
|
||||
for yset in 1 2 3
|
||||
do
|
||||
rm -f xvals yvals
|
||||
for x in 1 2 4 8 16 32
|
||||
do
|
||||
echo "yset $yset x factor $x"
|
||||
$PROG $COMMON_ARGS --alloc_size=`expr $x \* 1000000` --super_size=`expr $x \* 100000` > log
|
||||
postproc
|
||||
done
|
||||
rm -f yvals$yset
|
||||
mv yvals yvals$yset
|
||||
done
|
||||
|
||||
rm -f datapoints
|
||||
paste -d',' xvals yvals1 yvals2 yvals3 > datapoints
|
|
@ -0,0 +1,21 @@
|
|||
#!/bin/bash -e
|
||||
NT=$1
|
||||
PROG="./KokkosCore_PerformanceTest_Mempool"
|
||||
COMMON_ARGS="--kokkos-threads=$NT --fill_stride=1 --alloc_size=10027008 --super_size=65536 --repeat_inner=100 --chunk_span=4 --repeat_outer=10"
|
||||
|
||||
postproc() {
|
||||
cat log | grep "fill ops per second" | rev | cut -d ' ' -f 2 | rev >> yvals_fill
|
||||
cat log | grep "cycle ops per second" | rev | cut -d ' ' -f 2 | rev >> yvals_cycle
|
||||
}
|
||||
|
||||
rm -f xvals yvals_fill yvals_cycle
|
||||
for x in 75 95
|
||||
do
|
||||
echo "test fill level $x"
|
||||
echo $x >> xvals
|
||||
$PROG $COMMON_ARGS --fill_level=$x 2>&1 | tee log
|
||||
postproc
|
||||
done
|
||||
|
||||
rm -f datapoints
|
||||
paste xvals yvals_fill yvals_cycle > datapoints.txt
|
|
@ -0,0 +1,21 @@
|
|||
#!/bin/bash -e
|
||||
NT=$1
|
||||
PROG="./KokkosCore_PerformanceTest_TaskDAG"
|
||||
COMMON_ARGS="--kokkos-threads=$NT --alloc_size=10027008 --super_size=65536 --repeat_outer=10"
|
||||
|
||||
postproc() {
|
||||
cat log | grep "tasks per second" | rev | cut -d ' ' -f 2 | rev >> yvals
|
||||
}
|
||||
|
||||
rm -f xvals yvals
|
||||
for x in 21 23
|
||||
do
|
||||
echo "test input $x"
|
||||
echo $x >> xvals
|
||||
$PROG $COMMON_ARGS --input=$x 2>&1 | tee log
|
||||
postproc
|
||||
done
|
||||
|
||||
rm -f datapoints.txt
|
||||
paste xvals yvals > datapoints.txt
|
||||
|
|
@ -0,0 +1,357 @@
|
|||
/*
|
||||
//@HEADER
|
||||
// ************************************************************************
|
||||
//
|
||||
// Kokkos v. 2.0
|
||||
// Copyright (2014) Sandia Corporation
|
||||
//
|
||||
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
|
||||
// the U.S. Government retains certain rights in this software.
|
||||
//
|
||||
// Redistribution and use in source and binary forms, with or without
|
||||
// modification, are permitted provided that the following conditions are
|
||||
// met:
|
||||
//
|
||||
// 1. Redistributions of source code must retain the above copyright
|
||||
// notice, this list of conditions and the following disclaimer.
|
||||
//
|
||||
// 2. Redistributions in binary form must reproduce the above copyright
|
||||
// notice, this list of conditions and the following disclaimer in the
|
||||
// documentation and/or other materials provided with the distribution.
|
||||
//
|
||||
// 3. Neither the name of the Corporation nor the names of the
|
||||
// contributors may be used to endorse or promote products derived from
|
||||
// this software without specific prior written permission.
|
||||
//
|
||||
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
|
||||
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
||||
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
|
||||
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
|
||||
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
//
|
||||
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
|
||||
//
|
||||
// ************************************************************************
|
||||
//@HEADER
|
||||
*/
|
||||
|
||||
#include <cstdio>
|
||||
#include <cstring>
|
||||
#include <cstdlib>
|
||||
#include <limits>
|
||||
|
||||
#include <Kokkos_Core.hpp>
|
||||
#include <impl/Kokkos_Timer.hpp>
|
||||
|
||||
using ExecSpace = Kokkos::DefaultExecutionSpace ;
|
||||
using MemorySpace = Kokkos::DefaultExecutionSpace::memory_space ;
|
||||
|
||||
using MemoryPool = Kokkos::MemoryPool< ExecSpace > ;
|
||||
|
||||
struct TestFunctor {
|
||||
|
||||
typedef Kokkos::View< uintptr_t * , ExecSpace > ptrs_type ;
|
||||
|
||||
enum : unsigned { chunk = 32 };
|
||||
|
||||
MemoryPool pool ;
|
||||
ptrs_type ptrs ;
|
||||
unsigned chunk_span ;
|
||||
unsigned fill_stride ;
|
||||
unsigned range_iter ;
|
||||
unsigned repeat_inner ;
|
||||
|
||||
TestFunctor( size_t total_alloc_size
|
||||
, unsigned min_superblock_size
|
||||
, unsigned number_alloc
|
||||
, unsigned arg_stride_alloc
|
||||
, unsigned arg_chunk_span
|
||||
, unsigned arg_repeat )
|
||||
: pool()
|
||||
, ptrs()
|
||||
, chunk_span(0)
|
||||
, fill_stride(0)
|
||||
, repeat_inner(0)
|
||||
{
|
||||
MemorySpace m ;
|
||||
|
||||
const unsigned min_block_size = chunk ;
|
||||
const unsigned max_block_size = chunk * arg_chunk_span ;
|
||||
pool = MemoryPool( m , total_alloc_size
|
||||
, min_block_size
|
||||
, max_block_size
|
||||
, min_superblock_size );
|
||||
|
||||
ptrs = ptrs_type( Kokkos::view_alloc( m , "ptrs") , number_alloc );
|
||||
fill_stride = arg_stride_alloc ;
|
||||
chunk_span = arg_chunk_span ;
|
||||
range_iter = fill_stride * number_alloc ;
|
||||
repeat_inner = arg_repeat ;
|
||||
}
|
||||
|
||||
//----------------------------------------
|
||||
|
||||
typedef long value_type ;
|
||||
|
||||
//----------------------------------------
|
||||
|
||||
struct TagFill {};
|
||||
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
void operator()( TagFill , int i , value_type & update ) const noexcept
|
||||
{
|
||||
if ( 0 == i % fill_stride ) {
|
||||
|
||||
const int j = i / fill_stride ;
|
||||
|
||||
const unsigned size_alloc = chunk * ( 1 + ( j % chunk_span ) );
|
||||
|
||||
ptrs(j) = (uintptr_t) pool.allocate(size_alloc);
|
||||
|
||||
if ( ptrs(j) ) ++update ;
|
||||
}
|
||||
}
|
||||
|
||||
bool test_fill()
|
||||
{
|
||||
typedef Kokkos::RangePolicy< ExecSpace , TagFill > policy ;
|
||||
|
||||
long result = 0 ;
|
||||
|
||||
Kokkos::parallel_reduce( policy(0,range_iter), *this , result );
|
||||
|
||||
if ( result == long(ptrs.extent(0)) ) return true;
|
||||
pool.print_state( std::cerr );
|
||||
return false;
|
||||
}
|
||||
|
||||
//----------------------------------------
|
||||
|
||||
struct TagDel {};
|
||||
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
void operator()( TagDel , int i ) const noexcept
|
||||
{
|
||||
if ( 0 == i % fill_stride ) {
|
||||
|
||||
const int j = i / fill_stride ;
|
||||
|
||||
const unsigned size_alloc = chunk * ( 1 + ( j % chunk_span ) );
|
||||
|
||||
pool.deallocate( (void*) ptrs(j) , size_alloc );
|
||||
}
|
||||
}
|
||||
|
||||
void test_del()
|
||||
{
|
||||
typedef Kokkos::RangePolicy< ExecSpace , TagDel > policy ;
|
||||
|
||||
Kokkos::parallel_for( policy(0,range_iter), *this );
|
||||
}
|
||||
|
||||
//----------------------------------------
|
||||
|
||||
struct TagAllocDealloc {};
|
||||
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
void operator()( TagAllocDealloc , int i , long & update ) const noexcept
|
||||
{
|
||||
if ( 0 == i % fill_stride ) {
|
||||
|
||||
const int j = i / fill_stride ;
|
||||
|
||||
if ( 0 == j % 3 ) {
|
||||
|
||||
for ( unsigned k = 0 ; k < repeat_inner ; ++k ) {
|
||||
|
||||
const unsigned size_alloc = chunk * ( 1 + ( j % chunk_span ) );
|
||||
|
||||
pool.deallocate( (void*) ptrs(j) , size_alloc );
|
||||
|
||||
ptrs(j) = (uintptr_t) pool.allocate(size_alloc);
|
||||
|
||||
if ( 0 == ptrs(j) ) update++ ;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
bool test_alloc_dealloc()
|
||||
{
|
||||
typedef Kokkos::RangePolicy< ExecSpace , TagAllocDealloc > policy ;
|
||||
|
||||
long error_count = 0 ;
|
||||
|
||||
Kokkos::parallel_reduce( policy(0,range_iter), *this , error_count );
|
||||
|
||||
return 0 == error_count ;
|
||||
}
|
||||
};
|
||||
|
||||
|
||||
|
||||
int main( int argc , char* argv[] )
|
||||
{
|
||||
static const char help_flag[] = "--help" ;
|
||||
static const char alloc_size_flag[] = "--alloc_size=" ;
|
||||
static const char super_size_flag[] = "--super_size=" ;
|
||||
static const char chunk_span_flag[] = "--chunk_span=" ;
|
||||
static const char fill_stride_flag[] = "--fill_stride=" ;
|
||||
static const char fill_level_flag[] = "--fill_level=" ;
|
||||
static const char repeat_outer_flag[] = "--repeat_outer=" ;
|
||||
static const char repeat_inner_flag[] = "--repeat_inner=" ;
|
||||
|
||||
long total_alloc_size = 1000000 ;
|
||||
int min_superblock_size = 10000 ;
|
||||
int chunk_span = 5 ;
|
||||
int fill_stride = 1 ;
|
||||
int fill_level = 70 ;
|
||||
int repeat_outer = 1 ;
|
||||
int repeat_inner = 1 ;
|
||||
|
||||
int ask_help = 0 ;
|
||||
|
||||
for(int i=1;i<argc;i++)
|
||||
{
|
||||
const char * const a = argv[i];
|
||||
|
||||
if ( ! strncmp(a,help_flag,strlen(help_flag) ) ) ask_help = 1 ;
|
||||
|
||||
if ( ! strncmp(a,alloc_size_flag,strlen(alloc_size_flag) ) )
|
||||
total_alloc_size = atol( a + strlen(alloc_size_flag) );
|
||||
|
||||
if ( ! strncmp(a,super_size_flag,strlen(super_size_flag) ) )
|
||||
min_superblock_size = atoi( a + strlen(super_size_flag) );
|
||||
|
||||
if ( ! strncmp(a,fill_stride_flag,strlen(fill_stride_flag) ) )
|
||||
fill_stride = atoi( a + strlen(fill_stride_flag) );
|
||||
|
||||
if ( ! strncmp(a,fill_level_flag,strlen(fill_level_flag) ) )
|
||||
fill_level = atoi( a + strlen(fill_level_flag) );
|
||||
|
||||
if ( ! strncmp(a,chunk_span_flag,strlen(chunk_span_flag) ) )
|
||||
chunk_span = atoi( a + strlen(chunk_span_flag) );
|
||||
|
||||
if ( ! strncmp(a,repeat_outer_flag,strlen(repeat_outer_flag) ) )
|
||||
repeat_outer = atoi( a + strlen(repeat_outer_flag) );
|
||||
|
||||
if ( ! strncmp(a,repeat_inner_flag,strlen(repeat_inner_flag) ) )
|
||||
repeat_inner = atoi( a + strlen(repeat_inner_flag) );
|
||||
}
|
||||
|
||||
int chunk_span_bytes = 0;
|
||||
for (int i = 0; i < chunk_span; ++i) {
|
||||
auto chunk_bytes = TestFunctor::chunk * ( 1 + i );
|
||||
if (chunk_bytes < 64) chunk_bytes = 64;
|
||||
auto block_bytes_lg2 = Kokkos::Impl::integral_power_of_two_that_contains( chunk_bytes );
|
||||
auto block_bytes = (1 << block_bytes_lg2);
|
||||
chunk_span_bytes += block_bytes;
|
||||
}
|
||||
auto actual_superblock_bytes_lg2 = Kokkos::Impl::integral_power_of_two_that_contains( min_superblock_size );
|
||||
auto actual_superblock_bytes = (1 << actual_superblock_bytes_lg2);
|
||||
auto superblock_mask = actual_superblock_bytes - 1;
|
||||
auto nsuperblocks = (total_alloc_size + superblock_mask) >> actual_superblock_bytes_lg2;
|
||||
auto actual_total_bytes = nsuperblocks * actual_superblock_bytes;
|
||||
auto bytes_wanted = (actual_total_bytes * fill_level) / 100;
|
||||
auto chunk_spans = bytes_wanted / chunk_span_bytes;
|
||||
auto number_alloc = int( chunk_spans * chunk_span );
|
||||
|
||||
if ( ask_help ) {
|
||||
std::cout << "command line options:"
|
||||
<< " " << help_flag
|
||||
<< " " << alloc_size_flag << "##"
|
||||
<< " " << super_size_flag << "##"
|
||||
<< " " << fill_stride_flag << "##"
|
||||
<< " " << fill_level_flag << "##"
|
||||
<< " " << chunk_span_flag << "##"
|
||||
<< " " << repeat_outer_flag << "##"
|
||||
<< " " << repeat_inner_flag << "##"
|
||||
<< std::endl ;
|
||||
return 0;
|
||||
}
|
||||
|
||||
Kokkos::initialize(argc,argv);
|
||||
|
||||
double sum_fill_time = 0;
|
||||
double sum_cycle_time = 0;
|
||||
double sum_both_time = 0;
|
||||
double min_fill_time = std::numeric_limits<double>::max();
|
||||
double min_cycle_time = std::numeric_limits<double>::max();
|
||||
double min_both_time = std::numeric_limits<double>::max();
|
||||
//one alloc in fill, alloc/dealloc pair in repeat_inner
|
||||
for ( int i = 0 ; i < repeat_outer ; ++i ) {
|
||||
|
||||
TestFunctor functor( total_alloc_size
|
||||
, min_superblock_size
|
||||
, number_alloc
|
||||
, fill_stride
|
||||
, chunk_span
|
||||
, repeat_inner );
|
||||
|
||||
Kokkos::Impl::Timer timer ;
|
||||
|
||||
if ( ! functor.test_fill() ) {
|
||||
Kokkos::abort("fill ");
|
||||
}
|
||||
|
||||
auto t0 = timer.seconds();
|
||||
|
||||
if ( ! functor.test_alloc_dealloc() ) {
|
||||
Kokkos::abort("alloc/dealloc ");
|
||||
}
|
||||
|
||||
auto t1 = timer.seconds();
|
||||
auto this_fill_time = t0;
|
||||
auto this_cycle_time = t1 - t0;
|
||||
auto this_both_time = t1;
|
||||
sum_fill_time += this_fill_time;
|
||||
sum_cycle_time += this_cycle_time;
|
||||
sum_both_time += this_both_time;
|
||||
min_fill_time = std::min(min_fill_time, this_fill_time);
|
||||
min_cycle_time = std::min(min_cycle_time, this_cycle_time);
|
||||
min_both_time = std::min(min_both_time, this_both_time);
|
||||
}
|
||||
|
||||
Kokkos::finalize();
|
||||
|
||||
printf( "\"mempool: alloc super stride level span inner outer number\" %ld %d %d %d %d %d %d %d\n"
|
||||
, total_alloc_size
|
||||
, min_superblock_size
|
||||
, fill_stride
|
||||
, fill_level
|
||||
, chunk_span
|
||||
, repeat_inner
|
||||
, repeat_outer
|
||||
, number_alloc );
|
||||
|
||||
auto avg_fill_time = sum_fill_time / repeat_outer;
|
||||
auto avg_cycle_time = sum_cycle_time / repeat_outer;
|
||||
auto avg_both_time = sum_both_time / repeat_outer;
|
||||
|
||||
printf( "\"mempool: fill time (min, avg)\" %.8f %.8f\n"
|
||||
, min_fill_time
|
||||
, avg_fill_time );
|
||||
|
||||
printf( "\"mempool: cycle time (min, avg)\" %.8f %.8f\n"
|
||||
, min_cycle_time
|
||||
, avg_cycle_time );
|
||||
|
||||
printf( "\"mempool: test time (min, avg)\" %.8f %.8f\n"
|
||||
, min_both_time
|
||||
, avg_both_time );
|
||||
|
||||
printf( "\"mempool: fill ops per second (max, avg)\" %g %g\n"
|
||||
, number_alloc / min_fill_time
|
||||
, number_alloc / avg_fill_time );
|
||||
|
||||
printf( "\"mempool: cycle ops per second (max, avg)\" %g %g\n"
|
||||
, (2 * number_alloc * repeat_inner) / min_cycle_time
|
||||
, (2 * number_alloc * repeat_inner) / avg_cycle_time );
|
||||
}
|
||||
|
|
@ -0,0 +1,284 @@
|
|||
/*
|
||||
//@HEADER
|
||||
// ************************************************************************
|
||||
//
|
||||
// Kokkos v. 2.0
|
||||
// Copyright (2014) Sandia Corporation
|
||||
//
|
||||
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
|
||||
// the U.S. Government retains certain rights in this software.
|
||||
//
|
||||
// Redistribution and use in source and binary forms, with or without
|
||||
// modification, are permitted provided that the following conditions are
|
||||
// met:
|
||||
//
|
||||
// 1. Redistributions of source code must retain the above copyright
|
||||
// notice, this list of conditions and the following disclaimer.
|
||||
//
|
||||
// 2. Redistributions in binary form must reproduce the above copyright
|
||||
// notice, this list of conditions and the following disclaimer in the
|
||||
// documentation and/or other materials provided with the distribution.
|
||||
//
|
||||
// 3. Neither the name of the Corporation nor the names of the
|
||||
// contributors may be used to endorse or promote products derived from
|
||||
// this software without specific prior written permission.
|
||||
//
|
||||
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
|
||||
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
||||
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
|
||||
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
|
||||
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
//
|
||||
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
|
||||
//
|
||||
// ************************************************************************
|
||||
//@HEADER
|
||||
*/
|
||||
|
||||
#include <Kokkos_Core.hpp>
|
||||
|
||||
#if ! defined( KOKKOS_ENABLE_TASKDAG ) || \
|
||||
defined( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_THREADS )
|
||||
|
||||
int main()
|
||||
{
|
||||
return 0 ;
|
||||
}
|
||||
|
||||
#else
|
||||
|
||||
#include <cstdio>
|
||||
#include <cstring>
|
||||
#include <cstdlib>
|
||||
#include <limits>
|
||||
|
||||
#include <impl/Kokkos_Timer.hpp>
|
||||
|
||||
using ExecSpace = Kokkos::DefaultExecutionSpace ;
|
||||
|
||||
inline
|
||||
long eval_fib( long n )
|
||||
{
|
||||
constexpr long mask = 0x03;
|
||||
|
||||
long fib[4] = { 0, 1, 0, 0 };
|
||||
|
||||
for ( long i = 2; i <= n; ++i ) {
|
||||
fib[ i & mask ] = fib[ ( i - 1 ) & mask ] + fib[ ( i - 2 ) & mask ];
|
||||
}
|
||||
|
||||
return fib[ n & mask ];
|
||||
}
|
||||
|
||||
inline
|
||||
long fib_alloc_count( long n )
|
||||
{
|
||||
constexpr long mask = 0x03;
|
||||
|
||||
long count[4] = { 1, 1, 0, 0 };
|
||||
|
||||
for ( long i = 2; i <= n; ++i ) {
|
||||
count[ i & mask ] = 2 // this task plus the 'when_all' task
|
||||
+ count[ ( i - 1 ) & mask ]
|
||||
+ count[ ( i - 2 ) & mask ];
|
||||
}
|
||||
|
||||
return count[ n & mask ];
|
||||
}
|
||||
|
||||
template< class Space >
|
||||
struct TestFib {
|
||||
|
||||
using Scheduler = Kokkos::TaskScheduler< Space > ;
|
||||
using MemorySpace = typename Scheduler::memory_space ;
|
||||
using MemberType = typename Scheduler::member_type ;
|
||||
using FutureType = Kokkos::Future< long , Space > ;
|
||||
|
||||
typedef long value_type ;
|
||||
|
||||
Scheduler sched ;
|
||||
FutureType dep[2] ;
|
||||
const value_type n ;
|
||||
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
TestFib( const Scheduler & arg_sched , const value_type arg_n )
|
||||
: sched( arg_sched ), dep{} , n( arg_n ) {}
|
||||
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
void operator()( const MemberType & , value_type & result ) noexcept
|
||||
{
|
||||
if ( n < 2 ) {
|
||||
result = n ;
|
||||
}
|
||||
else if ( ! dep[0].is_null() && ! dep[1].is_null() ) {
|
||||
result = dep[0].get() + dep[1].get();
|
||||
}
|
||||
else {
|
||||
// Spawn new children and respawn myself to sum their results.
|
||||
// Spawn lower value at higher priority as it has a shorter
|
||||
// path to completion.
|
||||
|
||||
dep[1] = Kokkos::task_spawn
|
||||
( Kokkos::TaskSingle( sched, Kokkos::TaskPriority::High )
|
||||
, TestFib( sched, n - 2 ) );
|
||||
|
||||
dep[0] = Kokkos::task_spawn
|
||||
( Kokkos::TaskSingle( sched )
|
||||
, TestFib( sched, n - 1 ) );
|
||||
|
||||
Kokkos::Future< ExecSpace > fib_all = Kokkos::when_all( dep, 2 );
|
||||
|
||||
if ( ! dep[0].is_null() && ! dep[1].is_null() && ! fib_all.is_null() ) {
|
||||
// High priority to retire this branch.
|
||||
Kokkos::respawn( this, fib_all, Kokkos::TaskPriority::High );
|
||||
}
|
||||
else {
|
||||
Kokkos::abort("Failed nested task spawn (allocation)");
|
||||
}
|
||||
}
|
||||
}
|
||||
};
|
||||
|
||||
|
||||
|
||||
int main( int argc , char* argv[] )
|
||||
{
|
||||
static const char help[] = "--help" ;
|
||||
static const char alloc_size[] = "--alloc_size=" ;
|
||||
static const char super_size[] = "--super_size=" ;
|
||||
static const char repeat_outer[] = "--repeat_outer=" ;
|
||||
static const char input_value[] = "--input=" ;
|
||||
|
||||
long total_alloc_size = 1000000 ;
|
||||
int min_superblock_size = 10000 ;
|
||||
int test_repeat_outer = 1 ;
|
||||
int fib_input = 4 ;
|
||||
|
||||
int ask_help = 0 ;
|
||||
|
||||
for(int i=1;i<argc;i++)
|
||||
{
|
||||
const char * const a = argv[i];
|
||||
|
||||
if ( ! strncmp(a,help,strlen(help) ) ) ask_help = 1 ;
|
||||
|
||||
if ( ! strncmp(a,alloc_size,strlen(alloc_size) ) )
|
||||
total_alloc_size = atol( a + strlen(alloc_size) );
|
||||
|
||||
if ( ! strncmp(a,super_size,strlen(super_size) ) )
|
||||
min_superblock_size = atoi( a + strlen(super_size) );
|
||||
|
||||
if ( ! strncmp(a,repeat_outer,strlen(repeat_outer) ) )
|
||||
test_repeat_outer = atoi( a + strlen(repeat_outer) );
|
||||
|
||||
if ( ! strncmp(a,input_value,strlen(input_value) ) )
|
||||
fib_input = atoi( a + strlen(input_value) );
|
||||
}
|
||||
|
||||
const long fib_output = eval_fib( fib_input );
|
||||
const long number_alloc = fib_alloc_count( fib_input );
|
||||
|
||||
const unsigned min_block_size = 32 ;
|
||||
const unsigned max_block_size = 128 ;
|
||||
|
||||
long task_count_max = 0 ;
|
||||
long task_count_accum = 0 ;
|
||||
long test_result = 0 ;
|
||||
|
||||
if ( ask_help ) {
|
||||
std::cout << "command line options:"
|
||||
<< " " << help
|
||||
<< " " << alloc_size << "##"
|
||||
<< " " << super_size << "##"
|
||||
<< " " << input_value << "##"
|
||||
<< " " << repeat_outer << "##"
|
||||
<< std::endl ;
|
||||
return -1;
|
||||
}
|
||||
|
||||
typedef TestFib< ExecSpace > Functor ;
|
||||
|
||||
Kokkos::initialize(argc,argv);
|
||||
|
||||
Functor::Scheduler sched( Functor::MemorySpace()
|
||||
, total_alloc_size
|
||||
, min_block_size
|
||||
, max_block_size
|
||||
, min_superblock_size
|
||||
);
|
||||
|
||||
Functor::FutureType f =
|
||||
Kokkos::host_spawn( Kokkos::TaskSingle( sched )
|
||||
, Functor( sched , fib_input )
|
||||
);
|
||||
|
||||
Kokkos::wait( sched );
|
||||
|
||||
test_result = f.get();
|
||||
|
||||
task_count_max = sched.allocated_task_count_max();
|
||||
task_count_accum = sched.allocated_task_count_accum();
|
||||
|
||||
if ( number_alloc != task_count_accum ) {
|
||||
std::cout << " number_alloc( " << number_alloc << " )"
|
||||
<< " != task_count_accum( " << task_count_accum << " )"
|
||||
<< std::endl ;
|
||||
}
|
||||
|
||||
if ( fib_output != test_result ) {
|
||||
std::cout << " answer( " << fib_output << " )"
|
||||
<< " != result( " << test_result << " )"
|
||||
<< std::endl ;
|
||||
}
|
||||
|
||||
if ( fib_output != test_result || number_alloc != task_count_accum ) {
|
||||
printf(" TEST FAILED\n");
|
||||
return -1;
|
||||
}
|
||||
|
||||
double min_time = std::numeric_limits<double>::max();
|
||||
double time_sum = 0;
|
||||
|
||||
for ( int i = 0 ; i < test_repeat_outer ; ++i ) {
|
||||
Kokkos::Impl::Timer timer ;
|
||||
|
||||
Functor::FutureType ftmp =
|
||||
Kokkos::host_spawn( Kokkos::TaskSingle( sched )
|
||||
, Functor( sched , fib_input )
|
||||
);
|
||||
|
||||
Kokkos::wait( sched );
|
||||
auto this_time = timer.seconds();
|
||||
min_time = std::min(min_time, this_time);
|
||||
time_sum += this_time;
|
||||
}
|
||||
|
||||
auto avg_time = time_sum / test_repeat_outer;
|
||||
|
||||
Kokkos::finalize();
|
||||
|
||||
printf( "\"taskdag: alloc super repeat input output task-accum task-max\" %ld %d %d %d %ld %ld %ld\n"
|
||||
, total_alloc_size
|
||||
, min_superblock_size
|
||||
, test_repeat_outer
|
||||
, fib_input
|
||||
, fib_output
|
||||
, task_count_accum
|
||||
, task_count_max );
|
||||
|
||||
printf( "\"taskdag: time (min, avg)\" %g %g\n", min_time, avg_time);
|
||||
printf( "\"taskdag: tasks per second (max, avg)\" %g %g\n"
|
||||
, number_alloc / min_time
|
||||
, number_alloc / avg_time );
|
||||
|
||||
return 0 ;
|
||||
}
|
||||
|
||||
#endif
|
||||
|
|
@ -44,14 +44,12 @@
|
|||
#ifndef KOKKOS_CUDA_EXP_ITERATE_TILE_HPP
|
||||
#define KOKKOS_CUDA_EXP_ITERATE_TILE_HPP
|
||||
|
||||
#include <Kokkos_Macros.hpp>
|
||||
#if defined( __CUDACC__ ) && defined( KOKKOS_HAVE_CUDA )
|
||||
|
||||
#include <iostream>
|
||||
#include <algorithm>
|
||||
#include <stdio.h>
|
||||
|
||||
#include <Kokkos_Macros.hpp>
|
||||
|
||||
/* only compile this file if CUDA is enabled for Kokkos */
|
||||
#if defined( __CUDACC__ ) && defined( KOKKOS_HAVE_CUDA )
|
||||
#include <cstdio>
|
||||
|
||||
#include <utility>
|
||||
|
||||
|
@ -1298,3 +1296,4 @@ protected:
|
|||
|
||||
#endif
|
||||
#endif
|
||||
|
||||
|
|
|
@ -1,13 +1,13 @@
|
|||
/*
|
||||
//@HEADER
|
||||
// ************************************************************************
|
||||
//
|
||||
//
|
||||
// Kokkos v. 2.0
|
||||
// Copyright (2014) Sandia Corporation
|
||||
//
|
||||
//
|
||||
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
|
||||
// the U.S. Government retains certain rights in this software.
|
||||
//
|
||||
//
|
||||
// Redistribution and use in source and binary forms, with or without
|
||||
// modification, are permitted provided that the following conditions are
|
||||
// met:
|
||||
|
@ -36,7 +36,7 @@
|
|||
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
//
|
||||
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
|
||||
//
|
||||
//
|
||||
// ************************************************************************
|
||||
//@HEADER
|
||||
*/
|
||||
|
@ -45,11 +45,10 @@
|
|||
#define KOKKOS_CUDAEXEC_HPP
|
||||
|
||||
#include <Kokkos_Macros.hpp>
|
||||
|
||||
/* only compile this file if CUDA is enabled for Kokkos */
|
||||
#ifdef KOKKOS_ENABLE_CUDA
|
||||
|
||||
#include <string>
|
||||
#include <cstdint>
|
||||
#include <Kokkos_Parallel.hpp>
|
||||
#include <impl/Kokkos_Error.hpp>
|
||||
#include <Cuda/Kokkos_Cuda_abort.hpp>
|
||||
|
@ -99,6 +98,8 @@ CudaSpace::size_type cuda_internal_maximum_warp_count();
|
|||
CudaSpace::size_type cuda_internal_maximum_grid_count();
|
||||
CudaSpace::size_type cuda_internal_maximum_shared_words();
|
||||
|
||||
CudaSpace::size_type cuda_internal_maximum_concurrent_block_count();
|
||||
|
||||
CudaSpace::size_type * cuda_internal_scratch_flags( const CudaSpace::size_type size );
|
||||
CudaSpace::size_type * cuda_internal_scratch_space( const CudaSpace::size_type size );
|
||||
CudaSpace::size_type * cuda_internal_scratch_unified( const CudaSpace::size_type size );
|
||||
|
@ -146,7 +147,7 @@ Kokkos::Impl::CudaLockArraysStruct kokkos_impl_cuda_lock_arrays ;
|
|||
|
||||
namespace Kokkos {
|
||||
namespace Impl {
|
||||
void* cuda_resize_scratch_space(size_t bytes, bool force_shrink = false);
|
||||
void* cuda_resize_scratch_space(std::int64_t bytes, bool force_shrink = false);
|
||||
}
|
||||
}
|
||||
|
||||
|
@ -319,3 +320,4 @@ struct CudaParallelLaunch< DriverType , false > {
|
|||
#endif /* defined( __CUDACC__ ) */
|
||||
#endif /* defined( KOKKOS_ENABLE_CUDA ) */
|
||||
#endif /* #ifndef KOKKOS_CUDAEXEC_HPP */
|
||||
|
||||
|
|
|
@ -41,16 +41,15 @@
|
|||
//@HEADER
|
||||
*/
|
||||
|
||||
#include <stdlib.h>
|
||||
#include <Kokkos_Macros.hpp>
|
||||
#ifdef KOKKOS_ENABLE_CUDA
|
||||
|
||||
#include <cstdlib>
|
||||
#include <iostream>
|
||||
#include <sstream>
|
||||
#include <stdexcept>
|
||||
#include <algorithm>
|
||||
#include <atomic>
|
||||
#include <Kokkos_Macros.hpp>
|
||||
|
||||
/* only compile this file if CUDA is enabled for Kokkos */
|
||||
#ifdef KOKKOS_ENABLE_CUDA
|
||||
|
||||
#include <Kokkos_Core.hpp>
|
||||
#include <Kokkos_Cuda.hpp>
|
||||
|
@ -890,9 +889,9 @@ void init_lock_arrays_cuda_space() {
|
|||
}
|
||||
}
|
||||
|
||||
void* cuda_resize_scratch_space(size_t bytes, bool force_shrink) {
|
||||
void* cuda_resize_scratch_space(std::int64_t bytes, bool force_shrink) {
|
||||
static void* ptr = NULL;
|
||||
static size_t current_size = 0;
|
||||
static std::int64_t current_size = 0;
|
||||
if(current_size == 0) {
|
||||
current_size = bytes;
|
||||
ptr = Kokkos::kokkos_malloc<Kokkos::CudaSpace>("CudaSpace::ScratchMemory",current_size);
|
||||
|
@ -911,5 +910,7 @@ void* cuda_resize_scratch_space(size_t bytes, bool force_shrink) {
|
|||
|
||||
}
|
||||
}
|
||||
#else
|
||||
void KOKKOS_CORE_SRC_CUDA_CUDASPACE_PREVENT_LINK_ERROR() {}
|
||||
#endif // KOKKOS_ENABLE_CUDA
|
||||
|
||||
|
|
|
@ -1,13 +1,13 @@
|
|||
/*
|
||||
//@HEADER
|
||||
// ************************************************************************
|
||||
//
|
||||
//
|
||||
// Kokkos v. 2.0
|
||||
// Copyright (2014) Sandia Corporation
|
||||
//
|
||||
//
|
||||
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
|
||||
// the U.S. Government retains certain rights in this software.
|
||||
//
|
||||
//
|
||||
// Redistribution and use in source and binary forms, with or without
|
||||
// modification, are permitted provided that the following conditions are
|
||||
// met:
|
||||
|
@ -36,7 +36,7 @@
|
|||
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
//
|
||||
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
|
||||
//
|
||||
//
|
||||
// ************************************************************************
|
||||
//@HEADER
|
||||
*/
|
||||
|
@ -45,8 +45,6 @@
|
|||
#define KOKKOS_CUDA_ALLOCATION_TRACKING_HPP
|
||||
|
||||
#include <Kokkos_Macros.hpp>
|
||||
|
||||
/* only compile this file if CUDA is enabled for Kokkos */
|
||||
#ifdef KOKKOS_ENABLE_CUDA
|
||||
|
||||
#include <impl/Kokkos_Traits.hpp>
|
||||
|
@ -75,7 +73,7 @@ shared_allocation_record( Kokkos::CudaSpace const & arg_space
|
|||
new( functor ) DestructFunctor( arg_destruct );
|
||||
|
||||
record->m_destruct_functor = & shared_allocation_destroy< DestructFunctor > ;
|
||||
|
||||
|
||||
return record ;
|
||||
}
|
||||
|
||||
|
|
|
@ -1,13 +1,13 @@
|
|||
/*
|
||||
//@HEADER
|
||||
// ************************************************************************
|
||||
//
|
||||
//
|
||||
// Kokkos v. 2.0
|
||||
// Copyright (2014) Sandia Corporation
|
||||
//
|
||||
//
|
||||
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
|
||||
// the U.S. Government retains certain rights in this software.
|
||||
//
|
||||
//
|
||||
// Redistribution and use in source and binary forms, with or without
|
||||
// modification, are permitted provided that the following conditions are
|
||||
// met:
|
||||
|
@ -36,7 +36,7 @@
|
|||
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
//
|
||||
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
|
||||
//
|
||||
//
|
||||
// ************************************************************************
|
||||
//@HEADER
|
||||
*/
|
||||
|
@ -45,8 +45,6 @@
|
|||
#define KOKKOS_CUDA_ERROR_HPP
|
||||
|
||||
#include <Kokkos_Macros.hpp>
|
||||
|
||||
/* only compile this file if CUDA is enabled for Kokkos */
|
||||
#ifdef KOKKOS_ENABLE_CUDA
|
||||
|
||||
namespace Kokkos { namespace Impl {
|
||||
|
@ -67,3 +65,4 @@ inline void cuda_internal_safe_call( cudaError e , const char * name, const char
|
|||
|
||||
#endif //KOKKOS_ENABLE_CUDA
|
||||
#endif //KOKKOS_CUDA_ERROR_HPP
|
||||
|
||||
|
|
|
@ -44,11 +44,11 @@
|
|||
/*--------------------------------------------------------------------------*/
|
||||
/* Kokkos interfaces */
|
||||
|
||||
#include <Kokkos_Core.hpp>
|
||||
|
||||
/* only compile this file if CUDA is enabled for Kokkos */
|
||||
#include <Kokkos_Macros.hpp>
|
||||
#ifdef KOKKOS_ENABLE_CUDA
|
||||
|
||||
#include <Kokkos_Core.hpp>
|
||||
|
||||
#include <Cuda/Kokkos_Cuda_Error.hpp>
|
||||
#include <Cuda/Kokkos_Cuda_Internal.hpp>
|
||||
#include <impl/Kokkos_Error.hpp>
|
||||
|
@ -56,7 +56,7 @@
|
|||
|
||||
/*--------------------------------------------------------------------------*/
|
||||
/* Standard 'C' libraries */
|
||||
#include <stdlib.h>
|
||||
#include <cstdlib>
|
||||
|
||||
/* Standard 'C++' libraries */
|
||||
#include <vector>
|
||||
|
@ -404,9 +404,23 @@ void CudaInternal::initialize( int cuda_device_id , int stream_count )
|
|||
// Query what compute capability architecture a kernel executes:
|
||||
m_cudaArch = cuda_kernel_arch();
|
||||
|
||||
if ( m_cudaArch != cudaProp.major * 100 + cudaProp.minor * 10 ) {
|
||||
int compiled_major = m_cudaArch / 100;
|
||||
int compiled_minor = ( m_cudaArch % 100 ) / 10;
|
||||
|
||||
if ( compiled_major < 5 && cudaProp.major >= 5 ) {
|
||||
std::stringstream ss;
|
||||
ss << "Kokkos::Cuda::initialize ERROR: running kernels compiled for compute capability "
|
||||
<< compiled_major << "." << compiled_minor
|
||||
<< " (< 5.0) on device with compute capability "
|
||||
<< cudaProp.major << "." << cudaProp.minor
|
||||
<< " (>=5.0), this would give incorrect results!"
|
||||
<< std::endl ;
|
||||
std::string msg = ss.str();
|
||||
Kokkos::abort( msg.c_str() );
|
||||
}
|
||||
if ( compiled_major != cudaProp.major || compiled_minor != cudaProp.minor ) {
|
||||
std::cerr << "Kokkos::Cuda::initialize WARNING: running kernels compiled for compute capability "
|
||||
<< ( m_cudaArch / 100 ) << "." << ( ( m_cudaArch % 100 ) / 10 )
|
||||
<< compiled_major << "." << compiled_minor
|
||||
<< " on device with compute capability "
|
||||
<< cudaProp.major << "." << cudaProp.minor
|
||||
<< " , this will likely reduce potential performance."
|
||||
|
@ -661,6 +675,15 @@ void CudaInternal::finalize()
|
|||
Cuda::size_type cuda_internal_multiprocessor_count()
|
||||
{ return CudaInternal::singleton().m_multiProcCount ; }
|
||||
|
||||
CudaSpace::size_type cuda_internal_maximum_concurrent_block_count()
|
||||
{
|
||||
// Compute capability 5.0 through 6.2
|
||||
enum : int { max_resident_blocks_per_multiprocessor = 32 };
|
||||
|
||||
return CudaInternal::singleton().m_multiProcCount
|
||||
* max_resident_blocks_per_multiprocessor ;
|
||||
};
|
||||
|
||||
Cuda::size_type cuda_internal_maximum_warp_count()
|
||||
{ return CudaInternal::singleton().m_maxWarpCount ; }
|
||||
|
||||
|
@ -772,8 +795,10 @@ void Cuda::fence()
|
|||
Kokkos::Impl::cuda_device_synchronize();
|
||||
}
|
||||
|
||||
const char* Cuda::name() { return "Cuda"; }
|
||||
|
||||
} // namespace Kokkos
|
||||
|
||||
#else
|
||||
void KOKKOS_CORE_SRC_CUDA_IMPL_PREVENT_LINK_ERROR() {}
|
||||
#endif // KOKKOS_ENABLE_CUDA
|
||||
//----------------------------------------------------------------------------
|
||||
|
||||
|
|
|
@ -1,13 +1,13 @@
|
|||
/*
|
||||
//@HEADER
|
||||
// ************************************************************************
|
||||
//
|
||||
//
|
||||
// Kokkos v. 2.0
|
||||
// Copyright (2014) Sandia Corporation
|
||||
//
|
||||
//
|
||||
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
|
||||
// the U.S. Government retains certain rights in this software.
|
||||
//
|
||||
//
|
||||
// Redistribution and use in source and binary forms, with or without
|
||||
// modification, are permitted provided that the following conditions are
|
||||
// met:
|
||||
|
@ -36,19 +36,18 @@
|
|||
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
//
|
||||
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
|
||||
//
|
||||
//
|
||||
// ************************************************************************
|
||||
//@HEADER
|
||||
*/
|
||||
|
||||
#ifndef KOKKOS_CUDA_INTERNAL_HPP
|
||||
#define KOKKOS_CUDA_INTERNAL_HPP
|
||||
#include<iostream>
|
||||
#include <Kokkos_Macros.hpp>
|
||||
|
||||
/* only compile this file if CUDA is enabled for Kokkos */
|
||||
#include <Kokkos_Macros.hpp>
|
||||
#ifdef KOKKOS_ENABLE_CUDA
|
||||
|
||||
#include<iostream>
|
||||
#include <Cuda/Kokkos_Cuda_Error.hpp>
|
||||
|
||||
namespace Kokkos { namespace Impl {
|
||||
|
|
File diff suppressed because it is too large
Load Diff
|
@ -1,13 +1,13 @@
|
|||
/*
|
||||
//@HEADER
|
||||
// ************************************************************************
|
||||
//
|
||||
//
|
||||
// Kokkos v. 2.0
|
||||
// Copyright (2014) Sandia Corporation
|
||||
//
|
||||
//
|
||||
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
|
||||
// the U.S. Government retains certain rights in this software.
|
||||
//
|
||||
//
|
||||
// Redistribution and use in source and binary forms, with or without
|
||||
// modification, are permitted provided that the following conditions are
|
||||
// met:
|
||||
|
@ -36,7 +36,7 @@
|
|||
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
//
|
||||
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
|
||||
//
|
||||
//
|
||||
// ************************************************************************
|
||||
//@HEADER
|
||||
*/
|
||||
|
@ -45,8 +45,6 @@
|
|||
#define KOKKOS_CUDA_REDUCESCAN_HPP
|
||||
|
||||
#include <Kokkos_Macros.hpp>
|
||||
|
||||
/* only compile this file if CUDA is enabled for Kokkos */
|
||||
#if defined( __CUDACC__ ) && defined( KOKKOS_ENABLE_CUDA )
|
||||
|
||||
#include <utility>
|
||||
|
@ -63,6 +61,7 @@ namespace Kokkos {
|
|||
namespace Impl {
|
||||
|
||||
//----------------------------------------------------------------------------
|
||||
// Shuffle operations require input to be a register (stack) variable
|
||||
|
||||
template< typename T >
|
||||
__device__ inline
|
||||
|
@ -140,77 +139,6 @@ void cuda_shfl_up( T & out , T const & in , int delta ,
|
|||
}
|
||||
}
|
||||
|
||||
//----------------------------------------------------------------------------
|
||||
/** \brief Reduce within a warp over blockDim.x, the "vector" dimension.
|
||||
*
|
||||
* This will be called within a nested, intra-team parallel operation.
|
||||
* Use shuffle operations to avoid conflicts with shared memory usage.
|
||||
*
|
||||
* Requires:
|
||||
* blockDim.x is power of 2
|
||||
* blockDim.x <= 32 (one warp)
|
||||
*
|
||||
* Cannot use "butterfly" pattern because floating point
|
||||
* addition is non-associative. Therefore, must broadcast
|
||||
* the final result.
|
||||
*/
|
||||
template< class Reducer >
|
||||
__device__ inline
|
||||
void cuda_intra_warp_vector_reduce( Reducer const & reducer )
|
||||
{
|
||||
static_assert(
|
||||
std::is_reference< typename Reducer::reference_type >::value , "" );
|
||||
|
||||
if ( 1 < blockDim.x ) {
|
||||
|
||||
typename Reducer::value_type tmp ;
|
||||
|
||||
for ( int i = blockDim.x ; ( i >>= 1 ) ; ) {
|
||||
|
||||
cuda_shfl_down( tmp , reducer.reference() , i , blockDim.x );
|
||||
|
||||
if ( threadIdx.x < i ) { reducer.join( reducer.data() , & tmp ); }
|
||||
}
|
||||
|
||||
// Broadcast from root "lane" to all other "lanes"
|
||||
|
||||
cuda_shfl( reducer.reference() , reducer.reference() , 0 , blockDim.x );
|
||||
}
|
||||
}
|
||||
|
||||
/** \brief Inclusive scan over blockDim.x, the "vector" dimension.
|
||||
*
|
||||
* This will be called within a nested, intra-team parallel operation.
|
||||
* Use shuffle operations to avoid conflicts with shared memory usage.
|
||||
*
|
||||
* Algorithm is concurrent bottom-up reductions in triangular pattern
|
||||
* where each CUDA thread is the root of a reduction tree from the
|
||||
* zeroth CUDA thread to itself.
|
||||
*
|
||||
* Requires:
|
||||
* blockDim.x is power of 2
|
||||
* blockDim.x <= 32 (one warp)
|
||||
*/
|
||||
template< typename ValueType >
|
||||
__device__ inline
|
||||
void cuda_intra_warp_vector_inclusive_scan( ValueType & local )
|
||||
{
|
||||
ValueType tmp ;
|
||||
|
||||
// Bottom up:
|
||||
// [t] += [t-1] if t >= 1
|
||||
// [t] += [t-2] if t >= 2
|
||||
// [t] += [t-4] if t >= 4
|
||||
// ...
|
||||
|
||||
for ( int i = 1 ; i < blockDim.x ; i <<= 1 ) {
|
||||
|
||||
cuda_shfl_up( tmp , local , i , blockDim.x );
|
||||
|
||||
if ( i <= threadIdx.x ) { local += tmp ; }
|
||||
}
|
||||
}
|
||||
|
||||
//----------------------------------------------------------------------------
|
||||
/*
|
||||
* Algorithmic constraints:
|
||||
|
@ -247,12 +175,12 @@ inline void cuda_inter_warp_reduction( ValueType& value,
|
|||
|
||||
#define STEP_WIDTH 4
|
||||
// Depending on the ValueType _shared__ memory must be aligned up to 8byte boundaries
|
||||
// The reason not to use ValueType directly is that for types with constructors it
|
||||
// The reason not to use ValueType directly is that for types with constructors it
|
||||
// could lead to race conditions
|
||||
__shared__ double sh_result[(sizeof(ValueType)+7)/8*STEP_WIDTH];
|
||||
ValueType* result = (ValueType*) & sh_result;
|
||||
const unsigned step = 32 / blockDim.x;
|
||||
unsigned shift = STEP_WIDTH;
|
||||
const int step = 32 / blockDim.x;
|
||||
int shift = STEP_WIDTH;
|
||||
const int id = threadIdx.y%step==0?threadIdx.y/step:65000;
|
||||
if(id < STEP_WIDTH ) {
|
||||
result[id] = value;
|
||||
|
@ -297,7 +225,7 @@ bool cuda_inter_block_reduction( typename FunctorValueTraits< FunctorType , ArgT
|
|||
//Do the intra-block reduction with shfl operations and static shared memory
|
||||
cuda_intra_block_reduction(value,join,max_active_thread);
|
||||
|
||||
const unsigned id = threadIdx.y*blockDim.x + threadIdx.x;
|
||||
const int id = threadIdx.y*blockDim.x + threadIdx.x;
|
||||
|
||||
//One thread in the block writes block result to global scratch_memory
|
||||
if(id == 0 ) {
|
||||
|
@ -329,35 +257,35 @@ bool cuda_inter_block_reduction( typename FunctorValueTraits< FunctorType , ArgT
|
|||
|
||||
//Reduce all global values with splitting work over threads in one warp
|
||||
const int step_size = blockDim.x*blockDim.y < 32 ? blockDim.x*blockDim.y : 32;
|
||||
for(int i=id; i<gridDim.x; i+=step_size) {
|
||||
for(int i=id; i<(int)gridDim.x; i+=step_size) {
|
||||
value_type tmp = global[i];
|
||||
join(value, tmp);
|
||||
}
|
||||
|
||||
//Perform shfl reductions within the warp only join if contribution is valid (allows gridDim.x non power of two and <32)
|
||||
if (blockDim.x*blockDim.y > 1) {
|
||||
if (int(blockDim.x*blockDim.y) > 1) {
|
||||
value_type tmp = Kokkos::shfl_down(value, 1,32);
|
||||
if( id + 1 < gridDim.x )
|
||||
if( id + 1 < int(gridDim.x) )
|
||||
join(value, tmp);
|
||||
}
|
||||
if (blockDim.x*blockDim.y > 2) {
|
||||
if (int(blockDim.x*blockDim.y) > 2) {
|
||||
value_type tmp = Kokkos::shfl_down(value, 2,32);
|
||||
if( id + 2 < gridDim.x )
|
||||
if( id + 2 < int(gridDim.x) )
|
||||
join(value, tmp);
|
||||
}
|
||||
if (blockDim.x*blockDim.y > 4) {
|
||||
if (int(blockDim.x*blockDim.y) > 4) {
|
||||
value_type tmp = Kokkos::shfl_down(value, 4,32);
|
||||
if( id + 4 < gridDim.x )
|
||||
if( id + 4 < int(gridDim.x) )
|
||||
join(value, tmp);
|
||||
}
|
||||
if (blockDim.x*blockDim.y > 8) {
|
||||
if (int(blockDim.x*blockDim.y) > 8) {
|
||||
value_type tmp = Kokkos::shfl_down(value, 8,32);
|
||||
if( id + 8 < gridDim.x )
|
||||
if( id + 8 < int(gridDim.x) )
|
||||
join(value, tmp);
|
||||
}
|
||||
if (blockDim.x*blockDim.y > 16) {
|
||||
if (int(blockDim.x*blockDim.y) > 16) {
|
||||
value_type tmp = Kokkos::shfl_down(value, 16,32);
|
||||
if( id + 16 < gridDim.x )
|
||||
if( id + 16 < int(gridDim.x) )
|
||||
join(value, tmp);
|
||||
}
|
||||
}
|
||||
|
@ -370,6 +298,166 @@ bool cuda_inter_block_reduction( typename FunctorValueTraits< FunctorType , ArgT
|
|||
#endif
|
||||
}
|
||||
|
||||
template< class ReducerType >
|
||||
__device__ inline
|
||||
typename std::enable_if< Kokkos::is_reducer<ReducerType>::value >::type
|
||||
cuda_intra_warp_reduction( const ReducerType& reducer,
|
||||
const int max_active_thread = blockDim.y) {
|
||||
|
||||
typedef typename ReducerType::value_type ValueType;
|
||||
|
||||
unsigned int shift = 1;
|
||||
|
||||
ValueType result = reducer.reference();
|
||||
//Reduce over values from threads with different threadIdx.y
|
||||
while(blockDim.x * shift < 32 ) {
|
||||
const ValueType tmp = shfl_down(result, blockDim.x*shift,32u);
|
||||
//Only join if upper thread is active (this allows non power of two for blockDim.y
|
||||
if(threadIdx.y + shift < max_active_thread)
|
||||
reducer.join(result , tmp);
|
||||
shift*=2;
|
||||
}
|
||||
|
||||
result = shfl(result,0,32);
|
||||
reducer.reference() = result;
|
||||
}
|
||||
|
||||
template< class ReducerType >
|
||||
__device__ inline
|
||||
typename std::enable_if< Kokkos::is_reducer<ReducerType>::value >::type
|
||||
cuda_inter_warp_reduction( const ReducerType& reducer,
|
||||
const int max_active_thread = blockDim.y) {
|
||||
|
||||
typedef typename ReducerType::value_type ValueType;
|
||||
|
||||
#define STEP_WIDTH 4
|
||||
// Depending on the ValueType _shared__ memory must be aligned up to 8byte boundaries
|
||||
// The reason not to use ValueType directly is that for types with constructors it
|
||||
// could lead to race conditions
|
||||
__shared__ double sh_result[(sizeof(ValueType)+7)/8*STEP_WIDTH];
|
||||
ValueType* result = (ValueType*) & sh_result;
|
||||
ValueType value = reducer.reference();
|
||||
const int step = 32 / blockDim.x;
|
||||
int shift = STEP_WIDTH;
|
||||
const int id = threadIdx.y%step==0?threadIdx.y/step:65000;
|
||||
if(id < STEP_WIDTH ) {
|
||||
result[id] = value;
|
||||
}
|
||||
__syncthreads();
|
||||
while (shift<=max_active_thread/step) {
|
||||
if(shift<=id && shift+STEP_WIDTH>id && threadIdx.x==0) {
|
||||
reducer.join(result[id%STEP_WIDTH],value);
|
||||
}
|
||||
__syncthreads();
|
||||
shift+=STEP_WIDTH;
|
||||
}
|
||||
|
||||
|
||||
value = result[0];
|
||||
for(int i = 1; (i*step<max_active_thread) && i<STEP_WIDTH; i++)
|
||||
reducer.join(value,result[i]);
|
||||
|
||||
reducer.reference() = value;
|
||||
}
|
||||
|
||||
template< class ReducerType >
|
||||
__device__ inline
|
||||
typename std::enable_if< Kokkos::is_reducer<ReducerType>::value >::type
|
||||
cuda_intra_block_reduction( const ReducerType& reducer,
|
||||
const int max_active_thread = blockDim.y) {
|
||||
cuda_intra_warp_reduction(reducer,max_active_thread);
|
||||
cuda_inter_warp_reduction(reducer,max_active_thread);
|
||||
}
|
||||
|
||||
template< class ReducerType>
|
||||
__device__ inline
|
||||
typename std::enable_if< Kokkos::is_reducer<ReducerType>::value , bool >::type
|
||||
cuda_inter_block_reduction( const ReducerType& reducer,
|
||||
Cuda::size_type * const m_scratch_space,
|
||||
Cuda::size_type * const m_scratch_flags,
|
||||
const int max_active_thread = blockDim.y) {
|
||||
#ifdef __CUDA_ARCH__
|
||||
typedef typename ReducerType::value_type* pointer_type;
|
||||
typedef typename ReducerType::value_type value_type;
|
||||
|
||||
//Do the intra-block reduction with shfl operations and static shared memory
|
||||
cuda_intra_block_reduction(reducer,max_active_thread);
|
||||
|
||||
value_type value = reducer.reference();
|
||||
|
||||
const int id = threadIdx.y*blockDim.x + threadIdx.x;
|
||||
|
||||
//One thread in the block writes block result to global scratch_memory
|
||||
if(id == 0 ) {
|
||||
pointer_type global = ((pointer_type) m_scratch_space) + blockIdx.x;
|
||||
*global = value;
|
||||
}
|
||||
|
||||
//One warp of last block performs inter block reduction through loading the block values from global scratch_memory
|
||||
bool last_block = false;
|
||||
|
||||
__syncthreads();
|
||||
if ( id < 32 ) {
|
||||
Cuda::size_type count;
|
||||
|
||||
//Figure out whether this is the last block
|
||||
if(id == 0)
|
||||
count = Kokkos::atomic_fetch_add(m_scratch_flags,1);
|
||||
count = Kokkos::shfl(count,0,32);
|
||||
|
||||
//Last block does the inter block reduction
|
||||
if( count == gridDim.x - 1) {
|
||||
//set flag back to zero
|
||||
if(id == 0)
|
||||
*m_scratch_flags = 0;
|
||||
last_block = true;
|
||||
reducer.init(value);
|
||||
|
||||
pointer_type const volatile global = (pointer_type) m_scratch_space ;
|
||||
|
||||
//Reduce all global values with splitting work over threads in one warp
|
||||
const int step_size = blockDim.x*blockDim.y < 32 ? blockDim.x*blockDim.y : 32;
|
||||
for(int i=id; i<(int)gridDim.x; i+=step_size) {
|
||||
value_type tmp = global[i];
|
||||
reducer.join(value, tmp);
|
||||
}
|
||||
|
||||
//Perform shfl reductions within the warp only join if contribution is valid (allows gridDim.x non power of two and <32)
|
||||
if (int(blockDim.x*blockDim.y) > 1) {
|
||||
value_type tmp = Kokkos::shfl_down(value, 1,32);
|
||||
if( id + 1 < int(gridDim.x) )
|
||||
reducer.join(value, tmp);
|
||||
}
|
||||
if (int(blockDim.x*blockDim.y) > 2) {
|
||||
value_type tmp = Kokkos::shfl_down(value, 2,32);
|
||||
if( id + 2 < int(gridDim.x) )
|
||||
reducer.join(value, tmp);
|
||||
}
|
||||
if (int(blockDim.x*blockDim.y) > 4) {
|
||||
value_type tmp = Kokkos::shfl_down(value, 4,32);
|
||||
if( id + 4 < int(gridDim.x) )
|
||||
reducer.join(value, tmp);
|
||||
}
|
||||
if (int(blockDim.x*blockDim.y) > 8) {
|
||||
value_type tmp = Kokkos::shfl_down(value, 8,32);
|
||||
if( id + 8 < int(gridDim.x) )
|
||||
reducer.join(value, tmp);
|
||||
}
|
||||
if (int(blockDim.x*blockDim.y) > 16) {
|
||||
value_type tmp = Kokkos::shfl_down(value, 16,32);
|
||||
if( id + 16 < int(gridDim.x) )
|
||||
reducer.join(value, tmp);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
//The last block has in its thread=0 the global reduction value through "value"
|
||||
return last_block;
|
||||
#else
|
||||
return true;
|
||||
#endif
|
||||
}
|
||||
|
||||
//----------------------------------------------------------------------------
|
||||
// See section B.17 of Cuda C Programming Guide Version 3.2
|
||||
// for discussion of
|
||||
|
@ -529,11 +617,11 @@ bool cuda_single_inter_block_reduce_scan( const FunctorType & functor ,
|
|||
size_type * const shared = shared_data + word_count.value * BlockSizeMask ;
|
||||
size_type * const global = global_data + word_count.value * block_id ;
|
||||
|
||||
#if (__CUDA_ARCH__ < 500)
|
||||
for ( size_type i = threadIdx.y ; i < word_count.value ; i += blockDim.y ) { global[i] = shared[i] ; }
|
||||
#else
|
||||
for ( size_type i = 0 ; i < word_count.value ; i += 1 ) { global[i] = shared[i] ; }
|
||||
#endif
|
||||
//#if (__CUDA_ARCH__ < 500)
|
||||
for ( int i = int(threadIdx.y) ; i < int(word_count.value) ; i += int(blockDim.y) ) { global[i] = shared[i] ; }
|
||||
//#else
|
||||
// for ( size_type i = 0 ; i < word_count.value ; i += 1 ) { global[i] = shared[i] ; }
|
||||
//#endif
|
||||
|
||||
}
|
||||
|
||||
|
|
|
@ -1,13 +1,13 @@
|
|||
/*
|
||||
//@HEADER
|
||||
// ************************************************************************
|
||||
//
|
||||
//
|
||||
// Kokkos v. 2.0
|
||||
// Copyright (2014) Sandia Corporation
|
||||
//
|
||||
//
|
||||
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
|
||||
// the U.S. Government retains certain rights in this software.
|
||||
//
|
||||
//
|
||||
// Redistribution and use in source and binary forms, with or without
|
||||
// modification, are permitted provided that the following conditions are
|
||||
// met:
|
||||
|
@ -36,15 +36,16 @@
|
|||
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
//
|
||||
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
|
||||
//
|
||||
//
|
||||
// ************************************************************************
|
||||
//@HEADER
|
||||
*/
|
||||
|
||||
#include <Kokkos_Core.hpp>
|
||||
|
||||
#include <Kokkos_Macros.hpp>
|
||||
#if defined( KOKKOS_ENABLE_CUDA ) && defined( KOKKOS_ENABLE_TASKDAG )
|
||||
|
||||
#include <Kokkos_Core.hpp>
|
||||
|
||||
#include <impl/Kokkos_TaskQueue_impl.hpp>
|
||||
|
||||
//----------------------------------------------------------------------------
|
||||
|
@ -120,7 +121,7 @@ printf("TaskQueue<Cuda>::driver(%d,%d) task(%lx)\n",threadIdx.z,blockIdx.x
|
|||
}
|
||||
|
||||
if ( 0 == warp_lane ) {
|
||||
queue->complete( task.ptr );
|
||||
queue->complete( task.ptr );
|
||||
}
|
||||
}
|
||||
} while(1);
|
||||
|
@ -157,7 +158,7 @@ printf("cuda_task_queue_execute before\n");
|
|||
// If not large enough then set the stack size, in bytes:
|
||||
//
|
||||
// CUDA_SAFE_CALL( cudaDeviceSetLimit( cudaLimitStackSize , stack_size ) );
|
||||
|
||||
|
||||
cuda_task_queue_execute<<< grid , block , shared , stream >>>( queue );
|
||||
|
||||
CUDA_SAFE_CALL( cudaGetLastError() );
|
||||
|
@ -173,7 +174,7 @@ printf("cuda_task_queue_execute after\n");
|
|||
}} /* namespace Kokkos::Impl */
|
||||
|
||||
//----------------------------------------------------------------------------
|
||||
|
||||
#else
|
||||
void KOKKOS_CORE_SRC_CUDA_KOKKOS_CUDA_TASK_PREVENT_LINK_ERROR() {}
|
||||
#endif /* #if defined( KOKKOS_ENABLE_CUDA ) && defined( KOKKOS_ENABLE_TASKDAG ) */
|
||||
|
||||
|
||||
|
|
|
@ -44,6 +44,7 @@
|
|||
#ifndef KOKKOS_IMPL_CUDA_TASK_HPP
|
||||
#define KOKKOS_IMPL_CUDA_TASK_HPP
|
||||
|
||||
#include <Kokkos_Macros.hpp>
|
||||
#if defined( KOKKOS_ENABLE_TASKDAG )
|
||||
|
||||
//----------------------------------------------------------------------------
|
||||
|
@ -325,7 +326,7 @@ ValueType shfl_warp_broadcast
|
|||
return Kokkos::shfl(val, src_lane, width);
|
||||
}
|
||||
|
||||
// all-reduce across corresponding vector lanes between team members within warp
|
||||
/*// all-reduce across corresponding vector lanes between team members within warp
|
||||
// assume vec_length*team_size == warp_size
|
||||
// blockDim.x == vec_length == stride
|
||||
// blockDim.y == team_size
|
||||
|
@ -351,7 +352,7 @@ void parallel_reduce
|
|||
loop_boundaries.thread.team_size(),
|
||||
blockDim.x);
|
||||
initialized_result = shfl_warp_broadcast<ValueType>( initialized_result, threadIdx.x, Impl::CudaTraits::WarpSize );
|
||||
}
|
||||
}*/
|
||||
|
||||
// all-reduce across corresponding vector lanes between team members within warp
|
||||
// if no join() provided, use sum
|
||||
|
@ -382,13 +383,36 @@ void parallel_reduce
|
|||
initialized_result = shfl_warp_broadcast<ValueType>( initialized_result, threadIdx.x, Impl::CudaTraits::WarpSize );
|
||||
}
|
||||
|
||||
template< typename iType, class Lambda, typename ReducerType >
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
void parallel_reduce
|
||||
(const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::Cuda > >& loop_boundaries,
|
||||
const Lambda & lambda,
|
||||
const ReducerType& reducer) {
|
||||
|
||||
typedef typename ReducerType::value_type ValueType;
|
||||
//TODO what is the point of creating this temporary?
|
||||
ValueType result = ValueType();
|
||||
reducer.init(result);
|
||||
|
||||
for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
|
||||
lambda(i,result);
|
||||
}
|
||||
|
||||
strided_shfl_warp_reduction(
|
||||
[&] (ValueType& val1, const ValueType& val2) { reducer.join(val1,val2); },
|
||||
result,
|
||||
loop_boundaries.thread.team_size(),
|
||||
blockDim.x);
|
||||
reducer.reference() = shfl_warp_broadcast<ValueType>( result, threadIdx.x, Impl::CudaTraits::WarpSize );
|
||||
}
|
||||
// all-reduce within team members within warp
|
||||
// assume vec_length*team_size == warp_size
|
||||
// blockDim.x == vec_length == stride
|
||||
// blockDim.y == team_size
|
||||
// threadIdx.x == position in vec
|
||||
// threadIdx.y == member number
|
||||
template< typename iType, class Lambda, typename ValueType, class JoinType >
|
||||
/*template< typename iType, class Lambda, typename ValueType, class JoinType >
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
void parallel_reduce
|
||||
(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::Cuda > >& loop_boundaries,
|
||||
|
@ -404,7 +428,7 @@ void parallel_reduce
|
|||
|
||||
multi_shfl_warp_reduction<ValueType, JoinType>(join, initialized_result, blockDim.x);
|
||||
initialized_result = shfl_warp_broadcast<ValueType>( initialized_result, 0, blockDim.x );
|
||||
}
|
||||
}*/
|
||||
|
||||
// all-reduce within team members within warp
|
||||
// if no join() provided, use sum
|
||||
|
@ -436,6 +460,28 @@ void parallel_reduce
|
|||
initialized_result = shfl_warp_broadcast<ValueType>( initialized_result, 0, blockDim.x );
|
||||
}
|
||||
|
||||
template< typename iType, class Lambda, typename ReducerType >
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
void parallel_reduce
|
||||
(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::Cuda > >& loop_boundaries,
|
||||
const Lambda & lambda,
|
||||
const ReducerType& reducer) {
|
||||
|
||||
typedef typename ReducerType::value_type ValueType;
|
||||
|
||||
ValueType result = ValueType();
|
||||
reducer.init(result);
|
||||
|
||||
for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
|
||||
lambda(i,result);
|
||||
}
|
||||
|
||||
multi_shfl_warp_reduction(
|
||||
[&] (ValueType& val1, const ValueType& val2) { reducer.join(val1, val2); },
|
||||
result,
|
||||
blockDim.x);
|
||||
reducer.reference() = shfl_warp_broadcast<ValueType>( result, 0, blockDim.x );
|
||||
}
|
||||
// scan across corresponding vector lanes between team members within warp
|
||||
// assume vec_length*team_size == warp_size
|
||||
// blockDim.x == vec_length == stride
|
||||
|
|
|
@ -0,0 +1,982 @@
|
|||
/*
|
||||
//@HEADER
|
||||
// ************************************************************************
|
||||
//
|
||||
// Kokkos v. 2.0
|
||||
// Copyright (2014) Sandia Corporation
|
||||
//
|
||||
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
|
||||
// the U.S. Government retains certain rights in this software.
|
||||
//
|
||||
// Redistribution and use in source and binary forms, with or without
|
||||
// modification, are permitted provided that the following conditions are
|
||||
// met:
|
||||
//
|
||||
// 1. Redistributions of source code must retain the above copyright
|
||||
// notice, this list of conditions and the following disclaimer.
|
||||
//
|
||||
// 2. Redistributions in binary form must reproduce the above copyright
|
||||
// notice, this list of conditions and the following disclaimer in the
|
||||
// documentation and/or other materials provided with the distribution.
|
||||
//
|
||||
// 3. Neither the name of the Corporation nor the names of the
|
||||
// contributors may be used to endorse or promote products derived from
|
||||
// this software without specific prior written permission.
|
||||
//
|
||||
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
|
||||
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
||||
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
|
||||
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
|
||||
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
//
|
||||
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
|
||||
//
|
||||
// ************************************************************************
|
||||
//@HEADER
|
||||
*/
|
||||
|
||||
#ifndef KOKKOS_CUDA_TEAM_HPP
|
||||
#define KOKKOS_CUDA_TEAM_HPP
|
||||
|
||||
#include <iostream>
|
||||
#include <algorithm>
|
||||
#include <stdio.h>
|
||||
|
||||
#include <Kokkos_Macros.hpp>
|
||||
|
||||
/* only compile this file if CUDA is enabled for Kokkos */
|
||||
#if defined( __CUDACC__ ) && defined( KOKKOS_ENABLE_CUDA )
|
||||
|
||||
#include <utility>
|
||||
#include <Kokkos_Parallel.hpp>
|
||||
|
||||
#include <Cuda/Kokkos_CudaExec.hpp>
|
||||
#include <Cuda/Kokkos_Cuda_ReduceScan.hpp>
|
||||
#include <Cuda/Kokkos_Cuda_Internal.hpp>
|
||||
#include <Kokkos_Vectorization.hpp>
|
||||
|
||||
#if defined(KOKKOS_ENABLE_PROFILING)
|
||||
#include <impl/Kokkos_Profiling_Interface.hpp>
|
||||
#include <typeinfo>
|
||||
#endif
|
||||
|
||||
//----------------------------------------------------------------------------
|
||||
//----------------------------------------------------------------------------
|
||||
|
||||
namespace Kokkos {
|
||||
namespace Impl {
|
||||
|
||||
template< typename Type >
|
||||
struct CudaJoinFunctor {
|
||||
typedef Type value_type ;
|
||||
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
static void join( volatile value_type & update ,
|
||||
volatile const value_type & input )
|
||||
{ update += input ; }
|
||||
};
|
||||
|
||||
/**\brief Team member_type passed to TeamPolicy or TeamTask closures.
|
||||
*
|
||||
* Cuda thread blocks for team closures are dimensioned as:
|
||||
* blockDim.x == number of "vector lanes" per "thread"
|
||||
* blockDim.y == number of "threads" per team
|
||||
* blockDim.z == number of teams in a block
|
||||
* where
|
||||
* A set of teams exactly fill a warp OR a team is the whole block
|
||||
* ( 0 == WarpSize % ( blockDim.x * blockDim.y ) )
|
||||
* OR
|
||||
* ( 1 == blockDim.z )
|
||||
*
|
||||
* Thus when 1 < blockDim.z the team is warp-synchronous
|
||||
* and __syncthreads should not be called in team collectives.
|
||||
*
|
||||
* When multiple teams are mapped onto a single block then the
|
||||
* total available shared memory must be partitioned among teams.
|
||||
*/
|
||||
class CudaTeamMember {
|
||||
private:
|
||||
|
||||
typedef Kokkos::Cuda execution_space ;
|
||||
typedef execution_space::scratch_memory_space scratch_memory_space ;
|
||||
|
||||
void * m_team_reduce ;
|
||||
scratch_memory_space m_team_shared ;
|
||||
int m_team_reduce_size ;
|
||||
int m_league_rank ;
|
||||
int m_league_size ;
|
||||
|
||||
public:
|
||||
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
const execution_space::scratch_memory_space & team_shmem() const
|
||||
{ return m_team_shared.set_team_thread_mode(0,1,0) ; }
|
||||
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
const execution_space::scratch_memory_space &
|
||||
team_scratch(const int& level) const
|
||||
{ return m_team_shared.set_team_thread_mode(level,1,0) ; }
|
||||
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
const execution_space::scratch_memory_space &
|
||||
thread_scratch(const int& level) const
|
||||
{ return m_team_shared.set_team_thread_mode(level,team_size(),team_rank()) ; }
|
||||
|
||||
KOKKOS_INLINE_FUNCTION int league_rank() const { return m_league_rank ; }
|
||||
KOKKOS_INLINE_FUNCTION int league_size() const { return m_league_size ; }
|
||||
KOKKOS_INLINE_FUNCTION int team_rank() const
|
||||
{
|
||||
#ifdef __CUDA_ARCH__
|
||||
return threadIdx.y ;
|
||||
#else
|
||||
return 0;
|
||||
#endif
|
||||
}
|
||||
|
||||
KOKKOS_INLINE_FUNCTION int team_size() const
|
||||
{
|
||||
#ifdef __CUDA_ARCH__
|
||||
return blockDim.y ;
|
||||
#else
|
||||
return 1;
|
||||
#endif
|
||||
}
|
||||
|
||||
KOKKOS_INLINE_FUNCTION void team_barrier() const
|
||||
{
|
||||
#ifdef __CUDA_ARCH__
|
||||
if ( 1 == blockDim.z ) __syncthreads(); // team == block
|
||||
else __threadfence_block(); // team <= warp
|
||||
#endif
|
||||
}
|
||||
|
||||
//--------------------------------------------------------------------------
|
||||
|
||||
template<class ValueType>
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
void team_broadcast( ValueType & val, const int& thread_id) const
|
||||
{
|
||||
#ifdef __CUDA_ARCH__
|
||||
if ( 1 == blockDim.z ) { // team == block
|
||||
__syncthreads();
|
||||
// Wait for shared data write until all threads arrive here
|
||||
if ( threadIdx.x == 0 && threadIdx.y == thread_id ) {
|
||||
*((ValueType*) m_team_reduce) = val ;
|
||||
}
|
||||
__syncthreads(); // Wait for shared data read until root thread writes
|
||||
val = *((ValueType*) m_team_reduce);
|
||||
}
|
||||
else { // team <= warp
|
||||
ValueType tmp( val ); // input might not be a register variable
|
||||
cuda_shfl( val, tmp, blockDim.x * thread_id, blockDim.x * blockDim.y );
|
||||
}
|
||||
#endif
|
||||
}
|
||||
|
||||
//--------------------------------------------------------------------------
|
||||
/**\brief Reduction across a team
|
||||
*
|
||||
* Mapping of teams onto blocks:
|
||||
* blockDim.x is "vector lanes"
|
||||
* blockDim.y is team "threads"
|
||||
* blockDim.z is number of teams per block
|
||||
*
|
||||
* Requires:
|
||||
* blockDim.x is power two
|
||||
* blockDim.x <= CudaTraits::WarpSize
|
||||
* ( 0 == CudaTraits::WarpSize % ( blockDim.x * blockDim.y )
|
||||
* OR
|
||||
* ( 1 == blockDim.z )
|
||||
*/
|
||||
template< typename ReducerType >
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
typename std::enable_if< is_reducer< ReducerType >::value >::type
|
||||
team_reduce( ReducerType const & reducer ) const noexcept
|
||||
{
|
||||
#ifdef __CUDA_ARCH__
|
||||
|
||||
typedef typename ReducerType::value_type value_type ;
|
||||
|
||||
value_type tmp( reducer.reference() );
|
||||
|
||||
// reduce within the warp using shuffle
|
||||
|
||||
const int wx =
|
||||
( threadIdx.x + blockDim.x * threadIdx.y ) & CudaTraits::WarpIndexMask ;
|
||||
|
||||
for ( int i = CudaTraits::WarpSize ; blockDim.x <= ( i >>= 1 ) ; ) {
|
||||
|
||||
cuda_shfl_down( reducer.reference() , tmp , i , CudaTraits::WarpSize );
|
||||
|
||||
// Root of each vector lane reduces:
|
||||
if ( 0 == threadIdx.x && wx < i ) {
|
||||
reducer.join( tmp , reducer.reference() );
|
||||
}
|
||||
}
|
||||
|
||||
if ( 1 < blockDim.z ) { // team <= warp
|
||||
// broadcast result from root vector lange of root thread
|
||||
|
||||
cuda_shfl( reducer.reference() , tmp
|
||||
, blockDim.x * threadIdx.y , CudaTraits::WarpSize );
|
||||
|
||||
}
|
||||
else { // team == block
|
||||
// Reduce across warps using shared memory
|
||||
// Broadcast result within block
|
||||
|
||||
// Number of warps, blockDim.y may not be power of two:
|
||||
const int nw = ( blockDim.x * blockDim.y + CudaTraits::WarpIndexMask ) >> CudaTraits::WarpIndexShift ;
|
||||
|
||||
// Warp index:
|
||||
const int wy = ( blockDim.x * threadIdx.y ) >> CudaTraits::WarpIndexShift ;
|
||||
|
||||
// Number of shared memory entries for the reduction:
|
||||
int nsh = m_team_reduce_size / sizeof(value_type);
|
||||
|
||||
// Using at most one entry per warp:
|
||||
if ( nw < nsh ) nsh = nw ;
|
||||
|
||||
__syncthreads(); // Wait before shared data write
|
||||
|
||||
if ( 0 == wx && wy < nsh ) {
|
||||
((value_type*) m_team_reduce)[wy] = tmp ;
|
||||
}
|
||||
|
||||
// When more warps than shared entries:
|
||||
for ( int i = nsh ; i < nw ; i += nsh ) {
|
||||
|
||||
__syncthreads();
|
||||
|
||||
if ( 0 == wx && i <= wy ) {
|
||||
const int k = wy - i ;
|
||||
if ( k < nsh ) {
|
||||
reducer.join( *((value_type*) m_team_reduce + k) , tmp );
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
__syncthreads();
|
||||
|
||||
// One warp performs the inter-warp reduction:
|
||||
|
||||
if ( 0 == wy ) {
|
||||
|
||||
// Start at power of two covering nsh
|
||||
|
||||
for ( int i = 1 << ( 32 - __clz(nsh-1) ) ; ( i >>= 1 ) ; ) {
|
||||
const int k = wx + i ;
|
||||
if ( wx < i && k < nsh ) {
|
||||
reducer.join( ((value_type*)m_team_reduce)[wx]
|
||||
, ((value_type*)m_team_reduce)[k] );
|
||||
__threadfence_block();
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
__syncthreads(); // Wait for reduction
|
||||
|
||||
// Broadcast result to all threads
|
||||
reducer.reference() = *((value_type*)m_team_reduce);
|
||||
}
|
||||
|
||||
#endif /* #ifdef __CUDA_ARCH__ */
|
||||
}
|
||||
|
||||
//--------------------------------------------------------------------------
|
||||
/** \brief Intra-team exclusive prefix sum with team_rank() ordering
|
||||
* with intra-team non-deterministic ordering accumulation.
|
||||
*
|
||||
* The global inter-team accumulation value will, at the end of the
|
||||
* league's parallel execution, be the scan's total.
|
||||
* Parallel execution ordering of the league's teams is non-deterministic.
|
||||
* As such the base value for each team's scan operation is similarly
|
||||
* non-deterministic.
|
||||
*/
|
||||
template< typename Type >
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
Type team_scan( const Type & value , Type * const global_accum ) const
|
||||
{
|
||||
#ifdef __CUDA_ARCH__
|
||||
Type * const base_data = (Type *) m_team_reduce ;
|
||||
|
||||
__syncthreads(); // Don't write in to shared data until all threads have entered this function
|
||||
|
||||
if ( 0 == threadIdx.y ) { base_data[0] = 0 ; }
|
||||
|
||||
base_data[ threadIdx.y + 1 ] = value ;
|
||||
|
||||
Impl::cuda_intra_block_reduce_scan<true,Impl::CudaJoinFunctor<Type>,void>( Impl::CudaJoinFunctor<Type>() , base_data + 1 );
|
||||
|
||||
if ( global_accum ) {
|
||||
if ( blockDim.y == threadIdx.y + 1 ) {
|
||||
base_data[ blockDim.y ] = atomic_fetch_add( global_accum , base_data[ blockDim.y ] );
|
||||
}
|
||||
__syncthreads(); // Wait for atomic
|
||||
base_data[ threadIdx.y ] += base_data[ blockDim.y ] ;
|
||||
}
|
||||
|
||||
return base_data[ threadIdx.y ];
|
||||
#else
|
||||
return Type();
|
||||
#endif
|
||||
}
|
||||
|
||||
/** \brief Intra-team exclusive prefix sum with team_rank() ordering.
|
||||
*
|
||||
* The highest rank thread can compute the reduction total as
|
||||
* reduction_total = dev.team_scan( value ) + value ;
|
||||
*/
|
||||
template< typename Type >
|
||||
KOKKOS_INLINE_FUNCTION Type team_scan( const Type & value ) const {
|
||||
return this->template team_scan<Type>( value , 0 );
|
||||
}
|
||||
|
||||
//----------------------------------------
|
||||
|
||||
template< typename ReducerType >
|
||||
KOKKOS_INLINE_FUNCTION static
|
||||
typename std::enable_if< is_reducer< ReducerType >::value >::type
|
||||
vector_reduce( ReducerType const & reducer )
|
||||
{
|
||||
|
||||
#ifdef __CUDA_ARCH__
|
||||
if(blockDim.x == 1) return;
|
||||
|
||||
// Intra vector lane shuffle reduction:
|
||||
typename ReducerType::value_type tmp ( reducer.reference() );
|
||||
|
||||
for ( int i = blockDim.x ; ( i >>= 1 ) ; ) {
|
||||
cuda_shfl_down( reducer.reference() , tmp , i , blockDim.x );
|
||||
if ( threadIdx.x < i ) { reducer.join( tmp , reducer.reference() ); }
|
||||
}
|
||||
|
||||
// Broadcast from root lane to all other lanes.
|
||||
// Cannot use "butterfly" algorithm to avoid the broadcast
|
||||
// because floating point summation is not associative
|
||||
// and thus different threads could have different results.
|
||||
|
||||
cuda_shfl( reducer.reference() , tmp , 0 , blockDim.x );
|
||||
#endif
|
||||
}
|
||||
|
||||
//--------------------------------------------------------------------------
|
||||
/**\brief Global reduction across all blocks
|
||||
*
|
||||
* Return !0 if reducer contains the final value
|
||||
*/
|
||||
template< typename ReducerType >
|
||||
KOKKOS_INLINE_FUNCTION static
|
||||
typename std::enable_if< is_reducer< ReducerType >::value , int >::type
|
||||
global_reduce( ReducerType const & reducer
|
||||
, int * const global_scratch_flags
|
||||
, void * const global_scratch_space
|
||||
, void * const shmem
|
||||
, int const shmem_size
|
||||
)
|
||||
{
|
||||
#ifdef __CUDA_ARCH__
|
||||
|
||||
typedef typename ReducerType::value_type value_type ;
|
||||
typedef value_type volatile * pointer_type ;
|
||||
|
||||
// Number of shared memory entries for the reduction:
|
||||
const int nsh = shmem_size / sizeof(value_type);
|
||||
|
||||
// Number of CUDA threads in the block, rank within the block
|
||||
const int nid = blockDim.x * blockDim.y * blockDim.z ;
|
||||
const int tid = threadIdx.x + blockDim.x * (
|
||||
threadIdx.y + blockDim.y * threadIdx.z );
|
||||
|
||||
// Reduces within block using all available shared memory
|
||||
// Contributes if it is the root "vector lane"
|
||||
|
||||
// wn == number of warps in the block
|
||||
// wx == which lane within the warp
|
||||
// wy == which warp within the block
|
||||
|
||||
const int wn = ( nid + CudaTraits::WarpIndexMask ) >> CudaTraits::WarpIndexShift ;
|
||||
const int wx = tid & CudaTraits::WarpIndexMask ;
|
||||
const int wy = tid >> CudaTraits::WarpIndexShift ;
|
||||
|
||||
//------------------------
|
||||
{ // Intra warp shuffle reduction from contributing CUDA threads
|
||||
|
||||
value_type tmp( reducer.reference() );
|
||||
|
||||
for ( int i = CudaTraits::WarpSize ; blockDim.x <= ( i >>= 1 ) ; ) {
|
||||
|
||||
cuda_shfl_down( reducer.reference(), tmp, i, CudaTraits::WarpSize );
|
||||
|
||||
// Root of each vector lane reduces "thread" contribution
|
||||
if ( 0 == threadIdx.x && wx < i ) {
|
||||
reducer.join( & tmp , reducer.data() );
|
||||
}
|
||||
}
|
||||
|
||||
// Reduce across warps using shared memory.
|
||||
// Number of warps may not be power of two.
|
||||
|
||||
__syncthreads(); // Wait before shared data write
|
||||
|
||||
// Number of shared memory entries for the reduction
|
||||
// is at most one per warp
|
||||
const int nentry = wn < nsh ? wn : nsh ;
|
||||
|
||||
if ( 0 == wx && wy < nentry ) {
|
||||
// Root thread of warp 'wy' has warp's value to contribute
|
||||
((value_type*) shmem)[wy] = tmp ;
|
||||
}
|
||||
|
||||
__syncthreads(); // Wait for write to be visible to block
|
||||
|
||||
// When more warps than shared entries
|
||||
// then warps must take turns joining their contribution
|
||||
// to the designated shared memory entry.
|
||||
for ( int i = nentry ; i < wn ; i += nentry ) {
|
||||
|
||||
const int k = wy - i ;
|
||||
|
||||
if ( 0 == wx && i <= wy && k < nentry ) {
|
||||
// Root thread of warp 'wy' has warp's value to contribute
|
||||
reducer.join( ((value_type*) shmem) + k , & tmp );
|
||||
}
|
||||
|
||||
__syncthreads(); // Wait for write to be visible to block
|
||||
}
|
||||
|
||||
// One warp performs the inter-warp reduction:
|
||||
|
||||
if ( 0 == wy ) {
|
||||
|
||||
// Start fan-in at power of two covering nentry
|
||||
|
||||
for ( int i = ( 1 << ( 32 - __clz(nentry-1) ) ) ; ( i >>= 1 ) ; ) {
|
||||
const int k = wx + i ;
|
||||
if ( wx < i && k < nentry ) {
|
||||
reducer.join( ((pointer_type)shmem) + wx
|
||||
, ((pointer_type)shmem) + k );
|
||||
__threadfence_block(); // Wait for write to be visible to warp
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
//------------------------
|
||||
{ // Write block's value to global_scratch_memory
|
||||
|
||||
int last_block = 0 ;
|
||||
|
||||
if ( 0 == wx ) {
|
||||
reducer.copy( ((pointer_type)global_scratch_space)
|
||||
+ blockIdx.x * reducer.length()
|
||||
, reducer.data() );
|
||||
|
||||
__threadfence(); // Wait until global write is visible.
|
||||
|
||||
last_block = gridDim.x ==
|
||||
1 + Kokkos::atomic_fetch_add(global_scratch_flags,1);
|
||||
|
||||
// If last block then reset count
|
||||
if ( last_block ) *global_scratch_flags = 0 ;
|
||||
}
|
||||
|
||||
last_block = __syncthreads_or( last_block );
|
||||
|
||||
if ( ! last_block ) return 0 ;
|
||||
|
||||
}
|
||||
//------------------------
|
||||
// Last block reads global_scratch_memory into shared memory.
|
||||
|
||||
const int nentry = nid < gridDim.x ?
|
||||
( nid < nsh ? nid : nsh ) :
|
||||
( gridDim.x < nsh ? gridDim.x : nsh ) ;
|
||||
|
||||
// nentry = min( nid , nsh , gridDim.x )
|
||||
|
||||
// whole block reads global memory into shared memory:
|
||||
|
||||
if ( tid < nentry ) {
|
||||
|
||||
const int offset = tid * reducer.length();
|
||||
|
||||
reducer.copy( ((pointer_type)shmem) + offset
|
||||
, ((pointer_type)global_scratch_space) + offset );
|
||||
|
||||
for ( int i = nentry + tid ; i < gridDim.x ; i += nentry ) {
|
||||
reducer.join( ((pointer_type)shmem) + offset
|
||||
, ((pointer_type)global_scratch_space)
|
||||
+ i * reducer.length() );
|
||||
}
|
||||
}
|
||||
|
||||
__syncthreads(); // Wait for writes to be visible to block
|
||||
|
||||
if ( 0 == wy ) {
|
||||
|
||||
// Iterate to reduce shared memory to single warp fan-in size
|
||||
|
||||
const int nreduce = CudaTraits::WarpSize < nentry
|
||||
? CudaTraits::WarpSize : nentry ;
|
||||
|
||||
// nreduce = min( CudaTraits::WarpSize , nsh , gridDim.x )
|
||||
|
||||
if ( wx < nreduce && nreduce < nentry ) {
|
||||
for ( int i = nreduce + wx ; i < nentry ; i += nreduce ) {
|
||||
reducer.join( ((pointer_type)shmem) + wx
|
||||
, ((pointer_type)shmem) + i );
|
||||
}
|
||||
__threadfence_block(); // Wait for writes to be visible to warp
|
||||
}
|
||||
|
||||
// Start fan-in at power of two covering nentry
|
||||
|
||||
for ( int i = ( 1 << ( 32 - __clz(nreduce-1) ) ) ; ( i >>= 1 ) ; ) {
|
||||
const int k = wx + i ;
|
||||
if ( wx < i && k < nreduce ) {
|
||||
reducer.join( ((pointer_type)shmem) + wx
|
||||
, ((pointer_type)shmem) + k );
|
||||
__threadfence_block(); // Wait for writes to be visible to warp
|
||||
}
|
||||
}
|
||||
|
||||
if ( 0 == wx ) {
|
||||
reducer.copy( reducer.data() , (pointer_type)shmem );
|
||||
return 1 ;
|
||||
}
|
||||
}
|
||||
return 0 ;
|
||||
|
||||
#else
|
||||
return 0 ;
|
||||
#endif
|
||||
}
|
||||
|
||||
//----------------------------------------
|
||||
// Private for the driver
|
||||
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
CudaTeamMember( void * shared
|
||||
, const int shared_begin
|
||||
, const int shared_size
|
||||
, void* scratch_level_1_ptr
|
||||
, const int scratch_level_1_size
|
||||
, const int arg_league_rank
|
||||
, const int arg_league_size )
|
||||
: m_team_reduce( shared )
|
||||
, m_team_shared( ((char *)shared) + shared_begin , shared_size, scratch_level_1_ptr, scratch_level_1_size)
|
||||
, m_team_reduce_size( shared_begin )
|
||||
, m_league_rank( arg_league_rank )
|
||||
, m_league_size( arg_league_size )
|
||||
{}
|
||||
|
||||
};
|
||||
|
||||
} // namspace Impl
|
||||
} // namespace Kokkos
|
||||
|
||||
//----------------------------------------------------------------------------
|
||||
//----------------------------------------------------------------------------
|
||||
|
||||
namespace Kokkos {
|
||||
namespace Impl {
|
||||
|
||||
template<typename iType>
|
||||
struct TeamThreadRangeBoundariesStruct<iType,CudaTeamMember> {
|
||||
typedef iType index_type;
|
||||
const CudaTeamMember& member;
|
||||
const iType start;
|
||||
const iType end;
|
||||
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
TeamThreadRangeBoundariesStruct (const CudaTeamMember& thread_, const iType& count)
|
||||
: member(thread_)
|
||||
, start( 0 )
|
||||
, end( count ) {}
|
||||
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
TeamThreadRangeBoundariesStruct (const CudaTeamMember& thread_, const iType& begin_, const iType& end_)
|
||||
: member(thread_)
|
||||
, start( begin_ )
|
||||
, end( end_ ) {}
|
||||
};
|
||||
|
||||
|
||||
|
||||
template<typename iType>
|
||||
struct ThreadVectorRangeBoundariesStruct<iType,CudaTeamMember> {
|
||||
typedef iType index_type;
|
||||
const iType start;
|
||||
const iType end;
|
||||
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
ThreadVectorRangeBoundariesStruct (const CudaTeamMember, const iType& count)
|
||||
: start( 0 ), end( count ) {}
|
||||
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
ThreadVectorRangeBoundariesStruct (const iType& count)
|
||||
: start( 0 ), end( count ) {}
|
||||
};
|
||||
|
||||
} // namespace Impl
|
||||
|
||||
template<typename iType>
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
Impl::TeamThreadRangeBoundariesStruct< iType, Impl::CudaTeamMember >
|
||||
TeamThreadRange( const Impl::CudaTeamMember & thread, const iType & count ) {
|
||||
return Impl::TeamThreadRangeBoundariesStruct< iType, Impl::CudaTeamMember >( thread, count );
|
||||
}
|
||||
|
||||
template< typename iType1, typename iType2 >
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
Impl::TeamThreadRangeBoundariesStruct< typename std::common_type< iType1, iType2 >::type,
|
||||
Impl::CudaTeamMember >
|
||||
TeamThreadRange( const Impl::CudaTeamMember & thread, const iType1 & begin, const iType2 & end ) {
|
||||
typedef typename std::common_type< iType1, iType2 >::type iType;
|
||||
return Impl::TeamThreadRangeBoundariesStruct< iType, Impl::CudaTeamMember >( thread, iType(begin), iType(end) );
|
||||
}
|
||||
|
||||
template<typename iType>
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::CudaTeamMember >
|
||||
ThreadVectorRange(const Impl::CudaTeamMember& thread, const iType& count) {
|
||||
return Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::CudaTeamMember >(thread,count);
|
||||
}
|
||||
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
Impl::ThreadSingleStruct<Impl::CudaTeamMember> PerTeam(const Impl::CudaTeamMember& thread) {
|
||||
return Impl::ThreadSingleStruct<Impl::CudaTeamMember>(thread);
|
||||
}
|
||||
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
Impl::VectorSingleStruct<Impl::CudaTeamMember> PerThread(const Impl::CudaTeamMember& thread) {
|
||||
return Impl::VectorSingleStruct<Impl::CudaTeamMember>(thread);
|
||||
}
|
||||
|
||||
//----------------------------------------------------------------------------
|
||||
|
||||
/** \brief Inter-thread parallel_for.
|
||||
*
|
||||
* Executes closure(iType i) for each i=[0..N).
|
||||
*
|
||||
* The range [0..N) is mapped to all threads of the the calling thread team.
|
||||
*/
|
||||
template<typename iType, class Closure >
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
void parallel_for
|
||||
( const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::CudaTeamMember>&
|
||||
loop_boundaries
|
||||
, const Closure & closure
|
||||
)
|
||||
{
|
||||
#ifdef __CUDA_ARCH__
|
||||
for( iType i = loop_boundaries.start + threadIdx.y
|
||||
; i < loop_boundaries.end
|
||||
; i += blockDim.y )
|
||||
closure(i);
|
||||
#endif
|
||||
}
|
||||
|
||||
//----------------------------------------------------------------------------
|
||||
|
||||
/** \brief Inter-thread parallel_reduce with a reducer.
|
||||
*
|
||||
* Executes closure(iType i, ValueType & val) for each i=[0..N)
|
||||
*
|
||||
* The range [0..N) is mapped to all threads of the
|
||||
* calling thread team and a summation of val is
|
||||
* performed and put into result.
|
||||
*/
|
||||
template< typename iType, class Closure, class ReducerType >
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
typename std::enable_if< Kokkos::is_reducer< ReducerType >::value >::type
|
||||
parallel_reduce
|
||||
( const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::CudaTeamMember> &
|
||||
loop_boundaries
|
||||
, const Closure & closure
|
||||
, const ReducerType & reducer
|
||||
)
|
||||
{
|
||||
#ifdef __CUDA_ARCH__
|
||||
|
||||
reducer.init( reducer.reference() );
|
||||
|
||||
for( iType i = loop_boundaries.start + threadIdx.y
|
||||
; i < loop_boundaries.end
|
||||
; i += blockDim.y ) {
|
||||
closure(i,reducer.reference());
|
||||
}
|
||||
|
||||
loop_boundaries.member.team_reduce( reducer );
|
||||
|
||||
#endif
|
||||
}
|
||||
|
||||
|
||||
/** \brief Inter-thread parallel_reduce assuming summation.
|
||||
*
|
||||
* Executes closure(iType i, ValueType & val) for each i=[0..N)
|
||||
*
|
||||
* The range [0..N) is mapped to all threads of the
|
||||
* calling thread team and a summation of val is
|
||||
* performed and put into result.
|
||||
*/
|
||||
template< typename iType, class Closure, typename ValueType >
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
typename std::enable_if< ! Kokkos::is_reducer< ValueType >::value >::type
|
||||
parallel_reduce
|
||||
( const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::CudaTeamMember> &
|
||||
loop_boundaries
|
||||
, const Closure & closure
|
||||
, ValueType & result
|
||||
)
|
||||
{
|
||||
#ifdef __CUDA_ARCH__
|
||||
|
||||
Kokkos::Experimental::Sum<ValueType> reducer(result);
|
||||
|
||||
reducer.init( reducer.reference() );
|
||||
|
||||
for( iType i = loop_boundaries.start + threadIdx.y
|
||||
; i < loop_boundaries.end
|
||||
; i += blockDim.y ) {
|
||||
closure(i,result);
|
||||
}
|
||||
|
||||
loop_boundaries.member.team_reduce( reducer );
|
||||
|
||||
#endif
|
||||
}
|
||||
|
||||
//----------------------------------------------------------------------------
|
||||
|
||||
/** \brief Intra-thread vector parallel_for.
|
||||
*
|
||||
* Executes closure(iType i) for each i=[0..N)
|
||||
*
|
||||
* The range [0..N) is mapped to all vector lanes of the the calling thread.
|
||||
*/
|
||||
template<typename iType, class Closure >
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
void parallel_for
|
||||
( const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::CudaTeamMember>&
|
||||
loop_boundaries
|
||||
, const Closure & closure
|
||||
)
|
||||
{
|
||||
#ifdef __CUDA_ARCH__
|
||||
for ( iType i = loop_boundaries.start + threadIdx.x
|
||||
; i < loop_boundaries.end
|
||||
; i += blockDim.x ) {
|
||||
closure(i);
|
||||
}
|
||||
#endif
|
||||
}
|
||||
|
||||
//----------------------------------------------------------------------------
|
||||
|
||||
/** \brief Intra-thread vector parallel_reduce.
|
||||
*
|
||||
* Calls closure(iType i, ValueType & val) for each i=[0..N).
|
||||
*
|
||||
* The range [0..N) is mapped to all vector lanes of
|
||||
* the calling thread and a reduction of val is performed using +=
|
||||
* and output into result.
|
||||
*
|
||||
* The identity value for the += operator is assumed to be the default
|
||||
* constructed value.
|
||||
*/
|
||||
template< typename iType, class Closure, class ReducerType >
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
typename std::enable_if< is_reducer< ReducerType >::value >::type
|
||||
parallel_reduce
|
||||
( Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::CudaTeamMember>
|
||||
const & loop_boundaries
|
||||
, Closure const & closure
|
||||
, ReducerType const & reducer )
|
||||
{
|
||||
#ifdef __CUDA_ARCH__
|
||||
|
||||
reducer.init( reducer.reference() );
|
||||
|
||||
for ( iType i = loop_boundaries.start + threadIdx.x
|
||||
; i < loop_boundaries.end
|
||||
; i += blockDim.x ) {
|
||||
closure(i,reducer.reference());
|
||||
}
|
||||
|
||||
Impl::CudaTeamMember::vector_reduce( reducer );
|
||||
|
||||
#endif
|
||||
}
|
||||
|
||||
/** \brief Intra-thread vector parallel_reduce.
|
||||
*
|
||||
* Calls closure(iType i, ValueType & val) for each i=[0..N).
|
||||
*
|
||||
* The range [0..N) is mapped to all vector lanes of
|
||||
* the calling thread and a reduction of val is performed using +=
|
||||
* and output into result.
|
||||
*
|
||||
* The identity value for the += operator is assumed to be the default
|
||||
* constructed value.
|
||||
*/
|
||||
template< typename iType, class Closure, typename ValueType >
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
typename std::enable_if< ! is_reducer< ValueType >::value >::type
|
||||
parallel_reduce
|
||||
( Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::CudaTeamMember>
|
||||
const & loop_boundaries
|
||||
, Closure const & closure
|
||||
, ValueType & result )
|
||||
{
|
||||
#ifdef __CUDA_ARCH__
|
||||
result = ValueType();
|
||||
|
||||
for ( iType i = loop_boundaries.start + threadIdx.x
|
||||
; i < loop_boundaries.end
|
||||
; i += blockDim.x ) {
|
||||
closure(i,result);
|
||||
}
|
||||
|
||||
Impl::CudaTeamMember::vector_reduce(
|
||||
Kokkos::Experimental::Sum<ValueType>(result ) );
|
||||
|
||||
#endif
|
||||
}
|
||||
|
||||
//----------------------------------------------------------------------------
|
||||
|
||||
/** \brief Intra-thread vector parallel exclusive prefix sum.
|
||||
*
|
||||
* Executes closure(iType i, ValueType & val, bool final) for each i=[0..N)
|
||||
*
|
||||
* The range [0..N) is mapped to all vector lanes in the
|
||||
* thread and a scan operation is performed.
|
||||
* The last call to closure has final == true.
|
||||
*/
|
||||
template< typename iType, class Closure >
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
void parallel_scan
|
||||
( const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::CudaTeamMember >&
|
||||
loop_boundaries
|
||||
, const Closure & closure
|
||||
)
|
||||
{
|
||||
|
||||
#ifdef __CUDA_ARCH__
|
||||
|
||||
// Extract value_type from closure
|
||||
|
||||
using value_type =
|
||||
typename Kokkos::Impl::FunctorAnalysis
|
||||
< Kokkos::Impl::FunctorPatternInterface::SCAN
|
||||
, void
|
||||
, Closure >::value_type ;
|
||||
|
||||
// Loop through boundaries by vector-length chunks
|
||||
// must scan at each iteration
|
||||
|
||||
value_type accum = 0 ;
|
||||
|
||||
// All thread "lanes" must loop the same number of times.
|
||||
// Determine an loop end for all thread "lanes."
|
||||
// Requires:
|
||||
// blockDim.x is power of two and thus
|
||||
// ( end % blockDim.x ) == ( end & ( blockDim.x - 1 ) )
|
||||
// 1 <= blockDim.x <= CudaTraits::WarpSize
|
||||
|
||||
const int mask = blockDim.x - 1 ;
|
||||
const int rem = loop_boundaries.end & mask ; // == end % blockDim.x
|
||||
const int end = loop_boundaries.end + ( rem ? blockDim.x - rem : 0 );
|
||||
|
||||
for ( int i = threadIdx.x ; i < end ; i += blockDim.x ) {
|
||||
|
||||
value_type val = 0 ;
|
||||
|
||||
// First acquire per-lane contributions:
|
||||
if ( i < loop_boundaries.end ) closure( i , val , false );
|
||||
|
||||
value_type sval = val ;
|
||||
|
||||
// Bottom up inclusive scan in triangular pattern
|
||||
// where each CUDA thread is the root of a reduction tree
|
||||
// from the zeroth "lane" to itself.
|
||||
// [t] += [t-1] if t >= 1
|
||||
// [t] += [t-2] if t >= 2
|
||||
// [t] += [t-4] if t >= 4
|
||||
// ...
|
||||
|
||||
for ( int j = 1 ; j < blockDim.x ; j <<= 1 ) {
|
||||
value_type tmp = 0 ;
|
||||
Impl::cuda_shfl_up( tmp , sval , j , blockDim.x );
|
||||
if ( j <= threadIdx.x ) { sval += tmp ; }
|
||||
}
|
||||
|
||||
// Include accumulation and remove value for exclusive scan:
|
||||
val = accum + sval - val ;
|
||||
|
||||
// Provide exclusive scan value:
|
||||
if ( i < loop_boundaries.end ) closure( i , val , true );
|
||||
|
||||
// Accumulate the last value in the inclusive scan:
|
||||
Impl::cuda_shfl( sval , sval , mask , blockDim.x );
|
||||
|
||||
accum += sval ;
|
||||
}
|
||||
|
||||
#endif
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
namespace Kokkos {
|
||||
|
||||
template<class FunctorType>
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
void single(const Impl::VectorSingleStruct<Impl::CudaTeamMember>& , const FunctorType& lambda) {
|
||||
#ifdef __CUDA_ARCH__
|
||||
if(threadIdx.x == 0) lambda();
|
||||
#endif
|
||||
}
|
||||
|
||||
template<class FunctorType>
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
void single(const Impl::ThreadSingleStruct<Impl::CudaTeamMember>& , const FunctorType& lambda) {
|
||||
#ifdef __CUDA_ARCH__
|
||||
if(threadIdx.x == 0 && threadIdx.y == 0) lambda();
|
||||
#endif
|
||||
}
|
||||
|
||||
template<class FunctorType, class ValueType>
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
void single(const Impl::VectorSingleStruct<Impl::CudaTeamMember>& , const FunctorType& lambda, ValueType& val) {
|
||||
#ifdef __CUDA_ARCH__
|
||||
if(threadIdx.x == 0) lambda(val);
|
||||
val = shfl(val,0,blockDim.x);
|
||||
#endif
|
||||
}
|
||||
|
||||
template<class FunctorType, class ValueType>
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
void single(const Impl::ThreadSingleStruct<Impl::CudaTeamMember>& single_struct, const FunctorType& lambda, ValueType& val) {
|
||||
#ifdef __CUDA_ARCH__
|
||||
if(threadIdx.x == 0 && threadIdx.y == 0) {
|
||||
lambda(val);
|
||||
}
|
||||
single_struct.team_member.team_broadcast(val,0);
|
||||
#endif
|
||||
}
|
||||
|
||||
} // namespace Kokkos
|
||||
|
||||
#endif /* defined( __CUDACC__ ) */
|
||||
|
||||
#endif /* #ifndef KOKKOS_CUDA_TEAM_HPP */
|
||||
|
|
@ -1,13 +1,13 @@
|
|||
/*
|
||||
//@HEADER
|
||||
// ************************************************************************
|
||||
//
|
||||
//
|
||||
// Kokkos v. 2.0
|
||||
// Copyright (2014) Sandia Corporation
|
||||
//
|
||||
//
|
||||
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
|
||||
// the U.S. Government retains certain rights in this software.
|
||||
//
|
||||
//
|
||||
// Redistribution and use in source and binary forms, with or without
|
||||
// modification, are permitted provided that the following conditions are
|
||||
// met:
|
||||
|
@ -36,7 +36,7 @@
|
|||
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
//
|
||||
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
|
||||
//
|
||||
//
|
||||
// ************************************************************************
|
||||
//@HEADER
|
||||
*/
|
||||
|
@ -44,8 +44,6 @@
|
|||
#define KOKKOS_CUDA_VECTORIZATION_HPP
|
||||
|
||||
#include <Kokkos_Macros.hpp>
|
||||
|
||||
/* only compile this file if CUDA is enabled for Kokkos */
|
||||
#ifdef KOKKOS_ENABLE_CUDA
|
||||
|
||||
#include <Kokkos_Cuda.hpp>
|
||||
|
@ -296,3 +294,4 @@ namespace Impl {
|
|||
|
||||
#endif // KOKKOS_ENABLE_CUDA
|
||||
#endif
|
||||
|
||||
|
|
|
@ -1,13 +1,13 @@
|
|||
/*
|
||||
//@HEADER
|
||||
// ************************************************************************
|
||||
//
|
||||
//
|
||||
// Kokkos v. 2.0
|
||||
// Copyright (2014) Sandia Corporation
|
||||
//
|
||||
//
|
||||
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
|
||||
// the U.S. Government retains certain rights in this software.
|
||||
//
|
||||
//
|
||||
// Redistribution and use in source and binary forms, with or without
|
||||
// modification, are permitted provided that the following conditions are
|
||||
// met:
|
||||
|
@ -36,7 +36,7 @@
|
|||
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
//
|
||||
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
|
||||
//
|
||||
//
|
||||
// ************************************************************************
|
||||
//@HEADER
|
||||
*/
|
||||
|
@ -44,7 +44,7 @@
|
|||
#ifndef KOKKOS_EXPERIMENTAL_CUDA_VIEW_HPP
|
||||
#define KOKKOS_EXPERIMENTAL_CUDA_VIEW_HPP
|
||||
|
||||
/* only compile this file if CUDA is enabled for Kokkos */
|
||||
#include <Kokkos_Macros.hpp>
|
||||
#if defined( KOKKOS_ENABLE_CUDA )
|
||||
|
||||
//----------------------------------------------------------------------------
|
||||
|
|
|
@ -1,13 +1,13 @@
|
|||
/*
|
||||
//@HEADER
|
||||
// ************************************************************************
|
||||
//
|
||||
//
|
||||
// Kokkos v. 2.0
|
||||
// Copyright (2014) Sandia Corporation
|
||||
//
|
||||
//
|
||||
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
|
||||
// the U.S. Government retains certain rights in this software.
|
||||
//
|
||||
//
|
||||
// Redistribution and use in source and binary forms, with or without
|
||||
// modification, are permitted provided that the following conditions are
|
||||
// met:
|
||||
|
@ -36,7 +36,7 @@
|
|||
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
//
|
||||
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
|
||||
//
|
||||
//
|
||||
// ************************************************************************
|
||||
//@HEADER
|
||||
*/
|
||||
|
@ -46,7 +46,7 @@
|
|||
|
||||
//----------------------------------------------------------------------------
|
||||
//----------------------------------------------------------------------------
|
||||
#include "Kokkos_Macros.hpp"
|
||||
#include <Kokkos_Macros.hpp>
|
||||
#if defined( __CUDACC__ ) && defined( KOKKOS_ENABLE_CUDA )
|
||||
|
||||
#include <cuda.h>
|
||||
|
@ -82,6 +82,8 @@ void cuda_abort( const char * const message )
|
|||
|
||||
} // namespace Impl
|
||||
} // namespace Kokkos
|
||||
#else
|
||||
void KOKKOS_CORE_SRC_CUDA_ABORT_PREVENT_LINK_ERROR() {}
|
||||
#endif /* #if defined(__CUDACC__) && defined( KOKKOS_ENABLE_CUDA ) */
|
||||
#endif /* #ifndef KOKKOS_CUDA_ABORT_HPP */
|
||||
|
||||
|
|
|
@ -236,13 +236,13 @@ struct MDRangePolicy
|
|||
MDRangePolicy( lower_tmp, upper_tmp, tile_tmp );
|
||||
|
||||
#else
|
||||
if(m_lower.size()!=rank || m_upper.size() != rank)
|
||||
if(static_cast<int>(m_lower.size()) != rank || static_cast<int>(m_upper.size()) != rank)
|
||||
Kokkos::abort("MDRangePolicy: Constructor initializer lists have wrong size");
|
||||
|
||||
for ( auto i = 0; i < rank; ++i ) {
|
||||
m_lower[i] = static_cast<array_index_type>(lower.begin()[i]);
|
||||
m_upper[i] = static_cast<array_index_type>(upper.begin()[i]);
|
||||
if(tile.size()==rank)
|
||||
if(static_cast<int>(tile.size())==rank)
|
||||
m_tile[i] = static_cast<array_index_type>(tile.begin()[i]);
|
||||
else
|
||||
m_tile[i] = 0;
|
||||
|
|
|
@ -44,6 +44,7 @@
|
|||
#define KOKKOS_COMPLEX_HPP
|
||||
|
||||
#include <Kokkos_Atomic.hpp>
|
||||
#include <Kokkos_NumericTraits.hpp>
|
||||
#include <complex>
|
||||
#include <iostream>
|
||||
|
||||
|
@ -324,9 +325,30 @@ public:
|
|||
im_ /= src;
|
||||
return *this;
|
||||
}
|
||||
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
bool operator == (const complex<RealType>& src) {
|
||||
return (re_ == src.re_) && (im_ == src.im_);
|
||||
}
|
||||
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
bool operator == (const RealType src) {
|
||||
return (re_ == src) && (im_ == RealType(0));
|
||||
}
|
||||
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
bool operator != (const complex<RealType>& src) {
|
||||
return (re_ != src.re_) || (im_ != src.im_);
|
||||
}
|
||||
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
bool operator != (const RealType src) {
|
||||
return (re_ != src) || (im_ != RealType(0));
|
||||
}
|
||||
|
||||
};
|
||||
|
||||
//! Binary + operator for complex.
|
||||
//! Binary + operator for complex complex.
|
||||
template<class RealType>
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
complex<RealType>
|
||||
|
@ -334,6 +356,22 @@ operator + (const complex<RealType>& x, const complex<RealType>& y) {
|
|||
return complex<RealType> (x.real () + y.real (), x.imag () + y.imag ());
|
||||
}
|
||||
|
||||
//! Binary + operator for complex scalar.
|
||||
template<class RealType>
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
complex<RealType>
|
||||
operator + (const complex<RealType>& x, const RealType& y) {
|
||||
return complex<RealType> (x.real () + y , x.imag ());
|
||||
}
|
||||
|
||||
//! Binary + operator for scalar complex.
|
||||
template<class RealType>
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
complex<RealType>
|
||||
operator + (const RealType& x, const complex<RealType>& y) {
|
||||
return complex<RealType> (x + y.real (), y.imag ());
|
||||
}
|
||||
|
||||
//! Unary + operator for complex.
|
||||
template<class RealType>
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
|
@ -350,6 +388,22 @@ operator - (const complex<RealType>& x, const complex<RealType>& y) {
|
|||
return complex<RealType> (x.real () - y.real (), x.imag () - y.imag ());
|
||||
}
|
||||
|
||||
//! Binary - operator for complex scalar.
|
||||
template<class RealType>
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
complex<RealType>
|
||||
operator - (const complex<RealType>& x, const RealType& y) {
|
||||
return complex<RealType> (x.real () - y , x.imag ());
|
||||
}
|
||||
|
||||
//! Binary - operator for scalar complex.
|
||||
template<class RealType>
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
complex<RealType>
|
||||
operator - (const RealType& x, const complex<RealType>& y) {
|
||||
return complex<RealType> (x - y.real (), - y.imag ());
|
||||
}
|
||||
|
||||
//! Unary - operator for complex.
|
||||
template<class RealType>
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
|
@ -395,6 +449,16 @@ operator * (const RealType& x, const complex<RealType>& y) {
|
|||
return complex<RealType> (x * y.real (), x * y.imag ());
|
||||
}
|
||||
|
||||
/// \brief Binary * operator for RealType times complex.
|
||||
///
|
||||
/// This function exists because the compiler doesn't know that
|
||||
/// RealType and complex<RealType> commute with respect to operator*.
|
||||
template<class RealType>
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
complex<RealType>
|
||||
operator * (const complex<RealType>& y, const RealType& x) {
|
||||
return complex<RealType> (x * y.real (), x * y.imag ());
|
||||
}
|
||||
|
||||
//! Imaginary part of a complex number.
|
||||
template<class RealType>
|
||||
|
@ -415,7 +479,25 @@ template<class RealType>
|
|||
KOKKOS_INLINE_FUNCTION
|
||||
RealType abs (const complex<RealType>& x) {
|
||||
// FIXME (mfh 31 Oct 2014) Scale to avoid unwarranted overflow.
|
||||
return ::sqrt (real (x) * real (x) + imag (x) * imag (x));
|
||||
return std::sqrt (real (x) * real (x) + imag (x) * imag (x));
|
||||
}
|
||||
|
||||
//! Power of a complex number
|
||||
template<class RealType>
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
Kokkos::complex<RealType> pow (const complex<RealType>& x, const RealType& e) {
|
||||
RealType r = abs(x);
|
||||
RealType phi = std::atan(x.imag()/x.real());
|
||||
return std::pow(r,e) * Kokkos::complex<RealType>(std::cos(phi*e),std::sin(phi*e));
|
||||
}
|
||||
|
||||
//! Square root of a complex number.
|
||||
template<class RealType>
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
Kokkos::complex<RealType> sqrt (const complex<RealType>& x) {
|
||||
RealType r = abs(x);
|
||||
RealType phi = std::atan(x.imag()/x.real());
|
||||
return std::sqrt(r) * Kokkos::complex<RealType>(std::cos(phi*0.5),std::sin(phi*0.5));
|
||||
}
|
||||
|
||||
//! Conjugate of a complex number.
|
||||
|
@ -425,6 +507,19 @@ complex<RealType> conj (const complex<RealType>& x) {
|
|||
return complex<RealType> (real (x), -imag (x));
|
||||
}
|
||||
|
||||
//! Exponential of a complex number.
|
||||
template<class RealType>
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
complex<RealType> exp (const complex<RealType>& x) {
|
||||
return std::exp(x.real()) * complex<RealType> (std::cos (x.imag()), std::sin(x.imag()));
|
||||
}
|
||||
|
||||
//! Exponential of a complex number.
|
||||
template<class RealType>
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
complex<RealType> pow (const complex<RealType>& x) {
|
||||
return std::exp(x.real()) * complex<RealType> (std::cos (x.imag()), std::sin(x.imag()));
|
||||
}
|
||||
|
||||
//! Binary operator / for complex and real numbers
|
||||
template<class RealType1, class RealType2>
|
||||
|
@ -461,6 +556,14 @@ operator / (const complex<RealType>& x, const complex<RealType>& y) {
|
|||
}
|
||||
}
|
||||
|
||||
//! Binary operator / for complex and real numbers
|
||||
template<class RealType1, class RealType2>
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
complex<RealType1>
|
||||
operator / (const RealType1& x, const complex<RealType2>& y) {
|
||||
return complex<RealType1> (x)/y;
|
||||
}
|
||||
|
||||
//! Equality operator for two complex numbers.
|
||||
template<class RealType>
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
|
@ -468,9 +571,13 @@ bool operator == (const complex<RealType>& x, const complex<RealType>& y) {
|
|||
return real (x) == real (y) && imag (x) == imag (y);
|
||||
}
|
||||
|
||||
//! Equality operator for std::complex and Kokkos::complex.
|
||||
/// \brief Equality operator for std::complex and Kokkos::complex.
|
||||
///
|
||||
/// This cannot be a device function, since std::real is not.
|
||||
/// Otherwise, CUDA builds will give compiler warnings ("warning:
|
||||
/// calling a constexpr __host__ function("real") from a __host__
|
||||
/// __device__ function("operator==") is not allowed").
|
||||
template<class RealType>
|
||||
KOKKOS_INLINE_FUNCTION
|
||||
bool operator == (const std::complex<RealType>& x, const complex<RealType>& y) {
|
||||
return std::real (x) == real (y) && std::imag (x) == imag (y);
|
||||
}
|
||||
|
@ -533,6 +640,15 @@ std::ostream& operator >> (std::ostream& os, complex<RealType>& x) {
|
|||
}
|
||||
|
||||
|
||||
template<class T>
|
||||
struct reduction_identity<Kokkos::complex<T> > {
|
||||
typedef reduction_identity<T> t_red_ident;
|
||||
KOKKOS_FORCEINLINE_FUNCTION constexpr static Kokkos::complex<T> sum()
|
||||
{return Kokkos::complex<T>(t_red_ident::sum(),t_red_ident::sum());}
|
||||
KOKKOS_FORCEINLINE_FUNCTION constexpr static Kokkos::complex<T> prod()
|
||||
{return Kokkos::complex<T>(t_red_ident::prod(),t_red_ident::sum());}
|
||||
};
|
||||
|
||||
} // namespace Kokkos
|
||||
|
||||
#endif // KOKKOS_COMPLEX_HPP
|
||||
|
|
|
@ -200,7 +200,14 @@ public:
|
|||
, Kokkos::DefaultHostExecutionSpace , execution_space
|
||||
>::type host_execution_space ;
|
||||
#else
|
||||
typedef execution_space host_execution_space ;
|
||||
#if defined( KOKKOS_ENABLE_OPENMPTARGET )
|
||||
typedef typename std::conditional
|
||||
< std::is_same< execution_space , Kokkos::Experimental::OpenMPTarget >::value
|
||||
, Kokkos::DefaultHostExecutionSpace , execution_space
|
||||
>::type host_execution_space ;
|
||||
#else
|
||||
typedef execution_space host_execution_space ;
|
||||
#endif
|
||||
#endif
|
||||
|
||||
typedef typename std::conditional
|
||||
|
|
|
@ -57,11 +57,16 @@
|
|||
#include <Kokkos_OpenMP.hpp>
|
||||
#endif
|
||||
|
||||
//#if defined( KOKKOS_ENABLE_OPENMPTARGET )
|
||||
#include <Kokkos_OpenMPTarget.hpp>
|
||||
#include <Kokkos_OpenMPTargetSpace.hpp>
|
||||
//#endif
|
||||
|
||||
#if defined( KOKKOS_ENABLE_QTHREADS )
|
||||
#include <Kokkos_Qthreads.hpp>
|
||||
#endif
|
||||
|
||||
#if defined( KOKKOS_ENABLE_PTHREAD )
|
||||
#if defined( KOKKOS_ENABLE_THREADS )
|
||||
#include <Kokkos_Threads.hpp>
|
||||
#endif
|
||||
|
||||
|
@ -69,8 +74,8 @@
|
|||
#include <Kokkos_Cuda.hpp>
|
||||
#endif
|
||||
|
||||
#include <Kokkos_MemoryPool.hpp>
|
||||
#include <Kokkos_Pair.hpp>
|
||||
#include <Kokkos_MemoryPool.hpp>
|
||||
#include <Kokkos_Array.hpp>
|
||||
#include <Kokkos_View.hpp>
|
||||
#include <Kokkos_Vectorization.hpp>
|
||||
|
@ -167,3 +172,4 @@ void * kokkos_realloc( void * arg_alloc , const size_t arg_alloc_size )
|
|||
//----------------------------------------------------------------------------
|
||||
|
||||
#endif
|
||||
|
||||
|
|
|
@ -96,7 +96,7 @@ class Serial; ///< Execution space main process on CPU.
|
|||
class Qthreads; ///< Execution space with Qthreads back-end.
|
||||
#endif
|
||||
|
||||
#if defined( KOKKOS_ENABLE_PTHREAD )
|
||||
#if defined( KOKKOS_ENABLE_THREADS )
|
||||
class Threads; ///< Execution space with pthreads back-end.
|
||||
#endif
|
||||
|
||||
|
@ -104,6 +104,14 @@ class Threads; ///< Execution space with pthreads back-end.
|
|||
class OpenMP; ///< OpenMP execution space.
|
||||
#endif
|
||||
|
||||
#if defined( KOKKOS_ENABLE_OPENMPTARGET )
|
||||
namespace Experimental {
|
||||
class OpenMPTarget; ///< OpenMPTarget execution space.
|
||||
class OpenMPTargetSpace;
|
||||
}
|
||||
#endif
|
||||
|
||||
|
||||
#if defined( KOKKOS_ENABLE_CUDA )
|
||||
class CudaSpace; ///< Memory space on Cuda GPU
|
||||
class CudaUVMSpace; ///< Memory space on Cuda GPU with UVM
|
||||
|
@ -121,12 +129,14 @@ struct Device;
|
|||
|
||||
/// Define Kokkos::DefaultExecutionSpace as per configuration option
|
||||
/// or chosen from the enabled execution spaces in the following order:
|
||||
/// Kokkos::Cuda, Kokkos::OpenMP, Kokkos::Threads, Kokkos::Serial
|
||||
/// Kokkos::Cuda, Kokkos::Experimental::OpenMPTarget, Kokkos::OpenMP, Kokkos::Threads, Kokkos::Serial
|
||||
|
||||
namespace Kokkos {
|
||||
|
||||
#if defined( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_CUDA )
|
||||
typedef Cuda DefaultExecutionSpace;
|
||||
#elif defined ( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_OPENMPTARGET )
|
||||
typedef Experimental::OpenMPTarget DefaultExecutionSpace ;
|
||||
#elif defined( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_OPENMP )
|
||||
typedef OpenMP DefaultExecutionSpace;
|
||||
#elif defined( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_THREADS )
|
||||
|
@ -136,7 +146,7 @@ namespace Kokkos {
|
|||
#elif defined( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_SERIAL )
|
||||
typedef Serial DefaultExecutionSpace;
|
||||
#else
|
||||
# error "At least one of the following execution spaces must be defined in order to use Kokkos: Kokkos::Cuda, Kokkos::OpenMP, Kokkos::Threads, Kokkos::Qthreads, or Kokkos::Serial."
|
||||
# error "At least one of the following execution spaces must be defined in order to use Kokkos: Kokkos::Cuda, Kokkos::Experimental::OpenMPTarget, Kokkos::OpenMP, Kokkos::Threads, Kokkos::Qthreads, or Kokkos::Serial."
|
||||
#endif
|
||||
|
||||
#if defined( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_OPENMP )
|
||||
|
@ -149,7 +159,7 @@ namespace Kokkos {
|
|||
typedef Serial DefaultHostExecutionSpace;
|
||||
#elif defined( KOKKOS_ENABLE_OPENMP )
|
||||
typedef OpenMP DefaultHostExecutionSpace;
|
||||
#elif defined( KOKKOS_ENABLE_PTHREAD )
|
||||
#elif defined( KOKKOS_ENABLE_THREADS )
|
||||
typedef Threads DefaultHostExecutionSpace;
|
||||
//#elif defined( KOKKOS_ENABLE_QTHREADS )
|
||||
// typedef Qthreads DefaultHostExecutionSpace;
|
||||
|
@ -254,6 +264,21 @@ template< class FunctorType, class ExecPolicy, class ExecutionSapce =
|
|||
|
||||
} // namespace Impl
|
||||
|
||||
namespace Experimental {
|
||||
template<class ScalarType , class Space = HostSpace> struct Sum;
|
||||
template<class ScalarType , class Space = HostSpace> struct Prod;
|
||||
template<class ScalarType , class Space = HostSpace> struct Min;
|
||||
template<class ScalarType , class Space = HostSpace> struct Max;
|
||||
template<class ScalarType , class Space = HostSpace> struct MinMax;
|
||||
template<class ScalarType , class Index, class Space = HostSpace> struct MinLoc;
|
||||
template<class ScalarType , class Index, class Space = HostSpace> struct MaxLoc;
|
||||
template<class ScalarType , class Index, class Space = HostSpace> struct MinMaxLoc;
|
||||
template<class ScalarType , class Space = HostSpace> struct BAnd;
|
||||
template<class ScalarType , class Space = HostSpace> struct BOr;
|
||||
template<class ScalarType , class Space = HostSpace> struct LAnd;
|
||||
template<class ScalarType , class Space = HostSpace> struct LOr;
|
||||
}
|
||||
} // namespace Kokkos
|
||||
|
||||
#endif /* #ifndef KOKKOS_CORE_FWD_HPP */
|
||||
|
||||
|
|
|
@ -44,12 +44,11 @@
|
|||
#ifndef KOKKOS_CUDA_HPP
|
||||
#define KOKKOS_CUDA_HPP
|
||||
|
||||
#include <Kokkos_Core_fwd.hpp>
|
||||
|
||||
// If CUDA execution space is enabled then use this header file.
|
||||
|
||||
#include <Kokkos_Macros.hpp>
|
||||
#if defined( KOKKOS_ENABLE_CUDA )
|
||||
|
||||
#include <Kokkos_Core_fwd.hpp>
|
||||
|
||||
#include <iosfwd>
|
||||
#include <vector>
|
||||
|
||||
|
@ -214,6 +213,8 @@ public:
|
|||
//@}
|
||||
//--------------------------------------------------------------------------
|
||||
|
||||
static const char* name();
|
||||
|
||||
private:
|
||||
|
||||
cudaStream_t m_stream ;
|
||||
|
@ -291,6 +292,7 @@ struct VerifyExecutionCanAccessMemorySpace
|
|||
|
||||
#include <Cuda/Kokkos_CudaExec.hpp>
|
||||
#include <Cuda/Kokkos_Cuda_View.hpp>
|
||||
#include <Cuda/Kokkos_Cuda_Team.hpp>
|
||||
#include <Cuda/Kokkos_Cuda_Parallel.hpp>
|
||||
#include <Cuda/Kokkos_Cuda_Task.hpp>
|
||||
|
||||
|
@ -300,5 +302,3 @@ struct VerifyExecutionCanAccessMemorySpace
|
|||
#endif /* #if defined( KOKKOS_ENABLE_CUDA ) */
|
||||
#endif /* #ifndef KOKKOS_CUDA_HPP */
|
||||
|
||||
|
||||
|
||||
|
|
|
@ -1,13 +1,13 @@
|
|||
/*
|
||||
//@HEADER
|
||||
// ************************************************************************
|
||||
//
|
||||
//
|
||||
// Kokkos v. 2.0
|
||||
// Copyright (2014) Sandia Corporation
|
||||
//
|
||||
//
|
||||
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
|
||||
// the U.S. Government retains certain rights in this software.
|
||||
//
|
||||
//
|
||||
// Redistribution and use in source and binary forms, with or without
|
||||
// modification, are permitted provided that the following conditions are
|
||||
// met:
|
||||
|
@ -36,7 +36,7 @@
|
|||
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
//
|
||||
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
|
||||
//
|
||||
//
|
||||
// ************************************************************************
|
||||
//@HEADER
|
||||
*/
|
||||
|
@ -44,10 +44,11 @@
|
|||
#ifndef KOKKOS_CUDASPACE_HPP
|
||||
#define KOKKOS_CUDASPACE_HPP
|
||||
|
||||
#include <Kokkos_Core_fwd.hpp>
|
||||
|
||||
#include <Kokkos_Macros.hpp>
|
||||
#if defined( KOKKOS_ENABLE_CUDA )
|
||||
|
||||
#include <Kokkos_Core_fwd.hpp>
|
||||
|
||||
#include <iosfwd>
|
||||
#include <typeinfo>
|
||||
#include <string>
|
||||
|
@ -717,7 +718,7 @@ private:
|
|||
static ::cudaTextureObject_t
|
||||
attach_texture_object( const unsigned sizeof_alias
|
||||
, void * const alloc_ptr
|
||||
, const size_t alloc_size );
|
||||
, const size_t alloc_size );
|
||||
|
||||
static RecordBase s_root_record ;
|
||||
|
||||
|
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in New Issue