Updating Kokkos lib to v2.03.05

This commit is contained in:
Stan Moore 2017-06-08 10:42:08 -06:00
parent 286d4f2743
commit 5c1d17d1c0
474 changed files with 50972 additions and 10897 deletions

View File

@ -1,5 +1,53 @@
# Change Log
## [2.03.05](https://github.com/kokkos/kokkos/tree/2.03.05) (2017-05-27)
[Full Changelog](https://github.com/kokkos/kokkos/compare/2.03.00...2.03.05)
**Implemented enhancements:**
- Harmonize Custom Reductions over nesting levels [\#802](https://github.com/kokkos/kokkos/issues/802)
- Prevent users directly including KokkosCore\_config.h [\#815](https://github.com/kokkos/kokkos/issues/815)
- DualView aborts on concurrent host/device modify \(in debug mode\) [\#814](https://github.com/kokkos/kokkos/issues/814)
- Abort when running on a NVIDIA CC5.0 or higher architecture with code compiled for CC \< 5.0 [\#813](https://github.com/kokkos/kokkos/issues/813)
- Add "name" function to ExecSpaces [\#806](https://github.com/kokkos/kokkos/issues/806)
- Allow null Future in task spawn dependences [\#795](https://github.com/kokkos/kokkos/issues/795)
- Add Unit Tests for Kokkos::complex [\#785](https://github.com/kokkos/kokkos/issues/785)
- Add pow function for Kokkos::complex [\#784](https://github.com/kokkos/kokkos/issues/784)
- Square root of a complex [\#729](https://github.com/kokkos/kokkos/issues/729)
- Command line processing of --threads argument prevents users from having any commandline arguments starting with --threads [\#760](https://github.com/kokkos/kokkos/issues/760)
- Protected deprecated API with appropriate macro [\#756](https://github.com/kokkos/kokkos/issues/756)
- Allow task scheduler memory pool to be used by tasks [\#747](https://github.com/kokkos/kokkos/issues/747)
- View bounds checking on host-side performance: constructing a std::string [\#723](https://github.com/kokkos/kokkos/issues/723)
- Add check for AppleClang as compiler distinct from check for Clang. [\#705](https://github.com/kokkos/kokkos/issues/705)
- Uninclude source files for specific configurations to prevent link warning. [\#701](https://github.com/kokkos/kokkos/issues/701)
- Add --small option to snapshot script [\#697](https://github.com/kokkos/kokkos/issues/697)
- CMake Standalone Support [\#674](https://github.com/kokkos/kokkos/issues/674)
- CMake build unit test and install [\#808](https://github.com/kokkos/kokkos/issues/808)
- CMake: Fix having kokkos as a subdirectory in a pure cmake project [\#629](https://github.com/kokkos/kokkos/issues/629)
- Tribits macro assumes build directory is in top level source directory [\#654](https://github.com/kokkos/kokkos/issues/654)
- Use bin/nvcc\_wrapper, not config/nvcc\_wrapper [\#562](https://github.com/kokkos/kokkos/issues/562)
- Allow MemoryPool::allocate\(\) to be called from multiple threads per warp. [\#487](https://github.com/kokkos/kokkos/issues/487)
- Allow MemoryPool::allocate\\(\\) to be called from multiple threads per warp. [\#487](https://github.com/kokkos/kokkos/issues/487)
- Move OpenMP 4.5 OpenMPTarget backend into Develop [\#456](https://github.com/kokkos/kokkos/issues/456)
- Testing on ARM testbed [\#288](https://github.com/kokkos/kokkos/issues/288)
**Fixed bugs:**
- Fix label in OpenMP parallel\_reduce verify\_initialized [\#834](https://github.com/kokkos/kokkos/issues/834)
- TeamScratch Level 1 on Cuda hangs [\#820](https://github.com/kokkos/kokkos/issues/820)
- \[bug\] memory pool. [\#786](https://github.com/kokkos/kokkos/issues/786)
- Some Reduction Tests fail on Intel 18 with aggressive vectorization on [\#774](https://github.com/kokkos/kokkos/issues/774)
- Error copying dynamic view on copy of memory pool [\#773](https://github.com/kokkos/kokkos/issues/773)
- CUDA stack overflow with TaskDAG test [\#758](https://github.com/kokkos/kokkos/issues/758)
- ThreadVectorRange Customized Reduction Bug [\#739](https://github.com/kokkos/kokkos/issues/739)
- set\_scratch\_size overflows [\#726](https://github.com/kokkos/kokkos/issues/726)
- Get wrong results for compiler checks in Makefile on OS X. [\#706](https://github.com/kokkos/kokkos/issues/706)
- Fix check if multiple host architectures enabled. [\#702](https://github.com/kokkos/kokkos/issues/702)
- Threads Backend Does not Pass on Cray Compilers [\#609](https://github.com/kokkos/kokkos/issues/609)
- Rare bug in memory pool where allocation can finish on superblock in empty state [\#452](https://github.com/kokkos/kokkos/issues/452)
- LDFLAGS in core/unit\_test/Makefile: potential "undefined reference" to pthread lib [\#148](https://github.com/kokkos/kokkos/issues/148)
## [2.03.00](https://github.com/kokkos/kokkos/tree/2.03.00) (2017-04-25)
[Full Changelog](https://github.com/kokkos/kokkos/compare/2.02.15...2.03.00)

View File

@ -5,11 +5,12 @@ ELSE()
ENDIF()
IF(NOT KOKKOS_HAS_TRILINOS)
CMAKE_MINIMUM_REQUIRED(VERSION 2.8.11 FATAL_ERROR)
INCLUDE(cmake/tribits.cmake)
SET(CMAKE_CXX_STANDARD 11)
ENDIF()
cmake_minimum_required(VERSION 3.1 FATAL_ERROR)
project(Kokkos CXX)
INCLUDE(cmake/kokkos.cmake)
ELSE()
#------------------------------------------------------------------------------
#
# A) Forward delcare the package so that certain options are also defined for
# subpackages
@ -17,14 +18,13 @@ ENDIF()
TRIBITS_PACKAGE_DECL(Kokkos) # ENABLE_SHADOWING_WARNINGS)
#------------------------------------------------------------------------------
#
# B) Define the common options for Kokkos first so they can be used by
# subpackages as well.
#
# mfh 01 Aug 2016: See Issue #61:
#
# https://github.com/kokkos/kokkos/issues/61
@ -83,10 +83,10 @@ TRIBITS_ADD_OPTION_AND_DEFINE(
)
ASSERT_DEFINED(TPL_ENABLE_Pthread)
IF (Kokkos_ENABLE_Pthread AND NOT TPL_ENABLE_Pthread)
IF(Kokkos_ENABLE_Pthread AND NOT TPL_ENABLE_Pthread)
MESSAGE(FATAL_ERROR "You set Kokkos_ENABLE_Pthread=ON, but Trilinos' support for Pthread(s) is not enabled (TPL_ENABLE_Pthread=OFF). This is not allowed. Please enable Pthreads in Trilinos before attempting to enable Kokkos' support for Pthreads.")
ENDIF ()
IF (NOT TPL_ENABLE_Pthread)
ENDIF()
IF(NOT TPL_ENABLE_Pthread)
ADD_DEFINITIONS(-DGTEST_HAS_PTHREAD=0)
ENDIF()
@ -98,12 +98,13 @@ TRIBITS_ADD_OPTION_AND_DEFINE(
)
TRIBITS_ADD_OPTION_AND_DEFINE(
Kokkos_ENABLE_Qthreads
Kokkos_ENABLE_QTHREAD
KOKKOS_HAVE_QTHREADS
"Enable Qthreads support in Kokkos."
"${TPL_ENABLE_QTHREADS}"
"${TPL_ENABLE_QTHREAD}"
)
# TODO: No longer an option in Kokkos. Needs to be removed.
TRIBITS_ADD_OPTION_AND_DEFINE(
Kokkos_ENABLE_CXX11
KOKKOS_HAVE_CXX11
@ -118,6 +119,7 @@ TRIBITS_ADD_OPTION_AND_DEFINE(
"${TPL_ENABLE_HWLOC}"
)
# TODO: This is currently not used in Kokkos. Should it be removed?
TRIBITS_ADD_OPTION_AND_DEFINE(
Kokkos_ENABLE_MPI
KOKKOS_HAVE_MPI
@ -154,13 +156,27 @@ TRIBITS_ADD_OPTION_AND_DEFINE(
"${Kokkos_ENABLE_Debug_Bounds_Check_DEFAULT}"
)
TRIBITS_ADD_OPTION_AND_DEFINE(
Kokkos_ENABLE_Debug_DualView_Modify_Check
KOKKOS_ENABLE_DEBUG_DUALVIEW_MODIFY_CHECK
"Enable abort when Kokkos::DualView modified on host and device without sync."
"${Kokkos_ENABLE_DEBUG}"
)
TRIBITS_ADD_OPTION_AND_DEFINE(
Kokkos_ENABLE_Profiling
KOKKOS_ENABLE_PROFILING_INTERNAL
KOKKOS_ENABLE_PROFILING
"Enable KokkosP profiling support for kernel data collections."
"${TPL_ENABLE_DLlib}"
)
TRIBITS_ADD_OPTION_AND_DEFINE(
Kokkos_ENABLE_Profiling_Load_Print
KOKKOS_ENABLE_PROFILING_LOAD_PRINT
"Print to standard output which profiling library was loaded."
OFF
)
# placeholder for future device...
TRIBITS_ADD_OPTION_AND_DEFINE(
Kokkos_ENABLE_Winthread
@ -169,6 +185,7 @@ TRIBITS_ADD_OPTION_AND_DEFINE(
"${TPL_ENABLE_Winthread}"
)
# TODO: No longer an option in Kokkos. Needs to be removed.
# use new/old View
TRIBITS_ADD_OPTION_AND_DEFINE(
Kokkos_USING_DEPRECATED_VIEW
@ -177,12 +194,12 @@ TRIBITS_ADD_OPTION_AND_DEFINE(
OFF
)
#------------------------------------------------------------------------------
#
# C) Install Kokkos' executable scripts
#
# nvcc_wrapper is Kokkos' wrapper for NVIDIA's NVCC CUDA compiler.
# Kokkos needs nvcc_wrapper in order to build. Other libraries and
# executables also need nvcc_wrapper. Thus, we need to install it.
@ -199,6 +216,8 @@ INSTALL(PROGRAMS ${CMAKE_CURRENT_SOURCE_DIR}/bin/nvcc_wrapper DESTINATION bin)
TRIBITS_PROCESS_SUBPACKAGES()
#------------------------------------------------------------------------------
#
# E) If Kokkos itself is enabled, process the Kokkos package
#
@ -213,3 +232,4 @@ TRIBITS_EXCLUDE_FILES(
)
TRIBITS_PACKAGE_POSTPROCESS()
ENDIF()

View File

@ -35,23 +35,26 @@ KOKKOS_INTERNAL_USE_MEMKIND := $(strip $(shell echo $(KOKKOS_USE_TPLS) | grep "e
# Check for advanced settings.
KOKKOS_INTERNAL_OPT_RANGE_AGGRESSIVE_VECTORIZATION := $(strip $(shell echo $(KOKKOS_OPTIONS) | grep "aggressive_vectorization" | wc -l))
KOKKOS_INTERNAL_DISABLE_PROFILING := $(strip $(shell echo $(KOKKOS_OPTIONS) | grep "disable_profiling" | wc -l))
KOKKOS_INTERNAL_DISABLE_DUALVIEW_MODIFY_CHECK := $(strip $(shell echo $(KOKKOS_OPTIONS) | grep "disable_dualview_modify_check" | wc -l))
KOKKOS_INTERNAL_ENABLE_PROFILING_LOAD_PRINT := $(strip $(shell echo $(KOKKOS_OPTIONS) | grep "enable_profile_load_print" | wc -l))
KOKKOS_INTERNAL_CUDA_USE_LDG := $(strip $(shell echo $(KOKKOS_CUDA_OPTIONS) | grep "use_ldg" | wc -l))
KOKKOS_INTERNAL_CUDA_USE_UVM := $(strip $(shell echo $(KOKKOS_CUDA_OPTIONS) | grep "force_uvm" | wc -l))
KOKKOS_INTERNAL_CUDA_USE_RELOC := $(strip $(shell echo $(KOKKOS_CUDA_OPTIONS) | grep "rdc" | wc -l))
KOKKOS_INTERNAL_CUDA_USE_LAMBDA := $(strip $(shell echo $(KOKKOS_CUDA_OPTIONS) | grep "enable_lambda" | wc -l))
# Check for Kokkos Host Execution Spaces one of which must be on.
KOKKOS_INTERNAL_USE_OPENMP := $(strip $(shell echo $(KOKKOS_DEVICES) | grep OpenMP | wc -l))
KOKKOS_INTERNAL_USE_OPENMPTARGET := $(strip $(shell echo $(KOKKOS_DEVICES) | grep OpenMPTarget | wc -l))
KOKKOS_INTERNAL_USE_OPENMP := $(strip $(shell echo $(subst OpenMPTarget,,$(KOKKOS_DEVICES)) | grep OpenMP | wc -l))
KOKKOS_INTERNAL_USE_PTHREADS := $(strip $(shell echo $(KOKKOS_DEVICES) | grep Pthread | wc -l))
KOKKOS_INTERNAL_USE_QTHREADS := $(strip $(shell echo $(KOKKOS_DEVICES) | grep Qthreads | wc -l))
KOKKOS_INTERNAL_USE_SERIAL := $(strip $(shell echo $(KOKKOS_DEVICES) | grep Serial | wc -l))
ifeq ($(KOKKOS_INTERNAL_USE_OPENMP), 0)
ifeq ($(KOKKOS_INTERNAL_USE_PTHREADS), 0)
ifeq ($(KOKKOS_INTERNAL_USE_QTHREADS), 0)
KOKKOS_INTERNAL_USE_SERIAL := 1
endif
endif
ifeq ($(KOKKOS_INTERNAL_USE_PTHREADS), 0)
ifeq ($(KOKKOS_INTERNAL_USE_QTHREADS), 0)
KOKKOS_INTERNAL_USE_SERIAL := 1
endif
endif
endif
# Check for other Execution Spaces.
@ -64,24 +67,25 @@ ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
endif
# Check OS.
KOKKOS_OS := $(shell uname -s)
KOKKOS_INTERNAL_OS_CYGWIN := $(shell uname -s | grep CYGWIN | wc -l)
KOKKOS_INTERNAL_OS_LINUX := $(shell uname -s | grep Linux | wc -l)
KOKKOS_INTERNAL_OS_DARWIN := $(shell uname -s | grep Darwin | wc -l)
KOKKOS_OS := $(strip $(shell uname -s))
KOKKOS_INTERNAL_OS_CYGWIN := $(strip $(shell uname -s | grep CYGWIN | wc -l))
KOKKOS_INTERNAL_OS_LINUX := $(strip $(shell uname -s | grep Linux | wc -l))
KOKKOS_INTERNAL_OS_DARWIN := $(strip $(shell uname -s | grep Darwin | wc -l))
# Check compiler.
KOKKOS_INTERNAL_COMPILER_INTEL := $(shell $(CXX) --version 2>&1 | grep "Intel Corporation" | wc -l)
KOKKOS_INTERNAL_COMPILER_PGI := $(shell $(CXX) --version 2>&1 | grep PGI | wc -l)
KOKKOS_INTERNAL_COMPILER_XL := $(shell $(CXX) -qversion 2>&1 | grep XL | wc -l)
KOKKOS_INTERNAL_COMPILER_CRAY := $(shell $(CXX) -craype-verbose 2>&1 | grep "CC-" | wc -l)
KOKKOS_INTERNAL_COMPILER_NVCC := $(shell $(CXX) --version 2>&1 | grep "nvcc" | wc -l)
KOKKOS_INTERNAL_COMPILER_INTEL := $(strip $(shell $(CXX) --version 2>&1 | grep "Intel Corporation" | wc -l))
KOKKOS_INTERNAL_COMPILER_PGI := $(strip $(shell $(CXX) --version 2>&1 | grep PGI | wc -l))
KOKKOS_INTERNAL_COMPILER_XL := $(strip $(shell $(CXX) -qversion 2>&1 | grep XL | wc -l))
KOKKOS_INTERNAL_COMPILER_CRAY := $(strip $(shell $(CXX) -craype-verbose 2>&1 | grep "CC-" | wc -l))
KOKKOS_INTERNAL_COMPILER_NVCC := $(strip $(shell $(CXX) --version 2>&1 | grep nvcc | wc -l))
KOKKOS_INTERNAL_COMPILER_CLANG := $(strip $(shell $(CXX) --version 2>&1 | grep clang | wc -l))
KOKKOS_INTERNAL_COMPILER_APPLE_CLANG := $(strip $(shell $(CXX) --version 2>&1 | grep "apple-darwin" | wc -l))
ifneq ($(OMPI_CXX),)
KOKKOS_INTERNAL_COMPILER_NVCC := $(shell $(OMPI_CXX) --version 2>&1 | grep "nvcc" | wc -l)
KOKKOS_INTERNAL_COMPILER_NVCC := $(strip $(shell $(OMPI_CXX) --version 2>&1 | grep "nvcc" | wc -l))
endif
ifneq ($(MPICH_CXX),)
KOKKOS_INTERNAL_COMPILER_NVCC := $(shell $(MPICH_CXX) --version 2>&1 | grep "nvcc" | wc -l)
KOKKOS_INTERNAL_COMPILER_NVCC := $(strip $(shell $(MPICH_CXX) --version 2>&1 | grep "nvcc" | wc -l))
endif
KOKKOS_INTERNAL_COMPILER_CLANG := $(shell $(CXX) --version 2>&1 | grep "clang" | wc -l)
ifeq ($(KOKKOS_INTERNAL_COMPILER_CLANG), 2)
KOKKOS_INTERNAL_COMPILER_CLANG = 1
@ -90,6 +94,11 @@ ifeq ($(KOKKOS_INTERNAL_COMPILER_XL), 2)
KOKKOS_INTERNAL_COMPILER_XL = 1
endif
# Apple Clang passes both clang and apple clang tests, so turn off clang.
ifeq ($(KOKKOS_INTERNAL_COMPILER_APPLE_CLANG), 1)
KOKKOS_INTERNAL_COMPILER_CLANG = 0
endif
ifeq ($(KOKKOS_INTERNAL_COMPILER_CLANG), 1)
KOKKOS_INTERNAL_COMPILER_CLANG_VERSION := $(shell clang --version | grep version | cut -d ' ' -f3 | tr -d '.')
@ -97,29 +106,43 @@ ifeq ($(KOKKOS_INTERNAL_COMPILER_CLANG), 1)
ifeq ($(shell test $(KOKKOS_INTERNAL_COMPILER_CLANG_VERSION) -lt 400; echo $$?),0)
$(error Compiling Cuda code directly with Clang requires version 4.0.0 or higher)
endif
KOKKOS_INTERNAL_CUDA_USE_LAMBDA := 1
endif
endif
# Set OpenMP flags.
ifeq ($(KOKKOS_INTERNAL_COMPILER_PGI), 1)
KOKKOS_INTERNAL_OPENMP_FLAG := -mp
else
ifeq ($(KOKKOS_INTERNAL_COMPILER_CLANG), 1)
KOKKOS_INTERNAL_OPENMP_FLAG := -fopenmp=libomp
else
ifeq ($(KOKKOS_INTERNAL_COMPILER_XL), 1)
KOKKOS_INTERNAL_OPENMP_FLAG := -qsmp=omp
ifeq ($(KOKKOS_INTERNAL_COMPILER_APPLE_CLANG), 1)
KOKKOS_INTERNAL_OPENMP_FLAG := -fopenmp=libomp
else
ifeq ($(KOKKOS_INTERNAL_COMPILER_CRAY), 1)
# OpenMP is turned on by default in Cray compiler environment.
KOKKOS_INTERNAL_OPENMP_FLAG :=
ifeq ($(KOKKOS_INTERNAL_COMPILER_XL), 1)
KOKKOS_INTERNAL_OPENMP_FLAG := -qsmp=omp
else
KOKKOS_INTERNAL_OPENMP_FLAG := -fopenmp
ifeq ($(KOKKOS_INTERNAL_COMPILER_CRAY), 1)
# OpenMP is turned on by default in Cray compiler environment.
KOKKOS_INTERNAL_OPENMP_FLAG :=
else
KOKKOS_INTERNAL_OPENMP_FLAG := -fopenmp
endif
endif
endif
endif
endif
ifeq ($(KOKKOS_INTERNAL_COMPILER_XL), 1)
KOKKOS_INTERNAL_OPENMPTARGET_FLAG := -DKOKKOS_IBM_XL_OMP45_WORKAROUND -qsmp=omp -qoffload -qnoeh
else
ifeq ($(KOKKOS_INTERNAL_COMPILER_CLANG), 1)
KOKKOS_INTERNAL_OPENMPTARGET_FLAG := -DKOKKOS_BUG_WORKAROUND_IBM_CLANG_OMP45_VIEW_INIT -fopenmp-implicit-declare-target -fopenmp-targets=nvptx64-nvidia-cuda -fopenmp -fopenmp=libomp
endif
endif
# Set C++11 flags.
ifeq ($(KOKKOS_INTERNAL_COMPILER_PGI), 1)
KOKKOS_INTERNAL_CXX11_FLAG := --c++11
else
@ -146,7 +169,7 @@ KOKKOS_INTERNAL_USE_ARCH_SKX := $(strip $(shell echo $(KOKKOS_ARCH) | grep SKX |
KOKKOS_INTERNAL_USE_ARCH_KNL := $(strip $(shell echo $(KOKKOS_ARCH) | grep KNL | wc -l))
# NVIDIA based.
NVCC_WRAPPER := $(KOKKOS_PATH)/config/nvcc_wrapper
NVCC_WRAPPER := $(KOKKOS_PATH)/bin/nvcc_wrapper
KOKKOS_INTERNAL_USE_ARCH_KEPLER30 := $(strip $(shell echo $(KOKKOS_ARCH) | grep Kepler30 | wc -l))
KOKKOS_INTERNAL_USE_ARCH_KEPLER32 := $(strip $(shell echo $(KOKKOS_ARCH) | grep Kepler32 | wc -l))
KOKKOS_INTERNAL_USE_ARCH_KEPLER35 := $(strip $(shell echo $(KOKKOS_ARCH) | grep Kepler35 | wc -l))
@ -180,10 +203,20 @@ ifeq ($(KOKKOS_INTERNAL_USE_ARCH_NVIDIA), 0)
+ $(KOKKOS_INTERNAL_USE_ARCH_MAXWELL53) | bc))
endif
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_NVIDIA), 1)
ifeq ($(KOKKOS_INTERNAL_USE_OPENMPTARGET), 1)
ifeq ($(KOKKOS_INTERNAL_COMPILER_CLANG), 1)
KOKKOS_INTERNAL_NVCC_PATH := $(shell which nvcc)
CUDA_PATH ?= $(KOKKOS_INTERNAL_NVCC_PATH:/bin/nvcc=)
KOKKOS_INTERNAL_OPENMPTARGET_FLAG := $(KOKKOS_INTERNAL_OPENMPTARGET_FLAG) --cuda-path=$(CUDA_PATH)
endif
endif
endif
# ARM based.
KOKKOS_INTERNAL_USE_ARCH_ARMV80 := $(strip $(shell echo $(KOKKOS_ARCH) | grep ARMv80 | wc -l))
KOKKOS_INTERNAL_USE_ARCH_ARMV81 := $(strip $(shell echo $(KOKKOS_ARCH) | grep ARMv81 | wc -l))
KOKKOS_INTERNAL_USE_ARCH_ARMV8_THUNDERX := $(strip $(shell echo $(KOKKOS_ARCH) | grep ARMv8-ThunderX | wc -l))
KOKKOS_INTERNAL_USE_ARCH_ARM := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_ARCH_ARMV80)+$(KOKKOS_INTERNAL_USE_ARCH_ARMV81)+$(KOKKOS_INTERNAL_USE_ARCH_ARMV8_THUNDERX) | bc))
# IBM based.
KOKKOS_INTERNAL_USE_ARCH_BGQ := $(strip $(shell echo $(KOKKOS_ARCH) | grep BGQ | wc -l))
@ -206,8 +239,11 @@ KOKKOS_INTERNAL_USE_ISA_X86_64 := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_
KOKKOS_INTERNAL_USE_ISA_KNC := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_ARCH_KNC) | bc ))
KOKKOS_INTERNAL_USE_ISA_POWERPCLE := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_ARCH_POWER8)+$(KOKKOS_INTERNAL_USE_ARCH_POWER9) | bc ))
# Decide whether we can support transactional memory
KOKKOS_INTERNAL_USE_TM := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_ARCH_BDW)+$(KOKKOS_INTERNAL_USE_ARCH_SKX) | bc ))
# Incompatible flags?
KOKKOS_INTERNAL_USE_ARCH_MULTIHOST := $(strip $(shell echo "$(KOKKOS_INTERNAL_USE_ARCH_AVX)+$(KOKKOS_INTERNAL_USE_ARCH_AVX2)+$(KOKKOS_INTERNAL_USE_ARCH_KNC)+$(KOKKOS_INTERNAL_USE_ARCH_IBM)+$(KOKKOS_INTERNAL_USE_ARCH_AMDAVX)+$(KOKKOS_INTERNAL_USE_ARCH_ARMV80)+$(KOKKOS_INTERNAL_USE_ARCH_ARMV81)+$(KOKKOS_INTERNAL_USE_ARCH_ARMV8_THUNDERX)>1" | bc ))
KOKKOS_INTERNAL_USE_ARCH_MULTIHOST := $(strip $(shell echo "$(KOKKOS_INTERNAL_USE_ARCH_AVX)+$(KOKKOS_INTERNAL_USE_ARCH_AVX2)+$(KOKKOS_INTERNAL_USE_ARCH_AVX512MIC)+$(KOKKOS_INTERNAL_USE_ARCH_AVX512XEON)+$(KOKKOS_INTERNAL_USE_ARCH_KNC)+$(KOKKOS_INTERNAL_USE_ARCH_IBM)+$(KOKKOS_INTERNAL_USE_ARCH_ARM)>1" | bc ))
KOKKOS_INTERNAL_USE_ARCH_MULTIGPU := $(strip $(shell echo "$(KOKKOS_INTERNAL_USE_ARCH_NVIDIA)>1" | bc))
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_MULTIHOST), 1)
@ -240,12 +276,22 @@ tmp := $(shell echo "Makefile constructed configuration:" >> KokkosCore_config.t
tmp := $(shell date >> KokkosCore_config.tmp)
tmp := $(shell echo "----------------------------------------------*/" >> KokkosCore_config.tmp)
tmp := $(shell echo '\#if !defined(KOKKOS_MACROS_HPP) || defined(KOKKOS_CORE_CONFIG_H)' >> KokkosCore_config.tmp)
tmp := $(shell echo '\#error "Do not include KokkosCore_config.h directly; include Kokkos_Macros.hpp instead."' >> KokkosCore_config.tmp)
tmp := $(shell echo '\#else' >> KokkosCore_config.tmp)
tmp := $(shell echo '\#define KOKKOS_CORE_CONFIG_H' >> KokkosCore_config.tmp)
tmp := $(shell echo '\#endif' >> KokkosCore_config.tmp)
tmp := $(shell echo "/* Execution Spaces */" >> KokkosCore_config.tmp)
ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
tmp := $(shell echo "\#define KOKKOS_HAVE_CUDA 1" >> KokkosCore_config.tmp )
endif
ifeq ($(KOKKOS_INTERNAL_USE_OPENMPTARGET), 1)
tmp := $(shell echo '\#define KOKKOS_ENABLE_OPENMPTARGET 1' >> KokkosCore_config.tmp)
endif
ifeq ($(KOKKOS_INTERNAL_USE_OPENMP), 1)
tmp := $(shell echo '\#define KOKKOS_HAVE_OPENMP 1' >> KokkosCore_config.tmp)
endif
@ -262,6 +308,12 @@ ifeq ($(KOKKOS_INTERNAL_USE_SERIAL), 1)
tmp := $(shell echo "\#define KOKKOS_HAVE_SERIAL 1" >> KokkosCore_config.tmp )
endif
ifeq ($(KOKKOS_INTERNAL_USE_TM), 1)
tmp := $(shell echo "\#ifndef __CUDA_ARCH__" >> KokkosCore_config.tmp )
tmp := $(shell echo "\#define KOKKOS_ENABLE_TM" >> KokkosCore_config.tmp )
tmp := $(shell echo "\#endif" >> KokkosCore_config.tmp )
endif
ifeq ($(KOKKOS_INTERNAL_USE_ISA_X86_64), 1)
tmp := $(shell echo "\#ifndef __CUDA_ARCH__" >> KokkosCore_config.tmp )
tmp := $(shell echo "\#define KOKKOS_USE_ISA_X86_64" >> KokkosCore_config.tmp )
@ -293,13 +345,21 @@ ifeq ($(KOKKOS_INTERNAL_ENABLE_CXX1Z), 1)
endif
ifeq ($(KOKKOS_INTERNAL_ENABLE_DEBUG), 1)
ifeq ($(KOKKOS_INTERNAL_COMPILER_NVCC), 1)
KOKKOS_CXXFLAGS += -lineinfo
endif
ifeq ($(KOKKOS_INTERNAL_COMPILER_NVCC), 1)
KOKKOS_CXXFLAGS += -lineinfo
endif
KOKKOS_CXXFLAGS += -g
KOKKOS_LDFLAGS += -g -ldl
tmp := $(shell echo "\#define KOKKOS_ENABLE_DEBUG_BOUNDS_CHECK 1" >> KokkosCore_config.tmp )
tmp := $(shell echo "\#define KOKKOS_HAVE_DEBUG 1" >> KokkosCore_config.tmp )
ifeq ($(KOKKOS_INTERNAL_DISABLE_DUALVIEW_MODIFY_CHECK), 0)
tmp := $(shell echo "\#define KOKKOS_ENABLE_DEBUG_DUALVIEW_MODIFY_CHECK 1" >> KokkosCore_config.tmp )
endif
endif
ifeq ($(KOKKOS_INTERNAL_ENABLE_PROFILING_LOAD_PRINT), 1)
tmp := $(shell echo "\#define KOKKOS_ENABLE_PROFILING_LOAD_PRINT 1" >> KokkosCore_config.tmp )
endif
ifeq ($(KOKKOS_INTERNAL_USE_HWLOC), 1)
@ -311,8 +371,6 @@ endif
ifeq ($(KOKKOS_INTERNAL_USE_LIBRT), 1)
tmp := $(shell echo "\#define KOKKOS_USE_LIBRT 1" >> KokkosCore_config.tmp )
tmp := $(shell echo "\#define PREC_TIMER 1" >> KokkosCore_config.tmp )
tmp := $(shell echo "\#define KOKKOSP_ENABLE_RTLIB 1" >> KokkosCore_config.tmp )
KOKKOS_LIBS += -lrt
endif
@ -323,8 +381,8 @@ ifeq ($(KOKKOS_INTERNAL_USE_MEMKIND), 1)
tmp := $(shell echo "\#define KOKKOS_HAVE_HBWSPACE 1" >> KokkosCore_config.tmp )
endif
ifeq ($(KOKKOS_INTERNAL_DISABLE_PROFILING), 1)
tmp := $(shell echo "\#define KOKKOS_ENABLE_PROFILING 0" >> KokkosCore_config.tmp )
ifeq ($(KOKKOS_INTERNAL_DISABLE_PROFILING), 0)
tmp := $(shell echo "\#define KOKKOS_ENABLE_PROFILING" >> KokkosCore_config.tmp )
endif
tmp := $(shell echo "/* Optimization Settings */" >> KokkosCore_config.tmp)
@ -336,39 +394,44 @@ endif
tmp := $(shell echo "/* Cuda Settings */" >> KokkosCore_config.tmp)
ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
ifeq ($(KOKKOS_INTERNAL_CUDA_USE_LDG), 1)
tmp := $(shell echo "\#define KOKKOS_CUDA_USE_LDG_INTRINSIC 1" >> KokkosCore_config.tmp )
else
ifeq ($(KOKKOS_INTERNAL_COMPILER_CLANG), 1)
tmp := $(shell echo "\#define KOKKOS_CUDA_USE_LDG_INTRINSIC 1" >> KokkosCore_config.tmp )
endif
endif
ifeq ($(KOKKOS_INTERNAL_CUDA_USE_LDG), 1)
tmp := $(shell echo "\#define KOKKOS_CUDA_USE_LDG_INTRINSIC 1" >> KokkosCore_config.tmp )
endif
ifeq ($(KOKKOS_INTERNAL_CUDA_USE_UVM), 1)
tmp := $(shell echo "\#define KOKKOS_CUDA_USE_UVM 1" >> KokkosCore_config.tmp )
endif
ifeq ($(KOKKOS_INTERNAL_CUDA_USE_UVM), 1)
tmp := $(shell echo "\#define KOKKOS_CUDA_USE_UVM 1" >> KokkosCore_config.tmp )
tmp := $(shell echo "\#define KOKKOS_USE_CUDA_UVM 1" >> KokkosCore_config.tmp )
endif
ifeq ($(KOKKOS_INTERNAL_CUDA_USE_RELOC), 1)
tmp := $(shell echo "\#define KOKKOS_CUDA_USE_RELOCATABLE_DEVICE_CODE 1" >> KokkosCore_config.tmp )
KOKKOS_CXXFLAGS += --relocatable-device-code=true
KOKKOS_LDFLAGS += --relocatable-device-code=true
endif
ifeq ($(KOKKOS_INTERNAL_CUDA_USE_RELOC), 1)
tmp := $(shell echo "\#define KOKKOS_CUDA_USE_RELOCATABLE_DEVICE_CODE 1" >> KokkosCore_config.tmp )
KOKKOS_CXXFLAGS += --relocatable-device-code=true
KOKKOS_LDFLAGS += --relocatable-device-code=true
endif
ifeq ($(KOKKOS_INTERNAL_CUDA_USE_LAMBDA), 1)
ifeq ($(KOKKOS_INTERNAL_COMPILER_NVCC), 1)
ifeq ($(shell test $(KOKKOS_INTERNAL_COMPILER_NVCC_VERSION) -gt 70; echo $$?),0)
tmp := $(shell echo "\#define KOKKOS_CUDA_USE_LAMBDA 1" >> KokkosCore_config.tmp )
KOKKOS_CXXFLAGS += -expt-extended-lambda
else
$(warning Warning: Cuda Lambda support was requested but NVCC version is too low. This requires NVCC for Cuda version 7.5 or higher. Disabling Lambda support now.)
endif
endif
ifeq ($(KOKKOS_INTERNAL_CUDA_USE_LAMBDA), 1)
ifeq ($(KOKKOS_INTERNAL_COMPILER_NVCC), 1)
ifeq ($(shell test $(KOKKOS_INTERNAL_COMPILER_NVCC_VERSION) -gt 70; echo $$?),0)
ifeq ($(KOKKOS_INTERNAL_COMPILER_CLANG), 1)
tmp := $(shell echo "\#define KOKKOS_CUDA_USE_LAMBDA 1" >> KokkosCore_config.tmp )
KOKKOS_CXXFLAGS += -expt-extended-lambda
else
$(warning Warning: Cuda Lambda support was requested but NVCC version is too low. This requires NVCC for Cuda version 7.5 or higher. Disabling Lambda support now.)
endif
endif
ifeq ($(KOKKOS_INTERNAL_COMPILER_CLANG), 1)
tmp := $(shell echo "\#define KOKKOS_CUDA_USE_LAMBDA 1" >> KokkosCore_config.tmp )
tmp := $(shell echo "\#define KOKKOS_CUDA_CLANG_WORKAROUND" >> KokkosCore_config.tmp )
endif
endif
endif
# Add Architecture flags.
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_ARMV80), 1)
@ -469,7 +532,7 @@ ifeq ($(KOKKOS_INTERNAL_USE_ARCH_POWER9), 1)
endif
endif
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_AVX2), 1)
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_HSW), 1)
tmp := $(shell echo "\#define KOKKOS_ARCH_AVX2 1" >> KokkosCore_config.tmp )
ifeq ($(KOKKOS_INTERNAL_COMPILER_INTEL), 1)
@ -491,6 +554,28 @@ ifeq ($(KOKKOS_INTERNAL_USE_ARCH_AVX2), 1)
endif
endif
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_BDW), 1)
tmp := $(shell echo "\#define KOKKOS_ARCH_AVX2 1" >> KokkosCore_config.tmp )
ifeq ($(KOKKOS_INTERNAL_COMPILER_INTEL), 1)
KOKKOS_CXXFLAGS += -xCORE-AVX2
KOKKOS_LDFLAGS += -xCORE-AVX2
else
ifeq ($(KOKKOS_INTERNAL_COMPILER_CRAY), 1)
else
ifeq ($(KOKKOS_INTERNAL_COMPILER_PGI), 1)
KOKKOS_CXXFLAGS += -tp=haswell
KOKKOS_LDFLAGS += -tp=haswell
else
# Assume that this is a really a GNU compiler.
KOKKOS_CXXFLAGS += -march=core-avx2 -mtune=core-avx2 -mrtm
KOKKOS_LDFLAGS += -march=core-avx2 -mtune=core-avx2 -mrtm
endif
endif
endif
endif
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_AVX512MIC), 1)
tmp := $(shell echo "\#define KOKKOS_ARCH_AVX512MIC 1" >> KokkosCore_config.tmp )
@ -501,12 +586,12 @@ ifeq ($(KOKKOS_INTERNAL_USE_ARCH_AVX512MIC), 1)
ifeq ($(KOKKOS_INTERNAL_COMPILER_CRAY), 1)
else
ifeq ($(KOKKOS_INTERNAL_COMPILER_PGI), 1)
ifeq ($(KOKKOS_INTERNAL_COMPILER_PGI), 1)
else
# Asssume that this is really a GNU compiler.
KOKKOS_CXXFLAGS += -march=knl
KOKKOS_LDFLAGS += -march=knl
KOKKOS_CXXFLAGS += -march=knl -mtune=knl
KOKKOS_LDFLAGS += -march=knl -mtune=knl
endif
endif
endif
@ -526,8 +611,8 @@ ifeq ($(KOKKOS_INTERNAL_USE_ARCH_AVX512XEON), 1)
else
# Nothing here yet.
KOKKOS_CXXFLAGS += -march=skylake-avx512
KOKKOS_LDFLAGS += -march=skylake-avx512
KOKKOS_CXXFLAGS += -march=skylake-avx512 -mtune=skylake-avx512 -mrtm
KOKKOS_LDFLAGS += -march=skylake-avx512 -mtune=skylake-avx512 -mrtm
endif
endif
endif
@ -541,70 +626,67 @@ endif
# Figure out the architecture flag for Cuda.
ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
ifeq ($(KOKKOS_INTERNAL_COMPILER_NVCC), 1)
KOKKOS_INTERNAL_CUDA_ARCH_FLAG=-arch
endif
ifeq ($(KOKKOS_INTERNAL_COMPILER_CLANG), 1)
KOKKOS_INTERNAL_CUDA_ARCH_FLAG=--cuda-gpu-arch
KOKKOS_CXXFLAGS += -x cuda
endif
ifeq ($(KOKKOS_INTERNAL_COMPILER_NVCC), 1)
KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG=-arch
endif
ifeq ($(KOKKOS_INTERNAL_COMPILER_CLANG), 1)
KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG=--cuda-gpu-arch
KOKKOS_CXXFLAGS += -x cuda
endif
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_KEPLER30), 1)
tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER 1" >> KokkosCore_config.tmp )
tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER30 1" >> KokkosCore_config.tmp )
KOKKOS_INTERNAL_CUDA_ARCH_FLAG := $(KOKKOS_INTERNAL_CUDA_ARCH_FLAG)=sm_30
endif
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_KEPLER32), 1)
tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER 1" >> KokkosCore_config.tmp )
tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER32 1" >> KokkosCore_config.tmp )
KOKKOS_INTERNAL_CUDA_ARCH_FLAG := $(KOKKOS_INTERNAL_CUDA_ARCH_FLAG)=sm_32
endif
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_KEPLER35), 1)
tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER 1" >> KokkosCore_config.tmp )
tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER35 1" >> KokkosCore_config.tmp )
KOKKOS_INTERNAL_CUDA_ARCH_FLAG := $(KOKKOS_INTERNAL_CUDA_ARCH_FLAG)=sm_35
endif
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_KEPLER37), 1)
tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER 1" >> KokkosCore_config.tmp )
tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER37 1" >> KokkosCore_config.tmp )
KOKKOS_INTERNAL_CUDA_ARCH_FLAG := $(KOKKOS_INTERNAL_CUDA_ARCH_FLAG)=sm_37
endif
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_MAXWELL50), 1)
tmp := $(shell echo "\#define KOKKOS_ARCH_MAXWELL 1" >> KokkosCore_config.tmp )
tmp := $(shell echo "\#define KOKKOS_ARCH_MAXWELL50 1" >> KokkosCore_config.tmp )
KOKKOS_INTERNAL_CUDA_ARCH_FLAG := $(KOKKOS_INTERNAL_CUDA_ARCH_FLAG)=sm_50
endif
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_MAXWELL52), 1)
tmp := $(shell echo "\#define KOKKOS_ARCH_MAXWELL 1" >> KokkosCore_config.tmp )
tmp := $(shell echo "\#define KOKKOS_ARCH_MAXWELL52 1" >> KokkosCore_config.tmp )
KOKKOS_INTERNAL_CUDA_ARCH_FLAG := $(KOKKOS_INTERNAL_CUDA_ARCH_FLAG)=sm_52
endif
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_MAXWELL53), 1)
tmp := $(shell echo "\#define KOKKOS_ARCH_MAXWELL 1" >> KokkosCore_config.tmp )
tmp := $(shell echo "\#define KOKKOS_ARCH_MAXWELL53 1" >> KokkosCore_config.tmp )
KOKKOS_INTERNAL_CUDA_ARCH_FLAG := $(KOKKOS_INTERNAL_CUDA_ARCH_FLAG)=sm_53
endif
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_PASCAL60), 1)
tmp := $(shell echo "\#define KOKKOS_ARCH_PASCAL 1" >> KokkosCore_config.tmp )
tmp := $(shell echo "\#define KOKKOS_ARCH_PASCAL60 1" >> KokkosCore_config.tmp )
KOKKOS_INTERNAL_CUDA_ARCH_FLAG := $(KOKKOS_INTERNAL_CUDA_ARCH_FLAG)=sm_60
endif
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_PASCAL61), 1)
tmp := $(shell echo "\#define KOKKOS_ARCH_PASCAL 1" >> KokkosCore_config.tmp )
tmp := $(shell echo "\#define KOKKOS_ARCH_PASCAL61 1" >> KokkosCore_config.tmp )
KOKKOS_INTERNAL_CUDA_ARCH_FLAG := $(KOKKOS_INTERNAL_CUDA_ARCH_FLAG)=sm_61
endif
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_KEPLER30), 1)
tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER 1" >> KokkosCore_config.tmp )
tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER30 1" >> KokkosCore_config.tmp )
KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_30
KOKKOS_LDFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_30
endif
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_KEPLER32), 1)
tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER 1" >> KokkosCore_config.tmp )
tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER32 1" >> KokkosCore_config.tmp )
KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_32
KOKKOS_LDFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_32
endif
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_KEPLER35), 1)
tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER 1" >> KokkosCore_config.tmp )
tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER35 1" >> KokkosCore_config.tmp )
KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_35
KOKKOS_LDFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_35
endif
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_KEPLER37), 1)
tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER 1" >> KokkosCore_config.tmp )
tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER37 1" >> KokkosCore_config.tmp )
KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_37
KOKKOS_LDFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_37
endif
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_MAXWELL50), 1)
tmp := $(shell echo "\#define KOKKOS_ARCH_MAXWELL 1" >> KokkosCore_config.tmp )
tmp := $(shell echo "\#define KOKKOS_ARCH_MAXWELL50 1" >> KokkosCore_config.tmp )
KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_50
KOKKOS_LDFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_50
endif
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_MAXWELL52), 1)
tmp := $(shell echo "\#define KOKKOS_ARCH_MAXWELL 1" >> KokkosCore_config.tmp )
tmp := $(shell echo "\#define KOKKOS_ARCH_MAXWELL52 1" >> KokkosCore_config.tmp )
KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_52
KOKKOS_LDFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_52
endif
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_MAXWELL53), 1)
tmp := $(shell echo "\#define KOKKOS_ARCH_MAXWELL 1" >> KokkosCore_config.tmp )
tmp := $(shell echo "\#define KOKKOS_ARCH_MAXWELL53 1" >> KokkosCore_config.tmp )
KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_53
KOKKOS_LDFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_53
endif
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_PASCAL61), 1)
tmp := $(shell echo "\#define KOKKOS_ARCH_PASCAL 1" >> KokkosCore_config.tmp )
tmp := $(shell echo "\#define KOKKOS_ARCH_PASCAL61 1" >> KokkosCore_config.tmp )
KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_61
KOKKOS_LDFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_61
endif
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_PASCAL60), 1)
tmp := $(shell echo "\#define KOKKOS_ARCH_PASCAL 1" >> KokkosCore_config.tmp )
tmp := $(shell echo "\#define KOKKOS_ARCH_PASCAL60 1" >> KokkosCore_config.tmp )
KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_60
KOKKOS_LDFLAGS += $(KOKKOS_INTERNAL_COMPILER_CUDA_ARCH_FLAG)=sm_60
endif
ifneq ($(KOKKOS_INTERNAL_USE_ARCH_NVIDIA), 0)
KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_CUDA_ARCH_FLAG)
ifeq ($(KOKKOS_INTERNAL_COMPILER_NVCC), 1)
KOKKOS_LDFLAGS += $(KOKKOS_INTERNAL_CUDA_ARCH_FLAG)
endif
endif
endif
KOKKOS_INTERNAL_LS_CONFIG := $(shell ls KokkosCore_config.h)
@ -630,9 +712,24 @@ KOKKOS_SRC += $(wildcard $(KOKKOS_PATH)/containers/src/impl/*.cpp)
ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
KOKKOS_SRC += $(wildcard $(KOKKOS_PATH)/core/src/Cuda/*.cpp)
KOKKOS_HEADERS += $(wildcard $(KOKKOS_PATH)/core/src/Cuda/*.hpp)
KOKKOS_CXXFLAGS += -I$(CUDA_PATH)/include
KOKKOS_CPPFLAGS += -I$(CUDA_PATH)/include
KOKKOS_LDFLAGS += -L$(CUDA_PATH)/lib64
KOKKOS_LIBS += -lcudart -lcuda
ifeq ($(KOKKOS_INTERNAL_COMPILER_CLANG), 1)
KOKKOS_CXXFLAGS += --cuda-path=$(CUDA_PATH)
endif
endif
ifeq ($(KOKKOS_INTERNAL_USE_OPENMPTARGET), 1)
KOKKOS_SRC += $(KOKKOS_PATH)/core/src/OpenMPTarget/Kokkos_OpenMPTarget_Exec.cpp $(KOKKOS_PATH)/core/src/OpenMPTarget/Kokkos_OpenMPTargetSpace.cpp
KOKKOS_HEADERS += $(wildcard $(KOKKOS_PATH)/core/src/OpenMPTarget/*.hpp)
ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
KOKKOS_CXXFLAGS += -Xcompiler $(KOKKOS_INTERNAL_OPENMPTARGET_FLAG)
else
KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_OPENMPTARGET_FLAG)
endif
KOKKOS_LDFLAGS += $(KOKKOS_INTERNAL_OPENMPTARGET_FLAG)
endif
ifeq ($(KOKKOS_INTERNAL_USE_OPENMP), 1)
@ -666,10 +763,27 @@ endif
ifeq ($(KOKKOS_INTERNAL_COMPILER_CLANG), 1)
KOKKOS_INTERNAL_GCC_PATH = $(shell which g++)
KOKKOS_INTERNAL_GCC_TOOLCHAIN = $(KOKKOS_INTERNAL_GCC_PATH:/bin/g++=)
KOKKOS_CXXFLAGS += --gcc-toolchain=$(KOKKOS_INTERNAL_GCC_TOOLCHAIN) -DKOKKOS_CUDA_CLANG_WORKAROUND -DKOKKOS_CUDA_USE_LDG_INTRINSIC
KOKKOS_CXXFLAGS += --gcc-toolchain=$(KOKKOS_INTERNAL_GCC_TOOLCHAIN)
KOKKOS_LDFLAGS += --gcc-toolchain=$(KOKKOS_INTERNAL_GCC_TOOLCHAIN)
endif
# Don't include Kokkos_HBWSpace.cpp if not using MEMKIND to avoid a link warning.
ifneq ($(KOKKOS_INTERNAL_USE_MEMKIND), 1)
KOKKOS_SRC := $(filter-out $(KOKKOS_PATH)/core/src/impl/Kokkos_HBWSpace.cpp,$(KOKKOS_SRC))
endif
# Don't include Kokkos_Profiling_Interface.cpp if not using profiling to avoid a link warning.
ifeq ($(KOKKOS_INTERNAL_DISABLE_PROFILING), 1)
KOKKOS_SRC := $(filter-out $(KOKKOS_PATH)/core/src/impl/Kokkos_Profiling_Interface.cpp,$(KOKKOS_SRC))
endif
# Don't include Kokkos_Serial.cpp or Kokkos_Serial_Task.cpp if not using Serial
# device to avoid a link warning.
ifneq ($(KOKKOS_INTERNAL_USE_SERIAL), 1)
KOKKOS_SRC := $(filter-out $(KOKKOS_PATH)/core/src/impl/Kokkos_Serial.cpp,$(KOKKOS_SRC))
KOKKOS_SRC := $(filter-out $(KOKKOS_PATH)/core/src/impl/Kokkos_Serial_Task.cpp,$(KOKKOS_SRC))
endif
# With Cygwin functions such as fdopen and fileno are not defined
# when strict ansi is enabled. strict ansi gets enabled with --std=c++11
# though. So we hard undefine it here. Not sure if that has any bad side effects

View File

@ -53,11 +53,20 @@ Kokkos_Qthreads_Task.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/Qthreads/K
endif
ifeq ($(KOKKOS_INTERNAL_USE_OPENMP), 1)
Kokkos_OpenMPexec.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/OpenMP/Kokkos_OpenMPexec.cpp
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/OpenMP/Kokkos_OpenMPexec.cpp
Kokkos_OpenMP_Exec.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/OpenMP/Kokkos_OpenMP_Exec.cpp
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/OpenMP/Kokkos_OpenMP_Exec.cpp
Kokkos_OpenMP_Task.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/OpenMP/Kokkos_OpenMP_Task.cpp
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/OpenMP/Kokkos_OpenMP_Task.cpp
endif
ifeq ($(KOKKOS_INTERNAL_USE_OPENMPTARGET), 1)
Kokkos_OpenMPTarget_Exec.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/OpenMPTarget/Kokkos_OpenMPTarget_Exec.cpp
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/OpenMPTarget/Kokkos_OpenMPTarget_Exec.cpp
Kokkos_OpenMPTargetSpace.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/OpenMPTarget/Kokkos_OpenMPTargetSpace.cpp
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/OpenMPTarget/Kokkos_OpenMPTargetSpace.cpp
#Kokkos_OpenMPTarget_Task.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/OpenMPTarget/Kokkos_OpenMPTarget_Task.cpp
# $(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/OpenMPTarget/Kokkos_OpenMPTarget_Task.cpp
endif
Kokkos_HBWSpace.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/impl/Kokkos_HBWSpace.cpp
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/impl/Kokkos_HBWSpace.cpp

View File

@ -0,0 +1 @@
void KOKKOS_ALGORITHMS_SRC_DUMMY_PREVENT_LINK_ERROR() {}

View File

@ -674,7 +674,7 @@ namespace Kokkos {
const double V = 2.0*drand() - 1.0;
S = U*U+V*V;
}
return U*sqrt(-2.0*log(S)/S);
return U*std::sqrt(-2.0*log(S)/S);
}
KOKKOS_INLINE_FUNCTION
@ -917,7 +917,7 @@ namespace Kokkos {
const double V = 2.0*drand() - 1.0;
S = U*U+V*V;
}
return U*sqrt(-2.0*log(S)/S);
return U*std::sqrt(-2.0*log(S)/S);
}
KOKKOS_INLINE_FUNCTION
@ -1171,7 +1171,7 @@ namespace Kokkos {
const double V = 2.0*drand() - 1.0;
S = U*U+V*V;
}
return U*sqrt(-2.0*log(S)/S);
return U*std::sqrt(-2.0*log(S)/S);
}
KOKKOS_INLINE_FUNCTION

View File

@ -8,7 +8,7 @@ default: build_all
echo "End Build"
ifneq (,$(findstring Cuda,$(KOKKOS_DEVICES)))
CXX = $(KOKKOS_PATH)/config/nvcc_wrapper
CXX = $(KOKKOS_PATH)/bin/nvcc_wrapper
else
CXX = g++
endif
@ -21,8 +21,8 @@ include $(KOKKOS_PATH)/Makefile.kokkos
KOKKOS_CXXFLAGS += -I$(GTEST_PATH) -I${KOKKOS_PATH}/algorithms/unit_tests
TEST_TARGETS =
TARGETS =
TEST_TARGETS =
TARGETS =
ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
OBJ_CUDA = TestCuda.o UnitTestMain.o gtest-all.o
@ -49,16 +49,16 @@ ifeq ($(KOKKOS_INTERNAL_USE_SERIAL), 1)
endif
KokkosAlgorithms_UnitTest_Cuda: $(OBJ_CUDA) $(KOKKOS_LINK_DEPENDS)
$(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) $(OBJ_CUDA) $(KOKKOS_LIBS) $(LIB) -o KokkosAlgorithms_UnitTest_Cuda
$(LINK) $(EXTRA_PATH) $(OBJ_CUDA) $(KOKKOS_LIBS) $(LIB) $(KOKKOS_LDFLAGS) $(LDFLAGS) -o KokkosAlgorithms_UnitTest_Cuda
KokkosAlgorithms_UnitTest_Threads: $(OBJ_THREADS) $(KOKKOS_LINK_DEPENDS)
$(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) $(OBJ_THREADS) $(KOKKOS_LIBS) $(LIB) -o KokkosAlgorithms_UnitTest_Threads
$(LINK) $(EXTRA_PATH) $(OBJ_THREADS) $(KOKKOS_LIBS) $(LIB) $(KOKKOS_LDFLAGS) $(LDFLAGS) -o KokkosAlgorithms_UnitTest_Threads
KokkosAlgorithms_UnitTest_OpenMP: $(OBJ_OPENMP) $(KOKKOS_LINK_DEPENDS)
$(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) $(OBJ_OPENMP) $(KOKKOS_LIBS) $(LIB) -o KokkosAlgorithms_UnitTest_OpenMP
$(LINK) $(EXTRA_PATH) $(OBJ_OPENMP) $(KOKKOS_LIBS) $(LIB) $(KOKKOS_LDFLAGS) $(LDFLAGS) -o KokkosAlgorithms_UnitTest_OpenMP
KokkosAlgorithms_UnitTest_Serial: $(OBJ_SERIAL) $(KOKKOS_LINK_DEPENDS)
$(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) $(OBJ_SERIAL) $(KOKKOS_LIBS) $(LIB) -o KokkosAlgorithms_UnitTest_Serial
$(LINK) $(EXTRA_PATH) $(OBJ_SERIAL) $(KOKKOS_LIBS) $(LIB) $(KOKKOS_LDFLAGS) $(LDFLAGS) -o KokkosAlgorithms_UnitTest_Serial
test-cuda: KokkosAlgorithms_UnitTest_Cuda
./KokkosAlgorithms_UnitTest_Cuda
@ -76,7 +76,7 @@ build_all: $(TARGETS)
test: $(TEST_TARGETS)
clean: kokkos-clean
clean: kokkos-clean
rm -f *.o $(TARGETS)
# Compilation rules
@ -84,6 +84,5 @@ clean: kokkos-clean
%.o:%.cpp $(KOKKOS_CPP_DEPENDS)
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $<
gtest-all.o:$(GTEST_PATH)/gtest/gtest-all.cc
gtest-all.o:$(GTEST_PATH)/gtest/gtest-all.cc
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $(GTEST_PATH)/gtest/gtest-all.cc

View File

@ -1,13 +1,13 @@
/*
//@HEADER
// ************************************************************************
//
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
@ -36,12 +36,15 @@
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
//
// ************************************************************************
//@HEADER
*/
#include <stdint.h>
#include <Kokkos_Macros.hpp>
#ifdef KOKKOS_ENABLE_CUDA
#include <cstdint>
#include <iostream>
#include <iomanip>
@ -49,8 +52,6 @@
#include <Kokkos_Core.hpp>
#ifdef KOKKOS_ENABLE_CUDA
#include <TestRandom.hpp>
#include <TestSort.hpp>
@ -105,6 +106,7 @@ CUDA_SORT_UNSIGNED(171)
#undef CUDA_RANDOM_XORSHIFT1024
#undef CUDA_SORT_UNSIGNED
}
#else
void KOKKOS_ALGORITHMS_UNITTESTS_TESTCUDA_PREVENT_LINK_ERROR() {}
#endif /* #ifdef KOKKOS_ENABLE_CUDA */

View File

@ -1,13 +1,13 @@
/*
//@HEADER
// ************************************************************************
//
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
@ -36,13 +36,16 @@
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
//
// ************************************************************************
//@HEADER
*/
#include <gtest/gtest.h>
#include <Kokkos_Macros.hpp>
#ifdef KOKKOS_ENABLE_OPENMP
#include <gtest/gtest.h>
#include <Kokkos_Core.hpp>
//----------------------------------------------------------------------------
@ -52,7 +55,6 @@
namespace Test {
#ifdef KOKKOS_ENABLE_OPENMP
class openmp : public ::testing::Test {
protected:
static void SetUpTestCase()
@ -97,6 +99,8 @@ OPENMP_SORT_UNSIGNED(171)
#undef OPENMP_RANDOM_XORSHIFT64
#undef OPENMP_RANDOM_XORSHIFT1024
#undef OPENMP_SORT_UNSIGNED
#endif
} // namespace test
#else
void KOKKOS_ALGORITHMS_UNITTESTS_TESTOPENMP_PREVENT_LINK_ERROR() {}
#endif

View File

@ -295,7 +295,7 @@ struct test_random_scalar {
parallel_reduce (num_draws/1024, functor_type (pool, density_1d, density_3d), result);
//printf("Result: %lf %lf %lf\n",result.mean/num_draws/3,result.variance/num_draws/3,result.covariance/num_draws/2);
double tolerance = 1.6*sqrt(1.0/num_draws);
double tolerance = 1.6*std::sqrt(1.0/num_draws);
double mean_expect = 0.5*Kokkos::rand<rnd_type,Scalar>::max();
double variance_expect = 1.0/3.0*mean_expect*mean_expect;
double mean_eps = mean_expect/(result.mean/num_draws/3)-1.0;
@ -321,7 +321,7 @@ struct test_random_scalar {
typedef test_histogram1d_functor<typename RandomGenerator::device_type> functor_type;
parallel_reduce (HIST_DIM1D, functor_type (density_1d, num_draws), result);
double tolerance = 6*sqrt(1.0/HIST_DIM1D);
double tolerance = 6*std::sqrt(1.0/HIST_DIM1D);
double mean_expect = 1.0*num_draws*3/HIST_DIM1D;
double variance_expect = 1.0*num_draws*3/HIST_DIM1D*(1.0-1.0/HIST_DIM1D);
double covariance_expect = -1.0*num_draws*3/HIST_DIM1D/HIST_DIM1D;
@ -354,7 +354,7 @@ struct test_random_scalar {
typedef test_histogram3d_functor<typename RandomGenerator::device_type> functor_type;
parallel_reduce (HIST_DIM1D, functor_type (density_3d, num_draws), result);
double tolerance = 6*sqrt(1.0/HIST_DIM1D);
double tolerance = 6*std::sqrt(1.0/HIST_DIM1D);
double mean_expect = 1.0*num_draws/HIST_DIM1D;
double variance_expect = 1.0*num_draws/HIST_DIM1D*(1.0-1.0/HIST_DIM1D);
double covariance_expect = -1.0*num_draws/HIST_DIM1D/HIST_DIM1D;

View File

@ -1,13 +1,13 @@
/*
//@HEADER
// ************************************************************************
//
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
@ -36,11 +36,14 @@
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
//
// ************************************************************************
//@HEADER
*/
#include <Kokkos_Macros.hpp>
#ifdef KOKKOS_ENABLE_SERIAL
#include <gtest/gtest.h>
#include <Kokkos_Core.hpp>
@ -55,7 +58,6 @@
namespace Test {
#ifdef KOKKOS_ENABLE_SERIAL
class serial : public ::testing::Test {
protected:
static void SetUpTestCase()
@ -93,7 +95,9 @@ SERIAL_SORT_UNSIGNED(171)
#undef SERIAL_RANDOM_XORSHIFT1024
#undef SERIAL_SORT_UNSIGNED
#endif // KOKKOS_ENABLE_SERIAL
} // namespace Test
#else
void KOKKOS_ALGORITHMS_UNITTESTS_TESTSERIAL_PREVENT_LINK_ERROR() {}
#endif // KOKKOS_ENABLE_SERIAL

View File

@ -1,12 +1,12 @@
//@HEADER
// ************************************************************************
//
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
@ -35,12 +35,12 @@
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
//
// ************************************************************************
//@HEADER
#ifndef TESTSORT_HPP_
#define TESTSORT_HPP_
#ifndef KOKKOS_ALGORITHMS_UNITTESTS_TESTSORT_HPP
#define KOKKOS_ALGORITHMS_UNITTESTS_TESTSORT_HPP
#include <gtest/gtest.h>
#include<Kokkos_Core.hpp>
@ -212,7 +212,12 @@ void test_dynamic_view_sort(unsigned int n )
const size_t upper_bound = 2 * n ;
typename KeyDynamicViewType::memory_pool
pool( memory_space() , 2 * n * sizeof(KeyType) );
pool( memory_space()
, n * sizeof(KeyType) * 1.2
, 500 /* min block size in bytes */
, 30000 /* max block size in bytes */
, 1000000 /* min superblock size in bytes */
);
KeyDynamicViewType keys("Keys",pool,upper_bound);
@ -272,4 +277,4 @@ void test_sort(unsigned int N)
}
}
#endif /* TESTSORT_HPP_ */
#endif /* KOKKOS_ALGORITHMS_UNITTESTS_TESTSORT_HPP */

View File

@ -1,13 +1,13 @@
/*
//@HEADER
// ************************************************************************
//
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
@ -36,11 +36,14 @@
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
//
// ************************************************************************
//@HEADER
*/
#include <Kokkos_Macros.hpp>
#ifdef KOKKOS_ENABLE_THREADS
#include <gtest/gtest.h>
#include <Kokkos_Core.hpp>
@ -55,7 +58,6 @@
namespace Test {
#ifdef KOKKOS_ENABLE_PTHREAD
class threads : public ::testing::Test {
protected:
static void SetUpTestCase()
@ -107,7 +109,9 @@ THREADS_SORT_UNSIGNED(171)
#undef THREADS_RANDOM_XORSHIFT1024
#undef THREADS_SORT_UNSIGNED
#endif
} // namespace Test
#else
void KOKKOS_ALGORITHMS_UNITTESTS_TESTTHREADS_PREVENT_LINK_ERROR() {}
#endif

View File

@ -7,7 +7,7 @@ default: build
echo "Start Build"
ifneq (,$(findstring Cuda,$(KOKKOS_DEVICES)))
CXX = ${KOKKOS_PATH}/config/nvcc_wrapper
CXX = ${KOKKOS_PATH}/bin/nvcc_wrapper
EXE = bytes_and_flops.cuda
KOKKOS_DEVICES = "Cuda,OpenMP"
KOKKOS_ARCH = "SNB,Kepler35"
@ -22,7 +22,7 @@ CXXFLAGS = -O3 -g
DEPFLAGS = -M
LINK = ${CXX}
LINKFLAGS =
LINKFLAGS =
OBJ = $(SRC:.cpp=.o)
LIB =
@ -34,7 +34,7 @@ build: $(EXE)
$(EXE): $(OBJ) $(KOKKOS_LINK_DEPENDS)
$(LINK) $(KOKKOS_LDFLAGS) $(LINKFLAGS) $(EXTRA_PATH) $(OBJ) $(KOKKOS_LIBS) $(LIB) -o $(EXE)
clean: kokkos-clean
clean: kokkos-clean
rm -f *.o *.cuda *.host
# Compilation rules

View File

@ -7,7 +7,7 @@ default: build
echo "Start Build"
ifneq (,$(findstring Cuda,$(KOKKOS_DEVICES)))
CXX = ${KOKKOS_PATH}/config/nvcc_wrapper
CXX = ${KOKKOS_PATH}/bin/nvcc_wrapper
EXE = gather.cuda
KOKKOS_DEVICES = "Cuda,OpenMP"
KOKKOS_ARCH = "SNB,Kepler35"
@ -22,7 +22,7 @@ CXXFLAGS = -O3 -g
DEPFLAGS = -M
LINK = ${CXX}
LINKFLAGS =
LINKFLAGS =
OBJ = $(SRC:.cpp=.o)
LIB =
@ -35,10 +35,10 @@ build: $(EXE)
$(EXE): $(OBJ) $(KOKKOS_LINK_DEPENDS)
$(LINK) $(KOKKOS_LDFLAGS) $(LINKFLAGS) $(EXTRA_PATH) $(OBJ) $(KOKKOS_LIBS) $(LIB) -o $(EXE)
clean: kokkos-clean
clean: kokkos-clean
rm -f *.o *.cuda *.host
# Compilation rules
%.o:%.cpp $(KOKKOS_CPP_DEPENDS) gather_unroll.hpp gather.hpp
%.o:%.cpp $(KOKKOS_CPP_DEPENDS) gather_unroll.hpp gather.hpp
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $<

View File

@ -0,0 +1,18 @@
# - Config file for the Kokkos package
# It defines the following variables
# Kokkos_INCLUDE_DIRS - include directories for Kokkos
# Kokkos_LIBRARIES - libraries to link against
# Compute paths
GET_FILENAME_COMPONENT(Kokkos_CMAKE_DIR "${CMAKE_CURRENT_LIST_FILE}" PATH)
SET(Kokkos_INCLUDE_DIRS "@CONF_INCLUDE_DIRS@")
# Our library dependencies (contains definitions for IMPORTED targets)
IF(NOT TARGET kokkos AND NOT Kokkos_BINARY_DIR)
INCLUDE("${Kokkos_CMAKE_DIR}/KokkosTargets.cmake")
ENDIF()
# These are IMPORTED targets created by KokkosTargets.cmake
SET(Kokkos_LIBRARY_DIRS @INSTALL_LIB_DIR@)
SET(Kokkos_LIBRARIES @Kokkos_LIBRARIES_NAMES@)
SET(Kokkos_TPL_LIBRARIES @KOKKOS_LIBS@)

View File

@ -0,0 +1,20 @@
#.rst:
# FindHWLOC
# ----------
#
# Try to find HWLOC.
#
# The following variables are defined:
#
# HWLOC_FOUND - System has HWLOC
# HWLOC_INCLUDE_DIR - HWLOC include directory
# HWLOC_LIBRARIES - Libraries needed to use HWLOC
find_path(HWLOC_INCLUDE_DIR hwloc.h)
find_library(HWLOC_LIBRARIES hwloc)
include(FindPackageHandleStandardArgs)
find_package_handle_standard_args(HWLOC DEFAULT_MSG
HWLOC_INCLUDE_DIR HWLOC_LIBRARIES)
mark_as_advanced(HWLOC_INCLUDE_DIR HWLOC_LIBRARIES)

View File

@ -0,0 +1,20 @@
#.rst:
# FindMemkind
# ----------
#
# Try to find Memkind.
#
# The following variables are defined:
#
# MEMKIND_FOUND - System has Memkind
# MEMKIND_INCLUDE_DIR - Memkind include directory
# MEMKIND_LIBRARIES - Libraries needed to use Memkind
find_path(MEMKIND_INCLUDE_DIR memkind.h)
find_library(MEMKIND_LIBRARIES memkind)
include(FindPackageHandleStandardArgs)
find_package_handle_standard_args(Memkind DEFAULT_MSG
MEMKIND_INCLUDE_DIR MEMKIND_LIBRARIES)
mark_as_advanced(MEMKIND_INCLUDE_DIR MEMKIND_LIBRARIES)

View File

@ -0,0 +1,20 @@
#.rst:
# FindQthreads
# ----------
#
# Try to find Qthreads.
#
# The following variables are defined:
#
# QTHREADS_FOUND - System has Qthreads
# QTHREADS_INCLUDE_DIR - Qthreads include directory
# QTHREADS_LIBRARIES - Libraries needed to use Qthreads
find_path(QTHREADS_INCLUDE_DIR qthread.h)
find_library(QTHREADS_LIBRARIES qthread)
include(FindPackageHandleStandardArgs)
find_package_handle_standard_args(Qthreads DEFAULT_MSG
QTHREADS_INCLUDE_DIR QTHREADS_LIBRARIES)
mark_as_advanced(QTHREADS_INCLUDE_DIR QTHREADS_LIBRARIES)

File diff suppressed because it is too large Load Diff

View File

@ -60,34 +60,12 @@ Step 2:
// -------------------------------------------------------------------------------- //
Step 3:
3.1. Build and test Trilinos with 3 different configurations; a configure-all script is provided in Trilinos and should be modified to test each of the following 3 configurations with appropriate environment variable(s):
3.1. Build and test Trilinos with 4 different configurations; Run scripts for white and shepard are provided in kokkos/config/trilinos-integration
- GCC/4.7.2-OpenMP/Complex
Run tests with the following environment variable:
Usually its a good idea to run those script via nohup.
You can run all four at the same time, use separate directories for each.
export OMP_NUM_THREADS=2
- Intel/15.0.2-Serial/NoComplex
- GCC/4.8.4/CUDA/7.5.18-Cuda/Serial/NoComplex
Run tests with the following environment variables:
export CUDA_LAUNCH_BLOCKING=1
export CUDA_MANAGED_FORCE_DEVICE_ALLOC=1
mkdir Build
cd Build
cp TRILINOS_PATH/sampleScripts/Sandia-SEMS/configure-all ./
** Set the path to Trilinos appropriately within the configure-all script **
source $SEMS_MODULE_ROOT/utils/sems-modules-init.sh kokkos
source configure-all
make -k (-k means "keep going" to get past build errors; -j12 can also be specified to build with 12 threads, for example)
ctest
3.2. Compare the failed test output to the test output on the dashboard ( testing.sandia.gov/cdash select Trilinos ); investigate and fix problems if new tests fail after the Kokkos snapshot
3.2. Compare the failed test output between the pristine and the updated runs; investigate and fix problems if new tests fail after the Kokkos snapshot
// -------------------------------------------------------------------------------- //
@ -134,7 +112,7 @@ Step 4: Once all Trilinos tests pass promote Kokkos develop branch to master on
master: sha1
develop: sha1
git push --follow-tags origin master
4.4. Do NOT push yet
// -------------------------------------------------------------------------------- //
@ -156,9 +134,15 @@ Step 5:
python KOKKOS_PATH/config/snapshot.py KOKKOS_PATH TRILINOS_PATH/packages
5.3. Push the updated develop branch of Trilinos to Github - congratulations!!!
5.3. Run checkin-test to push to trilinos using the CI build modules (gcc/4.9.3)
(From Trilinos directory):
git push
The modules are listed in kokkos/config/trilinos-integration/checkin-test
Run checkin-test, forward dependencies and optional dependencies must be enabled
If push failed because someone else clearly broke something, push manually.
If push failed for unclear reasons, investigate, fix, and potentially start over from step 2 after reseting your local kokkos/master branch
Step 6: Push Kokkos to master
git push --follow-tags origin master
// -------------------------------------------------------------------------------- //

View File

@ -13,7 +13,7 @@
# module load cmake/2.8.11.2 gcc/4.8.3 cuda/6.5.14 nvcc-wrapper/gnu
#
# The 'nvcc-wrapper' module should load a script that matches
# kokkos/config/nvcc_wrapper
# kokkos/bin/nvcc_wrapper
#
#-----------------------------------------------------------------------------
# Source and installation directories:

View File

@ -13,7 +13,7 @@
# module load cmake/2.8.11.2 gcc/4.8.3 cuda/6.5.14 nvcc-wrapper/gnu
#
# The 'nvcc-wrapper' module should load a script that matches
# kokkos/config/nvcc_wrapper
# kokkos/bin/nvcc_wrapper
#
#-----------------------------------------------------------------------------
# Source and installation directories:

View File

@ -13,7 +13,7 @@
# module load cmake/2.8.11.2 gcc/4.8.3 cuda/6.5.14 nvcc-wrapper/gnu
#
# The 'nvcc-wrapper' module should load a script that matches
# kokkos/config/nvcc_wrapper
# kokkos/bin/nvcc_wrapper
#
#-----------------------------------------------------------------------------
# Source and installation directories:

View File

@ -5,4 +5,5 @@ tag: 2.02.00 date: 10:30:2016 master: 6c90a581 develop: ca3dd56e
tag: 2.02.01 date: 11:01:2016 master: 9c698c86 develop: b0072304
tag: 2.02.07 date: 12:16:2016 master: 4b4cc4ba develop: 382c0966
tag: 2.02.15 date: 02:10:2017 master: 8c64cd93 develop: 28dea8b6
tag: 2.03.00 date: 04:25:2017 master: 120d9ce7 develop: 015ba641
tag: 2.03.00 date: 04:25:2017 master: 120d9ce7 develop: 015ba641
tag: 2.03.05 date: 05:27:2017 master: 36b92f43 develop: 79073186

View File

@ -27,7 +27,7 @@ import subprocess, argparse, re, doctest, os, datetime, traceback
def parse_cmdline(description):
parser = argparse.ArgumentParser(usage="snapshot.py [options] source destination", description=description)
parser.add_argument("-n", "--no-comit", action="store_false", dest="create_commit", default=True,
parser.add_argument("-n", "--no-commit", action="store_false", dest="create_commit", default=True,
help="Do not perform a commit or create a commit message.")
parser.add_argument("-v", "--verbose", action="store_true", dest="verbose_mode", default=False,
help="Enable verbose mode.")
@ -39,6 +39,8 @@ def parse_cmdline(description):
help="Type of repository of the source, use none to skip all repository operations.")
parser.add_argument("--dest-repo", choices=["git","none"], default="",
help="Type of repository of the destination, use none to skip all repository operations.")
parser.add_argument("--small", action="store_true", dest="small_mode",
help="Don't include tests and other extra files when copying.")
parser.add_argument("source", help="Source project to snapshot from.")
parser.add_argument("destination", help="Destination to snapshot too.")
@ -58,9 +60,9 @@ def validate_options(options):
options.source = os.path.abspath(options.source)
options.destination = os.path.abspath(options.destination)
if os.path.exists(options.source):
apparent_source_repo_type, source_root = deterimine_repo_type(options.source)
apparent_source_repo_type, source_root = determine_repo_type(options.source)
else:
raise RuntimeError("Could not find source directory of %s." % options.source)
options.source_root = source_root
@ -69,7 +71,7 @@ def validate_options(options):
print "Could not find destination directory of %s so it will be created." % options.destination
os.makedirs(options.destination)
apparent_dest_repo_type, dest_root = deterimine_repo_type(options.destination)
apparent_dest_repo_type, dest_root = determine_repo_type(options.destination)
options.dest_root = dest_root
#error on svn repo types for now
@ -111,7 +113,7 @@ def run_cmd(cmd, options, working_dir="."):
print "==== %s stderr ====" % cmd_str
print proc_stderr
print "==== %s stderr ====" % cmd_str
if ret_val != 0:
raise RuntimeError("Command '%s' failed with error code %d. Error message:%s%s%sstdout:%s" % \
(cmd_str, ret_val, os.linesep, proc_stderr, os.linesep, proc_stdout))
@ -119,7 +121,7 @@ def run_cmd(cmd, options, working_dir="."):
return proc_stdout, proc_stderr
#end run_cmd
def deterimine_repo_type(location):
def determine_repo_type(location):
apparent_repo_type = "none"
while location != "":
@ -133,16 +135,32 @@ def deterimine_repo_type(location):
location = location[:location.rfind(os.sep)]
return apparent_repo_type, location
#end deterimine_repo_type
#end determine_repo_type
def rsync(source, dest, options):
rsync_cmd = ["rsync", "-ar", "--delete"]
if options.debug_mode:
rsync_cmd.append("-v")
if options.small_mode or options.source_repo == "git":
rsync_cmd.append("--delete-excluded")
if options.small_mode:
rsync_cmd.append("--include=config/master_history.txt")
rsync_cmd.append("--include=cmake/tpls")
rsync_cmd.append("--exclude=benchmarks/")
rsync_cmd.append("--exclude=config/*")
rsync_cmd.append("--exclude=doc/")
rsync_cmd.append("--exclude=example/")
rsync_cmd.append("--exclude=tpls/")
rsync_cmd.append("--exclude=HOW_TO_SNAPSHOT")
rsync_cmd.append("--exclude=unit_test")
rsync_cmd.append("--exclude=unit_tests")
rsync_cmd.append("--exclude=perf_test")
rsync_cmd.append("--exclude=performance_tests")
if options.source_repo == "git":
rsync_cmd.append("--exclude=.git")
rsync_cmd.append("--exclude=.git*")
rsync_cmd.append(options.source)
rsync_cmd.append(options.destination)
@ -171,28 +189,27 @@ def find_git_commit_information(options):
('sems', 'software.sandia.gov:/git/sems')
"""
git_log_cmd = ["git", "log", "-1"]
output, error = run_cmd(git_log_cmd, options, options.source)
commit_match = re.match("commit ([0-9a-fA-F]+)", output)
commit_id = commit_match.group(1)
commit_log = output
git_remote_cmd = ["git", "remote", "-v"]
output, error = run_cmd(git_remote_cmd, options, options.source)
remote_match = re.search("origin\s([^ ]*/([^ ]+))", output, re.MULTILINE)
if not remote_match:
raise RuntimeError("Could not find origin of repo at %s. Consider using none for source repo type." % (options.source))
source_location = remote_match.group(1)
source_name = remote_match.group(2).strip()
if source_name[-1] == "/":
source_name = source_name[:-1]
return commit_id, commit_log, source_name, source_location
#end find_git_commit_information
def do_git_commit(message, options):
@ -201,10 +218,10 @@ def do_git_commit(message, options):
git_add_cmd = ["git", "add", "-A"]
run_cmd(git_add_cmd, options, options.destination)
git_commit_cmd = ["git", "commit", "-m%s" % message]
run_cmd(git_commit_cmd, options, options.destination)
git_log_cmd = ["git", "log", "--format=%h", "-1"]
commit_sha1, error = run_cmd(git_log_cmd, options, options.destination)
@ -214,7 +231,7 @@ def do_git_commit(message, options):
def verify_git_repo_clean(location, options):
git_status_cmd = ["git", "status", "--porcelain"]
output, error = run_cmd(git_status_cmd, options, location)
if output != "":
if options.no_validate_repo == False:
raise RuntimeError("%s is not clean.%sPlease commit or stash all changes before running snapshot."
@ -223,7 +240,6 @@ def verify_git_repo_clean(location, options):
print "WARNING: %s is not clean. Proceeding anyway." % location
print "WARNING: This could lead to differences in the source and destination."
print "WARNING: It could also lead to extra files being included in the snapshot commit."
#end verify_git_repo_clean
def main(options):
@ -238,14 +254,14 @@ def main(options):
commit_log = "Unknown commit from %s snapshotted at: %s" % (options.source, datetime.datetime.now())
repo_name = options.source
repo_location = options.source
commit_message = create_commit_message(commit_id, commit_log, repo_name, repo_location) + os.linesep*2
if options.dest_repo == "git":
verify_git_repo_clean(options.destination, options)
rsync(options.source, options.destination, options)
if options.dest_repo == "git":
do_git_commit(commit_message, options)
elif options.dest_repo == "none":
@ -256,10 +272,6 @@ def main(options):
cwd = os.getcwd()
print "No commit done by request. Please use file at:"
print "%s%sif you wish to commit this to a repo later." % (cwd+"/"+file_name, os.linesep)
#end main
if (__name__ == "__main__"):
@ -267,7 +279,7 @@ if (__name__ == "__main__"):
doctest.testmod()
sys.exit(0)
try:
try:
options = parse_cmdline(__doc__)
main(options)
except RuntimeError, e:
@ -275,5 +287,5 @@ if (__name__ == "__main__"):
if "--debug" in sys.argv:
traceback.print_exc()
sys.exit(1)
else:
else:
sys.exit(0)

View File

@ -24,6 +24,8 @@ elif [[ "$HOSTNAME" =~ node.* ]]; then # Warning: very generic name
fi
elif [[ "$HOSTNAME" =~ apollo ]]; then
MACHINE=apollo
elif [[ "$HOSTNAME" =~ sullivan ]]; then
MACHINE=sullivan
elif [ ! -z "$SEMS_MODULEFILES_ROOT" ]; then
MACHINE=sems
else
@ -152,7 +154,7 @@ if [ "$MACHINE" = "sems" ]; then
"gcc/5.1.0 $BASE_MODULE_LIST "Serial" g++ $GCC_WARNING_FLAGS"
"intel/16.0.1 $BASE_MODULE_LIST "OpenMP" icpc $INTEL_WARNING_FLAGS"
"clang/3.9.0 $BASE_MODULE_LIST "Pthread_Serial" clang++ $CLANG_WARNING_FLAGS"
"cuda/8.0.44 $CUDA8_MODULE_LIST "Cuda_OpenMP" $KOKKOS_PATH/config/nvcc_wrapper $CUDA_WARNING_FLAGS"
"cuda/8.0.44 $CUDA8_MODULE_LIST "Cuda_OpenMP" $KOKKOS_PATH/bin/nvcc_wrapper $CUDA_WARNING_FLAGS"
)
else
# Format: (compiler module-list build-list exe-name warning-flag)
@ -164,6 +166,7 @@ if [ "$MACHINE" = "sems" ]; then
"clang/3.6.1 $BASE_MODULE_LIST $CLANG_BUILD_LIST clang++ $CLANG_WARNING_FLAGS"
"clang/3.7.1 $BASE_MODULE_LIST $CLANG_BUILD_LIST clang++ $CLANG_WARNING_FLAGS"
"clang/3.8.1 $BASE_MODULE_LIST $CLANG_BUILD_LIST clang++ $CLANG_WARNING_FLAGS"
"clang/3.9.0 $BASE_MODULE_LIST $CLANG_BUILD_LIST clang++ $CLANG_WARNING_FLAGS"
"cuda/7.0.28 $CUDA_MODULE_LIST $CUDA_BUILD_LIST $KOKKOS_PATH/config/nvcc_wrapper $CUDA_WARNING_FLAGS"
"cuda/7.5.18 $CUDA_MODULE_LIST $CUDA_BUILD_LIST $KOKKOS_PATH/config/nvcc_wrapper $CUDA_WARNING_FLAGS"
"cuda/8.0.44 $CUDA8_MODULE_LIST $CUDA_BUILD_LIST $KOKKOS_PATH/config/nvcc_wrapper $CUDA_WARNING_FLAGS"
@ -184,7 +187,7 @@ elif [ "$MACHINE" = "white" ]; then
# Format: (compiler module-list build-list exe-name warning-flag)
COMPILERS=("gcc/5.4.0 $BASE_MODULE_LIST $IBM_BUILD_LIST g++ $GCC_WARNING_FLAGS"
"ibm/13.1.3 $IBM_MODULE_LIST $IBM_BUILD_LIST xlC $IBM_WARNING_FLAGS"
"cuda/8.0.44 $CUDA_MODULE_LIST $CUDA_IBM_BUILD_LIST ${KOKKOS_PATH}/config/nvcc_wrapper $CUDA_WARNING_FLAGS"
"cuda/8.0.44 $CUDA_MODULE_LIST $CUDA_IBM_BUILD_LIST ${KOKKOS_PATH}/bin/nvcc_wrapper $CUDA_WARNING_FLAGS"
)
if [ -z "$ARCH_FLAG" ]; then
@ -221,7 +224,7 @@ elif [ "$MACHINE" = "sullivan" ]; then
BASE_MODULE_LIST="<COMPILER_NAME>/<COMPILER_VERSION>"
# Format: (compiler module-list build-list exe-name warning-flag)
COMPILERS=("gcc/5.3.0 $BASE_MODULE_LIST $ARM_GCC_BUILD_LIST g++ $GCC_WARNING_FLAGS")
COMPILERS=("gcc/6.1.0 $BASE_MODULE_LIST $ARM_GCC_BUILD_LIST g++ $GCC_WARNING_FLAGS")
if [ -z "$ARCH_FLAG" ]; then
ARCH_FLAG="--arch=ARMv8-ThunderX"
@ -278,11 +281,11 @@ elif [ "$MACHINE" = "apollo" ]; then
"intel/16.0.1 $BASE_MODULE_LIST "OpenMP" icpc $INTEL_WARNING_FLAGS"
"clang/3.9.0 $BASE_MODULE_LIST "Pthread_Serial" clang++ $CLANG_WARNING_FLAGS"
"clang/head $CLANG_MODULE_LIST "Cuda_Pthread" clang++ $CUDA_WARNING_FLAGS"
"cuda/8.0.44 $CUDA_MODULE_LIST "Cuda_OpenMP" $KOKKOS_PATH/config/nvcc_wrapper $CUDA_WARNING_FLAGS"
"cuda/8.0.44 $CUDA_MODULE_LIST "Cuda_OpenMP" $KOKKOS_PATH/bin/nvcc_wrapper $CUDA_WARNING_FLAGS"
)
else
# Format: (compiler module-list build-list exe-name warning-flag)
COMPILERS=("cuda/8.0.44 $CUDA8_MODULE_LIST $BUILD_LIST_CUDA_NVCC $KOKKOS_PATH/config/nvcc_wrapper $CUDA_WARNING_FLAGS"
COMPILERS=("cuda/8.0.44 $CUDA8_MODULE_LIST $BUILD_LIST_CUDA_NVCC $KOKKOS_PATH/bin/nvcc_wrapper $CUDA_WARNING_FLAGS"
"clang/head $CLANG_MODULE_LIST $BUILD_LIST_CUDA_CLANG clang++ $CUDA_WARNING_FLAGS"
"clang/3.9.0 $CLANG_MODULE_LIST $BUILD_LIST_CLANG clang++ $CLANG_WARNING_FLAGS"
"gcc/4.7.2 $BASE_MODULE_LIST $GCC_BUILD_LIST g++ $GCC_WARNING_FLAGS"
@ -295,8 +298,8 @@ elif [ "$MACHINE" = "apollo" ]; then
"intel/16.0.1 $BASE_MODULE_LIST $INTEL_BUILD_LIST icpc $INTEL_WARNING_FLAGS"
"clang/3.5.2 $BASE_MODULE_LIST $CLANG_BUILD_LIST clang++ $CLANG_WARNING_FLAGS"
"clang/3.6.1 $BASE_MODULE_LIST $CLANG_BUILD_LIST clang++ $CLANG_WARNING_FLAGS"
"cuda/7.0.28 $CUDA_MODULE_LIST $CUDA_BUILD_LIST $KOKKOS_PATH/config/nvcc_wrapper $CUDA_WARNING_FLAGS"
"cuda/7.5.18 $CUDA_MODULE_LIST $CUDA_BUILD_LIST $KOKKOS_PATH/config/nvcc_wrapper $CUDA_WARNING_FLAGS"
"cuda/7.0.28 $CUDA_MODULE_LIST $CUDA_BUILD_LIST $KOKKOS_PATH/bin/nvcc_wrapper $CUDA_WARNING_FLAGS"
"cuda/7.5.18 $CUDA_MODULE_LIST $CUDA_BUILD_LIST $KOKKOS_PATH/bin/nvcc_wrapper $CUDA_WARNING_FLAGS"
)
fi

View File

@ -48,7 +48,7 @@ esac
#nvcc wrapper and make the wrapper the compiler.
if [ $cuda_compiler != "" ]; then
export NVCC_WRAPPER_DEFAULT_COMPILER=$compiler
compiler=$kokkos_path/config/nvcc_wrapper
compiler=$kokkos_path/bin/nvcc_wrapper
fi
if [ $host_compiler_brand == "intel" -a $cuda_compiler != "" ]; then

View File

@ -0,0 +1,4 @@
module purge
module load sems-env sems-gcc/4.9.3 sems-openmpi/1.10.1 sems-hdf5/1.8.12/parallel sems-netcdf/4.3.2/parallel sems-python/2.7.9 sems-zlib/1.2.8/base sems-cmake/3.5.2 sems-parmetis/4.0.3/64bit_parallel sems-scotch/6.0.3/nopthread_64bit_parallel sems-boost/1.59.0/base
#Run Trilinos CheckinTest

View File

@ -1,5 +1,18 @@
#!/bin/bash -le
TRILINOS_UPDATE_BRANCH=$1
TRILINOS_PRISTINE_BRANCH=$2
if [ -z $TRILINOS_UPDATE_BRANCH ]
then
TRILINOS_UPDATE_BRANCH=develop
fi
if [ -z $TRILINOS_PRISTINE_BRANCH ]
then
TRILINOS_PRISTINE_BRANCH=develop
fi
export TRILINOS_UPDATED_PATH=${PWD}/trilinos-update
export TRILINOS_PRISTINE_PATH=${PWD}/trilinos-pristine
@ -16,8 +29,8 @@ if [ ! -d "${TRILINOS_PRISTINE_PATH}" ]; then
fi
cd ${TRILINOS_UPDATED_PATH}
git checkout develop
git reset --hard origin/develop
git checkout $TRILINOS_UPDATE_BRANCH
git reset --hard origin/$TRILINOS_UPDATE_BRANCH
git pull
cd ..
@ -28,18 +41,14 @@ echo ""
echo ""
echo "Trilinos State:"
git log --pretty=oneline --since=7.days
SHA=`git log --pretty=oneline --since=7.days | head -n 2 | tail -n 1 | awk '{print $1}'`
cd ..
cd ${TRILINOS_PRISTINE_PATH}
git status
git log --pretty=oneline --since=7.days
echo "Checkout develop"
git checkout develop
echo "Checkout $TRILINOS_PRISTINE_BRANCH"
git checkout $TRILINOS_PRISTINE_BRANCH
echo "Pull"
git pull
echo "Checkout SHA"
git checkout ${SHA}
cd ..
cd ${TRILINOS_PRISTINE_PATH}

View File

@ -0,0 +1,60 @@
#!/bin/bash -el
ulimit -c 0
module load devpack/openmpi/1.10.0/intel/16.1.056/cuda/none
KOKKOS_BRANCH=$1
TRILINOS_UPDATE_BRANCH=$2
TRILINOS_PRISTINE_BRANCH=$3
if [ -z $KOKKOS_BRANCH ]
then
KOKKOS_BRANCH=develop
fi
if [ -z $TRILINOS_UPDATE_BRANCH ]
then
TRILINOS_UPDATE_BRANCH=develop
fi
if [ -z $TRILINOS_PRISTINE_BRANCH ]
then
TRILINOS_PRISTINE_BRANCH=develop
fi
export OMP_NUM_THREADS=8
export JENKINS_DO_CUDA=OFF
export JENKINS_DO_OPENMP=OFF
export JENKINS_DO_PTHREAD=ON
export JENKINS_DO_SERIAL=OFF
export JENKINS_DO_COMPLEX=OFF
export ARCH_CXX_FLAG="-xCORE-AVX2 -mkl"
export ARCH_C_FLAG="-xCORE-AVX2 -mkl"
export BLAS_LIBRARIES="-mkl;${MKLROOT}/lib/intel64/libmkl_intel_lp64.a;${MKLROOT}/lib/intel64/libmkl_intel_thread.a;${MKLROOT}/lib/intel64/libmkl_core.a"
export LAPACK_LIBRARIES=${BLAS_LIBRARIES}
export JENKINS_DO_TESTS=ON
export JENKINS_DO_EXAMPLES=ON
export JENKINS_DO_SHARED=OFF
export QUEUE=haswell
module load python
export KOKKOS_PATH=${PWD}/kokkos
#Already done:
if [ ! -d "${KOKKOS_PATH}" ]; then
git clone https://github.com/kokkos/kokkos ${KOKKOS_PATH}
fi
cd ${KOKKOS_PATH}
git checkout $KOKKOS_BRANCH
git pull
cd ..
source ${KOKKOS_PATH}/config/trilinos-integration/prepare_trilinos_repos.sh $TRILINOS_UPDATE_BRANCH $TRILINOS_PRISTINE_BRANCH
${TRILINOS_UPDATED_PATH}/sampleScripts/Sandia-SEMS/run_repo_comparison_slurm ${TRILINOS_UPDATED_PATH} ${TRILINOS_PRISTINE_PATH} ${TRILINOS_UPDATED_PATH}/sampleScripts/Sandia-SEMS/configure-testbeds-jenkins-all TestCompare ${QUEUE}

View File

@ -0,0 +1,60 @@
#!/bin/bash -el
ulimit -c 0
module load devpack/openmpi/1.10.0/intel/16.1.056/cuda/none
KOKKOS_BRANCH=$1
TRILINOS_UPDATE_BRANCH=$2
TRILINOS_PRISTINE_BRANCH=$3
if [ -z $KOKKOS_BRANCH ]
then
KOKKOS_BRANCH=develop
fi
if [ -z $TRILINOS_UPDATE_BRANCH ]
then
TRILINOS_UPDATE_BRANCH=develop
fi
if [ -z $TRILINOS_PRISTINE_BRANCH ]
then
TRILINOS_PRISTINE_BRANCH=develop
fi
export OMP_NUM_THREADS=8
export JENKINS_DO_CUDA=OFF
export JENKINS_DO_OPENMP=OFF
export JENKINS_DO_PTHREAD=OFF
export JENKINS_DO_SERIAL=ON
export JENKINS_DO_COMPLEX=ON
export ARCH_CXX_FLAG="-xCORE-AVX2 -mkl"
export ARCH_C_FLAG="-xCORE-AVX2 -mkl"
export BLAS_LIBRARIES="-mkl;${MKLROOT}/lib/intel64/libmkl_intel_lp64.a;${MKLROOT}/lib/intel64/libmkl_intel_thread.a;${MKLROOT}/lib/intel64/libmkl_core.a"
export LAPACK_LIBRARIES=${BLAS_LIBRARIES}
export JENKINS_DO_TESTS=ON
export JENKINS_DO_EXAMPLES=ON
export JENKINS_DO_SHARED=OFF
export QUEUE=haswell
module load python
export KOKKOS_PATH=${PWD}/kokkos
#Already done:
if [ ! -d "${KOKKOS_PATH}" ]; then
git clone https://github.com/kokkos/kokkos ${KOKKOS_PATH}
fi
cd ${KOKKOS_PATH}
git checkout $KOKKOS_BRANCH
git pull
cd ..
source ${KOKKOS_PATH}/config/trilinos-integration/prepare_trilinos_repos.sh $TRILINOS_UPDATE_BRANCH $TRILINOS_PRISTINE_BRANCH
${TRILINOS_UPDATED_PATH}/sampleScripts/Sandia-SEMS/run_repo_comparison_slurm ${TRILINOS_UPDATED_PATH} ${TRILINOS_PRISTINE_PATH} ${TRILINOS_UPDATED_PATH}/sampleScripts/Sandia-SEMS/configure-testbeds-jenkins-all TestCompare ${QUEUE}

View File

@ -0,0 +1,63 @@
#!/bin/bash -el
ulimit -c 0
KOKKOS_BRANCH=$1
TRILINOS_UPDATE_BRANCH=$2
TRILINOS_PRISTINE_BRANCH=$3
if [ -z $KOKKOS_BRANCH ]
then
KOKKOS_BRANCH=develop
fi
if [ -z $TRILINOS_UPDATE_BRANCH ]
then
TRILINOS_UPDATE_BRANCH=develop
fi
if [ -z $TRILINOS_PRISTINE_BRANCH ]
then
TRILINOS_PRISTINE_BRANCH=develop
fi
module load devpack/openmpi/1.10.4/gcc/5.4.0/cuda/8.0.44
export OMP_NUM_THREADS=8
export JENKINS_DO_CUDA=ON
export JENKINS_DO_OPENMP=OFF
export JENKINS_DO_PTHREAD=OFF
export JENKINS_DO_SERIAL=ON
export JENKINS_DO_COMPLEX=OFF
export JENKINS_ARCH_CXX_FLAG="-mcpu=power8 -arch=sm_37"
export JENKINS_ARCH_C_FLAG="-mcpu=power8"
export BLAS_LIBRARIES="${BLAS_ROOT}/lib/libblas.a;gfortran;gomp"
export LAPACK_LIBRARIES="${LAPACK_ROOT}/lib/liblapack.a;gfortran;gomp"
export JENKINS_DO_TESTS=ON
export JENKINS_DO_EXAMPLES=ON
export QUEUE=rhel7F
module load python
export KOKKOS_PATH=${PWD}/kokkos
#Already done:
if [ ! -d "${KOKKOS_PATH}" ]; then
git clone https://github.com/kokkos/kokkos ${KOKKOS_PATH}
fi
export OMPI_CXX=${KOKKOS_PATH}/bin/nvcc_wrapper
cd ${KOKKOS_PATH}
git checkout $KOKKOS_BRANCH
git pull
cd ..
export CUDA_LAUNCH_BLOCKING=1
export CUDA_MANAGED_FORCE_DEVICE_ALLOC=1
source ${KOKKOS_PATH}/config/trilinos-integration/prepare_trilinos_repos.sh $TRILINOS_UPDATE_BRANCH $TRILINOS_PRISTINE_BRANCH
${TRILINOS_UPDATED_PATH}/sampleScripts/Sandia-SEMS/run_repo_comparison_lsf ${TRILINOS_UPDATED_PATH} ${TRILINOS_PRISTINE_PATH} ${TRILINOS_UPDATED_PATH}/sampleScripts/Sandia-SEMS/configure-testbeds-jenkins-all TestCompare ${QUEUE}

View File

@ -0,0 +1,58 @@
#!/bin/bash -el
ulimit -c 0
KOKKOS_BRANCH=$1
TRILINOS_UPDATE_BRANCH=$2
TRILINOS_PRISTINE_BRANCH=$3
if [ -z $KOKKOS_BRANCH ]
then
KOKKOS_BRANCH=develop
fi
if [ -z $TRILINOS_UPDATE_BRANCH ]
then
TRILINOS_UPDATE_BRANCH=develop
fi
if [ -z $TRILINOS_PRISTINE_BRANCH ]
then
TRILINOS_PRISTINE_BRANCH=develop
fi
module load devpack/openmpi/1.10.4/gcc/5.4.0/cuda/8.0.44
export OMP_NUM_THREADS=8
export JENKINS_DO_CUDA=OFF
export JENKINS_DO_OPENMP=ON
export JENKINS_DO_PTHREAD=OFF
export JENKINS_DO_SERIAL=OFF
export JENKINS_DO_COMPLEX=OFF
export JENKINS_ARCH_CXX_FLAG="-mcpu=power8"
export JENKINS_ARCH_C_FLAG="-mcpu=power8"
export BLAS_LIBRARIES="${BLAS_ROOT}/lib/libblas.a;gfortran;gomp"
export LAPACK_LIBRARIES="${LAPACK_ROOT}/lib/liblapack.a;gfortran;gomp"
export JENKINS_DO_TESTS=ON
export JENKINS_DO_EXAMPLES=ON
export QUEUE=rhel7F
module load python
export KOKKOS_PATH=${PWD}/kokkos
#Already done:
if [ ! -d "${KOKKOS_PATH}" ]; then
git clone https://github.com/kokkos/kokkos ${KOKKOS_PATH}
fi
cd ${KOKKOS_PATH}
git checkout $KOKKOS_BRANCH
git pull
cd ..
source ${KOKKOS_PATH}/config/trilinos-integration/prepare_trilinos_repos.sh $TRILINOS_UPDATE_BRANCH $TRILINOS_PRISTINE_BRANCH
${TRILINOS_UPDATED_PATH}/sampleScripts/Sandia-SEMS/run_repo_comparison_lsf ${TRILINOS_UPDATED_PATH} ${TRILINOS_PRISTINE_PATH} ${TRILINOS_UPDATED_PATH}/sampleScripts/Sandia-SEMS/configure-testbeds-jenkins-all TestCompare ${QUEUE}

View File

@ -8,7 +8,7 @@ default: build_all
echo "End Build"
ifneq (,$(findstring Cuda,$(KOKKOS_DEVICES)))
CXX = $(KOKKOS_PATH)/config/nvcc_wrapper
CXX = $(KOKKOS_PATH)/bin/nvcc_wrapper
else
CXX = g++
endif
@ -21,8 +21,8 @@ include $(KOKKOS_PATH)/Makefile.kokkos
KOKKOS_CXXFLAGS += -I$(GTEST_PATH) -I${KOKKOS_PATH}/containers/performance_tests
TEST_TARGETS =
TARGETS =
TEST_TARGETS =
TARGETS =
ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
OBJ_CUDA = TestCuda.o TestMain.o gtest-all.o
@ -65,7 +65,7 @@ build_all: $(TARGETS)
test: $(TEST_TARGETS)
clean: kokkos-clean
clean: kokkos-clean
rm -f *.o $(TARGETS)
# Compilation rules
@ -73,6 +73,5 @@ clean: kokkos-clean
%.o:%.cpp $(KOKKOS_CPP_DEPENDS)
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $<
gtest-all.o:$(GTEST_PATH)/gtest/gtest-all.cc
gtest-all.o:$(GTEST_PATH)/gtest/gtest-all.cc
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $(GTEST_PATH)/gtest/gtest-all.cc

View File

@ -1,13 +1,13 @@
/*
//@HEADER
// ************************************************************************
//
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
@ -36,12 +36,15 @@
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
//
// ************************************************************************
//@HEADER
*/
#include <stdint.h>
#include <Kokkos_Macros.hpp>
#if defined( KOKKOS_ENABLE_CUDA )
#include <cstdint>
#include <string>
#include <iostream>
#include <iomanip>
@ -52,8 +55,6 @@
#include <Kokkos_Core.hpp>
#if defined( KOKKOS_ENABLE_CUDA )
#include <TestDynRankView.hpp>
#include <Kokkos_UnorderedMap.hpp>
@ -79,7 +80,7 @@ protected:
}
};
TEST_F( cuda, dynrankview_perf )
TEST_F( cuda, dynrankview_perf )
{
std::cout << "Cuda" << std::endl;
std::cout << " DynRankView vs View: Initialization Only " << std::endl;
@ -105,5 +106,6 @@ TEST_F( cuda, unordered_map_performance_far)
}
}
#else
void KOKKOS_CONTAINERS_PERFORMANCE_TESTS_TESTCUDA_PREVENT_EMPTY_LINK_ERROR() {}
#endif /* #if defined( KOKKOS_ENABLE_CUDA ) */

View File

@ -1,13 +1,13 @@
//@HEADER
// ************************************************************************
//
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
@ -36,7 +36,7 @@
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
//
// ************************************************************************
//@HEADER
@ -263,3 +263,4 @@ void test_dynrankview_op_perf( const int par_size )
} //end Performance
#endif

View File

@ -1,13 +1,13 @@
/*
//@HEADER
// ************************************************************************
//
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
@ -36,11 +36,14 @@
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
//
// ************************************************************************
//@HEADER
*/
#include <Kokkos_Macros.hpp>
#if defined( KOKKOS_ENABLE_OPENMP )
#include <gtest/gtest.h>
#include <Kokkos_Core.hpp>
@ -93,7 +96,7 @@ protected:
}
};
TEST_F( openmp, dynrankview_perf )
TEST_F( openmp, dynrankview_perf )
{
std::cout << "OpenMP" << std::endl;
std::cout << " DynRankView vs View: Initialization Only " << std::endl;
@ -137,4 +140,7 @@ TEST_F( openmp, unordered_map_performance_far)
}
} // namespace test
#else
void KOKKOS_CONTAINERS_PERFORMANCE_TESTS_TESTOPENMP_PREVENT_EMPTY_LINK_ERROR() {}
#endif

View File

@ -1,13 +1,13 @@
/*
//@HEADER
// ************************************************************************
//
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
@ -36,11 +36,14 @@
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
//
// ************************************************************************
//@HEADER
*/
#include <Kokkos_Macros.hpp>
#if defined( KOKKOS_ENABLE_THREADS )
#include <gtest/gtest.h>
#include <Kokkos_Core.hpp>
@ -87,7 +90,7 @@ protected:
}
};
TEST_F( threads, dynrankview_perf )
TEST_F( threads, dynrankview_perf )
{
std::cout << "Threads" << std::endl;
std::cout << " DynRankView vs View: Initialization Only " << std::endl;
@ -132,4 +135,7 @@ TEST_F( threads, unordered_map_performance_far)
} // namespace Performance
#else
void KOKKOS_CONTAINERS_PERFORMANCE_TESTS_TESTTHREADS_PREVENT_EMPTY_LINK_ERROR() {}
#endif

View File

@ -1,13 +1,13 @@
/*
//@HEADER
// ************************************************************************
//
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
@ -36,7 +36,7 @@
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
//
// ************************************************************************
//@HEADER
*/
@ -435,3 +435,4 @@ void deep_copy( ConstBitset<DstDevice> & dst, ConstBitset<SrcDevice> const& src)
} // namespace Kokkos
#endif //KOKKOS_BITSET_HPP

View File

@ -442,6 +442,17 @@ public:
modified_host () = (modified_device () > modified_host () ?
modified_device () : modified_host ()) + 1;
}
#ifdef KOKKOS_ENABLE_DEBUG_DUALVIEW_MODIFY_CHECK
if (modified_host() && modified_device()) {
std::string msg = "Kokkos::DualView::modify ERROR: ";
msg += "Concurrent modification of host and device views ";
msg += "in DualView \"";
msg += d_view.label();
msg += "\"\n";
Kokkos::abort(msg.c_str());
}
#endif
}
//@}
@ -624,3 +635,4 @@ deep_copy (const ExecutionSpace& exec ,
} // namespace Kokkos
#endif

View File

@ -1,13 +1,13 @@
/*
//@HEADER
// ************************************************************************
//
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
@ -36,7 +36,7 @@
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
//
// ************************************************************************
//@HEADER
*/
@ -140,21 +140,21 @@ struct DynRankDimTraits {
static typename std::enable_if< (std::is_same<Layout , Kokkos::LayoutStride>::value) , Layout>::type createLayout( const Layout& layout )
{
return Layout( layout.dimension[0] != unspecified ? layout.dimension[0] : 1
, layout.stride[0]
, layout.stride[0]
, layout.dimension[1] != unspecified ? layout.dimension[1] : 1
, layout.stride[1]
, layout.stride[1]
, layout.dimension[2] != unspecified ? layout.dimension[2] : 1
, layout.stride[2]
, layout.stride[2]
, layout.dimension[3] != unspecified ? layout.dimension[3] : 1
, layout.stride[3]
, layout.stride[3]
, layout.dimension[4] != unspecified ? layout.dimension[4] : 1
, layout.stride[4]
, layout.stride[4]
, layout.dimension[5] != unspecified ? layout.dimension[5] : 1
, layout.stride[5]
, layout.stride[5]
, layout.dimension[6] != unspecified ? layout.dimension[6] : 1
, layout.stride[6]
, layout.stride[6]
, layout.dimension[7] != unspecified ? layout.dimension[7] : 1
, layout.stride[7]
, layout.stride[7]
);
}
@ -188,7 +188,7 @@ struct DynRankDimTraits {
KOKKOS_INLINE_FUNCTION
static typename std::enable_if< (std::is_same<Layout , Kokkos::LayoutRight>::value || std::is_same<Layout , Kokkos::LayoutLeft>::value) && std::is_integral<iType>::value , Layout >::type reconstructLayout( const Layout& layout , iType dynrank )
{
return Layout( dynrank > 0 ? layout.dimension[0] : ~size_t(0)
return Layout( dynrank > 0 ? layout.dimension[0] : ~size_t(0)
, dynrank > 1 ? layout.dimension[1] : ~size_t(0)
, dynrank > 2 ? layout.dimension[2] : ~size_t(0)
, dynrank > 3 ? layout.dimension[3] : ~size_t(0)
@ -205,27 +205,27 @@ struct DynRankDimTraits {
static typename std::enable_if< (std::is_same<Layout , Kokkos::LayoutStride>::value) && std::is_integral<iType>::value , Layout >::type reconstructLayout( const Layout& layout , iType dynrank )
{
return Layout( dynrank > 0 ? layout.dimension[0] : ~size_t(0)
, dynrank > 0 ? layout.stride[0] : (0)
, dynrank > 0 ? layout.stride[0] : (0)
, dynrank > 1 ? layout.dimension[1] : ~size_t(0)
, dynrank > 1 ? layout.stride[1] : (0)
, dynrank > 1 ? layout.stride[1] : (0)
, dynrank > 2 ? layout.dimension[2] : ~size_t(0)
, dynrank > 2 ? layout.stride[2] : (0)
, dynrank > 2 ? layout.stride[2] : (0)
, dynrank > 3 ? layout.dimension[3] : ~size_t(0)
, dynrank > 3 ? layout.stride[3] : (0)
, dynrank > 3 ? layout.stride[3] : (0)
, dynrank > 4 ? layout.dimension[4] : ~size_t(0)
, dynrank > 4 ? layout.stride[4] : (0)
, dynrank > 4 ? layout.stride[4] : (0)
, dynrank > 5 ? layout.dimension[5] : ~size_t(0)
, dynrank > 5 ? layout.stride[5] : (0)
, dynrank > 5 ? layout.stride[5] : (0)
, dynrank > 6 ? layout.dimension[6] : ~size_t(0)
, dynrank > 6 ? layout.stride[6] : (0)
, dynrank > 6 ? layout.stride[6] : (0)
, dynrank > 7 ? layout.dimension[7] : ~size_t(0)
, dynrank > 7 ? layout.stride[7] : (0)
, dynrank > 7 ? layout.stride[7] : (0)
);
}
/** \brief Debug bounds-checking routines */
// Enhanced debug checking - most infrastructure matches that of functions in
// Enhanced debug checking - most infrastructure matches that of functions in
// Kokkos_ViewMapping; additional checks for extra arguments beyond rank are 0
template< unsigned , typename iType0 , class MapType >
KOKKOS_INLINE_FUNCTION
@ -235,20 +235,20 @@ bool dyn_rank_view_verify_operator_bounds( const iType0 & , const MapType & )
template< unsigned R , typename iType0 , class MapType , typename iType1 , class ... Args >
KOKKOS_INLINE_FUNCTION
bool dyn_rank_view_verify_operator_bounds
( const iType0 & rank
( const iType0 & rank
, const MapType & map
, const iType1 & i
, Args ... args
)
{
if ( static_cast<iType0>(R) < rank ) {
if ( static_cast<iType0>(R) < rank ) {
return ( size_t(i) < map.extent(R) )
&& dyn_rank_view_verify_operator_bounds<R+1>( rank , map , args ... );
}
else if ( i != 0 ) {
printf("DynRankView Debug Bounds Checking Error: at rank %u\n Extra arguments beyond the rank must be zero \n",R);
return ( false )
&& dyn_rank_view_verify_operator_bounds<R+1>( rank , map , args ... );
&& dyn_rank_view_verify_operator_bounds<R+1>( rank , map , args ... );
}
else {
return ( true )
@ -281,20 +281,24 @@ void dyn_rank_view_error_operator_bounds
}
// op_rank = rank of the operator version that was called
template< typename iType0 , typename iType1 , class MapType , class ... Args >
template< typename MemorySpace
, typename iType0 , typename iType1 , class MapType , class ... Args >
KOKKOS_INLINE_FUNCTION
void dyn_rank_view_verify_operator_bounds
( const iType0 & op_rank , const iType1 & rank , const char* label , const MapType & map , Args ... args )
( const iType0 & op_rank , const iType1 & rank
, const Kokkos::Impl::SharedAllocationTracker & tracker
, const MapType & map , Args ... args )
{
if ( static_cast<iType0>(rank) > op_rank ) {
Kokkos::abort( "DynRankView Bounds Checking Error: Need at least rank arguments to the operator()" );
Kokkos::abort( "DynRankView Bounds Checking Error: Need at least rank arguments to the operator()" );
}
if ( ! dyn_rank_view_verify_operator_bounds<0>( rank , map , args ... ) ) {
#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST )
enum { LEN = 1024 };
char buffer[ LEN ];
int n = snprintf(buffer,LEN,"DynRankView bounds error of view %s (", label);
const std::string label = tracker.template get_label<MemorySpace>();
int n = snprintf(buffer,LEN,"DynRankView bounds error of view %s (", label.c_str());
dyn_rank_view_error_operator_bounds<0>( buffer + n , LEN - n , map , args ... );
Kokkos::Impl::throw_runtime_exception(std::string(buffer));
#else
@ -347,7 +351,7 @@ private:
std::is_same< typename DstTraits::array_layout
, typename SrcTraits::array_layout >::value ||
std::is_same< typename DstTraits::array_layout
, Kokkos::LayoutStride >::value
, Kokkos::LayoutStride >::value
};
public:
@ -381,9 +385,9 @@ public:
} //end Impl
/* \class DynRankView
* \brief Container that creates a Kokkos view with rank determined at runtime.
* \brief Container that creates a Kokkos view with rank determined at runtime.
* Essentially this is a rank 7 view that wraps the access operators
* to yield the functionality of a view
* to yield the functionality of a view
*
* Changes from View
* 1. The rank of the DynRankView is returned by the method rank()
@ -410,14 +414,14 @@ class DynRankView : public ViewTraits< DataType , Properties ... >
{
static_assert( !std::is_array<DataType>::value && !std::is_pointer<DataType>::value , "Cannot template DynRankView with array or pointer datatype - must be pod" );
private:
private:
template < class , class ... > friend class DynRankView ;
template < class , class ... > friend class Impl::ViewMapping ;
public:
public:
typedef ViewTraits< DataType , Properties ... > drvtraits ;
typedef View< DataType******* , Properties...> view_type ;
typedef View< DataType******* , Properties...> view_type ;
typedef ViewTraits< DataType******* , Properties ... > traits ;
@ -430,7 +434,7 @@ private:
map_type m_map ;
unsigned m_rank;
public:
public:
KOKKOS_INLINE_FUNCTION
view_type & DownCast() const { return ( view_type & ) (*this); }
KOKKOS_INLINE_FUNCTION
@ -588,7 +592,7 @@ private:
// rank of the calling operator - included as first argument in ARG
#define KOKKOS_IMPL_VIEW_OPERATOR_VERIFY( ARG ) \
DynRankView::template verify_space< Kokkos::Impl::ActiveExecutionMemorySpace >::check(); \
Kokkos::Experimental::Impl::dyn_rank_view_verify_operator_bounds ARG ;
Kokkos::Experimental::Impl::dyn_rank_view_verify_operator_bounds< typename traits::memory_space > ARG ;
#else
@ -607,14 +611,10 @@ public:
// Rank 0
KOKKOS_INLINE_FUNCTION
reference_type operator()() const
{
#ifndef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY( (0 , this->rank() , NULL , m_map) )
#else
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY( (0 , this->rank() , m_track.template get_label<typename traits::memory_space>().c_str(),m_map) )
#endif
{
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY( (0 , this->rank(), m_track, m_map) )
return implementation_map().reference();
//return m_map.reference(0,0,0,0,0,0,0);
//return m_map.reference(0,0,0,0,0,0,0);
}
// Rank 1
@ -624,6 +624,8 @@ public:
typename std::enable_if< std::is_same<typename drvtraits::value_type, typename drvtraits::scalar_array_type>::value && std::is_integral<iType>::value, reference_type>::type
operator[](const iType & i0) const
{
//Phalanx is violating this, since they use the operator to access ALL elements in the allocation
//KOKKOS_IMPL_VIEW_OPERATOR_VERIFY( (1 , this->rank(), m_track, m_map) )
return data()[i0];
}
@ -647,14 +649,10 @@ public:
template< typename iType >
KOKKOS_INLINE_FUNCTION
typename std::enable_if< (std::is_same<typename traits::specialize , void>::value && std::is_integral<iType>::value), reference_type>::type
operator()(const iType & i0 ) const
{
#ifndef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY( (1 , this->rank() , NULL , m_map , i0) )
#else
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY( (1 , this->rank() , m_track.template get_label<typename traits::memory_space>().c_str(),m_map,i0) )
#endif
return m_map.reference(i0);
operator()(const iType & i0 ) const
{
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY( (1 , this->rank(), m_track, m_map, i0) )
return m_map.reference(i0);
}
template< typename iType >
@ -662,11 +660,7 @@ public:
typename std::enable_if< !(std::is_same<typename traits::specialize , void>::value && std::is_integral<iType>::value), reference_type>::type
operator()(const iType & i0 ) const
{
#ifndef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY( (1 , this->rank() , NULL , m_map , i0) )
#else
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY( (1 , this->rank() , m_track.template get_label<typename traits::memory_space>().c_str(),m_map,i0) )
#endif
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY( (1 , this->rank(), m_track, m_map, i0) )
return m_map.reference(i0,0,0,0,0,0,0);
}
@ -674,155 +668,111 @@ public:
template< typename iType0 , typename iType1 >
KOKKOS_INLINE_FUNCTION
typename std::enable_if< (std::is_same<typename traits::specialize , void>::value && std::is_integral<iType0>::value && std::is_integral<iType1>::value), reference_type>::type
operator()(const iType0 & i0 , const iType1 & i1 ) const
{
#ifndef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY( (2 , this->rank() , NULL , m_map , i0 , i1) )
#else
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY( (2 , this->rank() , m_track.template get_label<typename traits::memory_space>().c_str(),m_map,i0,i1) )
#endif
return m_map.reference(i0,i1);
operator()(const iType0 & i0 , const iType1 & i1 ) const
{
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY( (2 , this->rank(), m_track, m_map, i0, i1) )
return m_map.reference(i0,i1);
}
template< typename iType0 , typename iType1 >
KOKKOS_INLINE_FUNCTION
typename std::enable_if< !(std::is_same<typename drvtraits::specialize , void>::value && std::is_integral<iType0>::value), reference_type>::type
operator()(const iType0 & i0 , const iType1 & i1 ) const
{
#ifndef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY( (2 , this->rank() , NULL , m_map , i0 , i1) )
#else
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY( (2 , this->rank() , m_track.template get_label<typename traits::memory_space>().c_str(),m_map,i0,i1) )
#endif
return m_map.reference(i0,i1,0,0,0,0,0);
operator()(const iType0 & i0 , const iType1 & i1 ) const
{
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY( (2 , this->rank(), m_track, m_map, i0, i1) )
return m_map.reference(i0,i1,0,0,0,0,0);
}
// Rank 3
template< typename iType0 , typename iType1 , typename iType2 >
KOKKOS_INLINE_FUNCTION
typename std::enable_if< (std::is_same<typename traits::specialize , void>::value && std::is_integral<iType0>::value && std::is_integral<iType1>::value && std::is_integral<iType2>::value), reference_type>::type
operator()(const iType0 & i0 , const iType1 & i1 , const iType2 & i2 ) const
{
#ifndef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY( (3 , this->rank() , NULL , m_map , i0 , i1 , i2) )
#else
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY( (3 , this->rank() , m_track.template get_label<typename traits::memory_space>().c_str(),m_map,i0,i1,i2) )
#endif
return m_map.reference(i0,i1,i2);
operator()(const iType0 & i0 , const iType1 & i1 , const iType2 & i2 ) const
{
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY( (3 , this->rank(), m_track, m_map, i0, i1, i2) )
return m_map.reference(i0,i1,i2);
}
template< typename iType0 , typename iType1 , typename iType2 >
KOKKOS_INLINE_FUNCTION
typename std::enable_if< !(std::is_same<typename drvtraits::specialize , void>::value && std::is_integral<iType0>::value), reference_type>::type
operator()(const iType0 & i0 , const iType1 & i1 , const iType2 & i2 ) const
{
#ifndef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY( (3 , this->rank() , NULL , m_map , i0 , i1 , i2) )
#else
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY( (3 , this->rank() , m_track.template get_label<typename traits::memory_space>().c_str(),m_map,i0,i1,i2) )
#endif
return m_map.reference(i0,i1,i2,0,0,0,0);
operator()(const iType0 & i0 , const iType1 & i1 , const iType2 & i2 ) const
{
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY( (3 , this->rank(), m_track, m_map, i0, i1, i2) )
return m_map.reference(i0,i1,i2,0,0,0,0);
}
// Rank 4
template< typename iType0 , typename iType1 , typename iType2 , typename iType3 >
KOKKOS_INLINE_FUNCTION
typename std::enable_if< (std::is_same<typename traits::specialize , void>::value && std::is_integral<iType0>::value && std::is_integral<iType1>::value && std::is_integral<iType2>::value && std::is_integral<iType3>::value), reference_type>::type
operator()(const iType0 & i0 , const iType1 & i1 , const iType2 & i2 , const iType3 & i3 ) const
{
#ifndef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY( (4 , this->rank() , NULL , m_map , i0 , i1 , i2 , i3) )
#else
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY( (4 , this->rank() , m_track.template get_label<typename traits::memory_space>().c_str(),m_map,i0,i1,i2,i3) )
#endif
return m_map.reference(i0,i1,i2,i3);
operator()(const iType0 & i0 , const iType1 & i1 , const iType2 & i2 , const iType3 & i3 ) const
{
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY( (4 , this->rank(), m_track, m_map, i0, i1, i2, i3) )
return m_map.reference(i0,i1,i2,i3);
}
template< typename iType0 , typename iType1 , typename iType2 , typename iType3 >
KOKKOS_INLINE_FUNCTION
typename std::enable_if< !(std::is_same<typename drvtraits::specialize , void>::value && std::is_integral<iType0>::value), reference_type>::type
operator()(const iType0 & i0 , const iType1 & i1 , const iType2 & i2 , const iType3 & i3 ) const
{
#ifndef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY( (4 , this->rank() , NULL , m_map , i0 , i1 , i2 , i3) )
#else
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY( (4 , this->rank() , m_track.template get_label<typename traits::memory_space>().c_str(),m_map,i0,i1,i2,i3) )
#endif
return m_map.reference(i0,i1,i2,i3,0,0,0);
operator()(const iType0 & i0 , const iType1 & i1 , const iType2 & i2 , const iType3 & i3 ) const
{
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY( (4 , this->rank(), m_track, m_map, i0, i1, i2, i3) )
return m_map.reference(i0,i1,i2,i3,0,0,0);
}
// Rank 5
template< typename iType0 , typename iType1 , typename iType2 , typename iType3, typename iType4 >
KOKKOS_INLINE_FUNCTION
typename std::enable_if< (std::is_same<typename traits::specialize , void>::value && std::is_integral<iType0>::value && std::is_integral<iType1>::value && std::is_integral<iType2>::value && std::is_integral<iType3>::value && std::is_integral<iType4>::value), reference_type>::type
operator()(const iType0 & i0 , const iType1 & i1 , const iType2 & i2 , const iType3 & i3 , const iType4 & i4 ) const
{
#ifndef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY( (5 , this->rank() , NULL , m_map , i0 , i1 , i2 , i3, i4) )
#else
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY( (5 , this->rank() , m_track.template get_label<typename traits::memory_space>().c_str(),m_map,i0,i1,i2,i3,i4) )
#endif
return m_map.reference(i0,i1,i2,i3,i4);
operator()(const iType0 & i0 , const iType1 & i1 , const iType2 & i2 , const iType3 & i3 , const iType4 & i4 ) const
{
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY( (5 , this->rank(), m_track, m_map, i0, i1, i2, i3, i4) )
return m_map.reference(i0,i1,i2,i3,i4);
}
template< typename iType0 , typename iType1 , typename iType2 , typename iType3, typename iType4 >
KOKKOS_INLINE_FUNCTION
typename std::enable_if< !(std::is_same<typename drvtraits::specialize , void>::value && std::is_integral<iType0>::value), reference_type>::type
operator()(const iType0 & i0 , const iType1 & i1 , const iType2 & i2 , const iType3 & i3 , const iType4 & i4 ) const
{
#ifndef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY( (5 , this->rank() , NULL , m_map , i0 , i1 , i2 , i3, i4) )
#else
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY( (5 , this->rank() , m_track.template get_label<typename traits::memory_space>().c_str(),m_map,i0,i1,i2,i3,i4) )
#endif
return m_map.reference(i0,i1,i2,i3,i4,0,0);
operator()(const iType0 & i0 , const iType1 & i1 , const iType2 & i2 , const iType3 & i3 , const iType4 & i4 ) const
{
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY( (5 , this->rank(), m_track, m_map, i0, i1, i2, i3, i4) )
return m_map.reference(i0,i1,i2,i3,i4,0,0);
}
// Rank 6
template< typename iType0 , typename iType1 , typename iType2 , typename iType3, typename iType4 , typename iType5 >
KOKKOS_INLINE_FUNCTION
typename std::enable_if< (std::is_same<typename traits::specialize , void>::value && std::is_integral<iType0>::value && std::is_integral<iType1>::value && std::is_integral<iType2>::value && std::is_integral<iType3>::value && std::is_integral<iType4>::value && std::is_integral<iType5>::value), reference_type>::type
operator()(const iType0 & i0 , const iType1 & i1 , const iType2 & i2 , const iType3 & i3 , const iType4 & i4 , const iType5 & i5 ) const
{
#ifndef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY( (6 , this->rank() , NULL , m_map , i0 , i1 , i2 , i3, i4 , i5) )
#else
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY( (6 , this->rank() , m_track.template get_label<typename traits::memory_space>().c_str(),m_map,i0,i1,i2,i3,i4,i5) )
#endif
return m_map.reference(i0,i1,i2,i3,i4,i5);
operator()(const iType0 & i0 , const iType1 & i1 , const iType2 & i2 , const iType3 & i3 , const iType4 & i4 , const iType5 & i5 ) const
{
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY( (6 , this->rank(), m_track, m_map, i0, i1, i2, i3, i4, i5) )
return m_map.reference(i0,i1,i2,i3,i4,i5);
}
template< typename iType0 , typename iType1 , typename iType2 , typename iType3, typename iType4 , typename iType5 >
KOKKOS_INLINE_FUNCTION
typename std::enable_if< !(std::is_same<typename drvtraits::specialize , void>::value && std::is_integral<iType0>::value), reference_type>::type
operator()(const iType0 & i0 , const iType1 & i1 , const iType2 & i2 , const iType3 & i3 , const iType4 & i4 , const iType5 & i5 ) const
{
#ifndef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY( (6 , this->rank() , NULL , m_map , i0 , i1 , i2 , i3, i4 , i5) )
#else
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY( (6 , this->rank() , m_track.template get_label<typename traits::memory_space>().c_str(),m_map,i0,i1,i2,i3,i4,i5) )
#endif
return m_map.reference(i0,i1,i2,i3,i4,i5,0);
operator()(const iType0 & i0 , const iType1 & i1 , const iType2 & i2 , const iType3 & i3 , const iType4 & i4 , const iType5 & i5 ) const
{
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY( (6 , this->rank(), m_track, m_map, i0, i1, i2, i3, i4, i5) )
return m_map.reference(i0,i1,i2,i3,i4,i5,0);
}
// Rank 7
template< typename iType0 , typename iType1 , typename iType2 , typename iType3, typename iType4 , typename iType5 , typename iType6 >
KOKKOS_INLINE_FUNCTION
typename std::enable_if< (std::is_integral<iType0>::value && std::is_integral<iType1>::value && std::is_integral<iType2>::value && std::is_integral<iType3>::value && std::is_integral<iType4>::value && std::is_integral<iType5>::value && std::is_integral<iType6>::value), reference_type>::type
operator()(const iType0 & i0 , const iType1 & i1 , const iType2 & i2 , const iType3 & i3 , const iType4 & i4 , const iType5 & i5 , const iType6 & i6 ) const
{
#ifndef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_HOST
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY( (7 , this->rank() , NULL , m_map , i0 , i1 , i2 , i3, i4 , i5 , i6) )
#else
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY( (7 , this->rank() , m_track.template get_label<typename traits::memory_space>().c_str(),m_map,i0,i1,i2,i3,i4,i5,i6) )
#endif
return m_map.reference(i0,i1,i2,i3,i4,i5,i6);
operator()(const iType0 & i0 , const iType1 & i1 , const iType2 & i2 , const iType3 & i3 , const iType4 & i4 , const iType5 & i5 , const iType6 & i6 ) const
{
KOKKOS_IMPL_VIEW_OPERATOR_VERIFY( (7 , this->rank(), m_track, m_map, i0, i1, i2, i3, i4, i5, i6) )
return m_map.reference(i0,i1,i2,i3,i4,i5,i6);
}
#undef KOKKOS_IMPL_VIEW_OPERATOR_VERIFY
//----------------------------------------
// Standard constructor, destructor, and assignment operators...
// Standard constructor, destructor, and assignment operators...
KOKKOS_INLINE_FUNCTION
~DynRankView() {}
@ -840,7 +790,7 @@ public:
DynRankView & operator = ( const DynRankView & rhs ) { m_track = rhs.m_track; m_map = rhs.m_map; m_rank = rhs.m_rank; return *this; }
KOKKOS_INLINE_FUNCTION
DynRankView & operator = ( DynRankView && rhs ) { m_track = rhs.m_track; m_map = rhs.m_map; m_rank = rhs.m_rank; return *this; }
DynRankView & operator = ( DynRankView && rhs ) { m_track = rhs.m_track; m_map = rhs.m_map; m_rank = rhs.m_rank; return *this; }
//----------------------------------------
// Compatible view copy constructor and assignment
@ -1068,7 +1018,7 @@ public:
DynRankView( const Label & arg_label
, typename std::enable_if<
Kokkos::Experimental::Impl::is_view_label<Label>::value ,
const size_t >::type arg_N0 = ~size_t(0)
const size_t >::type arg_N0 = ~size_t(0)
, const size_t arg_N1 = ~size_t(0)
, const size_t arg_N2 = ~size_t(0)
, const size_t arg_N3 = ~size_t(0)
@ -1104,7 +1054,7 @@ public:
, const size_t arg_N6 = ~size_t(0)
, const size_t arg_N7 = ~size_t(0)
)
: DynRankView(Impl::ViewCtorProp< std::string , Kokkos::Experimental::Impl::WithoutInitializing_t >( arg_prop.label , Kokkos::Experimental::WithoutInitializing ), arg_N0, arg_N1, arg_N2, arg_N3, arg_N4, arg_N5, arg_N6, arg_N7 )
: DynRankView(Impl::ViewCtorProp< std::string , Kokkos::Experimental::Impl::WithoutInitializing_t >( arg_prop.label , Kokkos::Experimental::WithoutInitializing ), arg_N0, arg_N1, arg_N2, arg_N3, arg_N4, arg_N5, arg_N6, arg_N7 )
{}
//----------------------------------------
@ -1182,7 +1132,7 @@ public:
, const typename traits::array_layout & arg_layout )
: DynRankView( Impl::ViewCtorProp<pointer_type>(
reinterpret_cast<pointer_type>(
arg_space.get_shmem( map_type::memory_span(
arg_space.get_shmem( map_type::memory_span(
Impl::DynRankDimTraits<typename traits::specialize>::createLayout( arg_layout ) //is this correct?
) ) ) )
, arg_layout )
@ -1206,7 +1156,7 @@ public:
Impl::DynRankDimTraits<typename traits::specialize>::createLayout(
typename traits::array_layout
( arg_N0 , arg_N1 , arg_N2 , arg_N3
, arg_N4 , arg_N5 , arg_N6 , arg_N7 ) ) ) ) )
, arg_N4 , arg_N5 , arg_N6 , arg_N7 ) ) ) ) )
)
, typename traits::array_layout
( arg_N0 , arg_N1 , arg_N2 , arg_N3
@ -1241,7 +1191,7 @@ struct ViewMapping
, Kokkos::LayoutRight >::value ||
std::is_same< typename SrcTraits::array_layout
, Kokkos::LayoutStride >::value
)
)
), DynRankSubviewTag >::type
, SrcTraits
, Args ... >
@ -1266,19 +1216,19 @@ private:
typedef typename SrcTraits::value_type value_type ;
typedef value_type******* data_type ;
typedef value_type******* data_type ;
public:
typedef Kokkos::ViewTraits
< data_type
, array_layout
, array_layout
, typename SrcTraits::device_type
, typename SrcTraits::memory_traits > traits_type ;
typedef Kokkos::View
< data_type
, array_layout
, array_layout
, typename SrcTraits::device_type
, typename SrcTraits::memory_traits > type ;
@ -1289,17 +1239,17 @@ public:
static_assert( Kokkos::Impl::is_memory_traits< MemoryTraits >::value , "" );
typedef Kokkos::ViewTraits
< data_type
< data_type
, array_layout
, typename SrcTraits::device_type
, MemoryTraits > traits_type ;
typedef Kokkos::View
< data_type
< data_type
, array_layout
, typename SrcTraits::device_type
, MemoryTraits > type ;
};
};
typedef typename SrcTraits::dimension dimension ;
@ -1318,7 +1268,7 @@ public:
template < typename T , class ... P >
KOKKOS_INLINE_FUNCTION
static ret_type subview( const unsigned src_rank , Kokkos::Experimental::DynRankView< T , P...> const & src
static ret_type subview( const unsigned src_rank , Kokkos::Experimental::DynRankView< T , P...> const & src
, Args ... args )
{
@ -1339,8 +1289,8 @@ public:
ret_type dst ;
const SubviewExtents< 7 , rank > extents =
ExtentGenerator< Args ... >::generator( src.m_map.m_offset.m_dim , args... ) ;
const SubviewExtents< 7 , rank > extents =
ExtentGenerator< Args ... >::generator( src.m_map.m_offset.m_dim , args... ) ;
dst_offset_type tempdst( src.m_map.m_offset , extents ) ;
@ -1392,12 +1342,12 @@ using Subdynrankview = typename Kokkos::Experimental::Impl::ViewMapping< Kokkos:
template< class D , class ... P , class ...Args >
KOKKOS_INLINE_FUNCTION
Subdynrankview< ViewTraits<D******* , P...> , Args... >
Subdynrankview< ViewTraits<D******* , P...> , Args... >
subdynrankview( const Kokkos::Experimental::DynRankView< D , P... > &src , Args...args)
{
if ( src.rank() > sizeof...(Args) ) //allow sizeof...(Args) >= src.rank(), ignore the remaining args
{ Kokkos::abort("subdynrankview: num of args must be >= rank of the source DynRankView"); }
typedef Kokkos::Experimental::Impl::ViewMapping< Kokkos::Experimental::Impl::DynRankSubviewTag , Kokkos::ViewTraits< D*******, P... > , Args... > metafcn ;
return metafcn::subview( src.rank() , src , args... );
@ -1406,7 +1356,7 @@ subdynrankview( const Kokkos::Experimental::DynRankView< D , P... > &src , Args.
//Wrapper to allow subview function name
template< class D , class ... P , class ...Args >
KOKKOS_INLINE_FUNCTION
Subdynrankview< ViewTraits<D******* , P...> , Args... >
Subdynrankview< ViewTraits<D******* , P...> , Args... >
subview( const Kokkos::Experimental::DynRankView< D , P... > &src , Args...args)
{
return subdynrankview( src , args... );
@ -1508,7 +1458,7 @@ struct DynRankViewFill {
};
template< class OutputView >
struct DynRankViewFill< OutputView , typename std::enable_if< OutputView::Rank == 0 >::type > {
struct DynRankViewFill< OutputView , typename std::enable_if< OutputView::Rank == 0 >::type > {
DynRankViewFill( const OutputView & dst , const typename OutputView::const_value_type & src )
{
Kokkos::Impl::DeepCopy< typename OutputView::memory_space , Kokkos::HostSpace >
@ -1648,9 +1598,9 @@ void deep_copy
// If same type, equal layout, equal dimensions, equal span, and contiguous memory then can byte-wise copy
if ( rank(src) == 0 && rank(dst) == 0 )
{
{
typedef typename dst_type::value_type value_type ;
Kokkos::Impl::DeepCopy< dst_memory_space , src_memory_space >( dst.data() , src.data() , sizeof(value_type) );
Kokkos::Impl::DeepCopy< dst_memory_space , src_memory_space >( dst.data() , src.data() , sizeof(value_type) );
}
else if ( std::is_same< typename DstType::traits::value_type ,
typename SrcType::traits::non_const_value_type >::value &&
@ -1826,7 +1776,7 @@ create_mirror( const DynRankView<T,P...> & src
typedef DynRankView<T,P...> src_type ;
typedef typename src_type::HostMirror dst_type ;
return dst_type( std::string( src.label() ).append("_mirror")
return dst_type( std::string( src.label() ).append("_mirror")
, Impl::reconstructLayout(src.layout(), src.rank()) );
}
@ -1870,7 +1820,7 @@ create_mirror_view( const DynRankView<T,P...> & src
)>::type * = 0
)
{
return Kokkos::Experimental::create_mirror( src );
return Kokkos::Experimental::create_mirror( src );
}
// Create a mirror view in a new space (specialization for same space)
@ -1966,3 +1916,4 @@ using Kokkos::Experimental::realloc ;
} //end Kokkos
#endif

View File

@ -86,7 +86,7 @@ private:
public:
typedef Kokkos::Experimental::MemoryPool< typename traits::device_type > memory_pool ;
typedef Kokkos::MemoryPool< typename traits::device_type > memory_pool ;
private:
@ -275,6 +275,10 @@ public:
ch[jc_try] = reinterpret_cast<value_type*>(
m_pool.allocate( sizeof(value_type) << m_chunk_shift ));
if ( 0 == ch[jc_try] ) {
Kokkos::abort("DynamicView::resize_parallel exhausted memory pool");
}
Kokkos::memory_fence();
}
}
@ -436,7 +440,7 @@ public:
void operator()( unsigned i ) const
{
if ( m_destroy && i < m_chunk_max && 0 != m_chunks[i] ) {
m_pool.deallocate( m_chunks[i] , m_pool.get_min_block_size() );
m_pool.deallocate( m_chunks[i] , m_pool.min_block_size() );
}
m_chunks[i] = 0 ;
}
@ -495,7 +499,7 @@ public:
// The memory pool chunk is guaranteed to be a power of two
, m_chunk_shift(
Kokkos::Impl::integral_power_of_two(
m_pool.get_min_block_size()/sizeof(typename traits::value_type)) )
m_pool.min_block_size()/sizeof(typename traits::value_type)) )
, m_chunk_mask( ( 1 << m_chunk_shift ) - 1 )
, m_chunk_max( ( arg_size_max + m_chunk_mask ) >> m_chunk_shift )
{

View File

@ -1,13 +1,13 @@
/*
//@HEADER
// ************************************************************************
//
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
@ -36,7 +36,7 @@
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
//
// ************************************************************************
//@HEADER
*/
@ -114,7 +114,7 @@ private:
template <typename ReportType, typename DeviceType>
inline int ErrorReporter<ReportType, DeviceType>::getNumReports()
inline int ErrorReporter<ReportType, DeviceType>::getNumReports()
{
int num_reports = 0;
Kokkos::deep_copy(num_reports,m_numReportsAttempted);
@ -194,3 +194,4 @@ void ErrorReporter<ReportType, DeviceType>::resize(const size_t new_size)
} // namespace kokkos
#endif

View File

@ -1,12 +1,12 @@
//@HEADER
// ************************************************************************
//
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
@ -35,7 +35,7 @@
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
//
// ************************************************************************
//@HEADER
@ -170,4 +170,3 @@ struct less_equal
#endif //KOKKOS_FUNCTIONAL_HPP

View File

@ -1,13 +1,13 @@
/*
//@HEADER
// ************************************************************************
//
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
@ -36,7 +36,7 @@
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
//
// ************************************************************************
//@HEADER
*/
@ -61,7 +61,7 @@
#include <iostream>
#include <stdint.h>
#include <cstdint>
#include <stdexcept>
@ -847,3 +847,4 @@ inline void deep_copy( UnorderedMap<DKey, DT, DDevice, Hasher, EqualTo>
} // namespace Kokkos
#endif //KOKKOS_UNORDERED_MAP_HPP

View File

@ -1,13 +1,13 @@
/*
//@HEADER
// ************************************************************************
//
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
@ -36,7 +36,7 @@
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
//
// ************************************************************************
//@HEADER
*/
@ -281,3 +281,4 @@ public:
}
#endif

View File

@ -46,7 +46,7 @@
#include <Kokkos_Macros.hpp>
#include <impl/Kokkos_BitOps.hpp>
#include <stdint.h>
#include <cstdint>
#include <cstdio>
#include <climits>

View File

@ -1,12 +1,12 @@
//@HEADER
// ************************************************************************
//
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
@ -35,7 +35,7 @@
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
//
// ************************************************************************
//@HEADER
@ -43,7 +43,7 @@
#define KOKKOS_FUNCTIONAL_IMPL_HPP
#include <Kokkos_Macros.hpp>
#include <stdint.h>
#include <cstdint>
namespace Kokkos { namespace Impl {
@ -193,3 +193,4 @@ bool bitwise_equal(T const * const a_ptr, T const * const b_ptr)
}} // namespace Kokkos::Impl
#endif //KOKKOS_FUNCTIONAL_IMPL_HPP

View File

@ -1,13 +1,13 @@
/*
//@HEADER
// ************************************************************************
//
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
@ -36,7 +36,7 @@
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
//
// ************************************************************************
//@HEADER
*/
@ -45,7 +45,7 @@
#define KOKKOS_UNORDERED_MAP_IMPL_HPP
#include <Kokkos_Core_fwd.hpp>
#include <stdint.h>
#include <cstdint>
#include <cstdio>
#include <climits>
@ -295,3 +295,4 @@ struct UnorderedMapCanAssign<const Key,const Value,const Key,Value> : public tru
}} //Kokkos::Impl
#endif // KOKKOS_UNORDERED_MAP_IMPL_HPP

View File

@ -8,7 +8,7 @@ default: build_all
echo "End Build"
ifneq (,$(findstring Cuda,$(KOKKOS_DEVICES)))
CXX = $(KOKKOS_PATH)/config/nvcc_wrapper
CXX = $(KOKKOS_PATH)/bin/nvcc_wrapper
else
CXX = g++
endif
@ -21,8 +21,8 @@ include $(KOKKOS_PATH)/Makefile.kokkos
KOKKOS_CXXFLAGS += -I$(GTEST_PATH) -I${KOKKOS_PATH}/containers/unit_tests
TEST_TARGETS =
TARGETS =
TEST_TARGETS =
TARGETS =
ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
OBJ_CUDA = TestCuda.o UnitTestMain.o gtest-all.o
@ -49,16 +49,16 @@ ifeq ($(KOKKOS_INTERNAL_USE_SERIAL), 1)
endif
KokkosContainers_UnitTest_Cuda: $(OBJ_CUDA) $(KOKKOS_LINK_DEPENDS)
$(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) $(OBJ_CUDA) $(KOKKOS_LIBS) $(LIB) -o KokkosContainers_UnitTest_Cuda
$(LINK) $(EXTRA_PATH) $(OBJ_CUDA) $(KOKKOS_LIBS) $(LIB) $(KOKKOS_LDFLAGS) $(LDFLAGS) -o KokkosContainers_UnitTest_Cuda
KokkosContainers_UnitTest_Threads: $(OBJ_THREADS) $(KOKKOS_LINK_DEPENDS)
$(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) $(OBJ_THREADS) $(KOKKOS_LIBS) $(LIB) -o KokkosContainers_UnitTest_Threads
$(LINK) $(EXTRA_PATH) $(OBJ_THREADS) $(KOKKOS_LIBS) $(LIB) $(KOKKOS_LDFLAGS) $(LDFLAGS) -o KokkosContainers_UnitTest_Threads
KokkosContainers_UnitTest_OpenMP: $(OBJ_OPENMP) $(KOKKOS_LINK_DEPENDS)
$(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) $(OBJ_OPENMP) $(KOKKOS_LIBS) $(LIB) -o KokkosContainers_UnitTest_OpenMP
$(LINK) $(EXTRA_PATH) $(OBJ_OPENMP) $(KOKKOS_LIBS) $(LIB) $(KOKKOS_LDFLAGS) $(LDFLAGS) -o KokkosContainers_UnitTest_OpenMP
KokkosContainers_UnitTest_Serial: $(OBJ_SERIAL) $(KOKKOS_LINK_DEPENDS)
$(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) $(OBJ_SERIAL) $(KOKKOS_LIBS) $(LIB) -o KokkosContainers_UnitTest_Serial
$(LINK) $(EXTRA_PATH) $(OBJ_SERIAL) $(KOKKOS_LIBS) $(LIB) $(KOKKOS_LDFLAGS) $(LDFLAGS) -o KokkosContainers_UnitTest_Serial
test-cuda: KokkosContainers_UnitTest_Cuda
./KokkosContainers_UnitTest_Cuda
@ -76,7 +76,7 @@ build_all: $(TARGETS)
test: $(TEST_TARGETS)
clean: kokkos-clean
clean: kokkos-clean
rm -f *.o $(TARGETS)
# Compilation rules
@ -84,6 +84,5 @@ clean: kokkos-clean
%.o:%.cpp $(KOKKOS_CPP_DEPENDS)
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $<
gtest-all.o:$(GTEST_PATH)/gtest/gtest-all.cc
gtest-all.o:$(GTEST_PATH)/gtest/gtest-all.cc
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $(GTEST_PATH)/gtest/gtest-all.cc

View File

@ -1,263 +0,0 @@
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
#ifndef KOKKOS_TEST_COMPLEX_HPP
#define KOKKOS_TEST_COMPLEX_HPP
#include <Kokkos_Complex.hpp>
#include <gtest/gtest.h>
#include <iostream>
namespace Test {
namespace Impl {
template <typename RealType>
void testComplexConstructors () {
typedef Kokkos::complex<RealType> complex_type;
complex_type z1;
complex_type z2 (0.0, 0.0);
complex_type z3 (1.0, 0.0);
complex_type z4 (0.0, 1.0);
complex_type z5 (-1.0, -2.0);
ASSERT_TRUE( z1 == z2 );
ASSERT_TRUE( z1 != z3 );
ASSERT_TRUE( z1 != z4 );
ASSERT_TRUE( z1 != z5 );
ASSERT_TRUE( z2 != z3 );
ASSERT_TRUE( z2 != z4 );
ASSERT_TRUE( z2 != z5 );
ASSERT_TRUE( z3 != z4 );
ASSERT_TRUE( z3 != z5 );
complex_type z6 (-1.0, -2.0);
ASSERT_TRUE( z5 == z6 );
// Make sure that complex has value semantics, in particular, that
// equality tests use values and not pointers, so that
// reassignment actually changes the value.
z1 = complex_type (-3.0, -4.0);
ASSERT_TRUE( z1.real () == -3.0 );
ASSERT_TRUE( z1.imag () == -4.0 );
ASSERT_TRUE( z1 != z2 );
complex_type z7 (1.0);
ASSERT_TRUE( z3 == z7 );
ASSERT_TRUE( z7 == 1.0 );
ASSERT_TRUE( z7 != -1.0 );
z7 = complex_type (5.0);
ASSERT_TRUE( z7.real () == 5.0 );
ASSERT_TRUE( z7.imag () == 0.0 );
}
template <typename RealType>
void testPlus () {
typedef Kokkos::complex<RealType> complex_type;
complex_type z1 (1.0, -1.0);
complex_type z2 (-1.0, 1.0);
complex_type z3 = z1 + z2;
ASSERT_TRUE( z3 == complex_type (0.0, 0.0) );
}
template <typename RealType>
void testMinus () {
typedef Kokkos::complex<RealType> complex_type;
// Test binary minus.
complex_type z1 (1.0, -1.0);
complex_type z2 (-1.0, 1.0);
complex_type z3 = z1 - z2;
ASSERT_TRUE( z3 == complex_type (2.0, -2.0) );
// Test unary minus.
complex_type z4 (3.0, -4.0);
ASSERT_TRUE( -z1 == complex_type (-3.0, 4.0) );
}
template <typename RealType>
void testTimes () {
typedef Kokkos::complex<RealType> complex_type;
complex_type z1 (1.0, -1.0);
complex_type z2 (-1.0, 1.0);
complex_type z3 = z1 * z2;
ASSERT_TRUE( z3 == complex_type (0.0, 2.0) );
// Make sure that std::complex * Kokkos::complex works too.
std::complex<RealType> z4 (-1.0, 1.0);
complex_type z5 = z4 * z1;
ASSERT_TRUE( z5 == complex_type (0.0, 2.0) );
}
template <typename RealType>
void testDivide () {
typedef Kokkos::complex<RealType> complex_type;
// Test division of a complex number by a real number.
complex_type z1 (1.0, -1.0);
complex_type z2 (1.0 / 2.0, -1.0 / 2.0);
ASSERT_TRUE( z1 / 2.0 == z2 );
// (-1+2i)/(1-i) == ((-1+2i)(1+i)) / ((1-i)(1+i))
// (-1+2i)(1+i) == -3 + i
complex_type z3 (-1.0, 2.0);
complex_type z4 (1.0, -1.0);
complex_type z5 (-3.0, 1.0);
ASSERT_TRUE(z3 * Kokkos::conj (z4) == z5 );
// Test division of a complex number by a complex number.
// This assumes that RealType is a floating-point type.
complex_type z6 (Kokkos::real (z5) / 2.0,
Kokkos::imag (z5) / 2.0);
complex_type z7 = z3 / z4;
ASSERT_TRUE( z7 == z6 );
}
template <typename RealType>
void testOutsideKernel () {
testComplexConstructors<RealType> ();
testPlus<RealType> ();
testTimes<RealType> ();
testDivide<RealType> ();
}
template<typename RealType, typename Device>
void testCreateView () {
typedef Kokkos::complex<RealType> complex_type;
Kokkos::View<complex_type*, Device> x ("x", 10);
ASSERT_TRUE( x.dimension_0 () == 10 );
// Test that View assignment works.
Kokkos::View<complex_type*, Device> x_nonconst = x;
Kokkos::View<const complex_type*, Device> x_const = x;
}
template<typename RealType, typename Device>
class Fill {
public:
typedef typename Device::execution_space execution_space;
typedef Kokkos::View<Kokkos::complex<RealType>*, Device> view_type;
typedef typename view_type::size_type size_type;
KOKKOS_INLINE_FUNCTION
void operator () (const size_type i) const {
x_(i) = val_;
}
Fill (const view_type& x, const Kokkos::complex<RealType>& val) :
x_ (x), val_ (val)
{}
private:
view_type x_;
const Kokkos::complex<RealType> val_;
};
template<typename RealType, typename Device>
class Sum {
public:
typedef typename Device::execution_space execution_space;
typedef Kokkos::View<const Kokkos::complex<RealType>*, Device> view_type;
typedef typename view_type::size_type size_type;
typedef Kokkos::complex<RealType> value_type;
KOKKOS_INLINE_FUNCTION
void operator () (const size_type i, Kokkos::complex<RealType>& sum) const {
sum += x_(i);
}
Sum (const view_type& x) : x_ (x) {}
private:
view_type x_;
};
template<typename RealType, typename Device>
void testInsideKernel () {
typedef Kokkos::complex<RealType> complex_type;
typedef Kokkos::View<complex_type*, Device> view_type;
typedef typename view_type::size_type size_type;
const size_type N = 1000;
view_type x ("x", N);
ASSERT_TRUE( x.dimension_0 () == N );
// Kokkos::parallel_reduce (N, [=] (const size_type i, complex_type& result) {
// result += x[i];
// });
Kokkos::parallel_for (N, Fill<RealType, Device> (x, complex_type (1.0, -1.0)));
complex_type sum;
Kokkos::parallel_reduce (N, Sum<RealType, Device> (x), sum);
ASSERT_TRUE( sum.real () == 1000.0 && sum.imag () == -1000.0 );
}
} // namespace Impl
template <typename Device>
void testComplex ()
{
Impl::testOutsideKernel<float> ();
Impl::testOutsideKernel<double> ();
Impl::testCreateView<float, Device> ();
Impl::testCreateView<double, Device> ();
Impl::testInsideKernel<float, Device> ();
Impl::testInsideKernel<double, Device> ();
}
} // namespace Test
#endif // KOKKOS_TEST_COMPLEX_HPP

View File

@ -1,13 +1,13 @@
/*
//@HEADER
// ************************************************************************
//
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
@ -36,14 +36,17 @@
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
//
// ************************************************************************
//@HEADER
*/
#include <Kokkos_Macros.hpp>
#ifdef KOKKOS_ENABLE_CUDA
#include <iostream>
#include <iomanip>
#include <stdint.h>
#include <cstdint>
#include <gtest/gtest.h>
@ -69,7 +72,6 @@
//----------------------------------------------------------------------------
#ifdef KOKKOS_ENABLE_CUDA
namespace Test {
@ -237,5 +239,7 @@ TEST_F(cuda, ErrorReporter)
}
#else
void KOKKOS_CONTAINERS_UNIT_TESTS_TESTCUDA_PREVENT_EMPTY_LINK_ERROR() {}
#endif /* #ifdef KOKKOS_ENABLE_CUDA */

View File

@ -1,13 +1,13 @@
/*
//@HEADER
// ************************************************************************
//
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
@ -36,7 +36,7 @@
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
//
// ************************************************************************
//@HEADER
*/
@ -119,3 +119,4 @@ void test_dualview_combinations(unsigned int size)
} // namespace Test
#endif //KOKKOS_TEST_UNORDERED_MAP_HPP

View File

@ -61,7 +61,7 @@ struct TestDynamicView
typedef typename Space::execution_space execution_space ;
typedef typename Space::memory_space memory_space ;
typedef Kokkos::Experimental::MemoryPool<typename Space::device_type> memory_pool_type;
typedef Kokkos::MemoryPool<typename Space::device_type> memory_pool_type;
typedef Kokkos::Experimental::DynamicView<Scalar*,Space> view_type;
typedef typename view_type::const_type const_view_type ;
@ -131,7 +131,12 @@ struct TestDynamicView
// printf("TestDynamicView::run(%d) construct memory pool\n",arg_total_size);
memory_pool_type pool( memory_space() , arg_total_size * sizeof(Scalar) * 1.2 );
memory_pool_type pool( memory_space()
, arg_total_size * sizeof(Scalar) * 1.2
, 500 /* min block size in bytes */
, 30000 /* max block size in bytes */
, 1000000 /* min superblock size in bytes */
);
// printf("TestDynamicView::run(%d) construct dynamic view\n",arg_total_size);

View File

@ -1,13 +1,13 @@
/*
//@HEADER
// ************************************************************************
//
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
@ -36,7 +36,7 @@
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
//
// ************************************************************************
//@HEADER
*/
@ -225,3 +225,4 @@ struct ErrorReporterDriverNativeOpenMP : public ErrorReporterDriverBase<Kokkos::
} // namespace Test
#endif // #ifndef KOKKOS_TEST_ERROR_REPORTING_HPP

View File

@ -1,13 +1,13 @@
/*
//@HEADER
// ************************************************************************
//
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
@ -36,11 +36,14 @@
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
//
// ************************************************************************
//@HEADER
*/
#include <Kokkos_Macros.hpp>
#ifdef KOKKOS_ENABLE_OPENMP
#include <gtest/gtest.h>
#include <Kokkos_Core.hpp>
@ -56,7 +59,6 @@
#include <TestVector.hpp>
#include <TestDualView.hpp>
#include <TestDynamicView.hpp>
#include <TestComplex.hpp>
#include <Kokkos_DynRankView.hpp>
#include <TestDynViewAPI.hpp>
@ -68,7 +70,6 @@
namespace Test {
#ifdef KOKKOS_ENABLE_OPENMP
class openmp : public ::testing::Test {
protected:
static void SetUpTestCase()
@ -91,11 +92,6 @@ protected:
}
};
TEST_F( openmp, complex )
{
testComplex<Kokkos::OpenMP> ();
}
TEST_F( openmp, dyn_view_api) {
TestDynViewAPI< double , Kokkos::OpenMP >();
}
@ -172,7 +168,6 @@ OPENMP_DUALVIEW_COMBINE_TEST( 10 )
#undef OPENMP_DEEP_COPY
#undef OPENMP_VECTOR_COMBINE_TEST
#undef OPENMP_DUALVIEW_COMBINE_TEST
#endif
TEST_F( openmp , dynamic_view )
@ -204,3 +199,7 @@ TEST_F(openmp, ErrorReporterNativeOpenMP)
} // namespace test
#else
void KOKKOS_CONTAINERS_UNIT_TESTS_TESTOPENMP_PREVENT_EMPTY_LINK_ERROR() {}
#endif

View File

@ -1,13 +1,13 @@
/*
//@HEADER
// ************************************************************************
//
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
@ -36,19 +36,18 @@
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
//
// ************************************************************************
//@HEADER
*/
#include <Kokkos_Macros.hpp>
#ifdef KOKKOS_ENABLE_SERIAL
#include <gtest/gtest.h>
#include <Kokkos_Core.hpp>
#if ! defined(KOKKOS_ENABLE_SERIAL)
# error "It doesn't make sense to build this file unless the Kokkos::Serial device is enabled. If you see this message, it probably means that there is an error in Kokkos' CMake build infrastructure."
#else
#include <Kokkos_Bitset.hpp>
#include <Kokkos_UnorderedMap.hpp>
#include <Kokkos_Vector.hpp>
@ -59,7 +58,6 @@
#include <TestVector.hpp>
#include <TestDualView.hpp>
#include <TestDynamicView.hpp>
#include <TestComplex.hpp>
#include <iomanip>
@ -105,11 +103,6 @@ TEST_F( serial , staticcrsgraph )
TestStaticCrsGraph::run_test_graph3< Kokkos::Serial >(75, 100000);
}
TEST_F( serial, complex )
{
testComplex<Kokkos::Serial> ();
}
TEST_F( serial, bitset )
{
test_bitset<Kokkos::Serial> ();
@ -190,6 +183,7 @@ TEST_F(serial, ErrorReporter)
} // namespace Test
#else
void KOKKOS_CONTAINERS_UNIT_TESTS_TESTSERIAL_PREVENT_EMPTY_LINK_ERROR() {}
#endif // KOKKOS_ENABLE_SERIAL

View File

@ -1,13 +1,13 @@
/*
//@HEADER
// ************************************************************************
//
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
@ -36,7 +36,7 @@
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
//
// ************************************************************************
//@HEADER
*/
@ -184,4 +184,3 @@ void run_test_graph3(size_t B, size_t N)
}
} /* namespace TestStaticCrsGraph */

View File

@ -1,13 +1,13 @@
/*
//@HEADER
// ************************************************************************
//
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
@ -36,17 +36,18 @@
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
//
// ************************************************************************
//@HEADER
*/
#include <Kokkos_Macros.hpp>
#ifdef KOKKOS_ENABLE_THREADS
#include <gtest/gtest.h>
#include <Kokkos_Core.hpp>
#if defined( KOKKOS_ENABLE_PTHREAD )
#include <Kokkos_Bitset.hpp>
#include <Kokkos_UnorderedMap.hpp>
@ -201,6 +202,7 @@ TEST_F(threads, ErrorReporter)
} // namespace Test
#endif /* #if defined( KOKKOS_ENABLE_PTHREAD ) */
#else
void KOKKOS_CONTAINERS_UNIT_TESTS_TESTTHREADS_PREVENT_EMPTY_LINK_ERROR() {}
#endif /* #if defined( KOKKOS_ENABLE_THREADS ) */

View File

@ -1,12 +1,12 @@
//@HEADER
// ************************************************************************
//
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
@ -35,7 +35,7 @@
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
//
// ************************************************************************
//@HEADER
@ -311,3 +311,4 @@ void test_deep_copy( uint32_t num_nodes )
} // namespace Test
#endif //KOKKOS_TEST_UNORDERED_MAP_HPP

View File

@ -1,12 +1,12 @@
//@HEADER
// ************************************************************************
//
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
@ -35,7 +35,7 @@
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
//
// ************************************************************************
//@HEADER
@ -129,3 +129,4 @@ void test_vector_combinations(unsigned int size)
} // namespace Test
#endif //KOKKOS_TEST_UNORDERED_MAP_HPP

View File

@ -1,15 +1,38 @@
#ifndef KOKKOS_CORE_CONFIG_H
#if !defined(KOKKOS_MACROS_HPP) || defined(KOKKOS_CORE_CONFIG_H)
#error "Don't include KokkosCore_config.h directly; include Kokkos_Macros.hpp instead."
#else
#define KOKKOS_CORE_CONFIG_H
#endif
/* The trivial 'src/build_common.sh' creates a config
* that must stay in sync with this file.
*/
#cmakedefine KOKKOS_FOR_SIERRA
#if !defined( KOKKOS_FOR_SIERRA )
#ifndef KOKKOS_FOR_SIERRA
#cmakedefine KOKKOS_HAVE_MPI
#cmakedefine KOKKOS_HAVE_CUDA
#cmakedefine KOKKOS_HAVE_OPENMP
#cmakedefine KOKKOS_HAVE_PTHREAD
#cmakedefine KOKKOS_HAVE_QTHREADS
#cmakedefine KOKKOS_HAVE_SERIAL
#cmakedefine KOKKOS_HAVE_Winthread
#cmakedefine KOKKOS_HAVE_HWLOC
#cmakedefine KOKKOS_ENABLE_HBWSPACE
#cmakedefine KOKKOS_ENABLE_LIBRT
#cmakedefine KOKKOS_HAVE_DEBUG
#cmakedefine KOKKOS_ENABLE_DEBUG_BOUNDS_CHECK
#cmakedefine KOKKOS_ENABLE_DEBUG_DUALVIEW_MODIFY_CHECK
#cmakedefine KOKKOS_ENABLE_PROFILING
#cmakedefine KOKKOS_ENABLE_PROFILING_LOAD_PRINT
#cmakedefine KOKKOS_ENABLE_AGGRESSIVE_VECTORIZATION
#ifdef KOKKOS_HAVE_CUDA
#cmakedefine KOKKOS_ENABLE_CUDA_LDG_INTRINSIC
// mfh 16 Sep 2014: If passed in on the command line, that overrides
// any value of KOKKOS_USE_CUDA_UVM here. Doing this should prevent build
@ -23,26 +46,8 @@
// hesitate to do that now, because I'm not sure if all the files are
// including KokkosCore_config.h (or a header file that includes it) like
// they should.
#if ! defined(KOKKOS_USE_CUDA_UVM)
#ifndef KOKKOS_USE_CUDA_UVM
#cmakedefine KOKKOS_USE_CUDA_UVM
#endif // ! defined(KOKKOS_USE_CUDA_UVM)
#cmakedefine KOKKOS_HAVE_PTHREAD
#cmakedefine KOKKOS_HAVE_SERIAL
#cmakedefine KOKKOS_HAVE_QTHREADS
#cmakedefine KOKKOS_HAVE_Winthread
#cmakedefine KOKKOS_HAVE_OPENMP
#cmakedefine KOKKOS_HAVE_HWLOC
#cmakedefine KOKKOS_HAVE_DEBUG
#cmakedefine KOKKOS_ENABLE_DEBUG_BOUNDS_CHECK
#cmakedefine KOKKOS_HAVE_CXX11
#cmakedefine KOKKOS_HAVE_CUSPARSE
#cmakedefine KOKKOS_ENABLE_PROFILING_INTERNAL
#ifdef KOKKOS_ENABLE_PROFILING_INTERNAL
#define KOKKOS_ENABLE_PROFILING 1
#else
#define KOKKOS_ENABLE_PROFILING 0
#endif
#cmakedefine KOKKOS_HAVE_CUDA_RDC
@ -55,13 +60,51 @@
#define KOKKOS_CUDA_USE_LAMBDA 1
#endif
#endif
#cmakedefine KOKKOS_CUDA_CLANG_WORKAROUND
#ifndef __CUDA_ARCH__
#cmakedefine KOKKOS_ENABLE_ISA_X86_64
#cmakedefine KOKKOS_ENABLE_ISA_KNC
#cmakedefine KOKKOS_ENABLE_ISA_POWERPCLE
#endif
#cmakedefine KOKKOS_ARCH_ARMV80 1
#cmakedefine KOKKOS_ARCH_ARMV81 1
#cmakedefine KOKKOS_ARCH_ARMV8_THUNDERX 1
#cmakedefine KOKKOS_ARCH_AVX 1
#cmakedefine KOKKOS_ARCH_AVX2 1
#cmakedefine KOKKOS_ARCH_AVX512MIC 1
#cmakedefine KOKKOS_ARCH_AVX512XEON 1
#cmakedefine KOKKOS_ARCH_KNC 1
#cmakedefine KOKKOS_ARCH_POWER8 1
#cmakedefine KOKKOS_ARCH_POWER9 1
#cmakedefine KOKKOS_ARCH_KEPLER 1
#cmakedefine KOKKOS_ARCH_KEPLER30 1
#cmakedefine KOKKOS_ARCH_KEPLER32 1
#cmakedefine KOKKOS_ARCH_KEPLER35 1
#cmakedefine KOKKOS_ARCH_KEPLER37 1
#cmakedefine KOKKOS_ARCH_MAXWELL 1
#cmakedefine KOKKOS_ARCH_MAXWELL50 1
#cmakedefine KOKKOS_ARCH_MAXWELL52 1
#cmakedefine KOKKOS_ARCH_MAXWELL53 1
#cmakedefine KOKKOS_ARCH_PASCAL 1
#cmakedefine KOKKOS_ARCH_PASCAL60 1
#cmakedefine KOKKOS_ARCH_PASCAL61 1
// Don't forbid users from defining this macro on the command line,
// but still make sure that CMake logic can control its definition.
#if ! defined(KOKKOS_HAVE_CXX11_DISPATCH_LAMBDA)
#ifndef KOKKOS_HAVE_CXX11_DISPATCH_LAMBDA
#cmakedefine KOKKOS_HAVE_CXX11_DISPATCH_LAMBDA 1
#endif // KOKKOS_HAVE_CXX11_DISPATCH_LAMBDA
#endif
// TODO: These are currently not used in Kokkos. Should they be removed?
#cmakedefine KOKKOS_HAVE_MPI
#cmakedefine KOKKOS_HAVE_CUSPARSE
// TODO: No longer options in Kokkos. Need to be removed.
#cmakedefine KOKKOS_USING_DEPRECATED_VIEW
#cmakedefine KOKKOS_HAVE_CXX11
#endif // KOKKOS_FOR_SIERRA
#endif // KOKKOS_CORE_CONFIG_H

View File

@ -1,11 +1,18 @@
INCLUDE_DIRECTORIES(${CMAKE_CURRENT_BINRARY_DIR})
INCLUDE_DIRECTORIES(${CMAKE_CURRENT_BINARY_DIR})
INCLUDE_DIRECTORIES(REQUIRED_DURING_INSTALLATION_TESTING ${CMAKE_CURRENT_SOURCE_DIR})
# warning: PerfTest_CustomReduction.cpp uses
# ../../algorithms/src/Kokkos_Random.hpp
# we'll just allow it to be included, but note
# that in TriBITS KokkosAlgorithms can be disabled...
INCLUDE_DIRECTORIES("${CMAKE_CURRENT_SOURCE_DIR}/../../algorithms/src")
SET(SOURCES
PerfTestMain.cpp
PerfTestHost.cpp
PerfTestCuda.cpp
PerfTestGramSchmidt.cpp
PerfTestHexGrad.cpp
PerfTest_CustomReduction.cpp
)
# Per #374, we always want to build this test, but we only want to run

View File

@ -8,12 +8,14 @@ default: build_all
echo "End Build"
ifneq (,$(findstring Cuda,$(KOKKOS_DEVICES)))
CXX = $(KOKKOS_PATH)/config/nvcc_wrapper
CXX = $(KOKKOS_PATH)/bin/nvcc_wrapper
KOKKOS_CUDA_OPTIONS=enable_lambda
else
CXX = g++
endif
CXXFLAGS = -O3
CXXFLAGS = -O3
#CXXFLAGS += -DGENERIC_REDUCER
LINK ?= $(CXX)
LDFLAGS ?= -lpthread
@ -21,23 +23,49 @@ include $(KOKKOS_PATH)/Makefile.kokkos
KOKKOS_CXXFLAGS += -I$(GTEST_PATH) -I${KOKKOS_PATH}/core/perf_test
TEST_TARGETS =
TARGETS =
TEST_TARGETS =
TARGETS =
OBJ_PERF = PerfTestHost.o PerfTestCuda.o PerfTestMain.o gtest-all.o
#
OBJ_PERF = PerfTestMain.o gtest-all.o
OBJ_PERF += PerfTestGramSchmidt.o
OBJ_PERF += PerfTestHexGrad.o
OBJ_PERF += PerfTest_CustomReduction.o
TARGETS += KokkosCore_PerformanceTest
TEST_TARGETS += test-performance
#
OBJ_ATOMICS = test_atomic.o
TARGETS += KokkosCore_PerformanceTest_Atomics
TEST_TARGETS += test-atomic
#
OBJ_MEMPOOL = test_mempool.o
TARGETS += KokkosCore_PerformanceTest_Mempool
TEST_TARGETS += test-mempool
#
OBJ_TASKDAG = test_taskdag.o
TARGETS += KokkosCore_PerformanceTest_TaskDAG
TEST_TARGETS += test-taskdag
#
KokkosCore_PerformanceTest: $(OBJ_PERF) $(KOKKOS_LINK_DEPENDS)
$(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) $(OBJ_PERF) $(KOKKOS_LIBS) $(LIB) -o KokkosCore_PerformanceTest
$(LINK) $(EXTRA_PATH) $(OBJ_PERF) $(KOKKOS_LIBS) $(LIB) $(KOKKOS_LDFLAGS) $(LDFLAGS) -o KokkosCore_PerformanceTest
KokkosCore_PerformanceTest_Atomics: $(OBJ_ATOMICS) $(KOKKOS_LINK_DEPENDS)
$(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) $(OBJ_ATOMICS) $(KOKKOS_LIBS) $(LIB) -o KokkosCore_PerformanceTest_Atomics
$(LINK) $(EXTRA_PATH) $(OBJ_ATOMICS) $(KOKKOS_LIBS) $(LIB) $(KOKKOS_LDFLAGS) $(LDFLAGS) -o KokkosCore_PerformanceTest_Atomics
KokkosCore_PerformanceTest_Mempool: $(OBJ_MEMPOOL) $(KOKKOS_LINK_DEPENDS)
$(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) $(OBJ_MEMPOOL) $(KOKKOS_LIBS) $(LIB) -o KokkosCore_PerformanceTest_Mempool
KokkosCore_PerformanceTest_TaskDAG: $(OBJ_TASKDAG) $(KOKKOS_LINK_DEPENDS)
$(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) $(OBJ_TASKDAG) $(KOKKOS_LIBS) $(LIB) -o KokkosCore_PerformanceTest_TaskDAG
test-performance: KokkosCore_PerformanceTest
./KokkosCore_PerformanceTest
@ -45,12 +73,18 @@ test-performance: KokkosCore_PerformanceTest
test-atomic: KokkosCore_PerformanceTest_Atomics
./KokkosCore_PerformanceTest_Atomics
test-mempool: KokkosCore_PerformanceTest_Mempool
./KokkosCore_PerformanceTest_Mempool
test-taskdag: KokkosCore_PerformanceTest_TaskDAG
./KokkosCore_PerformanceTest_TaskDAG
build_all: $(TARGETS)
test: $(TEST_TARGETS)
clean: kokkos-clean
clean: kokkos-clean
rm -f *.o $(TARGETS)
# Compilation rules
@ -58,5 +92,6 @@ clean: kokkos-clean
%.o:%.cpp $(KOKKOS_CPP_DEPENDS)
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $<
gtest-all.o:$(GTEST_PATH)/gtest/gtest-all.cc
gtest-all.o:$(GTEST_PATH)/gtest/gtest-all.cc
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $(GTEST_PATH)/gtest/gtest-all.cc

View File

@ -1,199 +0,0 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#include <iostream>
#include <iomanip>
#include <algorithm>
#include <gtest/gtest.h>
#include <Kokkos_Core.hpp>
#if defined( KOKKOS_ENABLE_CUDA )
#include <impl/Kokkos_Timer.hpp>
#include <PerfTestMDRange.hpp>
#include <PerfTestHexGrad.hpp>
#include <PerfTestBlasKernels.hpp>
#include <PerfTestGramSchmidt.hpp>
#include <PerfTestDriver.hpp>
namespace Test {
class cuda : public ::testing::Test {
protected:
static void SetUpTestCase() {
Kokkos::HostSpace::execution_space::initialize();
Kokkos::Cuda::initialize( Kokkos::Cuda::SelectDevice(0) );
}
static void TearDownTestCase() {
Kokkos::Cuda::finalize();
Kokkos::HostSpace::execution_space::finalize();
}
};
//TEST_F( cuda, mdrange_lr ) {
// EXPECT_NO_THROW( (run_test_mdrange<Kokkos::Cuda , Kokkos::LayoutRight>( 5, 8, "Kokkos::Cuda" )) );
//}
//TEST_F( cuda, mdrange_ll ) {
// EXPECT_NO_THROW( (run_test_mdrange<Kokkos::Cuda , Kokkos::LayoutLeft>( 5, 8, "Kokkos::Cuda" )) );
//}
TEST_F( cuda, hexgrad )
{
EXPECT_NO_THROW( run_test_hexgrad< Kokkos::Cuda >( 10 , 20, "Kokkos::Cuda" ) );
}
TEST_F( cuda, gramschmidt )
{
EXPECT_NO_THROW( run_test_gramschmidt< Kokkos::Cuda >( 10 , 20, "Kokkos::Cuda" ) );
}
namespace {
template <typename T>
struct TextureFetch
{
typedef Kokkos::View< T *, Kokkos::CudaSpace> array_type;
typedef Kokkos::View< const T *, Kokkos::CudaSpace, Kokkos::MemoryRandomAccess> const_array_type;
typedef Kokkos::View< int *, Kokkos::CudaSpace> index_array_type;
typedef Kokkos::View< const int *, Kokkos::CudaSpace> const_index_array_type;
struct FillArray
{
array_type m_array;
FillArray( const array_type & array )
: m_array(array)
{}
void apply() const
{
Kokkos::parallel_for( Kokkos::RangePolicy<Kokkos::Cuda,int>(0,m_array.dimension_0()), *this);
}
KOKKOS_INLINE_FUNCTION
void operator()(int i) const { m_array(i) = i; }
};
struct RandomIndexes
{
index_array_type m_indexes;
typename index_array_type::HostMirror m_host_indexes;
RandomIndexes( const index_array_type & indexes)
: m_indexes(indexes)
, m_host_indexes(Kokkos::create_mirror(m_indexes))
{}
void apply() const
{
Kokkos::parallel_for( Kokkos::RangePolicy<Kokkos::HostSpace::execution_space,int>(0,m_host_indexes.dimension_0()), *this);
//random shuffle
Kokkos::HostSpace::execution_space::fence();
std::random_shuffle(m_host_indexes.ptr_on_device(), m_host_indexes.ptr_on_device() + m_host_indexes.dimension_0());
Kokkos::deep_copy(m_indexes,m_host_indexes);
}
KOKKOS_INLINE_FUNCTION
void operator()(int i) const { m_host_indexes(i) = i; }
};
struct RandomReduce
{
const_array_type m_array;
const_index_array_type m_indexes;
RandomReduce( const const_array_type & array, const const_index_array_type & indexes)
: m_array(array)
, m_indexes(indexes)
{}
void apply(T & reduce) const
{
Kokkos::parallel_reduce( Kokkos::RangePolicy<Kokkos::Cuda,int>(0,m_array.dimension_0()), *this, reduce);
}
KOKKOS_INLINE_FUNCTION
void operator()(int i, T & reduce) const
{ reduce += m_array(m_indexes(i)); }
};
static void run(int size, double & reduce_time, T &reduce)
{
array_type array("array",size);
index_array_type indexes("indexes",size);
{ FillArray f(array); f.apply(); }
{ RandomIndexes f(indexes); f.apply(); }
Kokkos::Cuda::fence();
Kokkos::Timer timer;
for (int j=0; j<10; ++j) {
RandomReduce f(array,indexes);
f.apply(reduce);
}
Kokkos::Cuda::fence();
reduce_time = timer.seconds();
}
};
} // unnamed namespace
TEST_F( cuda, texture_double )
{
printf("Random reduce of double through texture fetch\n");
for (int i=1; i<=26; ++i) {
int size = 1<<i;
double time = 0;
double reduce = 0;
TextureFetch<double>::run(size,time,reduce);
printf(" time = %1.3e size = 2^%d\n", time, i);
}
}
} // namespace Test
#endif /* #if defined( KOKKOS_ENABLE_CUDA ) */

View File

@ -398,91 +398,5 @@ void run_test_mdrange( int exp_beg , int exp_end, const char deviceTypeName[], i
}
template< class DeviceType >
void run_test_hexgrad( int exp_beg , int exp_end, const char deviceTypeName[] )
{
std::string label_hexgrad ;
label_hexgrad.append( "\"HexGrad< double , " );
// mfh 06 Jun 2013: This only appends "DeviceType" (literally) to
// the string, not the actual name of the device type. Thus, I've
// modified the function to take the name of the device type.
//
//label_hexgrad.append( KOKKOS_MACRO_TO_STRING( DeviceType ) );
label_hexgrad.append( deviceTypeName );
label_hexgrad.append( " >\"" );
for (int i = exp_beg ; i < exp_end ; ++i) {
double min_seconds = 0.0 ;
double max_seconds = 0.0 ;
double avg_seconds = 0.0 ;
const int parallel_work_length = 1<<i;
for ( int j = 0 ; j < NUMBER_OF_TRIALS ; ++j ) {
const double seconds = HexGrad< DeviceType >::test(parallel_work_length) ;
if ( 0 == j ) {
min_seconds = seconds ;
max_seconds = seconds ;
}
else {
if ( seconds < min_seconds ) min_seconds = seconds ;
if ( seconds > max_seconds ) max_seconds = seconds ;
}
avg_seconds += seconds ;
}
avg_seconds /= NUMBER_OF_TRIALS ;
std::cout << label_hexgrad
<< " , " << parallel_work_length
<< " , " << min_seconds
<< " , " << ( min_seconds / parallel_work_length )
<< std::endl ;
}
}
template< class DeviceType >
void run_test_gramschmidt( int exp_beg , int exp_end, const char deviceTypeName[] )
{
std::string label_gramschmidt ;
label_gramschmidt.append( "\"GramSchmidt< double , " );
// mfh 06 Jun 2013: This only appends "DeviceType" (literally) to
// the string, not the actual name of the device type. Thus, I've
// modified the function to take the name of the device type.
//
//label_gramschmidt.append( KOKKOS_MACRO_TO_STRING( DeviceType ) );
label_gramschmidt.append( deviceTypeName );
label_gramschmidt.append( " >\"" );
for (int i = exp_beg ; i < exp_end ; ++i) {
double min_seconds = 0.0 ;
double max_seconds = 0.0 ;
double avg_seconds = 0.0 ;
const int parallel_work_length = 1<<i;
for ( int j = 0 ; j < NUMBER_OF_TRIALS ; ++j ) {
const double seconds = ModifiedGramSchmidt< double , DeviceType >::test(parallel_work_length, 32 ) ;
if ( 0 == j ) {
min_seconds = seconds ;
max_seconds = seconds ;
}
else {
if ( seconds < min_seconds ) min_seconds = seconds ;
if ( seconds > max_seconds ) max_seconds = seconds ;
}
avg_seconds += seconds ;
}
avg_seconds /= NUMBER_OF_TRIALS ;
std::cout << label_gramschmidt
<< " , " << parallel_work_length
<< " , " << min_seconds
<< " , " << ( min_seconds / parallel_work_length )
<< std::endl ;
}
}
}

View File

@ -41,6 +41,10 @@
//@HEADER
*/
#include <Kokkos_Core.hpp>
#include <gtest/gtest.h>
#include <PerfTest_Category.hpp>
#include <cmath>
#include <PerfTestBlasKernels.hpp>
@ -70,7 +74,7 @@ struct InvNorm2 : public Kokkos::DotSingle< VectorView > {
KOKKOS_INLINE_FUNCTION
void final( value_type & result ) const
{
result = sqrt( result );
result = std::sqrt( result );
Rjj() = result ;
inv() = ( 0 < result ) ? 1.0 / result : 0 ;
}
@ -157,7 +161,7 @@ struct ModifiedGramSchmidt
for ( size_type j = 0 ; j < count ; ++j ) {
// Reduction : tmp = dot( Q(:,j) , Q(:,j) );
// PostProcess : tmp = sqrt( tmp ); R(j,j) = tmp ; tmp = 1 / tmp ;
// PostProcess : tmp = std::sqrt( tmp ); R(j,j) = tmp ; tmp = 1 / tmp ;
const vector_type Qj = Kokkos::subview( Q_ , Kokkos::ALL() , j );
const value_view Rjj = Kokkos::subview( R_ , j , j );
@ -222,5 +226,58 @@ struct ModifiedGramSchmidt
}
};
template< class DeviceType >
void run_test_gramschmidt( int exp_beg , int exp_end, int num_trials, const char deviceTypeName[] )
{
std::string label_gramschmidt ;
label_gramschmidt.append( "\"GramSchmidt< double , " );
label_gramschmidt.append( deviceTypeName );
label_gramschmidt.append( " >\"" );
for (int i = exp_beg ; i < exp_end ; ++i) {
double min_seconds = 0.0 ;
double max_seconds = 0.0 ;
double avg_seconds = 0.0 ;
const int parallel_work_length = 1<<i;
for ( int j = 0 ; j < num_trials ; ++j ) {
const double seconds = ModifiedGramSchmidt< double , DeviceType >::test(parallel_work_length, 32 ) ;
if ( 0 == j ) {
min_seconds = seconds ;
max_seconds = seconds ;
}
else {
if ( seconds < min_seconds ) min_seconds = seconds ;
if ( seconds > max_seconds ) max_seconds = seconds ;
}
avg_seconds += seconds ;
}
avg_seconds /= num_trials ;
std::cout << label_gramschmidt
<< " , " << parallel_work_length
<< " , " << min_seconds
<< " , " << ( min_seconds / parallel_work_length )
<< std::endl ;
}
}
TEST_F( default_exec, gramschmidt ) {
int exp_beg = 10;
int exp_end = 20;
int num_trials = 5;
if(command_line_num_args()>1)
exp_beg = atoi(command_line_arg(1));
if(command_line_num_args()>2)
exp_end = atoi(command_line_arg(2));
if(command_line_num_args()>3)
num_trials = atoi(command_line_arg(3));
EXPECT_NO_THROW(run_test_gramschmidt< Kokkos::DefaultExecutionSpace>( exp_beg, exp_end, num_trials, Kokkos::DefaultExecutionSpace::name() ));
}
}

View File

@ -41,6 +41,10 @@
//@HEADER
*/
#include <Kokkos_Core.hpp>
#include <gtest/gtest.h>
#include <PerfTest_Category.hpp>
namespace Test {
template< class DeviceType ,
@ -264,5 +268,58 @@ struct HexGrad
}
};
template< class DeviceType >
void run_test_hexgrad( int exp_beg , int exp_end, int num_trials, const char deviceTypeName[] )
{
std::string label_hexgrad ;
label_hexgrad.append( "\"HexGrad< double , " );
label_hexgrad.append( deviceTypeName );
label_hexgrad.append( " >\"" );
for (int i = exp_beg ; i < exp_end ; ++i) {
double min_seconds = 0.0 ;
double max_seconds = 0.0 ;
double avg_seconds = 0.0 ;
const int parallel_work_length = 1<<i;
for ( int j = 0 ; j < num_trials ; ++j ) {
const double seconds = HexGrad< DeviceType >::test(parallel_work_length) ;
if ( 0 == j ) {
min_seconds = seconds ;
max_seconds = seconds ;
}
else {
if ( seconds < min_seconds ) min_seconds = seconds ;
if ( seconds > max_seconds ) max_seconds = seconds ;
}
avg_seconds += seconds ;
}
avg_seconds /= num_trials ;
std::cout << label_hexgrad
<< " , " << parallel_work_length
<< " , " << min_seconds
<< " , " << ( min_seconds / parallel_work_length )
<< std::endl ;
}
}
TEST_F( default_exec, hexgrad ) {
int exp_beg = 10;
int exp_end = 20;
int num_trials = 5;
if(command_line_num_args()>1)
exp_beg = atoi(command_line_arg(1));
if(command_line_num_args()>2)
exp_end = atoi(command_line_arg(2));
if(command_line_num_args()>3)
num_trials = atoi(command_line_arg(3));
EXPECT_NO_THROW(run_test_hexgrad< Kokkos::DefaultExecutionSpace >( exp_beg, exp_end, num_trials, Kokkos::DefaultExecutionSpace::name() ));
}
}

View File

@ -1,125 +0,0 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#include <gtest/gtest.h>
#include <Kokkos_Core.hpp>
#if defined( KOKKOS_ENABLE_OPENMP )
typedef Kokkos::OpenMP TestHostDevice ;
const char TestHostDeviceName[] = "Kokkos::OpenMP" ;
#elif defined( KOKKOS_ENABLE_PTHREAD )
typedef Kokkos::Threads TestHostDevice ;
const char TestHostDeviceName[] = "Kokkos::Threads" ;
#elif defined( KOKKOS_ENABLE_SERIAL )
typedef Kokkos::Serial TestHostDevice ;
const char TestHostDeviceName[] = "Kokkos::Serial" ;
#else
# error "You must enable at least one of the following execution spaces in order to build this test: Kokkos::Threads, Kokkos::OpenMP, or Kokkos::Serial."
#endif
#include <impl/Kokkos_Timer.hpp>
#include <PerfTestMDRange.hpp>
#include <PerfTestHexGrad.hpp>
#include <PerfTestBlasKernels.hpp>
#include <PerfTestGramSchmidt.hpp>
#include <PerfTestDriver.hpp>
//------------------------------------------------------------------------
namespace Test {
class host : public ::testing::Test {
protected:
static void SetUpTestCase()
{
if(Kokkos::hwloc::available()) {
const unsigned numa_count = Kokkos::hwloc::get_available_numa_count();
const unsigned cores_per_numa = Kokkos::hwloc::get_available_cores_per_numa();
const unsigned threads_per_core = Kokkos::hwloc::get_available_threads_per_core();
unsigned threads_count = 0 ;
threads_count = std::max( 1u , numa_count )
* std::max( 2u , cores_per_numa * threads_per_core );
TestHostDevice::initialize( threads_count );
} else {
const unsigned thread_count = 4 ;
TestHostDevice::initialize( thread_count );
}
}
static void TearDownTestCase()
{
TestHostDevice::finalize();
}
};
//TEST_F( host, mdrange_lr ) {
// EXPECT_NO_THROW( (run_test_mdrange<TestHostDevice , Kokkos::LayoutRight> (5, 8, TestHostDeviceName) ) );
//}
//TEST_F( host, mdrange_ll ) {
// EXPECT_NO_THROW( (run_test_mdrange<TestHostDevice , Kokkos::LayoutLeft> (5, 8, TestHostDeviceName) ) );
//}
TEST_F( host, hexgrad ) {
EXPECT_NO_THROW(run_test_hexgrad< TestHostDevice>( 10, 20, TestHostDeviceName ));
}
TEST_F( host, gramschmidt ) {
EXPECT_NO_THROW(run_test_gramschmidt< TestHostDevice>( 10, 20, TestHostDeviceName ));
}
} // namespace Test

View File

@ -42,8 +42,37 @@
*/
#include <gtest/gtest.h>
#include <Kokkos_Core.hpp>
namespace Test {
int command_line_num_args(int n = 0) {
static int n_args = 0;
if(n>0)
n_args = n;
return n_args;
}
const char* command_line_arg(int k, char** input_args = NULL) {
static char** args;
if(input_args != NULL)
args = input_args;
if(command_line_num_args() > k)
return args[k];
else
return NULL;
}
}
int main(int argc, char *argv[]) {
::testing::InitGoogleTest(&argc,argv);
return RUN_ALL_TESTS();
Kokkos::initialize(argc,argv);
(void) Test::command_line_num_args(argc);
(void) Test::command_line_arg(0,argv);
int result = RUN_ALL_TESTS();
Kokkos::finalize();
return result;
}

View File

@ -41,16 +41,28 @@
//@HEADER
*/
#include <cuda/TestCuda.hpp>
#ifndef KOKKOS_TEST_THREADS_HPP
#define KOKKOS_TEST_THREADS_HPP
#include <gtest/gtest.h>
namespace Test {
TEST_F( cuda, view_api_a )
{
typedef Kokkos::View< const int *, Kokkos::Cuda, Kokkos::MemoryTraits<Kokkos::RandomAccess> > view_texture_managed;
typedef Kokkos::View< const int *, Kokkos::Cuda, Kokkos::MemoryTraits<Kokkos::RandomAccess | Kokkos::Unmanaged> > view_texture_unmanaged;
extern int command_line_num_args(int n = 0);
extern const char* command_line_arg(int k, char** input_args = NULL);
TestViewAPI< double, Kokkos::Cuda >();
}
class default_exec : public ::testing::Test {
protected:
static void SetUpTestCase() {
}
static void TearDownTestCase() {
}
};
} // namespace Test
#define TEST_CATEGORY default_exec
#define TEST_EXECSPACE Kokkos::DefaultExecutionSpace
#endif

View File

@ -0,0 +1,115 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#include <Kokkos_Core.hpp>
#include <gtest/gtest.h>
#include <PerfTest_Category.hpp>
#include <Kokkos_Random.hpp>
#ifdef KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA
namespace Test {
template<class Scalar>
void custom_reduction_test(int N, int R, int num_trials) {
Kokkos::Random_XorShift64_Pool<> rand_pool(183291);
Kokkos::View<Scalar*> a("A",N);
Kokkos::fill_random(a,rand_pool,1.0);
Scalar max;
// Warm up
Kokkos::parallel_reduce(Kokkos::TeamPolicy<>(N/1024,32), KOKKOS_LAMBDA( const Kokkos::TeamPolicy<>::member_type& team, Scalar& lmax) {
Scalar team_max = Scalar(0);
for(int rr = 0; rr<R; rr++) {
int i = team.league_rank();
Kokkos::parallel_reduce(Kokkos::TeamThreadRange(team,32), [&] (const int& j, Scalar& thread_max) {
Scalar t_max = Scalar(0);
Kokkos::parallel_reduce(Kokkos::ThreadVectorRange(team,32), [&] (const int& k, Scalar& max_) {
const Scalar val = a((i*32 + j)*32 + k);
if(val>lmax) lmax = val;
if((k == 11) && (j==17) && (i==2)) lmax = 11.5;
},Kokkos::Experimental::Max<Scalar>(t_max));
if(t_max>thread_max) thread_max = t_max;
},Kokkos::Experimental::Max<Scalar>(team_max));
}
if(team_max>lmax) lmax = team_max;
},Kokkos::Experimental::Max<Scalar>(max));
// Timing
Kokkos::Timer timer;
for(int r = 0; r<num_trials; r++) {
Kokkos::parallel_reduce(Kokkos::TeamPolicy<>(N/1024,32), KOKKOS_LAMBDA( const Kokkos::TeamPolicy<>::member_type& team, Scalar& lmax) {
Scalar team_max = Scalar(0);
for(int rr = 0; rr<R; rr++) {
int i = team.league_rank();
Kokkos::parallel_reduce(Kokkos::TeamThreadRange(team,32), [&] (const int& j, Scalar& thread_max) {
Scalar t_max = Scalar(0);
Kokkos::parallel_reduce(Kokkos::ThreadVectorRange(team,32), [&] (const int& k, Scalar& max_) {
const Scalar val = a((i*32 + j)*32 + k);
if(val>lmax) lmax = val;
if((k == 11) && (j==17) && (i==2)) lmax = 11.5;
},Kokkos::Experimental::Max<Scalar>(t_max));
if(t_max>thread_max) thread_max = t_max;
},Kokkos::Experimental::Max<Scalar>(team_max));
}
if(team_max>lmax) lmax = team_max;
},Kokkos::Experimental::Max<Scalar>(max));
}
double time = timer.seconds();
printf("%e %e %e\n",time,1.0*N*R*num_trials*sizeof(Scalar)/time/1024/1024/1024,max);
}
TEST_F( default_exec, custom_reduction ) {
int N = 100000;
int R = 1000;
int num_trials = 1;
if(command_line_num_args()>1)
N = atoi(command_line_arg(1));
if(command_line_num_args()>2)
R = atoi(command_line_arg(2));
if(command_line_num_args()>3)
num_trials = atoi(command_line_arg(3));
custom_reduction_test<double>(N,R,num_trials);
}
}
#endif

View File

@ -0,0 +1,25 @@
#!/bin/bash -e
NT=$1
PROG="./KokkosCore_PerformanceTest_Mempool"
COMMON_ARGS="--kokkos-threads=$NT --fill_stride=1 --fill_level=70 --chunk_span=5 --repeat_inner=100"
postproc() {
cat log | head -n 1 | rev | cut -d ' ' -f 1 | rev >> xvals
cat log | tail -n 1 | rev | cut -d ' ' -f 1 | rev >> yvals
}
for yset in 1 2 3
do
rm -f xvals yvals
for x in 1 2 4 8 16 32
do
echo "yset $yset x factor $x"
$PROG $COMMON_ARGS --alloc_size=`expr $x \* 1000000` --super_size=`expr $x \* 100000` > log
postproc
done
rm -f yvals$yset
mv yvals yvals$yset
done
rm -f datapoints
paste -d',' xvals yvals1 yvals2 yvals3 > datapoints

View File

@ -0,0 +1,21 @@
#!/bin/bash -e
NT=$1
PROG="./KokkosCore_PerformanceTest_Mempool"
COMMON_ARGS="--kokkos-threads=$NT --fill_stride=1 --alloc_size=10027008 --super_size=65536 --repeat_inner=100 --chunk_span=4 --repeat_outer=10"
postproc() {
cat log | grep "fill ops per second" | rev | cut -d ' ' -f 2 | rev >> yvals_fill
cat log | grep "cycle ops per second" | rev | cut -d ' ' -f 2 | rev >> yvals_cycle
}
rm -f xvals yvals_fill yvals_cycle
for x in 75 95
do
echo "test fill level $x"
echo $x >> xvals
$PROG $COMMON_ARGS --fill_level=$x 2>&1 | tee log
postproc
done
rm -f datapoints
paste xvals yvals_fill yvals_cycle > datapoints.txt

View File

@ -0,0 +1,21 @@
#!/bin/bash -e
NT=$1
PROG="./KokkosCore_PerformanceTest_TaskDAG"
COMMON_ARGS="--kokkos-threads=$NT --alloc_size=10027008 --super_size=65536 --repeat_outer=10"
postproc() {
cat log | grep "tasks per second" | rev | cut -d ' ' -f 2 | rev >> yvals
}
rm -f xvals yvals
for x in 21 23
do
echo "test input $x"
echo $x >> xvals
$PROG $COMMON_ARGS --input=$x 2>&1 | tee log
postproc
done
rm -f datapoints.txt
paste xvals yvals > datapoints.txt

View File

@ -0,0 +1,357 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#include <cstdio>
#include <cstring>
#include <cstdlib>
#include <limits>
#include <Kokkos_Core.hpp>
#include <impl/Kokkos_Timer.hpp>
using ExecSpace = Kokkos::DefaultExecutionSpace ;
using MemorySpace = Kokkos::DefaultExecutionSpace::memory_space ;
using MemoryPool = Kokkos::MemoryPool< ExecSpace > ;
struct TestFunctor {
typedef Kokkos::View< uintptr_t * , ExecSpace > ptrs_type ;
enum : unsigned { chunk = 32 };
MemoryPool pool ;
ptrs_type ptrs ;
unsigned chunk_span ;
unsigned fill_stride ;
unsigned range_iter ;
unsigned repeat_inner ;
TestFunctor( size_t total_alloc_size
, unsigned min_superblock_size
, unsigned number_alloc
, unsigned arg_stride_alloc
, unsigned arg_chunk_span
, unsigned arg_repeat )
: pool()
, ptrs()
, chunk_span(0)
, fill_stride(0)
, repeat_inner(0)
{
MemorySpace m ;
const unsigned min_block_size = chunk ;
const unsigned max_block_size = chunk * arg_chunk_span ;
pool = MemoryPool( m , total_alloc_size
, min_block_size
, max_block_size
, min_superblock_size );
ptrs = ptrs_type( Kokkos::view_alloc( m , "ptrs") , number_alloc );
fill_stride = arg_stride_alloc ;
chunk_span = arg_chunk_span ;
range_iter = fill_stride * number_alloc ;
repeat_inner = arg_repeat ;
}
//----------------------------------------
typedef long value_type ;
//----------------------------------------
struct TagFill {};
KOKKOS_INLINE_FUNCTION
void operator()( TagFill , int i , value_type & update ) const noexcept
{
if ( 0 == i % fill_stride ) {
const int j = i / fill_stride ;
const unsigned size_alloc = chunk * ( 1 + ( j % chunk_span ) );
ptrs(j) = (uintptr_t) pool.allocate(size_alloc);
if ( ptrs(j) ) ++update ;
}
}
bool test_fill()
{
typedef Kokkos::RangePolicy< ExecSpace , TagFill > policy ;
long result = 0 ;
Kokkos::parallel_reduce( policy(0,range_iter), *this , result );
if ( result == long(ptrs.extent(0)) ) return true;
pool.print_state( std::cerr );
return false;
}
//----------------------------------------
struct TagDel {};
KOKKOS_INLINE_FUNCTION
void operator()( TagDel , int i ) const noexcept
{
if ( 0 == i % fill_stride ) {
const int j = i / fill_stride ;
const unsigned size_alloc = chunk * ( 1 + ( j % chunk_span ) );
pool.deallocate( (void*) ptrs(j) , size_alloc );
}
}
void test_del()
{
typedef Kokkos::RangePolicy< ExecSpace , TagDel > policy ;
Kokkos::parallel_for( policy(0,range_iter), *this );
}
//----------------------------------------
struct TagAllocDealloc {};
KOKKOS_INLINE_FUNCTION
void operator()( TagAllocDealloc , int i , long & update ) const noexcept
{
if ( 0 == i % fill_stride ) {
const int j = i / fill_stride ;
if ( 0 == j % 3 ) {
for ( unsigned k = 0 ; k < repeat_inner ; ++k ) {
const unsigned size_alloc = chunk * ( 1 + ( j % chunk_span ) );
pool.deallocate( (void*) ptrs(j) , size_alloc );
ptrs(j) = (uintptr_t) pool.allocate(size_alloc);
if ( 0 == ptrs(j) ) update++ ;
}
}
}
}
bool test_alloc_dealloc()
{
typedef Kokkos::RangePolicy< ExecSpace , TagAllocDealloc > policy ;
long error_count = 0 ;
Kokkos::parallel_reduce( policy(0,range_iter), *this , error_count );
return 0 == error_count ;
}
};
int main( int argc , char* argv[] )
{
static const char help_flag[] = "--help" ;
static const char alloc_size_flag[] = "--alloc_size=" ;
static const char super_size_flag[] = "--super_size=" ;
static const char chunk_span_flag[] = "--chunk_span=" ;
static const char fill_stride_flag[] = "--fill_stride=" ;
static const char fill_level_flag[] = "--fill_level=" ;
static const char repeat_outer_flag[] = "--repeat_outer=" ;
static const char repeat_inner_flag[] = "--repeat_inner=" ;
long total_alloc_size = 1000000 ;
int min_superblock_size = 10000 ;
int chunk_span = 5 ;
int fill_stride = 1 ;
int fill_level = 70 ;
int repeat_outer = 1 ;
int repeat_inner = 1 ;
int ask_help = 0 ;
for(int i=1;i<argc;i++)
{
const char * const a = argv[i];
if ( ! strncmp(a,help_flag,strlen(help_flag) ) ) ask_help = 1 ;
if ( ! strncmp(a,alloc_size_flag,strlen(alloc_size_flag) ) )
total_alloc_size = atol( a + strlen(alloc_size_flag) );
if ( ! strncmp(a,super_size_flag,strlen(super_size_flag) ) )
min_superblock_size = atoi( a + strlen(super_size_flag) );
if ( ! strncmp(a,fill_stride_flag,strlen(fill_stride_flag) ) )
fill_stride = atoi( a + strlen(fill_stride_flag) );
if ( ! strncmp(a,fill_level_flag,strlen(fill_level_flag) ) )
fill_level = atoi( a + strlen(fill_level_flag) );
if ( ! strncmp(a,chunk_span_flag,strlen(chunk_span_flag) ) )
chunk_span = atoi( a + strlen(chunk_span_flag) );
if ( ! strncmp(a,repeat_outer_flag,strlen(repeat_outer_flag) ) )
repeat_outer = atoi( a + strlen(repeat_outer_flag) );
if ( ! strncmp(a,repeat_inner_flag,strlen(repeat_inner_flag) ) )
repeat_inner = atoi( a + strlen(repeat_inner_flag) );
}
int chunk_span_bytes = 0;
for (int i = 0; i < chunk_span; ++i) {
auto chunk_bytes = TestFunctor::chunk * ( 1 + i );
if (chunk_bytes < 64) chunk_bytes = 64;
auto block_bytes_lg2 = Kokkos::Impl::integral_power_of_two_that_contains( chunk_bytes );
auto block_bytes = (1 << block_bytes_lg2);
chunk_span_bytes += block_bytes;
}
auto actual_superblock_bytes_lg2 = Kokkos::Impl::integral_power_of_two_that_contains( min_superblock_size );
auto actual_superblock_bytes = (1 << actual_superblock_bytes_lg2);
auto superblock_mask = actual_superblock_bytes - 1;
auto nsuperblocks = (total_alloc_size + superblock_mask) >> actual_superblock_bytes_lg2;
auto actual_total_bytes = nsuperblocks * actual_superblock_bytes;
auto bytes_wanted = (actual_total_bytes * fill_level) / 100;
auto chunk_spans = bytes_wanted / chunk_span_bytes;
auto number_alloc = int( chunk_spans * chunk_span );
if ( ask_help ) {
std::cout << "command line options:"
<< " " << help_flag
<< " " << alloc_size_flag << "##"
<< " " << super_size_flag << "##"
<< " " << fill_stride_flag << "##"
<< " " << fill_level_flag << "##"
<< " " << chunk_span_flag << "##"
<< " " << repeat_outer_flag << "##"
<< " " << repeat_inner_flag << "##"
<< std::endl ;
return 0;
}
Kokkos::initialize(argc,argv);
double sum_fill_time = 0;
double sum_cycle_time = 0;
double sum_both_time = 0;
double min_fill_time = std::numeric_limits<double>::max();
double min_cycle_time = std::numeric_limits<double>::max();
double min_both_time = std::numeric_limits<double>::max();
//one alloc in fill, alloc/dealloc pair in repeat_inner
for ( int i = 0 ; i < repeat_outer ; ++i ) {
TestFunctor functor( total_alloc_size
, min_superblock_size
, number_alloc
, fill_stride
, chunk_span
, repeat_inner );
Kokkos::Impl::Timer timer ;
if ( ! functor.test_fill() ) {
Kokkos::abort("fill ");
}
auto t0 = timer.seconds();
if ( ! functor.test_alloc_dealloc() ) {
Kokkos::abort("alloc/dealloc ");
}
auto t1 = timer.seconds();
auto this_fill_time = t0;
auto this_cycle_time = t1 - t0;
auto this_both_time = t1;
sum_fill_time += this_fill_time;
sum_cycle_time += this_cycle_time;
sum_both_time += this_both_time;
min_fill_time = std::min(min_fill_time, this_fill_time);
min_cycle_time = std::min(min_cycle_time, this_cycle_time);
min_both_time = std::min(min_both_time, this_both_time);
}
Kokkos::finalize();
printf( "\"mempool: alloc super stride level span inner outer number\" %ld %d %d %d %d %d %d %d\n"
, total_alloc_size
, min_superblock_size
, fill_stride
, fill_level
, chunk_span
, repeat_inner
, repeat_outer
, number_alloc );
auto avg_fill_time = sum_fill_time / repeat_outer;
auto avg_cycle_time = sum_cycle_time / repeat_outer;
auto avg_both_time = sum_both_time / repeat_outer;
printf( "\"mempool: fill time (min, avg)\" %.8f %.8f\n"
, min_fill_time
, avg_fill_time );
printf( "\"mempool: cycle time (min, avg)\" %.8f %.8f\n"
, min_cycle_time
, avg_cycle_time );
printf( "\"mempool: test time (min, avg)\" %.8f %.8f\n"
, min_both_time
, avg_both_time );
printf( "\"mempool: fill ops per second (max, avg)\" %g %g\n"
, number_alloc / min_fill_time
, number_alloc / avg_fill_time );
printf( "\"mempool: cycle ops per second (max, avg)\" %g %g\n"
, (2 * number_alloc * repeat_inner) / min_cycle_time
, (2 * number_alloc * repeat_inner) / avg_cycle_time );
}

View File

@ -0,0 +1,284 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#include <Kokkos_Core.hpp>
#if ! defined( KOKKOS_ENABLE_TASKDAG ) || \
defined( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_THREADS )
int main()
{
return 0 ;
}
#else
#include <cstdio>
#include <cstring>
#include <cstdlib>
#include <limits>
#include <impl/Kokkos_Timer.hpp>
using ExecSpace = Kokkos::DefaultExecutionSpace ;
inline
long eval_fib( long n )
{
constexpr long mask = 0x03;
long fib[4] = { 0, 1, 0, 0 };
for ( long i = 2; i <= n; ++i ) {
fib[ i & mask ] = fib[ ( i - 1 ) & mask ] + fib[ ( i - 2 ) & mask ];
}
return fib[ n & mask ];
}
inline
long fib_alloc_count( long n )
{
constexpr long mask = 0x03;
long count[4] = { 1, 1, 0, 0 };
for ( long i = 2; i <= n; ++i ) {
count[ i & mask ] = 2 // this task plus the 'when_all' task
+ count[ ( i - 1 ) & mask ]
+ count[ ( i - 2 ) & mask ];
}
return count[ n & mask ];
}
template< class Space >
struct TestFib {
using Scheduler = Kokkos::TaskScheduler< Space > ;
using MemorySpace = typename Scheduler::memory_space ;
using MemberType = typename Scheduler::member_type ;
using FutureType = Kokkos::Future< long , Space > ;
typedef long value_type ;
Scheduler sched ;
FutureType dep[2] ;
const value_type n ;
KOKKOS_INLINE_FUNCTION
TestFib( const Scheduler & arg_sched , const value_type arg_n )
: sched( arg_sched ), dep{} , n( arg_n ) {}
KOKKOS_INLINE_FUNCTION
void operator()( const MemberType & , value_type & result ) noexcept
{
if ( n < 2 ) {
result = n ;
}
else if ( ! dep[0].is_null() && ! dep[1].is_null() ) {
result = dep[0].get() + dep[1].get();
}
else {
// Spawn new children and respawn myself to sum their results.
// Spawn lower value at higher priority as it has a shorter
// path to completion.
dep[1] = Kokkos::task_spawn
( Kokkos::TaskSingle( sched, Kokkos::TaskPriority::High )
, TestFib( sched, n - 2 ) );
dep[0] = Kokkos::task_spawn
( Kokkos::TaskSingle( sched )
, TestFib( sched, n - 1 ) );
Kokkos::Future< ExecSpace > fib_all = Kokkos::when_all( dep, 2 );
if ( ! dep[0].is_null() && ! dep[1].is_null() && ! fib_all.is_null() ) {
// High priority to retire this branch.
Kokkos::respawn( this, fib_all, Kokkos::TaskPriority::High );
}
else {
Kokkos::abort("Failed nested task spawn (allocation)");
}
}
}
};
int main( int argc , char* argv[] )
{
static const char help[] = "--help" ;
static const char alloc_size[] = "--alloc_size=" ;
static const char super_size[] = "--super_size=" ;
static const char repeat_outer[] = "--repeat_outer=" ;
static const char input_value[] = "--input=" ;
long total_alloc_size = 1000000 ;
int min_superblock_size = 10000 ;
int test_repeat_outer = 1 ;
int fib_input = 4 ;
int ask_help = 0 ;
for(int i=1;i<argc;i++)
{
const char * const a = argv[i];
if ( ! strncmp(a,help,strlen(help) ) ) ask_help = 1 ;
if ( ! strncmp(a,alloc_size,strlen(alloc_size) ) )
total_alloc_size = atol( a + strlen(alloc_size) );
if ( ! strncmp(a,super_size,strlen(super_size) ) )
min_superblock_size = atoi( a + strlen(super_size) );
if ( ! strncmp(a,repeat_outer,strlen(repeat_outer) ) )
test_repeat_outer = atoi( a + strlen(repeat_outer) );
if ( ! strncmp(a,input_value,strlen(input_value) ) )
fib_input = atoi( a + strlen(input_value) );
}
const long fib_output = eval_fib( fib_input );
const long number_alloc = fib_alloc_count( fib_input );
const unsigned min_block_size = 32 ;
const unsigned max_block_size = 128 ;
long task_count_max = 0 ;
long task_count_accum = 0 ;
long test_result = 0 ;
if ( ask_help ) {
std::cout << "command line options:"
<< " " << help
<< " " << alloc_size << "##"
<< " " << super_size << "##"
<< " " << input_value << "##"
<< " " << repeat_outer << "##"
<< std::endl ;
return -1;
}
typedef TestFib< ExecSpace > Functor ;
Kokkos::initialize(argc,argv);
Functor::Scheduler sched( Functor::MemorySpace()
, total_alloc_size
, min_block_size
, max_block_size
, min_superblock_size
);
Functor::FutureType f =
Kokkos::host_spawn( Kokkos::TaskSingle( sched )
, Functor( sched , fib_input )
);
Kokkos::wait( sched );
test_result = f.get();
task_count_max = sched.allocated_task_count_max();
task_count_accum = sched.allocated_task_count_accum();
if ( number_alloc != task_count_accum ) {
std::cout << " number_alloc( " << number_alloc << " )"
<< " != task_count_accum( " << task_count_accum << " )"
<< std::endl ;
}
if ( fib_output != test_result ) {
std::cout << " answer( " << fib_output << " )"
<< " != result( " << test_result << " )"
<< std::endl ;
}
if ( fib_output != test_result || number_alloc != task_count_accum ) {
printf(" TEST FAILED\n");
return -1;
}
double min_time = std::numeric_limits<double>::max();
double time_sum = 0;
for ( int i = 0 ; i < test_repeat_outer ; ++i ) {
Kokkos::Impl::Timer timer ;
Functor::FutureType ftmp =
Kokkos::host_spawn( Kokkos::TaskSingle( sched )
, Functor( sched , fib_input )
);
Kokkos::wait( sched );
auto this_time = timer.seconds();
min_time = std::min(min_time, this_time);
time_sum += this_time;
}
auto avg_time = time_sum / test_repeat_outer;
Kokkos::finalize();
printf( "\"taskdag: alloc super repeat input output task-accum task-max\" %ld %d %d %d %ld %ld %ld\n"
, total_alloc_size
, min_superblock_size
, test_repeat_outer
, fib_input
, fib_output
, task_count_accum
, task_count_max );
printf( "\"taskdag: time (min, avg)\" %g %g\n", min_time, avg_time);
printf( "\"taskdag: tasks per second (max, avg)\" %g %g\n"
, number_alloc / min_time
, number_alloc / avg_time );
return 0 ;
}
#endif

View File

@ -44,14 +44,12 @@
#ifndef KOKKOS_CUDA_EXP_ITERATE_TILE_HPP
#define KOKKOS_CUDA_EXP_ITERATE_TILE_HPP
#include <Kokkos_Macros.hpp>
#if defined( __CUDACC__ ) && defined( KOKKOS_HAVE_CUDA )
#include <iostream>
#include <algorithm>
#include <stdio.h>
#include <Kokkos_Macros.hpp>
/* only compile this file if CUDA is enabled for Kokkos */
#if defined( __CUDACC__ ) && defined( KOKKOS_HAVE_CUDA )
#include <cstdio>
#include <utility>
@ -1298,3 +1296,4 @@ protected:
#endif
#endif

View File

@ -1,13 +1,13 @@
/*
//@HEADER
// ************************************************************************
//
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
@ -36,7 +36,7 @@
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
//
// ************************************************************************
//@HEADER
*/
@ -45,11 +45,10 @@
#define KOKKOS_CUDAEXEC_HPP
#include <Kokkos_Macros.hpp>
/* only compile this file if CUDA is enabled for Kokkos */
#ifdef KOKKOS_ENABLE_CUDA
#include <string>
#include <cstdint>
#include <Kokkos_Parallel.hpp>
#include <impl/Kokkos_Error.hpp>
#include <Cuda/Kokkos_Cuda_abort.hpp>
@ -99,6 +98,8 @@ CudaSpace::size_type cuda_internal_maximum_warp_count();
CudaSpace::size_type cuda_internal_maximum_grid_count();
CudaSpace::size_type cuda_internal_maximum_shared_words();
CudaSpace::size_type cuda_internal_maximum_concurrent_block_count();
CudaSpace::size_type * cuda_internal_scratch_flags( const CudaSpace::size_type size );
CudaSpace::size_type * cuda_internal_scratch_space( const CudaSpace::size_type size );
CudaSpace::size_type * cuda_internal_scratch_unified( const CudaSpace::size_type size );
@ -146,7 +147,7 @@ Kokkos::Impl::CudaLockArraysStruct kokkos_impl_cuda_lock_arrays ;
namespace Kokkos {
namespace Impl {
void* cuda_resize_scratch_space(size_t bytes, bool force_shrink = false);
void* cuda_resize_scratch_space(std::int64_t bytes, bool force_shrink = false);
}
}
@ -319,3 +320,4 @@ struct CudaParallelLaunch< DriverType , false > {
#endif /* defined( __CUDACC__ ) */
#endif /* defined( KOKKOS_ENABLE_CUDA ) */
#endif /* #ifndef KOKKOS_CUDAEXEC_HPP */

View File

@ -41,16 +41,15 @@
//@HEADER
*/
#include <stdlib.h>
#include <Kokkos_Macros.hpp>
#ifdef KOKKOS_ENABLE_CUDA
#include <cstdlib>
#include <iostream>
#include <sstream>
#include <stdexcept>
#include <algorithm>
#include <atomic>
#include <Kokkos_Macros.hpp>
/* only compile this file if CUDA is enabled for Kokkos */
#ifdef KOKKOS_ENABLE_CUDA
#include <Kokkos_Core.hpp>
#include <Kokkos_Cuda.hpp>
@ -890,9 +889,9 @@ void init_lock_arrays_cuda_space() {
}
}
void* cuda_resize_scratch_space(size_t bytes, bool force_shrink) {
void* cuda_resize_scratch_space(std::int64_t bytes, bool force_shrink) {
static void* ptr = NULL;
static size_t current_size = 0;
static std::int64_t current_size = 0;
if(current_size == 0) {
current_size = bytes;
ptr = Kokkos::kokkos_malloc<Kokkos::CudaSpace>("CudaSpace::ScratchMemory",current_size);
@ -911,5 +910,7 @@ void* cuda_resize_scratch_space(size_t bytes, bool force_shrink) {
}
}
#else
void KOKKOS_CORE_SRC_CUDA_CUDASPACE_PREVENT_LINK_ERROR() {}
#endif // KOKKOS_ENABLE_CUDA

View File

@ -1,13 +1,13 @@
/*
//@HEADER
// ************************************************************************
//
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
@ -36,7 +36,7 @@
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
//
// ************************************************************************
//@HEADER
*/
@ -45,8 +45,6 @@
#define KOKKOS_CUDA_ALLOCATION_TRACKING_HPP
#include <Kokkos_Macros.hpp>
/* only compile this file if CUDA is enabled for Kokkos */
#ifdef KOKKOS_ENABLE_CUDA
#include <impl/Kokkos_Traits.hpp>
@ -75,7 +73,7 @@ shared_allocation_record( Kokkos::CudaSpace const & arg_space
new( functor ) DestructFunctor( arg_destruct );
record->m_destruct_functor = & shared_allocation_destroy< DestructFunctor > ;
return record ;
}

View File

@ -1,13 +1,13 @@
/*
//@HEADER
// ************************************************************************
//
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
@ -36,7 +36,7 @@
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
//
// ************************************************************************
//@HEADER
*/
@ -45,8 +45,6 @@
#define KOKKOS_CUDA_ERROR_HPP
#include <Kokkos_Macros.hpp>
/* only compile this file if CUDA is enabled for Kokkos */
#ifdef KOKKOS_ENABLE_CUDA
namespace Kokkos { namespace Impl {
@ -67,3 +65,4 @@ inline void cuda_internal_safe_call( cudaError e , const char * name, const char
#endif //KOKKOS_ENABLE_CUDA
#endif //KOKKOS_CUDA_ERROR_HPP

View File

@ -44,11 +44,11 @@
/*--------------------------------------------------------------------------*/
/* Kokkos interfaces */
#include <Kokkos_Core.hpp>
/* only compile this file if CUDA is enabled for Kokkos */
#include <Kokkos_Macros.hpp>
#ifdef KOKKOS_ENABLE_CUDA
#include <Kokkos_Core.hpp>
#include <Cuda/Kokkos_Cuda_Error.hpp>
#include <Cuda/Kokkos_Cuda_Internal.hpp>
#include <impl/Kokkos_Error.hpp>
@ -56,7 +56,7 @@
/*--------------------------------------------------------------------------*/
/* Standard 'C' libraries */
#include <stdlib.h>
#include <cstdlib>
/* Standard 'C++' libraries */
#include <vector>
@ -404,9 +404,23 @@ void CudaInternal::initialize( int cuda_device_id , int stream_count )
// Query what compute capability architecture a kernel executes:
m_cudaArch = cuda_kernel_arch();
if ( m_cudaArch != cudaProp.major * 100 + cudaProp.minor * 10 ) {
int compiled_major = m_cudaArch / 100;
int compiled_minor = ( m_cudaArch % 100 ) / 10;
if ( compiled_major < 5 && cudaProp.major >= 5 ) {
std::stringstream ss;
ss << "Kokkos::Cuda::initialize ERROR: running kernels compiled for compute capability "
<< compiled_major << "." << compiled_minor
<< " (< 5.0) on device with compute capability "
<< cudaProp.major << "." << cudaProp.minor
<< " (>=5.0), this would give incorrect results!"
<< std::endl ;
std::string msg = ss.str();
Kokkos::abort( msg.c_str() );
}
if ( compiled_major != cudaProp.major || compiled_minor != cudaProp.minor ) {
std::cerr << "Kokkos::Cuda::initialize WARNING: running kernels compiled for compute capability "
<< ( m_cudaArch / 100 ) << "." << ( ( m_cudaArch % 100 ) / 10 )
<< compiled_major << "." << compiled_minor
<< " on device with compute capability "
<< cudaProp.major << "." << cudaProp.minor
<< " , this will likely reduce potential performance."
@ -661,6 +675,15 @@ void CudaInternal::finalize()
Cuda::size_type cuda_internal_multiprocessor_count()
{ return CudaInternal::singleton().m_multiProcCount ; }
CudaSpace::size_type cuda_internal_maximum_concurrent_block_count()
{
// Compute capability 5.0 through 6.2
enum : int { max_resident_blocks_per_multiprocessor = 32 };
return CudaInternal::singleton().m_multiProcCount
* max_resident_blocks_per_multiprocessor ;
};
Cuda::size_type cuda_internal_maximum_warp_count()
{ return CudaInternal::singleton().m_maxWarpCount ; }
@ -772,8 +795,10 @@ void Cuda::fence()
Kokkos::Impl::cuda_device_synchronize();
}
const char* Cuda::name() { return "Cuda"; }
} // namespace Kokkos
#else
void KOKKOS_CORE_SRC_CUDA_IMPL_PREVENT_LINK_ERROR() {}
#endif // KOKKOS_ENABLE_CUDA
//----------------------------------------------------------------------------

View File

@ -1,13 +1,13 @@
/*
//@HEADER
// ************************************************************************
//
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
@ -36,19 +36,18 @@
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
//
// ************************************************************************
//@HEADER
*/
#ifndef KOKKOS_CUDA_INTERNAL_HPP
#define KOKKOS_CUDA_INTERNAL_HPP
#include<iostream>
#include <Kokkos_Macros.hpp>
/* only compile this file if CUDA is enabled for Kokkos */
#include <Kokkos_Macros.hpp>
#ifdef KOKKOS_ENABLE_CUDA
#include<iostream>
#include <Cuda/Kokkos_Cuda_Error.hpp>
namespace Kokkos { namespace Impl {

File diff suppressed because it is too large Load Diff

View File

@ -1,13 +1,13 @@
/*
//@HEADER
// ************************************************************************
//
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
@ -36,7 +36,7 @@
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
//
// ************************************************************************
//@HEADER
*/
@ -45,8 +45,6 @@
#define KOKKOS_CUDA_REDUCESCAN_HPP
#include <Kokkos_Macros.hpp>
/* only compile this file if CUDA is enabled for Kokkos */
#if defined( __CUDACC__ ) && defined( KOKKOS_ENABLE_CUDA )
#include <utility>
@ -63,6 +61,7 @@ namespace Kokkos {
namespace Impl {
//----------------------------------------------------------------------------
// Shuffle operations require input to be a register (stack) variable
template< typename T >
__device__ inline
@ -140,77 +139,6 @@ void cuda_shfl_up( T & out , T const & in , int delta ,
}
}
//----------------------------------------------------------------------------
/** \brief Reduce within a warp over blockDim.x, the "vector" dimension.
*
* This will be called within a nested, intra-team parallel operation.
* Use shuffle operations to avoid conflicts with shared memory usage.
*
* Requires:
* blockDim.x is power of 2
* blockDim.x <= 32 (one warp)
*
* Cannot use "butterfly" pattern because floating point
* addition is non-associative. Therefore, must broadcast
* the final result.
*/
template< class Reducer >
__device__ inline
void cuda_intra_warp_vector_reduce( Reducer const & reducer )
{
static_assert(
std::is_reference< typename Reducer::reference_type >::value , "" );
if ( 1 < blockDim.x ) {
typename Reducer::value_type tmp ;
for ( int i = blockDim.x ; ( i >>= 1 ) ; ) {
cuda_shfl_down( tmp , reducer.reference() , i , blockDim.x );
if ( threadIdx.x < i ) { reducer.join( reducer.data() , & tmp ); }
}
// Broadcast from root "lane" to all other "lanes"
cuda_shfl( reducer.reference() , reducer.reference() , 0 , blockDim.x );
}
}
/** \brief Inclusive scan over blockDim.x, the "vector" dimension.
*
* This will be called within a nested, intra-team parallel operation.
* Use shuffle operations to avoid conflicts with shared memory usage.
*
* Algorithm is concurrent bottom-up reductions in triangular pattern
* where each CUDA thread is the root of a reduction tree from the
* zeroth CUDA thread to itself.
*
* Requires:
* blockDim.x is power of 2
* blockDim.x <= 32 (one warp)
*/
template< typename ValueType >
__device__ inline
void cuda_intra_warp_vector_inclusive_scan( ValueType & local )
{
ValueType tmp ;
// Bottom up:
// [t] += [t-1] if t >= 1
// [t] += [t-2] if t >= 2
// [t] += [t-4] if t >= 4
// ...
for ( int i = 1 ; i < blockDim.x ; i <<= 1 ) {
cuda_shfl_up( tmp , local , i , blockDim.x );
if ( i <= threadIdx.x ) { local += tmp ; }
}
}
//----------------------------------------------------------------------------
/*
* Algorithmic constraints:
@ -247,12 +175,12 @@ inline void cuda_inter_warp_reduction( ValueType& value,
#define STEP_WIDTH 4
// Depending on the ValueType _shared__ memory must be aligned up to 8byte boundaries
// The reason not to use ValueType directly is that for types with constructors it
// The reason not to use ValueType directly is that for types with constructors it
// could lead to race conditions
__shared__ double sh_result[(sizeof(ValueType)+7)/8*STEP_WIDTH];
ValueType* result = (ValueType*) & sh_result;
const unsigned step = 32 / blockDim.x;
unsigned shift = STEP_WIDTH;
const int step = 32 / blockDim.x;
int shift = STEP_WIDTH;
const int id = threadIdx.y%step==0?threadIdx.y/step:65000;
if(id < STEP_WIDTH ) {
result[id] = value;
@ -297,7 +225,7 @@ bool cuda_inter_block_reduction( typename FunctorValueTraits< FunctorType , ArgT
//Do the intra-block reduction with shfl operations and static shared memory
cuda_intra_block_reduction(value,join,max_active_thread);
const unsigned id = threadIdx.y*blockDim.x + threadIdx.x;
const int id = threadIdx.y*blockDim.x + threadIdx.x;
//One thread in the block writes block result to global scratch_memory
if(id == 0 ) {
@ -329,35 +257,35 @@ bool cuda_inter_block_reduction( typename FunctorValueTraits< FunctorType , ArgT
//Reduce all global values with splitting work over threads in one warp
const int step_size = blockDim.x*blockDim.y < 32 ? blockDim.x*blockDim.y : 32;
for(int i=id; i<gridDim.x; i+=step_size) {
for(int i=id; i<(int)gridDim.x; i+=step_size) {
value_type tmp = global[i];
join(value, tmp);
}
//Perform shfl reductions within the warp only join if contribution is valid (allows gridDim.x non power of two and <32)
if (blockDim.x*blockDim.y > 1) {
if (int(blockDim.x*blockDim.y) > 1) {
value_type tmp = Kokkos::shfl_down(value, 1,32);
if( id + 1 < gridDim.x )
if( id + 1 < int(gridDim.x) )
join(value, tmp);
}
if (blockDim.x*blockDim.y > 2) {
if (int(blockDim.x*blockDim.y) > 2) {
value_type tmp = Kokkos::shfl_down(value, 2,32);
if( id + 2 < gridDim.x )
if( id + 2 < int(gridDim.x) )
join(value, tmp);
}
if (blockDim.x*blockDim.y > 4) {
if (int(blockDim.x*blockDim.y) > 4) {
value_type tmp = Kokkos::shfl_down(value, 4,32);
if( id + 4 < gridDim.x )
if( id + 4 < int(gridDim.x) )
join(value, tmp);
}
if (blockDim.x*blockDim.y > 8) {
if (int(blockDim.x*blockDim.y) > 8) {
value_type tmp = Kokkos::shfl_down(value, 8,32);
if( id + 8 < gridDim.x )
if( id + 8 < int(gridDim.x) )
join(value, tmp);
}
if (blockDim.x*blockDim.y > 16) {
if (int(blockDim.x*blockDim.y) > 16) {
value_type tmp = Kokkos::shfl_down(value, 16,32);
if( id + 16 < gridDim.x )
if( id + 16 < int(gridDim.x) )
join(value, tmp);
}
}
@ -370,6 +298,166 @@ bool cuda_inter_block_reduction( typename FunctorValueTraits< FunctorType , ArgT
#endif
}
template< class ReducerType >
__device__ inline
typename std::enable_if< Kokkos::is_reducer<ReducerType>::value >::type
cuda_intra_warp_reduction( const ReducerType& reducer,
const int max_active_thread = blockDim.y) {
typedef typename ReducerType::value_type ValueType;
unsigned int shift = 1;
ValueType result = reducer.reference();
//Reduce over values from threads with different threadIdx.y
while(blockDim.x * shift < 32 ) {
const ValueType tmp = shfl_down(result, blockDim.x*shift,32u);
//Only join if upper thread is active (this allows non power of two for blockDim.y
if(threadIdx.y + shift < max_active_thread)
reducer.join(result , tmp);
shift*=2;
}
result = shfl(result,0,32);
reducer.reference() = result;
}
template< class ReducerType >
__device__ inline
typename std::enable_if< Kokkos::is_reducer<ReducerType>::value >::type
cuda_inter_warp_reduction( const ReducerType& reducer,
const int max_active_thread = blockDim.y) {
typedef typename ReducerType::value_type ValueType;
#define STEP_WIDTH 4
// Depending on the ValueType _shared__ memory must be aligned up to 8byte boundaries
// The reason not to use ValueType directly is that for types with constructors it
// could lead to race conditions
__shared__ double sh_result[(sizeof(ValueType)+7)/8*STEP_WIDTH];
ValueType* result = (ValueType*) & sh_result;
ValueType value = reducer.reference();
const int step = 32 / blockDim.x;
int shift = STEP_WIDTH;
const int id = threadIdx.y%step==0?threadIdx.y/step:65000;
if(id < STEP_WIDTH ) {
result[id] = value;
}
__syncthreads();
while (shift<=max_active_thread/step) {
if(shift<=id && shift+STEP_WIDTH>id && threadIdx.x==0) {
reducer.join(result[id%STEP_WIDTH],value);
}
__syncthreads();
shift+=STEP_WIDTH;
}
value = result[0];
for(int i = 1; (i*step<max_active_thread) && i<STEP_WIDTH; i++)
reducer.join(value,result[i]);
reducer.reference() = value;
}
template< class ReducerType >
__device__ inline
typename std::enable_if< Kokkos::is_reducer<ReducerType>::value >::type
cuda_intra_block_reduction( const ReducerType& reducer,
const int max_active_thread = blockDim.y) {
cuda_intra_warp_reduction(reducer,max_active_thread);
cuda_inter_warp_reduction(reducer,max_active_thread);
}
template< class ReducerType>
__device__ inline
typename std::enable_if< Kokkos::is_reducer<ReducerType>::value , bool >::type
cuda_inter_block_reduction( const ReducerType& reducer,
Cuda::size_type * const m_scratch_space,
Cuda::size_type * const m_scratch_flags,
const int max_active_thread = blockDim.y) {
#ifdef __CUDA_ARCH__
typedef typename ReducerType::value_type* pointer_type;
typedef typename ReducerType::value_type value_type;
//Do the intra-block reduction with shfl operations and static shared memory
cuda_intra_block_reduction(reducer,max_active_thread);
value_type value = reducer.reference();
const int id = threadIdx.y*blockDim.x + threadIdx.x;
//One thread in the block writes block result to global scratch_memory
if(id == 0 ) {
pointer_type global = ((pointer_type) m_scratch_space) + blockIdx.x;
*global = value;
}
//One warp of last block performs inter block reduction through loading the block values from global scratch_memory
bool last_block = false;
__syncthreads();
if ( id < 32 ) {
Cuda::size_type count;
//Figure out whether this is the last block
if(id == 0)
count = Kokkos::atomic_fetch_add(m_scratch_flags,1);
count = Kokkos::shfl(count,0,32);
//Last block does the inter block reduction
if( count == gridDim.x - 1) {
//set flag back to zero
if(id == 0)
*m_scratch_flags = 0;
last_block = true;
reducer.init(value);
pointer_type const volatile global = (pointer_type) m_scratch_space ;
//Reduce all global values with splitting work over threads in one warp
const int step_size = blockDim.x*blockDim.y < 32 ? blockDim.x*blockDim.y : 32;
for(int i=id; i<(int)gridDim.x; i+=step_size) {
value_type tmp = global[i];
reducer.join(value, tmp);
}
//Perform shfl reductions within the warp only join if contribution is valid (allows gridDim.x non power of two and <32)
if (int(blockDim.x*blockDim.y) > 1) {
value_type tmp = Kokkos::shfl_down(value, 1,32);
if( id + 1 < int(gridDim.x) )
reducer.join(value, tmp);
}
if (int(blockDim.x*blockDim.y) > 2) {
value_type tmp = Kokkos::shfl_down(value, 2,32);
if( id + 2 < int(gridDim.x) )
reducer.join(value, tmp);
}
if (int(blockDim.x*blockDim.y) > 4) {
value_type tmp = Kokkos::shfl_down(value, 4,32);
if( id + 4 < int(gridDim.x) )
reducer.join(value, tmp);
}
if (int(blockDim.x*blockDim.y) > 8) {
value_type tmp = Kokkos::shfl_down(value, 8,32);
if( id + 8 < int(gridDim.x) )
reducer.join(value, tmp);
}
if (int(blockDim.x*blockDim.y) > 16) {
value_type tmp = Kokkos::shfl_down(value, 16,32);
if( id + 16 < int(gridDim.x) )
reducer.join(value, tmp);
}
}
}
//The last block has in its thread=0 the global reduction value through "value"
return last_block;
#else
return true;
#endif
}
//----------------------------------------------------------------------------
// See section B.17 of Cuda C Programming Guide Version 3.2
// for discussion of
@ -529,11 +617,11 @@ bool cuda_single_inter_block_reduce_scan( const FunctorType & functor ,
size_type * const shared = shared_data + word_count.value * BlockSizeMask ;
size_type * const global = global_data + word_count.value * block_id ;
#if (__CUDA_ARCH__ < 500)
for ( size_type i = threadIdx.y ; i < word_count.value ; i += blockDim.y ) { global[i] = shared[i] ; }
#else
for ( size_type i = 0 ; i < word_count.value ; i += 1 ) { global[i] = shared[i] ; }
#endif
//#if (__CUDA_ARCH__ < 500)
for ( int i = int(threadIdx.y) ; i < int(word_count.value) ; i += int(blockDim.y) ) { global[i] = shared[i] ; }
//#else
// for ( size_type i = 0 ; i < word_count.value ; i += 1 ) { global[i] = shared[i] ; }
//#endif
}

View File

@ -1,13 +1,13 @@
/*
//@HEADER
// ************************************************************************
//
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
@ -36,15 +36,16 @@
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
//
// ************************************************************************
//@HEADER
*/
#include <Kokkos_Core.hpp>
#include <Kokkos_Macros.hpp>
#if defined( KOKKOS_ENABLE_CUDA ) && defined( KOKKOS_ENABLE_TASKDAG )
#include <Kokkos_Core.hpp>
#include <impl/Kokkos_TaskQueue_impl.hpp>
//----------------------------------------------------------------------------
@ -120,7 +121,7 @@ printf("TaskQueue<Cuda>::driver(%d,%d) task(%lx)\n",threadIdx.z,blockIdx.x
}
if ( 0 == warp_lane ) {
queue->complete( task.ptr );
queue->complete( task.ptr );
}
}
} while(1);
@ -157,7 +158,7 @@ printf("cuda_task_queue_execute before\n");
// If not large enough then set the stack size, in bytes:
//
// CUDA_SAFE_CALL( cudaDeviceSetLimit( cudaLimitStackSize , stack_size ) );
cuda_task_queue_execute<<< grid , block , shared , stream >>>( queue );
CUDA_SAFE_CALL( cudaGetLastError() );
@ -173,7 +174,7 @@ printf("cuda_task_queue_execute after\n");
}} /* namespace Kokkos::Impl */
//----------------------------------------------------------------------------
#else
void KOKKOS_CORE_SRC_CUDA_KOKKOS_CUDA_TASK_PREVENT_LINK_ERROR() {}
#endif /* #if defined( KOKKOS_ENABLE_CUDA ) && defined( KOKKOS_ENABLE_TASKDAG ) */

View File

@ -44,6 +44,7 @@
#ifndef KOKKOS_IMPL_CUDA_TASK_HPP
#define KOKKOS_IMPL_CUDA_TASK_HPP
#include <Kokkos_Macros.hpp>
#if defined( KOKKOS_ENABLE_TASKDAG )
//----------------------------------------------------------------------------
@ -325,7 +326,7 @@ ValueType shfl_warp_broadcast
return Kokkos::shfl(val, src_lane, width);
}
// all-reduce across corresponding vector lanes between team members within warp
/*// all-reduce across corresponding vector lanes between team members within warp
// assume vec_length*team_size == warp_size
// blockDim.x == vec_length == stride
// blockDim.y == team_size
@ -351,7 +352,7 @@ void parallel_reduce
loop_boundaries.thread.team_size(),
blockDim.x);
initialized_result = shfl_warp_broadcast<ValueType>( initialized_result, threadIdx.x, Impl::CudaTraits::WarpSize );
}
}*/
// all-reduce across corresponding vector lanes between team members within warp
// if no join() provided, use sum
@ -382,13 +383,36 @@ void parallel_reduce
initialized_result = shfl_warp_broadcast<ValueType>( initialized_result, threadIdx.x, Impl::CudaTraits::WarpSize );
}
template< typename iType, class Lambda, typename ReducerType >
KOKKOS_INLINE_FUNCTION
void parallel_reduce
(const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::Cuda > >& loop_boundaries,
const Lambda & lambda,
const ReducerType& reducer) {
typedef typename ReducerType::value_type ValueType;
//TODO what is the point of creating this temporary?
ValueType result = ValueType();
reducer.init(result);
for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
lambda(i,result);
}
strided_shfl_warp_reduction(
[&] (ValueType& val1, const ValueType& val2) { reducer.join(val1,val2); },
result,
loop_boundaries.thread.team_size(),
blockDim.x);
reducer.reference() = shfl_warp_broadcast<ValueType>( result, threadIdx.x, Impl::CudaTraits::WarpSize );
}
// all-reduce within team members within warp
// assume vec_length*team_size == warp_size
// blockDim.x == vec_length == stride
// blockDim.y == team_size
// threadIdx.x == position in vec
// threadIdx.y == member number
template< typename iType, class Lambda, typename ValueType, class JoinType >
/*template< typename iType, class Lambda, typename ValueType, class JoinType >
KOKKOS_INLINE_FUNCTION
void parallel_reduce
(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::Cuda > >& loop_boundaries,
@ -404,7 +428,7 @@ void parallel_reduce
multi_shfl_warp_reduction<ValueType, JoinType>(join, initialized_result, blockDim.x);
initialized_result = shfl_warp_broadcast<ValueType>( initialized_result, 0, blockDim.x );
}
}*/
// all-reduce within team members within warp
// if no join() provided, use sum
@ -436,6 +460,28 @@ void parallel_reduce
initialized_result = shfl_warp_broadcast<ValueType>( initialized_result, 0, blockDim.x );
}
template< typename iType, class Lambda, typename ReducerType >
KOKKOS_INLINE_FUNCTION
void parallel_reduce
(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::TaskExec< Kokkos::Cuda > >& loop_boundaries,
const Lambda & lambda,
const ReducerType& reducer) {
typedef typename ReducerType::value_type ValueType;
ValueType result = ValueType();
reducer.init(result);
for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
lambda(i,result);
}
multi_shfl_warp_reduction(
[&] (ValueType& val1, const ValueType& val2) { reducer.join(val1, val2); },
result,
blockDim.x);
reducer.reference() = shfl_warp_broadcast<ValueType>( result, 0, blockDim.x );
}
// scan across corresponding vector lanes between team members within warp
// assume vec_length*team_size == warp_size
// blockDim.x == vec_length == stride

View File

@ -0,0 +1,982 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#ifndef KOKKOS_CUDA_TEAM_HPP
#define KOKKOS_CUDA_TEAM_HPP
#include <iostream>
#include <algorithm>
#include <stdio.h>
#include <Kokkos_Macros.hpp>
/* only compile this file if CUDA is enabled for Kokkos */
#if defined( __CUDACC__ ) && defined( KOKKOS_ENABLE_CUDA )
#include <utility>
#include <Kokkos_Parallel.hpp>
#include <Cuda/Kokkos_CudaExec.hpp>
#include <Cuda/Kokkos_Cuda_ReduceScan.hpp>
#include <Cuda/Kokkos_Cuda_Internal.hpp>
#include <Kokkos_Vectorization.hpp>
#if defined(KOKKOS_ENABLE_PROFILING)
#include <impl/Kokkos_Profiling_Interface.hpp>
#include <typeinfo>
#endif
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Impl {
template< typename Type >
struct CudaJoinFunctor {
typedef Type value_type ;
KOKKOS_INLINE_FUNCTION
static void join( volatile value_type & update ,
volatile const value_type & input )
{ update += input ; }
};
/**\brief Team member_type passed to TeamPolicy or TeamTask closures.
*
* Cuda thread blocks for team closures are dimensioned as:
* blockDim.x == number of "vector lanes" per "thread"
* blockDim.y == number of "threads" per team
* blockDim.z == number of teams in a block
* where
* A set of teams exactly fill a warp OR a team is the whole block
* ( 0 == WarpSize % ( blockDim.x * blockDim.y ) )
* OR
* ( 1 == blockDim.z )
*
* Thus when 1 < blockDim.z the team is warp-synchronous
* and __syncthreads should not be called in team collectives.
*
* When multiple teams are mapped onto a single block then the
* total available shared memory must be partitioned among teams.
*/
class CudaTeamMember {
private:
typedef Kokkos::Cuda execution_space ;
typedef execution_space::scratch_memory_space scratch_memory_space ;
void * m_team_reduce ;
scratch_memory_space m_team_shared ;
int m_team_reduce_size ;
int m_league_rank ;
int m_league_size ;
public:
KOKKOS_INLINE_FUNCTION
const execution_space::scratch_memory_space & team_shmem() const
{ return m_team_shared.set_team_thread_mode(0,1,0) ; }
KOKKOS_INLINE_FUNCTION
const execution_space::scratch_memory_space &
team_scratch(const int& level) const
{ return m_team_shared.set_team_thread_mode(level,1,0) ; }
KOKKOS_INLINE_FUNCTION
const execution_space::scratch_memory_space &
thread_scratch(const int& level) const
{ return m_team_shared.set_team_thread_mode(level,team_size(),team_rank()) ; }
KOKKOS_INLINE_FUNCTION int league_rank() const { return m_league_rank ; }
KOKKOS_INLINE_FUNCTION int league_size() const { return m_league_size ; }
KOKKOS_INLINE_FUNCTION int team_rank() const
{
#ifdef __CUDA_ARCH__
return threadIdx.y ;
#else
return 0;
#endif
}
KOKKOS_INLINE_FUNCTION int team_size() const
{
#ifdef __CUDA_ARCH__
return blockDim.y ;
#else
return 1;
#endif
}
KOKKOS_INLINE_FUNCTION void team_barrier() const
{
#ifdef __CUDA_ARCH__
if ( 1 == blockDim.z ) __syncthreads(); // team == block
else __threadfence_block(); // team <= warp
#endif
}
//--------------------------------------------------------------------------
template<class ValueType>
KOKKOS_INLINE_FUNCTION
void team_broadcast( ValueType & val, const int& thread_id) const
{
#ifdef __CUDA_ARCH__
if ( 1 == blockDim.z ) { // team == block
__syncthreads();
// Wait for shared data write until all threads arrive here
if ( threadIdx.x == 0 && threadIdx.y == thread_id ) {
*((ValueType*) m_team_reduce) = val ;
}
__syncthreads(); // Wait for shared data read until root thread writes
val = *((ValueType*) m_team_reduce);
}
else { // team <= warp
ValueType tmp( val ); // input might not be a register variable
cuda_shfl( val, tmp, blockDim.x * thread_id, blockDim.x * blockDim.y );
}
#endif
}
//--------------------------------------------------------------------------
/**\brief Reduction across a team
*
* Mapping of teams onto blocks:
* blockDim.x is "vector lanes"
* blockDim.y is team "threads"
* blockDim.z is number of teams per block
*
* Requires:
* blockDim.x is power two
* blockDim.x <= CudaTraits::WarpSize
* ( 0 == CudaTraits::WarpSize % ( blockDim.x * blockDim.y )
* OR
* ( 1 == blockDim.z )
*/
template< typename ReducerType >
KOKKOS_INLINE_FUNCTION
typename std::enable_if< is_reducer< ReducerType >::value >::type
team_reduce( ReducerType const & reducer ) const noexcept
{
#ifdef __CUDA_ARCH__
typedef typename ReducerType::value_type value_type ;
value_type tmp( reducer.reference() );
// reduce within the warp using shuffle
const int wx =
( threadIdx.x + blockDim.x * threadIdx.y ) & CudaTraits::WarpIndexMask ;
for ( int i = CudaTraits::WarpSize ; blockDim.x <= ( i >>= 1 ) ; ) {
cuda_shfl_down( reducer.reference() , tmp , i , CudaTraits::WarpSize );
// Root of each vector lane reduces:
if ( 0 == threadIdx.x && wx < i ) {
reducer.join( tmp , reducer.reference() );
}
}
if ( 1 < blockDim.z ) { // team <= warp
// broadcast result from root vector lange of root thread
cuda_shfl( reducer.reference() , tmp
, blockDim.x * threadIdx.y , CudaTraits::WarpSize );
}
else { // team == block
// Reduce across warps using shared memory
// Broadcast result within block
// Number of warps, blockDim.y may not be power of two:
const int nw = ( blockDim.x * blockDim.y + CudaTraits::WarpIndexMask ) >> CudaTraits::WarpIndexShift ;
// Warp index:
const int wy = ( blockDim.x * threadIdx.y ) >> CudaTraits::WarpIndexShift ;
// Number of shared memory entries for the reduction:
int nsh = m_team_reduce_size / sizeof(value_type);
// Using at most one entry per warp:
if ( nw < nsh ) nsh = nw ;
__syncthreads(); // Wait before shared data write
if ( 0 == wx && wy < nsh ) {
((value_type*) m_team_reduce)[wy] = tmp ;
}
// When more warps than shared entries:
for ( int i = nsh ; i < nw ; i += nsh ) {
__syncthreads();
if ( 0 == wx && i <= wy ) {
const int k = wy - i ;
if ( k < nsh ) {
reducer.join( *((value_type*) m_team_reduce + k) , tmp );
}
}
}
__syncthreads();
// One warp performs the inter-warp reduction:
if ( 0 == wy ) {
// Start at power of two covering nsh
for ( int i = 1 << ( 32 - __clz(nsh-1) ) ; ( i >>= 1 ) ; ) {
const int k = wx + i ;
if ( wx < i && k < nsh ) {
reducer.join( ((value_type*)m_team_reduce)[wx]
, ((value_type*)m_team_reduce)[k] );
__threadfence_block();
}
}
}
__syncthreads(); // Wait for reduction
// Broadcast result to all threads
reducer.reference() = *((value_type*)m_team_reduce);
}
#endif /* #ifdef __CUDA_ARCH__ */
}
//--------------------------------------------------------------------------
/** \brief Intra-team exclusive prefix sum with team_rank() ordering
* with intra-team non-deterministic ordering accumulation.
*
* The global inter-team accumulation value will, at the end of the
* league's parallel execution, be the scan's total.
* Parallel execution ordering of the league's teams is non-deterministic.
* As such the base value for each team's scan operation is similarly
* non-deterministic.
*/
template< typename Type >
KOKKOS_INLINE_FUNCTION
Type team_scan( const Type & value , Type * const global_accum ) const
{
#ifdef __CUDA_ARCH__
Type * const base_data = (Type *) m_team_reduce ;
__syncthreads(); // Don't write in to shared data until all threads have entered this function
if ( 0 == threadIdx.y ) { base_data[0] = 0 ; }
base_data[ threadIdx.y + 1 ] = value ;
Impl::cuda_intra_block_reduce_scan<true,Impl::CudaJoinFunctor<Type>,void>( Impl::CudaJoinFunctor<Type>() , base_data + 1 );
if ( global_accum ) {
if ( blockDim.y == threadIdx.y + 1 ) {
base_data[ blockDim.y ] = atomic_fetch_add( global_accum , base_data[ blockDim.y ] );
}
__syncthreads(); // Wait for atomic
base_data[ threadIdx.y ] += base_data[ blockDim.y ] ;
}
return base_data[ threadIdx.y ];
#else
return Type();
#endif
}
/** \brief Intra-team exclusive prefix sum with team_rank() ordering.
*
* The highest rank thread can compute the reduction total as
* reduction_total = dev.team_scan( value ) + value ;
*/
template< typename Type >
KOKKOS_INLINE_FUNCTION Type team_scan( const Type & value ) const {
return this->template team_scan<Type>( value , 0 );
}
//----------------------------------------
template< typename ReducerType >
KOKKOS_INLINE_FUNCTION static
typename std::enable_if< is_reducer< ReducerType >::value >::type
vector_reduce( ReducerType const & reducer )
{
#ifdef __CUDA_ARCH__
if(blockDim.x == 1) return;
// Intra vector lane shuffle reduction:
typename ReducerType::value_type tmp ( reducer.reference() );
for ( int i = blockDim.x ; ( i >>= 1 ) ; ) {
cuda_shfl_down( reducer.reference() , tmp , i , blockDim.x );
if ( threadIdx.x < i ) { reducer.join( tmp , reducer.reference() ); }
}
// Broadcast from root lane to all other lanes.
// Cannot use "butterfly" algorithm to avoid the broadcast
// because floating point summation is not associative
// and thus different threads could have different results.
cuda_shfl( reducer.reference() , tmp , 0 , blockDim.x );
#endif
}
//--------------------------------------------------------------------------
/**\brief Global reduction across all blocks
*
* Return !0 if reducer contains the final value
*/
template< typename ReducerType >
KOKKOS_INLINE_FUNCTION static
typename std::enable_if< is_reducer< ReducerType >::value , int >::type
global_reduce( ReducerType const & reducer
, int * const global_scratch_flags
, void * const global_scratch_space
, void * const shmem
, int const shmem_size
)
{
#ifdef __CUDA_ARCH__
typedef typename ReducerType::value_type value_type ;
typedef value_type volatile * pointer_type ;
// Number of shared memory entries for the reduction:
const int nsh = shmem_size / sizeof(value_type);
// Number of CUDA threads in the block, rank within the block
const int nid = blockDim.x * blockDim.y * blockDim.z ;
const int tid = threadIdx.x + blockDim.x * (
threadIdx.y + blockDim.y * threadIdx.z );
// Reduces within block using all available shared memory
// Contributes if it is the root "vector lane"
// wn == number of warps in the block
// wx == which lane within the warp
// wy == which warp within the block
const int wn = ( nid + CudaTraits::WarpIndexMask ) >> CudaTraits::WarpIndexShift ;
const int wx = tid & CudaTraits::WarpIndexMask ;
const int wy = tid >> CudaTraits::WarpIndexShift ;
//------------------------
{ // Intra warp shuffle reduction from contributing CUDA threads
value_type tmp( reducer.reference() );
for ( int i = CudaTraits::WarpSize ; blockDim.x <= ( i >>= 1 ) ; ) {
cuda_shfl_down( reducer.reference(), tmp, i, CudaTraits::WarpSize );
// Root of each vector lane reduces "thread" contribution
if ( 0 == threadIdx.x && wx < i ) {
reducer.join( & tmp , reducer.data() );
}
}
// Reduce across warps using shared memory.
// Number of warps may not be power of two.
__syncthreads(); // Wait before shared data write
// Number of shared memory entries for the reduction
// is at most one per warp
const int nentry = wn < nsh ? wn : nsh ;
if ( 0 == wx && wy < nentry ) {
// Root thread of warp 'wy' has warp's value to contribute
((value_type*) shmem)[wy] = tmp ;
}
__syncthreads(); // Wait for write to be visible to block
// When more warps than shared entries
// then warps must take turns joining their contribution
// to the designated shared memory entry.
for ( int i = nentry ; i < wn ; i += nentry ) {
const int k = wy - i ;
if ( 0 == wx && i <= wy && k < nentry ) {
// Root thread of warp 'wy' has warp's value to contribute
reducer.join( ((value_type*) shmem) + k , & tmp );
}
__syncthreads(); // Wait for write to be visible to block
}
// One warp performs the inter-warp reduction:
if ( 0 == wy ) {
// Start fan-in at power of two covering nentry
for ( int i = ( 1 << ( 32 - __clz(nentry-1) ) ) ; ( i >>= 1 ) ; ) {
const int k = wx + i ;
if ( wx < i && k < nentry ) {
reducer.join( ((pointer_type)shmem) + wx
, ((pointer_type)shmem) + k );
__threadfence_block(); // Wait for write to be visible to warp
}
}
}
}
//------------------------
{ // Write block's value to global_scratch_memory
int last_block = 0 ;
if ( 0 == wx ) {
reducer.copy( ((pointer_type)global_scratch_space)
+ blockIdx.x * reducer.length()
, reducer.data() );
__threadfence(); // Wait until global write is visible.
last_block = gridDim.x ==
1 + Kokkos::atomic_fetch_add(global_scratch_flags,1);
// If last block then reset count
if ( last_block ) *global_scratch_flags = 0 ;
}
last_block = __syncthreads_or( last_block );
if ( ! last_block ) return 0 ;
}
//------------------------
// Last block reads global_scratch_memory into shared memory.
const int nentry = nid < gridDim.x ?
( nid < nsh ? nid : nsh ) :
( gridDim.x < nsh ? gridDim.x : nsh ) ;
// nentry = min( nid , nsh , gridDim.x )
// whole block reads global memory into shared memory:
if ( tid < nentry ) {
const int offset = tid * reducer.length();
reducer.copy( ((pointer_type)shmem) + offset
, ((pointer_type)global_scratch_space) + offset );
for ( int i = nentry + tid ; i < gridDim.x ; i += nentry ) {
reducer.join( ((pointer_type)shmem) + offset
, ((pointer_type)global_scratch_space)
+ i * reducer.length() );
}
}
__syncthreads(); // Wait for writes to be visible to block
if ( 0 == wy ) {
// Iterate to reduce shared memory to single warp fan-in size
const int nreduce = CudaTraits::WarpSize < nentry
? CudaTraits::WarpSize : nentry ;
// nreduce = min( CudaTraits::WarpSize , nsh , gridDim.x )
if ( wx < nreduce && nreduce < nentry ) {
for ( int i = nreduce + wx ; i < nentry ; i += nreduce ) {
reducer.join( ((pointer_type)shmem) + wx
, ((pointer_type)shmem) + i );
}
__threadfence_block(); // Wait for writes to be visible to warp
}
// Start fan-in at power of two covering nentry
for ( int i = ( 1 << ( 32 - __clz(nreduce-1) ) ) ; ( i >>= 1 ) ; ) {
const int k = wx + i ;
if ( wx < i && k < nreduce ) {
reducer.join( ((pointer_type)shmem) + wx
, ((pointer_type)shmem) + k );
__threadfence_block(); // Wait for writes to be visible to warp
}
}
if ( 0 == wx ) {
reducer.copy( reducer.data() , (pointer_type)shmem );
return 1 ;
}
}
return 0 ;
#else
return 0 ;
#endif
}
//----------------------------------------
// Private for the driver
KOKKOS_INLINE_FUNCTION
CudaTeamMember( void * shared
, const int shared_begin
, const int shared_size
, void* scratch_level_1_ptr
, const int scratch_level_1_size
, const int arg_league_rank
, const int arg_league_size )
: m_team_reduce( shared )
, m_team_shared( ((char *)shared) + shared_begin , shared_size, scratch_level_1_ptr, scratch_level_1_size)
, m_team_reduce_size( shared_begin )
, m_league_rank( arg_league_rank )
, m_league_size( arg_league_size )
{}
};
} // namspace Impl
} // namespace Kokkos
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
namespace Kokkos {
namespace Impl {
template<typename iType>
struct TeamThreadRangeBoundariesStruct<iType,CudaTeamMember> {
typedef iType index_type;
const CudaTeamMember& member;
const iType start;
const iType end;
KOKKOS_INLINE_FUNCTION
TeamThreadRangeBoundariesStruct (const CudaTeamMember& thread_, const iType& count)
: member(thread_)
, start( 0 )
, end( count ) {}
KOKKOS_INLINE_FUNCTION
TeamThreadRangeBoundariesStruct (const CudaTeamMember& thread_, const iType& begin_, const iType& end_)
: member(thread_)
, start( begin_ )
, end( end_ ) {}
};
template<typename iType>
struct ThreadVectorRangeBoundariesStruct<iType,CudaTeamMember> {
typedef iType index_type;
const iType start;
const iType end;
KOKKOS_INLINE_FUNCTION
ThreadVectorRangeBoundariesStruct (const CudaTeamMember, const iType& count)
: start( 0 ), end( count ) {}
KOKKOS_INLINE_FUNCTION
ThreadVectorRangeBoundariesStruct (const iType& count)
: start( 0 ), end( count ) {}
};
} // namespace Impl
template<typename iType>
KOKKOS_INLINE_FUNCTION
Impl::TeamThreadRangeBoundariesStruct< iType, Impl::CudaTeamMember >
TeamThreadRange( const Impl::CudaTeamMember & thread, const iType & count ) {
return Impl::TeamThreadRangeBoundariesStruct< iType, Impl::CudaTeamMember >( thread, count );
}
template< typename iType1, typename iType2 >
KOKKOS_INLINE_FUNCTION
Impl::TeamThreadRangeBoundariesStruct< typename std::common_type< iType1, iType2 >::type,
Impl::CudaTeamMember >
TeamThreadRange( const Impl::CudaTeamMember & thread, const iType1 & begin, const iType2 & end ) {
typedef typename std::common_type< iType1, iType2 >::type iType;
return Impl::TeamThreadRangeBoundariesStruct< iType, Impl::CudaTeamMember >( thread, iType(begin), iType(end) );
}
template<typename iType>
KOKKOS_INLINE_FUNCTION
Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::CudaTeamMember >
ThreadVectorRange(const Impl::CudaTeamMember& thread, const iType& count) {
return Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::CudaTeamMember >(thread,count);
}
KOKKOS_INLINE_FUNCTION
Impl::ThreadSingleStruct<Impl::CudaTeamMember> PerTeam(const Impl::CudaTeamMember& thread) {
return Impl::ThreadSingleStruct<Impl::CudaTeamMember>(thread);
}
KOKKOS_INLINE_FUNCTION
Impl::VectorSingleStruct<Impl::CudaTeamMember> PerThread(const Impl::CudaTeamMember& thread) {
return Impl::VectorSingleStruct<Impl::CudaTeamMember>(thread);
}
//----------------------------------------------------------------------------
/** \brief Inter-thread parallel_for.
*
* Executes closure(iType i) for each i=[0..N).
*
* The range [0..N) is mapped to all threads of the the calling thread team.
*/
template<typename iType, class Closure >
KOKKOS_INLINE_FUNCTION
void parallel_for
( const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::CudaTeamMember>&
loop_boundaries
, const Closure & closure
)
{
#ifdef __CUDA_ARCH__
for( iType i = loop_boundaries.start + threadIdx.y
; i < loop_boundaries.end
; i += blockDim.y )
closure(i);
#endif
}
//----------------------------------------------------------------------------
/** \brief Inter-thread parallel_reduce with a reducer.
*
* Executes closure(iType i, ValueType & val) for each i=[0..N)
*
* The range [0..N) is mapped to all threads of the
* calling thread team and a summation of val is
* performed and put into result.
*/
template< typename iType, class Closure, class ReducerType >
KOKKOS_INLINE_FUNCTION
typename std::enable_if< Kokkos::is_reducer< ReducerType >::value >::type
parallel_reduce
( const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::CudaTeamMember> &
loop_boundaries
, const Closure & closure
, const ReducerType & reducer
)
{
#ifdef __CUDA_ARCH__
reducer.init( reducer.reference() );
for( iType i = loop_boundaries.start + threadIdx.y
; i < loop_boundaries.end
; i += blockDim.y ) {
closure(i,reducer.reference());
}
loop_boundaries.member.team_reduce( reducer );
#endif
}
/** \brief Inter-thread parallel_reduce assuming summation.
*
* Executes closure(iType i, ValueType & val) for each i=[0..N)
*
* The range [0..N) is mapped to all threads of the
* calling thread team and a summation of val is
* performed and put into result.
*/
template< typename iType, class Closure, typename ValueType >
KOKKOS_INLINE_FUNCTION
typename std::enable_if< ! Kokkos::is_reducer< ValueType >::value >::type
parallel_reduce
( const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::CudaTeamMember> &
loop_boundaries
, const Closure & closure
, ValueType & result
)
{
#ifdef __CUDA_ARCH__
Kokkos::Experimental::Sum<ValueType> reducer(result);
reducer.init( reducer.reference() );
for( iType i = loop_boundaries.start + threadIdx.y
; i < loop_boundaries.end
; i += blockDim.y ) {
closure(i,result);
}
loop_boundaries.member.team_reduce( reducer );
#endif
}
//----------------------------------------------------------------------------
/** \brief Intra-thread vector parallel_for.
*
* Executes closure(iType i) for each i=[0..N)
*
* The range [0..N) is mapped to all vector lanes of the the calling thread.
*/
template<typename iType, class Closure >
KOKKOS_INLINE_FUNCTION
void parallel_for
( const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::CudaTeamMember>&
loop_boundaries
, const Closure & closure
)
{
#ifdef __CUDA_ARCH__
for ( iType i = loop_boundaries.start + threadIdx.x
; i < loop_boundaries.end
; i += blockDim.x ) {
closure(i);
}
#endif
}
//----------------------------------------------------------------------------
/** \brief Intra-thread vector parallel_reduce.
*
* Calls closure(iType i, ValueType & val) for each i=[0..N).
*
* The range [0..N) is mapped to all vector lanes of
* the calling thread and a reduction of val is performed using +=
* and output into result.
*
* The identity value for the += operator is assumed to be the default
* constructed value.
*/
template< typename iType, class Closure, class ReducerType >
KOKKOS_INLINE_FUNCTION
typename std::enable_if< is_reducer< ReducerType >::value >::type
parallel_reduce
( Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::CudaTeamMember>
const & loop_boundaries
, Closure const & closure
, ReducerType const & reducer )
{
#ifdef __CUDA_ARCH__
reducer.init( reducer.reference() );
for ( iType i = loop_boundaries.start + threadIdx.x
; i < loop_boundaries.end
; i += blockDim.x ) {
closure(i,reducer.reference());
}
Impl::CudaTeamMember::vector_reduce( reducer );
#endif
}
/** \brief Intra-thread vector parallel_reduce.
*
* Calls closure(iType i, ValueType & val) for each i=[0..N).
*
* The range [0..N) is mapped to all vector lanes of
* the calling thread and a reduction of val is performed using +=
* and output into result.
*
* The identity value for the += operator is assumed to be the default
* constructed value.
*/
template< typename iType, class Closure, typename ValueType >
KOKKOS_INLINE_FUNCTION
typename std::enable_if< ! is_reducer< ValueType >::value >::type
parallel_reduce
( Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::CudaTeamMember>
const & loop_boundaries
, Closure const & closure
, ValueType & result )
{
#ifdef __CUDA_ARCH__
result = ValueType();
for ( iType i = loop_boundaries.start + threadIdx.x
; i < loop_boundaries.end
; i += blockDim.x ) {
closure(i,result);
}
Impl::CudaTeamMember::vector_reduce(
Kokkos::Experimental::Sum<ValueType>(result ) );
#endif
}
//----------------------------------------------------------------------------
/** \brief Intra-thread vector parallel exclusive prefix sum.
*
* Executes closure(iType i, ValueType & val, bool final) for each i=[0..N)
*
* The range [0..N) is mapped to all vector lanes in the
* thread and a scan operation is performed.
* The last call to closure has final == true.
*/
template< typename iType, class Closure >
KOKKOS_INLINE_FUNCTION
void parallel_scan
( const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::CudaTeamMember >&
loop_boundaries
, const Closure & closure
)
{
#ifdef __CUDA_ARCH__
// Extract value_type from closure
using value_type =
typename Kokkos::Impl::FunctorAnalysis
< Kokkos::Impl::FunctorPatternInterface::SCAN
, void
, Closure >::value_type ;
// Loop through boundaries by vector-length chunks
// must scan at each iteration
value_type accum = 0 ;
// All thread "lanes" must loop the same number of times.
// Determine an loop end for all thread "lanes."
// Requires:
// blockDim.x is power of two and thus
// ( end % blockDim.x ) == ( end & ( blockDim.x - 1 ) )
// 1 <= blockDim.x <= CudaTraits::WarpSize
const int mask = blockDim.x - 1 ;
const int rem = loop_boundaries.end & mask ; // == end % blockDim.x
const int end = loop_boundaries.end + ( rem ? blockDim.x - rem : 0 );
for ( int i = threadIdx.x ; i < end ; i += blockDim.x ) {
value_type val = 0 ;
// First acquire per-lane contributions:
if ( i < loop_boundaries.end ) closure( i , val , false );
value_type sval = val ;
// Bottom up inclusive scan in triangular pattern
// where each CUDA thread is the root of a reduction tree
// from the zeroth "lane" to itself.
// [t] += [t-1] if t >= 1
// [t] += [t-2] if t >= 2
// [t] += [t-4] if t >= 4
// ...
for ( int j = 1 ; j < blockDim.x ; j <<= 1 ) {
value_type tmp = 0 ;
Impl::cuda_shfl_up( tmp , sval , j , blockDim.x );
if ( j <= threadIdx.x ) { sval += tmp ; }
}
// Include accumulation and remove value for exclusive scan:
val = accum + sval - val ;
// Provide exclusive scan value:
if ( i < loop_boundaries.end ) closure( i , val , true );
// Accumulate the last value in the inclusive scan:
Impl::cuda_shfl( sval , sval , mask , blockDim.x );
accum += sval ;
}
#endif
}
}
namespace Kokkos {
template<class FunctorType>
KOKKOS_INLINE_FUNCTION
void single(const Impl::VectorSingleStruct<Impl::CudaTeamMember>& , const FunctorType& lambda) {
#ifdef __CUDA_ARCH__
if(threadIdx.x == 0) lambda();
#endif
}
template<class FunctorType>
KOKKOS_INLINE_FUNCTION
void single(const Impl::ThreadSingleStruct<Impl::CudaTeamMember>& , const FunctorType& lambda) {
#ifdef __CUDA_ARCH__
if(threadIdx.x == 0 && threadIdx.y == 0) lambda();
#endif
}
template<class FunctorType, class ValueType>
KOKKOS_INLINE_FUNCTION
void single(const Impl::VectorSingleStruct<Impl::CudaTeamMember>& , const FunctorType& lambda, ValueType& val) {
#ifdef __CUDA_ARCH__
if(threadIdx.x == 0) lambda(val);
val = shfl(val,0,blockDim.x);
#endif
}
template<class FunctorType, class ValueType>
KOKKOS_INLINE_FUNCTION
void single(const Impl::ThreadSingleStruct<Impl::CudaTeamMember>& single_struct, const FunctorType& lambda, ValueType& val) {
#ifdef __CUDA_ARCH__
if(threadIdx.x == 0 && threadIdx.y == 0) {
lambda(val);
}
single_struct.team_member.team_broadcast(val,0);
#endif
}
} // namespace Kokkos
#endif /* defined( __CUDACC__ ) */
#endif /* #ifndef KOKKOS_CUDA_TEAM_HPP */

View File

@ -1,13 +1,13 @@
/*
//@HEADER
// ************************************************************************
//
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
@ -36,7 +36,7 @@
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
//
// ************************************************************************
//@HEADER
*/
@ -44,8 +44,6 @@
#define KOKKOS_CUDA_VECTORIZATION_HPP
#include <Kokkos_Macros.hpp>
/* only compile this file if CUDA is enabled for Kokkos */
#ifdef KOKKOS_ENABLE_CUDA
#include <Kokkos_Cuda.hpp>
@ -296,3 +294,4 @@ namespace Impl {
#endif // KOKKOS_ENABLE_CUDA
#endif

View File

@ -1,13 +1,13 @@
/*
//@HEADER
// ************************************************************************
//
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
@ -36,7 +36,7 @@
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
//
// ************************************************************************
//@HEADER
*/
@ -44,7 +44,7 @@
#ifndef KOKKOS_EXPERIMENTAL_CUDA_VIEW_HPP
#define KOKKOS_EXPERIMENTAL_CUDA_VIEW_HPP
/* only compile this file if CUDA is enabled for Kokkos */
#include <Kokkos_Macros.hpp>
#if defined( KOKKOS_ENABLE_CUDA )
//----------------------------------------------------------------------------

View File

@ -1,13 +1,13 @@
/*
//@HEADER
// ************************************************************************
//
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
@ -36,7 +36,7 @@
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
//
// ************************************************************************
//@HEADER
*/
@ -46,7 +46,7 @@
//----------------------------------------------------------------------------
//----------------------------------------------------------------------------
#include "Kokkos_Macros.hpp"
#include <Kokkos_Macros.hpp>
#if defined( __CUDACC__ ) && defined( KOKKOS_ENABLE_CUDA )
#include <cuda.h>
@ -82,6 +82,8 @@ void cuda_abort( const char * const message )
} // namespace Impl
} // namespace Kokkos
#else
void KOKKOS_CORE_SRC_CUDA_ABORT_PREVENT_LINK_ERROR() {}
#endif /* #if defined(__CUDACC__) && defined( KOKKOS_ENABLE_CUDA ) */
#endif /* #ifndef KOKKOS_CUDA_ABORT_HPP */

View File

@ -236,13 +236,13 @@ struct MDRangePolicy
MDRangePolicy( lower_tmp, upper_tmp, tile_tmp );
#else
if(m_lower.size()!=rank || m_upper.size() != rank)
if(static_cast<int>(m_lower.size()) != rank || static_cast<int>(m_upper.size()) != rank)
Kokkos::abort("MDRangePolicy: Constructor initializer lists have wrong size");
for ( auto i = 0; i < rank; ++i ) {
m_lower[i] = static_cast<array_index_type>(lower.begin()[i]);
m_upper[i] = static_cast<array_index_type>(upper.begin()[i]);
if(tile.size()==rank)
if(static_cast<int>(tile.size())==rank)
m_tile[i] = static_cast<array_index_type>(tile.begin()[i]);
else
m_tile[i] = 0;

View File

@ -44,6 +44,7 @@
#define KOKKOS_COMPLEX_HPP
#include <Kokkos_Atomic.hpp>
#include <Kokkos_NumericTraits.hpp>
#include <complex>
#include <iostream>
@ -324,9 +325,30 @@ public:
im_ /= src;
return *this;
}
KOKKOS_INLINE_FUNCTION
bool operator == (const complex<RealType>& src) {
return (re_ == src.re_) && (im_ == src.im_);
}
KOKKOS_INLINE_FUNCTION
bool operator == (const RealType src) {
return (re_ == src) && (im_ == RealType(0));
}
KOKKOS_INLINE_FUNCTION
bool operator != (const complex<RealType>& src) {
return (re_ != src.re_) || (im_ != src.im_);
}
KOKKOS_INLINE_FUNCTION
bool operator != (const RealType src) {
return (re_ != src) || (im_ != RealType(0));
}
};
//! Binary + operator for complex.
//! Binary + operator for complex complex.
template<class RealType>
KOKKOS_INLINE_FUNCTION
complex<RealType>
@ -334,6 +356,22 @@ operator + (const complex<RealType>& x, const complex<RealType>& y) {
return complex<RealType> (x.real () + y.real (), x.imag () + y.imag ());
}
//! Binary + operator for complex scalar.
template<class RealType>
KOKKOS_INLINE_FUNCTION
complex<RealType>
operator + (const complex<RealType>& x, const RealType& y) {
return complex<RealType> (x.real () + y , x.imag ());
}
//! Binary + operator for scalar complex.
template<class RealType>
KOKKOS_INLINE_FUNCTION
complex<RealType>
operator + (const RealType& x, const complex<RealType>& y) {
return complex<RealType> (x + y.real (), y.imag ());
}
//! Unary + operator for complex.
template<class RealType>
KOKKOS_INLINE_FUNCTION
@ -350,6 +388,22 @@ operator - (const complex<RealType>& x, const complex<RealType>& y) {
return complex<RealType> (x.real () - y.real (), x.imag () - y.imag ());
}
//! Binary - operator for complex scalar.
template<class RealType>
KOKKOS_INLINE_FUNCTION
complex<RealType>
operator - (const complex<RealType>& x, const RealType& y) {
return complex<RealType> (x.real () - y , x.imag ());
}
//! Binary - operator for scalar complex.
template<class RealType>
KOKKOS_INLINE_FUNCTION
complex<RealType>
operator - (const RealType& x, const complex<RealType>& y) {
return complex<RealType> (x - y.real (), - y.imag ());
}
//! Unary - operator for complex.
template<class RealType>
KOKKOS_INLINE_FUNCTION
@ -395,6 +449,16 @@ operator * (const RealType& x, const complex<RealType>& y) {
return complex<RealType> (x * y.real (), x * y.imag ());
}
/// \brief Binary * operator for RealType times complex.
///
/// This function exists because the compiler doesn't know that
/// RealType and complex<RealType> commute with respect to operator*.
template<class RealType>
KOKKOS_INLINE_FUNCTION
complex<RealType>
operator * (const complex<RealType>& y, const RealType& x) {
return complex<RealType> (x * y.real (), x * y.imag ());
}
//! Imaginary part of a complex number.
template<class RealType>
@ -415,7 +479,25 @@ template<class RealType>
KOKKOS_INLINE_FUNCTION
RealType abs (const complex<RealType>& x) {
// FIXME (mfh 31 Oct 2014) Scale to avoid unwarranted overflow.
return ::sqrt (real (x) * real (x) + imag (x) * imag (x));
return std::sqrt (real (x) * real (x) + imag (x) * imag (x));
}
//! Power of a complex number
template<class RealType>
KOKKOS_INLINE_FUNCTION
Kokkos::complex<RealType> pow (const complex<RealType>& x, const RealType& e) {
RealType r = abs(x);
RealType phi = std::atan(x.imag()/x.real());
return std::pow(r,e) * Kokkos::complex<RealType>(std::cos(phi*e),std::sin(phi*e));
}
//! Square root of a complex number.
template<class RealType>
KOKKOS_INLINE_FUNCTION
Kokkos::complex<RealType> sqrt (const complex<RealType>& x) {
RealType r = abs(x);
RealType phi = std::atan(x.imag()/x.real());
return std::sqrt(r) * Kokkos::complex<RealType>(std::cos(phi*0.5),std::sin(phi*0.5));
}
//! Conjugate of a complex number.
@ -425,6 +507,19 @@ complex<RealType> conj (const complex<RealType>& x) {
return complex<RealType> (real (x), -imag (x));
}
//! Exponential of a complex number.
template<class RealType>
KOKKOS_INLINE_FUNCTION
complex<RealType> exp (const complex<RealType>& x) {
return std::exp(x.real()) * complex<RealType> (std::cos (x.imag()), std::sin(x.imag()));
}
//! Exponential of a complex number.
template<class RealType>
KOKKOS_INLINE_FUNCTION
complex<RealType> pow (const complex<RealType>& x) {
return std::exp(x.real()) * complex<RealType> (std::cos (x.imag()), std::sin(x.imag()));
}
//! Binary operator / for complex and real numbers
template<class RealType1, class RealType2>
@ -461,6 +556,14 @@ operator / (const complex<RealType>& x, const complex<RealType>& y) {
}
}
//! Binary operator / for complex and real numbers
template<class RealType1, class RealType2>
KOKKOS_INLINE_FUNCTION
complex<RealType1>
operator / (const RealType1& x, const complex<RealType2>& y) {
return complex<RealType1> (x)/y;
}
//! Equality operator for two complex numbers.
template<class RealType>
KOKKOS_INLINE_FUNCTION
@ -468,9 +571,13 @@ bool operator == (const complex<RealType>& x, const complex<RealType>& y) {
return real (x) == real (y) && imag (x) == imag (y);
}
//! Equality operator for std::complex and Kokkos::complex.
/// \brief Equality operator for std::complex and Kokkos::complex.
///
/// This cannot be a device function, since std::real is not.
/// Otherwise, CUDA builds will give compiler warnings ("warning:
/// calling a constexpr __host__ function("real") from a __host__
/// __device__ function("operator==") is not allowed").
template<class RealType>
KOKKOS_INLINE_FUNCTION
bool operator == (const std::complex<RealType>& x, const complex<RealType>& y) {
return std::real (x) == real (y) && std::imag (x) == imag (y);
}
@ -533,6 +640,15 @@ std::ostream& operator >> (std::ostream& os, complex<RealType>& x) {
}
template<class T>
struct reduction_identity<Kokkos::complex<T> > {
typedef reduction_identity<T> t_red_ident;
KOKKOS_FORCEINLINE_FUNCTION constexpr static Kokkos::complex<T> sum()
{return Kokkos::complex<T>(t_red_ident::sum(),t_red_ident::sum());}
KOKKOS_FORCEINLINE_FUNCTION constexpr static Kokkos::complex<T> prod()
{return Kokkos::complex<T>(t_red_ident::prod(),t_red_ident::sum());}
};
} // namespace Kokkos
#endif // KOKKOS_COMPLEX_HPP

View File

@ -200,7 +200,14 @@ public:
, Kokkos::DefaultHostExecutionSpace , execution_space
>::type host_execution_space ;
#else
typedef execution_space host_execution_space ;
#if defined( KOKKOS_ENABLE_OPENMPTARGET )
typedef typename std::conditional
< std::is_same< execution_space , Kokkos::Experimental::OpenMPTarget >::value
, Kokkos::DefaultHostExecutionSpace , execution_space
>::type host_execution_space ;
#else
typedef execution_space host_execution_space ;
#endif
#endif
typedef typename std::conditional

View File

@ -57,11 +57,16 @@
#include <Kokkos_OpenMP.hpp>
#endif
//#if defined( KOKKOS_ENABLE_OPENMPTARGET )
#include <Kokkos_OpenMPTarget.hpp>
#include <Kokkos_OpenMPTargetSpace.hpp>
//#endif
#if defined( KOKKOS_ENABLE_QTHREADS )
#include <Kokkos_Qthreads.hpp>
#endif
#if defined( KOKKOS_ENABLE_PTHREAD )
#if defined( KOKKOS_ENABLE_THREADS )
#include <Kokkos_Threads.hpp>
#endif
@ -69,8 +74,8 @@
#include <Kokkos_Cuda.hpp>
#endif
#include <Kokkos_MemoryPool.hpp>
#include <Kokkos_Pair.hpp>
#include <Kokkos_MemoryPool.hpp>
#include <Kokkos_Array.hpp>
#include <Kokkos_View.hpp>
#include <Kokkos_Vectorization.hpp>
@ -167,3 +172,4 @@ void * kokkos_realloc( void * arg_alloc , const size_t arg_alloc_size )
//----------------------------------------------------------------------------
#endif

View File

@ -96,7 +96,7 @@ class Serial; ///< Execution space main process on CPU.
class Qthreads; ///< Execution space with Qthreads back-end.
#endif
#if defined( KOKKOS_ENABLE_PTHREAD )
#if defined( KOKKOS_ENABLE_THREADS )
class Threads; ///< Execution space with pthreads back-end.
#endif
@ -104,6 +104,14 @@ class Threads; ///< Execution space with pthreads back-end.
class OpenMP; ///< OpenMP execution space.
#endif
#if defined( KOKKOS_ENABLE_OPENMPTARGET )
namespace Experimental {
class OpenMPTarget; ///< OpenMPTarget execution space.
class OpenMPTargetSpace;
}
#endif
#if defined( KOKKOS_ENABLE_CUDA )
class CudaSpace; ///< Memory space on Cuda GPU
class CudaUVMSpace; ///< Memory space on Cuda GPU with UVM
@ -121,12 +129,14 @@ struct Device;
/// Define Kokkos::DefaultExecutionSpace as per configuration option
/// or chosen from the enabled execution spaces in the following order:
/// Kokkos::Cuda, Kokkos::OpenMP, Kokkos::Threads, Kokkos::Serial
/// Kokkos::Cuda, Kokkos::Experimental::OpenMPTarget, Kokkos::OpenMP, Kokkos::Threads, Kokkos::Serial
namespace Kokkos {
#if defined( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_CUDA )
typedef Cuda DefaultExecutionSpace;
#elif defined ( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_OPENMPTARGET )
typedef Experimental::OpenMPTarget DefaultExecutionSpace ;
#elif defined( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_OPENMP )
typedef OpenMP DefaultExecutionSpace;
#elif defined( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_THREADS )
@ -136,7 +146,7 @@ namespace Kokkos {
#elif defined( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_SERIAL )
typedef Serial DefaultExecutionSpace;
#else
# error "At least one of the following execution spaces must be defined in order to use Kokkos: Kokkos::Cuda, Kokkos::OpenMP, Kokkos::Threads, Kokkos::Qthreads, or Kokkos::Serial."
# error "At least one of the following execution spaces must be defined in order to use Kokkos: Kokkos::Cuda, Kokkos::Experimental::OpenMPTarget, Kokkos::OpenMP, Kokkos::Threads, Kokkos::Qthreads, or Kokkos::Serial."
#endif
#if defined( KOKKOS_ENABLE_DEFAULT_DEVICE_TYPE_OPENMP )
@ -149,7 +159,7 @@ namespace Kokkos {
typedef Serial DefaultHostExecutionSpace;
#elif defined( KOKKOS_ENABLE_OPENMP )
typedef OpenMP DefaultHostExecutionSpace;
#elif defined( KOKKOS_ENABLE_PTHREAD )
#elif defined( KOKKOS_ENABLE_THREADS )
typedef Threads DefaultHostExecutionSpace;
//#elif defined( KOKKOS_ENABLE_QTHREADS )
// typedef Qthreads DefaultHostExecutionSpace;
@ -254,6 +264,21 @@ template< class FunctorType, class ExecPolicy, class ExecutionSapce =
} // namespace Impl
namespace Experimental {
template<class ScalarType , class Space = HostSpace> struct Sum;
template<class ScalarType , class Space = HostSpace> struct Prod;
template<class ScalarType , class Space = HostSpace> struct Min;
template<class ScalarType , class Space = HostSpace> struct Max;
template<class ScalarType , class Space = HostSpace> struct MinMax;
template<class ScalarType , class Index, class Space = HostSpace> struct MinLoc;
template<class ScalarType , class Index, class Space = HostSpace> struct MaxLoc;
template<class ScalarType , class Index, class Space = HostSpace> struct MinMaxLoc;
template<class ScalarType , class Space = HostSpace> struct BAnd;
template<class ScalarType , class Space = HostSpace> struct BOr;
template<class ScalarType , class Space = HostSpace> struct LAnd;
template<class ScalarType , class Space = HostSpace> struct LOr;
}
} // namespace Kokkos
#endif /* #ifndef KOKKOS_CORE_FWD_HPP */

View File

@ -44,12 +44,11 @@
#ifndef KOKKOS_CUDA_HPP
#define KOKKOS_CUDA_HPP
#include <Kokkos_Core_fwd.hpp>
// If CUDA execution space is enabled then use this header file.
#include <Kokkos_Macros.hpp>
#if defined( KOKKOS_ENABLE_CUDA )
#include <Kokkos_Core_fwd.hpp>
#include <iosfwd>
#include <vector>
@ -214,6 +213,8 @@ public:
//@}
//--------------------------------------------------------------------------
static const char* name();
private:
cudaStream_t m_stream ;
@ -291,6 +292,7 @@ struct VerifyExecutionCanAccessMemorySpace
#include <Cuda/Kokkos_CudaExec.hpp>
#include <Cuda/Kokkos_Cuda_View.hpp>
#include <Cuda/Kokkos_Cuda_Team.hpp>
#include <Cuda/Kokkos_Cuda_Parallel.hpp>
#include <Cuda/Kokkos_Cuda_Task.hpp>
@ -300,5 +302,3 @@ struct VerifyExecutionCanAccessMemorySpace
#endif /* #if defined( KOKKOS_ENABLE_CUDA ) */
#endif /* #ifndef KOKKOS_CUDA_HPP */

View File

@ -1,13 +1,13 @@
/*
//@HEADER
// ************************************************************************
//
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
@ -36,7 +36,7 @@
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
//
// ************************************************************************
//@HEADER
*/
@ -44,10 +44,11 @@
#ifndef KOKKOS_CUDASPACE_HPP
#define KOKKOS_CUDASPACE_HPP
#include <Kokkos_Core_fwd.hpp>
#include <Kokkos_Macros.hpp>
#if defined( KOKKOS_ENABLE_CUDA )
#include <Kokkos_Core_fwd.hpp>
#include <iosfwd>
#include <typeinfo>
#include <string>
@ -717,7 +718,7 @@ private:
static ::cudaTextureObject_t
attach_texture_object( const unsigned sizeof_alias
, void * const alloc_ptr
, const size_t alloc_size );
, const size_t alloc_size );
static RecordBase s_root_record ;

Some files were not shown because too many files have changed in this diff Show More