Merge pull request #725 from stanmoore1/kk_update

Update the Kokkos library in LAMMPS to v2.5.00
This commit is contained in:
Steve Plimpton 2018-01-08 09:12:51 -07:00 committed by GitHub
commit 450c689ae9
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
366 changed files with 11645 additions and 3763 deletions

View File

@ -1,5 +1,75 @@
# Change Log
## [2.5.00](https://github.com/kokkos/kokkos/tree/2.5.00) (2017-12-15)
[Full Changelog](https://github.com/kokkos/kokkos/compare/2.04.11...2.5.00)
**Part of the Kokkos C++ Performance Portability Programming EcoSystem 2.5**
**Implemented enhancements:**
- Provide Makefile.kokkos logic for CMake and TriBITS [\#878](https://github.com/kokkos/kokkos/issues/878)
- Add Scatter View [\#825](https://github.com/kokkos/kokkos/issues/825)
- Drop gcc 4.7 and intel 14 from supported compiler list [\#603](https://github.com/kokkos/kokkos/issues/603)
- Enable construction of unmanaged view using common\_view\_alloc\_prop [\#1170](https://github.com/kokkos/kokkos/issues/1170)
- Unused Function Warning with XL [\#1267](https://github.com/kokkos/kokkos/issues/1267)
- Add memory pool parameter check [\#1218](https://github.com/kokkos/kokkos/issues/1218)
- CUDA9: Fix warning for unsupported long double [\#1189](https://github.com/kokkos/kokkos/issues/1189)
- CUDA9: fix warning on defaulted function marking [\#1188](https://github.com/kokkos/kokkos/issues/1188)
- CUDA9: fix warnings for deprecated warp level functions [\#1187](https://github.com/kokkos/kokkos/issues/1187)
- Add CUDA 9.0 nightly testing [\#1174](https://github.com/kokkos/kokkos/issues/1174)
- {OMPI,MPICH}\_CXX hack breaks nvcc\_wrapper use case [\#1166](https://github.com/kokkos/kokkos/issues/1166)
- KOKKOS\_HAVE\_CUDA\_LAMBDA became KOKKOS\_CUDA\_USE\_LAMBDA [\#1274](https://github.com/kokkos/kokkos/issues/1274)
**Fixed bugs:**
- MinMax Reducer with tagged operator doesn't compile [\#1251](https://github.com/kokkos/kokkos/issues/1251)
- Reducers for Tagged operators give wrong answer [\#1250](https://github.com/kokkos/kokkos/issues/1250)
- Kokkos not Compatible with Big Endian Machines? [\#1235](https://github.com/kokkos/kokkos/issues/1235)
- Parallel Scan hangs forever on BG/Q [\#1234](https://github.com/kokkos/kokkos/issues/1234)
- Threads backend doesn't compile with Clang on OS X [\#1232](https://github.com/kokkos/kokkos/issues/1232)
- $\(shell date\) needs quote [\#1264](https://github.com/kokkos/kokkos/issues/1264)
- Unqualified parallel\_for call conflicts with user-defined parallel\_for [\#1219](https://github.com/kokkos/kokkos/issues/1219)
- KokkosAlgorithms: CMake issue in unit tests [\#1212](https://github.com/kokkos/kokkos/issues/1212)
- Intel 18 Error: "simd pragma has been deprecated" [\#1210](https://github.com/kokkos/kokkos/issues/1210)
- Memory leak in Kokkos::initialize [\#1194](https://github.com/kokkos/kokkos/issues/1194)
- CUDA9: compiler error with static assert template arguments [\#1190](https://github.com/kokkos/kokkos/issues/1190)
- Kokkos::Serial::is\_initialized returns always true [\#1184](https://github.com/kokkos/kokkos/issues/1184)
- Triple nested parallelism still fails on bowman [\#1093](https://github.com/kokkos/kokkos/issues/1093)
- OpenMP openmp.range on Develop Runs Forever on POWER7+ with RHEL7 and GCC4.8.5 [\#995](https://github.com/kokkos/kokkos/issues/995)
- Rendezvous performance at global scope [\#985](https://github.com/kokkos/kokkos/issues/985)
## [2.04.11](https://github.com/kokkos/kokkos/tree/2.04.11) (2017-10-28)
[Full Changelog](https://github.com/kokkos/kokkos/compare/2.04.04...2.04.11)
**Implemented enhancements:**
- Add Subview pattern. [\#648](https://github.com/kokkos/kokkos/issues/648)
- Add Kokkos "global" is\_initialized [\#1060](https://github.com/kokkos/kokkos/issues/1060)
- Add create\_mirror\_view\_and\_copy [\#1161](https://github.com/kokkos/kokkos/issues/1161)
- Add KokkosConcepts SpaceAccessibility function [\#1092](https://github.com/kokkos/kokkos/issues/1092)
- Option to Disable Initialize Warnings [\#1142](https://github.com/kokkos/kokkos/issues/1142)
- Mature task-DAG capability [\#320](https://github.com/kokkos/kokkos/issues/320)
- Promote Work DAG from experimental [\#1126](https://github.com/kokkos/kokkos/issues/1126)
- Implement new WorkGraph push/pop [\#1108](https://github.com/kokkos/kokkos/issues/1108)
- Kokkos\_ENABLE\_Cuda\_Lambda should default ON [\#1101](https://github.com/kokkos/kokkos/issues/1101)
- Add multidimensional parallel for example and improve unit test [\#1064](https://github.com/kokkos/kokkos/issues/1064)
- Fix ROCm: Performance tests not building [\#1038](https://github.com/kokkos/kokkos/issues/1038)
- Make KOKKOS\_ALIGN\_SIZE a configure-time option [\#1004](https://github.com/kokkos/kokkos/issues/1004)
- Make alignment consistent [\#809](https://github.com/kokkos/kokkos/issues/809)
- Improve subview construction on Cuda backend [\#615](https://github.com/kokkos/kokkos/issues/615)
**Fixed bugs:**
- Kokkos::vector fixes for application [\#1134](https://github.com/kokkos/kokkos/issues/1134)
- DynamicView non-power of two value\_type [\#1177](https://github.com/kokkos/kokkos/issues/1177)
- Memory pool bug [\#1154](https://github.com/kokkos/kokkos/issues/1154)
- Cuda launch bounds performance regression bug [\#1140](https://github.com/kokkos/kokkos/issues/1140)
- Significant performance regression in LAMMPS after updating Kokkos [\#1139](https://github.com/kokkos/kokkos/issues/1139)
- CUDA compile error [\#1128](https://github.com/kokkos/kokkos/issues/1128)
- MDRangePolicy neg idx test failure in debug mode [\#1113](https://github.com/kokkos/kokkos/issues/1113)
- subview construction on Cuda backend [\#615](https://github.com/kokkos/kokkos/issues/615)
## [2.04.04](https://github.com/kokkos/kokkos/tree/2.04.04) (2017-09-11)
[Full Changelog](https://github.com/kokkos/kokkos/compare/2.04.00...2.04.04)

View File

@ -1,3 +1,5 @@
# Is this a build as part of Trilinos?
IF(COMMAND TRIBITS_PACKAGE_DECL)
SET(KOKKOS_HAS_TRILINOS ON CACHE BOOL "")
ELSE()
@ -6,13 +8,57 @@ ENDIF()
IF(NOT KOKKOS_HAS_TRILINOS)
cmake_minimum_required(VERSION 3.1 FATAL_ERROR)
project(Kokkos CXX)
INCLUDE(cmake/kokkos.cmake)
# Define Project Name if this is a standalone build
IF(NOT DEFINED ${PROJECT_NAME})
project(Kokkos CXX)
ENDIF()
# Basic initialization (Used in KOKKOS_SETTINGS)
set(KOKKOS_SRC_PATH ${Kokkos_SOURCE_DIR})
set(KOKKOS_PATH ${KOKKOS_SRC_PATH})
#------------ COMPILER AND FEATURE CHECKS ------------------------------------
include(${KOKKOS_SRC_PATH}/cmake/kokkos_functions.cmake)
set_kokkos_cxx_compiler()
set_kokkos_cxx_standard()
#------------ GET OPTIONS AND KOKKOS_SETTINGS --------------------------------
# Add Kokkos' modules to CMake's module path.
set(CMAKE_MODULE_PATH ${CMAKE_MODULE_PATH} "${Kokkos_SOURCE_DIR}/cmake/Modules/")
set(KOKKOS_CMAKE_VERBOSE True)
include(${KOKKOS_SRC_PATH}/cmake/kokkos_options.cmake)
include(${KOKKOS_SRC_PATH}/cmake/kokkos_settings.cmake)
#------------ GENERATE HEADER AND SOURCE FILES -------------------------------
execute_process(
COMMAND ${KOKKOS_SETTINGS} make -f ${KOKKOS_SRC_PATH}/cmake/Makefile.generate_cmake_settings CXX=${CMAKE_CXX_COMPILER} generate_build_settings
WORKING_DIRECTORY "${Kokkos_BINARY_DIR}"
OUTPUT_FILE ${Kokkos_BINARY_DIR}/core_src_make.out
RESULT_VARIABLE res
)
include(${Kokkos_BINARY_DIR}/kokkos_generated_settings.cmake)
set_kokkos_srcs(KOKKOS_SRC ${KOKKOS_SRC})
#------------ NOW BUILD ------------------------------------------------------
include(${KOKKOS_SRC_PATH}/cmake/kokkos_build.cmake)
#------------ Add in Fake Tribits Handling to allow unit test builds- --------
include(${KOKKOS_SRC_PATH}/cmake/tribits.cmake)
TRIBITS_PACKAGE_DECL(Kokkos)
ADD_SUBDIRECTORY(core)
ADD_SUBDIRECTORY(containers)
ADD_SUBDIRECTORY(algorithms)
ELSE()
#------------------------------------------------------------------------------
#
# A) Forward delcare the package so that certain options are also defined for
# A) Forward declare the package so that certain options are also defined for
# subpackages
#
@ -21,178 +67,28 @@ TRIBITS_PACKAGE_DECL(Kokkos) # ENABLE_SHADOWING_WARNINGS)
#------------------------------------------------------------------------------
#
# B) Define the common options for Kokkos first so they can be used by
# subpackages as well.
# B) Install Kokkos' build files
#
# If using the Makefile-generated files, then need to set things up.
# Here, assume that TriBITS has been run from ProjectCompilerPostConfig.cmake
# and already generated KokkosCore_config.h and kokkos_generated_settings.cmake
# in the previously define Kokkos_GEN_DIR
# We need to copy them over to the correct place and source the cmake file
# mfh 01 Aug 2016: See Issue #61:
#
# https://github.com/kokkos/kokkos/issues/61
#
# Don't use TRIBITS_ADD_DEBUG_OPTION() here, because that defines
# HAVE_KOKKOS_DEBUG. We define KOKKOS_HAVE_DEBUG here instead,
# for compatibility with Kokkos' Makefile build system.
if(NOT KOKKOS_LEGACY_TRIBITS)
set(Kokkos_GEN_DIR ${CMAKE_BINARY_DIR})
file(COPY "${Kokkos_GEN_DIR}/KokkosCore_config.h"
DESTINATION "${CMAKE_CURRENT_BINARY_DIR}" USE_SOURCE_PERMISSIONS)
install(FILES "${Kokkos_GEN_DIR}/KokkosCore_config.h"
DESTINATION include)
file(COPY "${Kokkos_GEN_DIR}/kokkos_generated_settings.cmake"
DESTINATION "${CMAKE_CURRENT_BINARY_DIR}" USE_SOURCE_PERMISSIONS)
TRIBITS_ADD_OPTION_AND_DEFINE(
Kokkos_ENABLE_DEBUG
KOKKOS_HAVE_DEBUG
"Enable run-time debug checks. These checks may be expensive, so they are disabled by default in a release build."
${${PROJECT_NAME}_ENABLE_DEBUG}
)
TRIBITS_ADD_OPTION_AND_DEFINE(
Kokkos_ENABLE_SIERRA_BUILD
KOKKOS_FOR_SIERRA
"Configure Kokkos for building within the Sierra build system."
OFF
)
TRIBITS_ADD_OPTION_AND_DEFINE(
Kokkos_ENABLE_Cuda
KOKKOS_HAVE_CUDA
"Enable CUDA support in Kokkos."
"${TPL_ENABLE_CUDA}"
)
TRIBITS_ADD_OPTION_AND_DEFINE(
Kokkos_ENABLE_Cuda_UVM
KOKKOS_USE_CUDA_UVM
"Enable CUDA Unified Virtual Memory as the default in Kokkos."
OFF
)
TRIBITS_ADD_OPTION_AND_DEFINE(
Kokkos_ENABLE_Cuda_RDC
KOKKOS_HAVE_CUDA_RDC
"Enable CUDA Relocatable Device Code support in Kokkos."
OFF
)
TRIBITS_ADD_OPTION_AND_DEFINE(
Kokkos_ENABLE_Cuda_Lambda
KOKKOS_HAVE_CUDA_LAMBDA
"Enable CUDA LAMBDA support in Kokkos."
OFF
)
TRIBITS_ADD_OPTION_AND_DEFINE(
Kokkos_ENABLE_Pthread
KOKKOS_HAVE_PTHREAD
"Enable Pthread support in Kokkos."
OFF
)
ASSERT_DEFINED(TPL_ENABLE_Pthread)
IF(Kokkos_ENABLE_Pthread AND NOT TPL_ENABLE_Pthread)
MESSAGE(FATAL_ERROR "You set Kokkos_ENABLE_Pthread=ON, but Trilinos' support for Pthread(s) is not enabled (TPL_ENABLE_Pthread=OFF). This is not allowed. Please enable Pthreads in Trilinos before attempting to enable Kokkos' support for Pthreads.")
ENDIF()
IF(NOT TPL_ENABLE_Pthread)
ADD_DEFINITIONS(-DGTEST_HAS_PTHREAD=0)
ENDIF()
TRIBITS_ADD_OPTION_AND_DEFINE(
Kokkos_ENABLE_OpenMP
KOKKOS_HAVE_OPENMP
"Enable OpenMP support in Kokkos."
"${${PROJECT_NAME}_ENABLE_OpenMP}"
)
TRIBITS_ADD_OPTION_AND_DEFINE(
Kokkos_ENABLE_QTHREAD
KOKKOS_HAVE_QTHREADS
"Enable Qthreads support in Kokkos."
"${TPL_ENABLE_QTHREAD}"
)
# TODO: No longer an option in Kokkos. Needs to be removed.
TRIBITS_ADD_OPTION_AND_DEFINE(
Kokkos_ENABLE_CXX11
KOKKOS_HAVE_CXX11
"Enable C++11 support in Kokkos."
"${${PROJECT_NAME}_ENABLE_CXX11}"
)
TRIBITS_ADD_OPTION_AND_DEFINE(
Kokkos_ENABLE_HWLOC
KOKKOS_HAVE_HWLOC
"Enable HWLOC support in Kokkos."
"${TPL_ENABLE_HWLOC}"
)
# TODO: This is currently not used in Kokkos. Should it be removed?
TRIBITS_ADD_OPTION_AND_DEFINE(
Kokkos_ENABLE_MPI
KOKKOS_HAVE_MPI
"Enable MPI support in Kokkos."
"${TPL_ENABLE_MPI}"
)
# Set default value of Kokkos_ENABLE_Debug_Bounds_Check option
#
# CMake is case sensitive. The Kokkos_ENABLE_Debug_Bounds_Check
# option (defined below) is annoyingly not all caps, but we need to
# keep it that way for backwards compatibility. If users forget and
# try using an all-caps variable, then make it count by using the
# all-caps version as the default value of the original, not-all-caps
# option. Otherwise, the default value of this option comes from
# Kokkos_ENABLE_DEBUG (see Issue #367).
ASSERT_DEFINED(${PACKAGE_NAME}_ENABLE_DEBUG)
IF(DEFINED Kokkos_ENABLE_DEBUG_BOUNDS_CHECK)
IF(Kokkos_ENABLE_DEBUG_BOUNDS_CHECK)
SET(Kokkos_ENABLE_Debug_Bounds_Check_DEFAULT ON)
ELSE()
SET(Kokkos_ENABLE_Debug_Bounds_Check_DEFAULT "${${PACKAGE_NAME}_ENABLE_DEBUG}")
ENDIF()
ELSE()
SET(Kokkos_ENABLE_Debug_Bounds_Check_DEFAULT "${${PACKAGE_NAME}_ENABLE_DEBUG}")
ENDIF()
ASSERT_DEFINED(Kokkos_ENABLE_Debug_Bounds_Check_DEFAULT)
TRIBITS_ADD_OPTION_AND_DEFINE(
Kokkos_ENABLE_Debug_Bounds_Check
KOKKOS_ENABLE_DEBUG_BOUNDS_CHECK
"Enable Kokkos::View run-time bounds checking."
"${Kokkos_ENABLE_Debug_Bounds_Check_DEFAULT}"
)
TRIBITS_ADD_OPTION_AND_DEFINE(
Kokkos_ENABLE_Debug_DualView_Modify_Check
KOKKOS_ENABLE_DEBUG_DUALVIEW_MODIFY_CHECK
"Enable abort when Kokkos::DualView modified on host and device without sync."
"${Kokkos_ENABLE_DEBUG}"
)
TRIBITS_ADD_OPTION_AND_DEFINE(
Kokkos_ENABLE_Profiling
KOKKOS_ENABLE_PROFILING
"Enable KokkosP profiling support for kernel data collections."
"${TPL_ENABLE_DLlib}"
)
TRIBITS_ADD_OPTION_AND_DEFINE(
Kokkos_ENABLE_Profiling_Load_Print
KOKKOS_ENABLE_PROFILING_LOAD_PRINT
"Print to standard output which profiling library was loaded."
OFF
)
# placeholder for future device...
TRIBITS_ADD_OPTION_AND_DEFINE(
Kokkos_ENABLE_Winthread
KOKKOS_HAVE_WINTHREAD
"Enable Winthread support in Kokkos."
"${TPL_ENABLE_Winthread}"
)
# TODO: No longer an option in Kokkos. Needs to be removed.
# use new/old View
TRIBITS_ADD_OPTION_AND_DEFINE(
Kokkos_USING_DEPRECATED_VIEW
KOKKOS_USING_DEPRECATED_VIEW
"Choose whether to use the old, deprecated Kokkos::View"
OFF
)
include(${CMAKE_CURRENT_BINARY_DIR}/kokkos_generated_settings.cmake)
# Sources come from makefile-generated kokkos_generated_settings.cmake file
# Enable using the individual sources if needed
set_kokkos_srcs(KOKKOS_SRC ${KOKKOS_SRC})
endif ()
#------------------------------------------------------------------------------
@ -226,10 +122,6 @@ TRIBITS_PACKAGE_DEF()
TRIBITS_EXCLUDE_AUTOTOOLS_FILES()
TRIBITS_EXCLUDE_FILES(
classic/doc
classic/LinAlg/doc/CrsRefactorNotesMay2012
)
TRIBITS_PACKAGE_POSTPROCESS()
ENDIF()

View File

@ -28,33 +28,39 @@ KOKKOS_OPTIONS ?= ""
# Options: force_uvm,use_ldg,rdc,enable_lambda
KOKKOS_CUDA_OPTIONS ?= "enable_lambda"
# Return a 1 if a string contains a substring and 0 if not
# Note the search string should be without '"'
# Example: $(call kokkos_has_string,"hwloc,librt",hwloc)
# Will return a 1
kokkos_has_string=$(if $(findstring $2,$1),1,0)
# Check for general settings.
KOKKOS_INTERNAL_ENABLE_DEBUG := $(strip $(shell echo $(KOKKOS_DEBUG) | grep "yes" | wc -l))
KOKKOS_INTERNAL_ENABLE_CXX11 := $(strip $(shell echo $(KOKKOS_CXX_STANDARD) | grep "c++11" | wc -l))
KOKKOS_INTERNAL_ENABLE_CXX1Z := $(strip $(shell echo $(KOKKOS_CXX_STANDARD) | grep "c++1z" | wc -l))
KOKKOS_INTERNAL_ENABLE_DEBUG := $(call kokkos_has_string,$(KOKKOS_DEBUG),yes)
KOKKOS_INTERNAL_ENABLE_CXX11 := $(call kokkos_has_string,$(KOKKOS_CXX_STANDARD),c++11)
KOKKOS_INTERNAL_ENABLE_CXX1Z := $(call kokkos_has_string,$(KOKKOS_CXX_STANDARD),c++1z)
# Check for external libraries.
KOKKOS_INTERNAL_USE_HWLOC := $(strip $(shell echo $(KOKKOS_USE_TPLS) | grep "hwloc" | wc -l))
KOKKOS_INTERNAL_USE_LIBRT := $(strip $(shell echo $(KOKKOS_USE_TPLS) | grep "librt" | wc -l))
KOKKOS_INTERNAL_USE_MEMKIND := $(strip $(shell echo $(KOKKOS_USE_TPLS) | grep "experimental_memkind" | wc -l))
KOKKOS_INTERNAL_USE_HWLOC := $(call kokkos_has_string,$(KOKKOS_USE_TPLS),hwloc)
KOKKOS_INTERNAL_USE_LIBRT := $(call kokkos_has_string,$(KOKKOS_USE_TPLS),librt)
KOKKOS_INTERNAL_USE_MEMKIND := $(call kokkos_has_string,$(KOKKOS_USE_TPLS),experimental_memkind)
# Check for advanced settings.
KOKKOS_INTERNAL_ENABLE_COMPILER_WARNINGS := $(strip $(shell echo $(KOKKOS_OPTIONS) | grep "compiler_warnings" | wc -l))
KOKKOS_INTERNAL_OPT_RANGE_AGGRESSIVE_VECTORIZATION := $(strip $(shell echo $(KOKKOS_OPTIONS) | grep "aggressive_vectorization" | wc -l))
KOKKOS_INTERNAL_DISABLE_PROFILING := $(strip $(shell echo $(KOKKOS_OPTIONS) | grep "disable_profiling" | wc -l))
KOKKOS_INTERNAL_DISABLE_DUALVIEW_MODIFY_CHECK := $(strip $(shell echo $(KOKKOS_OPTIONS) | grep "disable_dualview_modify_check" | wc -l))
KOKKOS_INTERNAL_ENABLE_PROFILING_LOAD_PRINT := $(strip $(shell echo $(KOKKOS_OPTIONS) | grep "enable_profile_load_print" | wc -l))
KOKKOS_INTERNAL_CUDA_USE_LDG := $(strip $(shell echo $(KOKKOS_CUDA_OPTIONS) | grep "use_ldg" | wc -l))
KOKKOS_INTERNAL_CUDA_USE_UVM := $(strip $(shell echo $(KOKKOS_CUDA_OPTIONS) | grep "force_uvm" | wc -l))
KOKKOS_INTERNAL_CUDA_USE_RELOC := $(strip $(shell echo $(KOKKOS_CUDA_OPTIONS) | grep "rdc" | wc -l))
KOKKOS_INTERNAL_CUDA_USE_LAMBDA := $(strip $(shell echo $(KOKKOS_CUDA_OPTIONS) | grep "enable_lambda" | wc -l))
KOKKOS_INTERNAL_ENABLE_COMPILER_WARNINGS := $(call kokkos_has_string,$(KOKKOS_OPTIONS),compiler_warnings)
KOKKOS_INTERNAL_OPT_RANGE_AGGRESSIVE_VECTORIZATION := $(call kokkos_has_string,$(KOKKOS_OPTIONS),aggressive_vectorization)
KOKKOS_INTERNAL_DISABLE_PROFILING := $(call kokkos_has_string,$(KOKKOS_OPTIONS),disable_profiling)
KOKKOS_INTERNAL_DISABLE_DUALVIEW_MODIFY_CHECK := $(call kokkos_has_string,$(KOKKOS_OPTIONS),disable_dualview_modify_check)
KOKKOS_INTERNAL_ENABLE_PROFILING_LOAD_PRINT := $(call kokkos_has_string,$(KOKKOS_OPTIONS),enable_profile_load_print)
KOKKOS_INTERNAL_CUDA_USE_LDG := $(call kokkos_has_string,$(KOKKOS_CUDA_OPTIONS),use_ldg)
KOKKOS_INTERNAL_CUDA_USE_UVM := $(call kokkos_has_string,$(KOKKOS_CUDA_OPTIONS),force_uvm)
KOKKOS_INTERNAL_CUDA_USE_RELOC := $(call kokkos_has_string,$(KOKKOS_CUDA_OPTIONS),rdc)
KOKKOS_INTERNAL_CUDA_USE_LAMBDA := $(call kokkos_has_string,$(KOKKOS_CUDA_OPTIONS),enable_lambda)
# Check for Kokkos Host Execution Spaces one of which must be on.
KOKKOS_INTERNAL_USE_OPENMP := $(strip $(shell echo $(subst OpenMPTarget,,$(KOKKOS_DEVICES)) | grep OpenMP | wc -l))
KOKKOS_INTERNAL_USE_PTHREADS := $(strip $(shell echo $(KOKKOS_DEVICES) | grep Pthread | wc -l))
KOKKOS_INTERNAL_USE_QTHREADS := $(strip $(shell echo $(KOKKOS_DEVICES) | grep Qthreads | wc -l))
KOKKOS_INTERNAL_USE_SERIAL := $(strip $(shell echo $(KOKKOS_DEVICES) | grep Serial | wc -l))
KOKKOS_INTERNAL_USE_OPENMP := $(call kokkos_has_string,$(subst OpenMPTarget,,$(KOKKOS_DEVICES)),OpenMP)
KOKKOS_INTERNAL_USE_PTHREADS := $(call kokkos_has_string,$(KOKKOS_DEVICES),Pthread)
KOKKOS_INTERNAL_USE_QTHREADS := $(call kokkos_has_string,$(KOKKOS_DEVICES),Qthreads)
KOKKOS_INTERNAL_USE_SERIAL := $(call kokkos_has_string,$(KOKKOS_DEVICES),Serial)
ifeq ($(KOKKOS_INTERNAL_USE_OPENMP), 0)
ifeq ($(KOKKOS_INTERNAL_USE_PTHREADS), 0)
@ -65,9 +71,9 @@ ifeq ($(KOKKOS_INTERNAL_USE_OPENMP), 0)
endif
# Check for other Execution Spaces.
KOKKOS_INTERNAL_USE_CUDA := $(strip $(shell echo $(KOKKOS_DEVICES) | grep Cuda | wc -l))
KOKKOS_INTERNAL_USE_ROCM := $(strip $(shell echo $(KOKKOS_DEVICES) | grep ROCm | wc -l))
KOKKOS_INTERNAL_USE_OPENMPTARGET := $(strip $(shell echo $(KOKKOS_DEVICES) | grep OpenMPTarget | wc -l))
KOKKOS_INTERNAL_USE_CUDA := $(call kokkos_has_string,$(KOKKOS_DEVICES),Cuda)
KOKKOS_INTERNAL_USE_ROCM := $(call kokkos_has_string,$(KOKKOS_DEVICES),ROCm)
KOKKOS_INTERNAL_USE_OPENMPTARGET := $(call kokkos_has_string,$(KOKKOS_DEVICES),OpenMPTarget)
ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
KOKKOS_INTERNAL_NVCC_PATH := $(shell which nvcc)
@ -77,25 +83,20 @@ endif
# Check OS.
KOKKOS_OS := $(strip $(shell uname -s))
KOKKOS_INTERNAL_OS_CYGWIN := $(strip $(shell uname -s | grep CYGWIN | wc -l))
KOKKOS_INTERNAL_OS_LINUX := $(strip $(shell uname -s | grep Linux | wc -l))
KOKKOS_INTERNAL_OS_DARWIN := $(strip $(shell uname -s | grep Darwin | wc -l))
KOKKOS_INTERNAL_OS_CYGWIN := $(call kokkos_has_string,$(KOKKOS_OS),CYGWIN)
KOKKOS_INTERNAL_OS_LINUX := $(call kokkos_has_string,$(KOKKOS_OS),Linux)
KOKKOS_INTERNAL_OS_DARWIN := $(call kokkos_has_string,$(KOKKOS_OS),Darwin)
# Check compiler.
KOKKOS_INTERNAL_COMPILER_INTEL := $(strip $(shell $(CXX) --version 2>&1 | grep "Intel Corporation" | wc -l))
KOKKOS_INTERNAL_COMPILER_PGI := $(strip $(shell $(CXX) --version 2>&1 | grep PGI | wc -l))
KOKKOS_CXX_VERSION := $(strip $(shell $(CXX) --version 2>&1))
KOKKOS_INTERNAL_COMPILER_INTEL := $(call kokkos_has_string,$(KOKKOS_CXX_VERSION),Intel Corporation)
KOKKOS_INTERNAL_COMPILER_PGI := $(call kokkos_has_string,$(KOKKOS_CXX_VERSION),PGI)
KOKKOS_INTERNAL_COMPILER_XL := $(strip $(shell $(CXX) -qversion 2>&1 | grep XL | wc -l))
KOKKOS_INTERNAL_COMPILER_CRAY := $(strip $(shell $(CXX) -craype-verbose 2>&1 | grep "CC-" | wc -l))
KOKKOS_INTERNAL_COMPILER_NVCC := $(strip $(shell $(CXX) --version 2>&1 | grep nvcc | wc -l))
ifneq ($(OMPI_CXX),)
KOKKOS_INTERNAL_COMPILER_NVCC := $(strip $(shell $(OMPI_CXX) --version 2>&1 | grep nvcc | wc -l))
endif
ifneq ($(MPICH_CXX),)
KOKKOS_INTERNAL_COMPILER_NVCC := $(strip $(shell $(MPICH_CXX) --version 2>&1 | grep nvcc | wc -l))
endif
KOKKOS_INTERNAL_COMPILER_CLANG := $(strip $(shell $(CXX) --version 2>&1 | grep clang | wc -l))
KOKKOS_INTERNAL_COMPILER_APPLE_CLANG := $(strip $(shell $(CXX) --version 2>&1 | grep "apple-darwin" | wc -l))
KOKKOS_INTERNAL_COMPILER_HCC := $(strip $(shell $(CXX) --version 2>&1 | grep HCC | wc -l))
KOKKOS_INTERNAL_COMPILER_NVCC := $(strip $(shell export OMPI_CXX=$(OMPI_CXX); export MPICH_CXX=$(MPICH_CXX); $(CXX) --version 2>&1 | grep nvcc | wc -l))
KOKKOS_INTERNAL_COMPILER_CLANG := $(call kokkos_has_string,$(KOKKOS_CXX_VERSION),clang)
KOKKOS_INTERNAL_COMPILER_APPLE_CLANG := $(call kokkos_has_string,$(KOKKOS_CXX_VERSION),apple-darwin)
KOKKOS_INTERNAL_COMPILER_HCC := $(call kokkos_has_string,$(KOKKOS_CXX_VERSION),HCC)
ifeq ($(KOKKOS_INTERNAL_COMPILER_CLANG), 2)
KOKKOS_INTERNAL_COMPILER_CLANG = 1
@ -209,47 +210,48 @@ endif
# Check for Kokkos Architecture settings.
# Intel based.
KOKKOS_INTERNAL_USE_ARCH_KNC := $(strip $(shell echo $(KOKKOS_ARCH) | grep KNC | wc -l))
KOKKOS_INTERNAL_USE_ARCH_WSM := $(strip $(shell echo $(KOKKOS_ARCH) | grep WSM | wc -l))
KOKKOS_INTERNAL_USE_ARCH_SNB := $(strip $(shell echo $(KOKKOS_ARCH) | grep SNB | wc -l))
KOKKOS_INTERNAL_USE_ARCH_HSW := $(strip $(shell echo $(KOKKOS_ARCH) | grep HSW | wc -l))
KOKKOS_INTERNAL_USE_ARCH_BDW := $(strip $(shell echo $(KOKKOS_ARCH) | grep BDW | wc -l))
KOKKOS_INTERNAL_USE_ARCH_SKX := $(strip $(shell echo $(KOKKOS_ARCH) | grep SKX | wc -l))
KOKKOS_INTERNAL_USE_ARCH_KNL := $(strip $(shell echo $(KOKKOS_ARCH) | grep KNL | wc -l))
KOKKOS_INTERNAL_USE_ARCH_KNC := $(call kokkos_has_string,$(KOKKOS_ARCH),KNC)
KOKKOS_INTERNAL_USE_ARCH_WSM := $(call kokkos_has_string,$(KOKKOS_ARCH),WSM)
KOKKOS_INTERNAL_USE_ARCH_SNB := $(call kokkos_has_string,$(KOKKOS_ARCH),SNB)
KOKKOS_INTERNAL_USE_ARCH_HSW := $(call kokkos_has_string,$(KOKKOS_ARCH),HSW)
KOKKOS_INTERNAL_USE_ARCH_BDW := $(call kokkos_has_string,$(KOKKOS_ARCH),BDW)
KOKKOS_INTERNAL_USE_ARCH_SKX := $(call kokkos_has_string,$(KOKKOS_ARCH),SKX)
KOKKOS_INTERNAL_USE_ARCH_KNL := $(call kokkos_has_string,$(KOKKOS_ARCH),KNL)
# NVIDIA based.
NVCC_WRAPPER := $(KOKKOS_PATH)/bin/nvcc_wrapper
KOKKOS_INTERNAL_USE_ARCH_KEPLER30 := $(strip $(shell echo $(KOKKOS_ARCH) | grep Kepler30 | wc -l))
KOKKOS_INTERNAL_USE_ARCH_KEPLER32 := $(strip $(shell echo $(KOKKOS_ARCH) | grep Kepler32 | wc -l))
KOKKOS_INTERNAL_USE_ARCH_KEPLER35 := $(strip $(shell echo $(KOKKOS_ARCH) | grep Kepler35 | wc -l))
KOKKOS_INTERNAL_USE_ARCH_KEPLER37 := $(strip $(shell echo $(KOKKOS_ARCH) | grep Kepler37 | wc -l))
KOKKOS_INTERNAL_USE_ARCH_MAXWELL50 := $(strip $(shell echo $(KOKKOS_ARCH) | grep Maxwell50 | wc -l))
KOKKOS_INTERNAL_USE_ARCH_MAXWELL52 := $(strip $(shell echo $(KOKKOS_ARCH) | grep Maxwell52 | wc -l))
KOKKOS_INTERNAL_USE_ARCH_MAXWELL53 := $(strip $(shell echo $(KOKKOS_ARCH) | grep Maxwell53 | wc -l))
KOKKOS_INTERNAL_USE_ARCH_PASCAL61 := $(strip $(shell echo $(KOKKOS_ARCH) | grep Pascal61 | wc -l))
KOKKOS_INTERNAL_USE_ARCH_PASCAL60 := $(strip $(shell echo $(KOKKOS_ARCH) | grep Pascal60 | wc -l))
KOKKOS_INTERNAL_USE_ARCH_NVIDIA := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_ARCH_KEPLER30) \
+ $(KOKKOS_INTERNAL_USE_ARCH_KEPLER32) \
+ $(KOKKOS_INTERNAL_USE_ARCH_KEPLER35) \
+ $(KOKKOS_INTERNAL_USE_ARCH_KEPLER37) \
+ $(KOKKOS_INTERNAL_USE_ARCH_PASCAL61) \
+ $(KOKKOS_INTERNAL_USE_ARCH_PASCAL60) \
+ $(KOKKOS_INTERNAL_USE_ARCH_MAXWELL50) \
+ $(KOKKOS_INTERNAL_USE_ARCH_MAXWELL52) \
+ $(KOKKOS_INTERNAL_USE_ARCH_MAXWELL53) | bc))
KOKKOS_INTERNAL_USE_ARCH_KEPLER30 := $(call kokkos_has_string,$(KOKKOS_ARCH),Kepler30)
KOKKOS_INTERNAL_USE_ARCH_KEPLER32 := $(call kokkos_has_string,$(KOKKOS_ARCH),Kepler32)
KOKKOS_INTERNAL_USE_ARCH_KEPLER35 := $(call kokkos_has_string,$(KOKKOS_ARCH),Kepler35)
KOKKOS_INTERNAL_USE_ARCH_KEPLER37 := $(call kokkos_has_string,$(KOKKOS_ARCH),Kepler37)
KOKKOS_INTERNAL_USE_ARCH_MAXWELL50 := $(call kokkos_has_string,$(KOKKOS_ARCH),Maxwell50)
KOKKOS_INTERNAL_USE_ARCH_MAXWELL52 := $(call kokkos_has_string,$(KOKKOS_ARCH),Maxwell52)
KOKKOS_INTERNAL_USE_ARCH_MAXWELL53 := $(call kokkos_has_string,$(KOKKOS_ARCH),Maxwell53)
KOKKOS_INTERNAL_USE_ARCH_PASCAL61 := $(call kokkos_has_string,$(KOKKOS_ARCH),Pascal61)
KOKKOS_INTERNAL_USE_ARCH_PASCAL60 := $(call kokkos_has_string,$(KOKKOS_ARCH),Pascal60)
KOKKOS_INTERNAL_USE_ARCH_NVIDIA := $(shell expr $(KOKKOS_INTERNAL_USE_ARCH_KEPLER30) \
+ $(KOKKOS_INTERNAL_USE_ARCH_KEPLER32) \
+ $(KOKKOS_INTERNAL_USE_ARCH_KEPLER35) \
+ $(KOKKOS_INTERNAL_USE_ARCH_KEPLER37) \
+ $(KOKKOS_INTERNAL_USE_ARCH_PASCAL61) \
+ $(KOKKOS_INTERNAL_USE_ARCH_PASCAL60) \
+ $(KOKKOS_INTERNAL_USE_ARCH_MAXWELL50) \
+ $(KOKKOS_INTERNAL_USE_ARCH_MAXWELL52) \
+ $(KOKKOS_INTERNAL_USE_ARCH_MAXWELL53))
#SEK: This seems like a bug to me
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_NVIDIA), 0)
KOKKOS_INTERNAL_USE_ARCH_MAXWELL50 := $(strip $(shell echo $(KOKKOS_ARCH) | grep Maxwell | wc -l))
KOKKOS_INTERNAL_USE_ARCH_KEPLER35 := $(strip $(shell echo $(KOKKOS_ARCH) | grep Kepler | wc -l))
KOKKOS_INTERNAL_USE_ARCH_NVIDIA := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_ARCH_KEPLER30) \
+ $(KOKKOS_INTERNAL_USE_ARCH_KEPLER32) \
+ $(KOKKOS_INTERNAL_USE_ARCH_KEPLER35) \
+ $(KOKKOS_INTERNAL_USE_ARCH_KEPLER37) \
+ $(KOKKOS_INTERNAL_USE_ARCH_PASCAL61) \
+ $(KOKKOS_INTERNAL_USE_ARCH_PASCAL60) \
+ $(KOKKOS_INTERNAL_USE_ARCH_MAXWELL50) \
+ $(KOKKOS_INTERNAL_USE_ARCH_MAXWELL52) \
+ $(KOKKOS_INTERNAL_USE_ARCH_MAXWELL53) | bc))
KOKKOS_INTERNAL_USE_ARCH_MAXWELL50 := $(call kokkos_has_string,$(KOKKOS_ARCH),Maxwell)
KOKKOS_INTERNAL_USE_ARCH_KEPLER35 := $(call kokkos_has_string,$(KOKKOS_ARCH),Kepler)
KOKKOS_INTERNAL_USE_ARCH_NVIDIA := $(shell expr $(KOKKOS_INTERNAL_USE_ARCH_KEPLER30) \
+ $(KOKKOS_INTERNAL_USE_ARCH_KEPLER32) \
+ $(KOKKOS_INTERNAL_USE_ARCH_KEPLER35) \
+ $(KOKKOS_INTERNAL_USE_ARCH_KEPLER37) \
+ $(KOKKOS_INTERNAL_USE_ARCH_PASCAL61) \
+ $(KOKKOS_INTERNAL_USE_ARCH_PASCAL60) \
+ $(KOKKOS_INTERNAL_USE_ARCH_MAXWELL50) \
+ $(KOKKOS_INTERNAL_USE_ARCH_MAXWELL52) \
+ $(KOKKOS_INTERNAL_USE_ARCH_MAXWELL53))
endif
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_NVIDIA), 1)
@ -262,43 +264,43 @@ ifeq ($(KOKKOS_INTERNAL_USE_ARCH_NVIDIA), 1)
endif
endif
# ARM based.
KOKKOS_INTERNAL_USE_ARCH_ARMV80 := $(strip $(shell echo $(KOKKOS_ARCH) | grep ARMv80 | wc -l))
KOKKOS_INTERNAL_USE_ARCH_ARMV81 := $(strip $(shell echo $(KOKKOS_ARCH) | grep ARMv81 | wc -l))
KOKKOS_INTERNAL_USE_ARCH_ARMV8_THUNDERX := $(strip $(shell echo $(KOKKOS_ARCH) | grep ARMv8-ThunderX | wc -l))
KOKKOS_INTERNAL_USE_ARCH_ARMV80 := $(call kokkos_has_string,$(KOKKOS_ARCH),ARMv80)
KOKKOS_INTERNAL_USE_ARCH_ARMV81 := $(call kokkos_has_string,$(KOKKOS_ARCH),ARMv81)
KOKKOS_INTERNAL_USE_ARCH_ARMV8_THUNDERX := $(call kokkos_has_string,$(KOKKOS_ARCH),ARMv8-ThunderX)
KOKKOS_INTERNAL_USE_ARCH_ARM := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_ARCH_ARMV80)+$(KOKKOS_INTERNAL_USE_ARCH_ARMV81)+$(KOKKOS_INTERNAL_USE_ARCH_ARMV8_THUNDERX) | bc))
# IBM based.
KOKKOS_INTERNAL_USE_ARCH_BGQ := $(strip $(shell echo $(KOKKOS_ARCH) | grep BGQ | wc -l))
KOKKOS_INTERNAL_USE_ARCH_POWER7 := $(strip $(shell echo $(KOKKOS_ARCH) | grep Power7 | wc -l))
KOKKOS_INTERNAL_USE_ARCH_POWER8 := $(strip $(shell echo $(KOKKOS_ARCH) | grep Power8 | wc -l))
KOKKOS_INTERNAL_USE_ARCH_POWER9 := $(strip $(shell echo $(KOKKOS_ARCH) | grep Power9 | wc -l))
KOKKOS_INTERNAL_USE_ARCH_BGQ := $(call kokkos_has_string,$(KOKKOS_ARCH),BGQ)
KOKKOS_INTERNAL_USE_ARCH_POWER7 := $(call kokkos_has_string,$(KOKKOS_ARCH),Power7)
KOKKOS_INTERNAL_USE_ARCH_POWER8 := $(call kokkos_has_string,$(KOKKOS_ARCH),Power8)
KOKKOS_INTERNAL_USE_ARCH_POWER9 := $(call kokkos_has_string,$(KOKKOS_ARCH),Power9)
KOKKOS_INTERNAL_USE_ARCH_IBM := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_ARCH_BGQ)+$(KOKKOS_INTERNAL_USE_ARCH_POWER7)+$(KOKKOS_INTERNAL_USE_ARCH_POWER8)+$(KOKKOS_INTERNAL_USE_ARCH_POWER9) | bc))
# AMD based.
KOKKOS_INTERNAL_USE_ARCH_AMDAVX := $(strip $(shell echo $(KOKKOS_ARCH) | grep AMDAVX | wc -l))
KOKKOS_INTERNAL_USE_ARCH_RYZEN := $(strip $(shell echo $(KOKKOS_ARCH) | grep Ryzen | wc -l))
KOKKOS_INTERNAL_USE_ARCH_EPYC := $(strip $(shell echo $(KOKKOS_ARCH) | grep Epyc | wc -l))
KOKKOS_INTERNAL_USE_ARCH_KAVERI := $(strip $(shell echo $(KOKKOS_ARCH) | grep Kaveri | wc -l))
KOKKOS_INTERNAL_USE_ARCH_CARRIZO := $(strip $(shell echo $(KOKKOS_ARCH) | grep Carrizo | wc -l))
KOKKOS_INTERNAL_USE_ARCH_FIJI := $(strip $(shell echo $(KOKKOS_ARCH) | grep Fiji | wc -l))
KOKKOS_INTERNAL_USE_ARCH_VEGA := $(strip $(shell echo $(KOKKOS_ARCH) | grep Vega | wc -l))
KOKKOS_INTERNAL_USE_ARCH_GFX901 := $(strip $(shell echo $(KOKKOS_ARCH) | grep gfx901 | wc -l))
KOKKOS_INTERNAL_USE_ARCH_AMDAVX := $(call kokkos_has_string,$(KOKKOS_ARCH),AMDAVX)
KOKKOS_INTERNAL_USE_ARCH_RYZEN := $(call kokkos_has_string,$(KOKKOS_ARCH),Ryzen)
KOKKOS_INTERNAL_USE_ARCH_EPYC := $(call kokkos_has_string,$(KOKKOS_ARCH),Epyc)
KOKKOS_INTERNAL_USE_ARCH_KAVERI := $(call kokkos_has_string,$(KOKKOS_ARCH),Kaveri)
KOKKOS_INTERNAL_USE_ARCH_CARRIZO := $(call kokkos_has_string,$(KOKKOS_ARCH),Carrizo)
KOKKOS_INTERNAL_USE_ARCH_FIJI := $(call kokkos_has_string,$(KOKKOS_ARCH),Fiji)
KOKKOS_INTERNAL_USE_ARCH_VEGA := $(call kokkos_has_string,$(KOKKOS_ARCH),Vega)
KOKKOS_INTERNAL_USE_ARCH_GFX901 := $(call kokkos_has_string,$(KOKKOS_ARCH),gfx901)
# Any AVX?
KOKKOS_INTERNAL_USE_ARCH_SSE42 := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_ARCH_WSM) | bc ))
KOKKOS_INTERNAL_USE_ARCH_AVX := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_ARCH_SNB)+$(KOKKOS_INTERNAL_USE_ARCH_AMDAVX) | bc ))
KOKKOS_INTERNAL_USE_ARCH_AVX2 := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_ARCH_HSW)+$(KOKKOS_INTERNAL_USE_ARCH_BDW) | bc ))
KOKKOS_INTERNAL_USE_ARCH_AVX512MIC := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_ARCH_KNL) | bc ))
KOKKOS_INTERNAL_USE_ARCH_AVX512XEON := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_ARCH_SKX) | bc ))
KOKKOS_INTERNAL_USE_ARCH_SSE42 := $(shell expr $(KOKKOS_INTERNAL_USE_ARCH_WSM))
KOKKOS_INTERNAL_USE_ARCH_AVX := $(shell expr $(KOKKOS_INTERNAL_USE_ARCH_SNB) + $(KOKKOS_INTERNAL_USE_ARCH_AMDAVX))
KOKKOS_INTERNAL_USE_ARCH_AVX2 := $(shell expr $(KOKKOS_INTERNAL_USE_ARCH_HSW) + $(KOKKOS_INTERNAL_USE_ARCH_BDW))
KOKKOS_INTERNAL_USE_ARCH_AVX512MIC := $(shell expr $(KOKKOS_INTERNAL_USE_ARCH_KNL))
KOKKOS_INTERNAL_USE_ARCH_AVX512XEON := $(shell expr $(KOKKOS_INTERNAL_USE_ARCH_SKX))
# Decide what ISA level we are able to support.
KOKKOS_INTERNAL_USE_ISA_X86_64 := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_ARCH_WSM)+$(KOKKOS_INTERNAL_USE_ARCH_SNB)+$(KOKKOS_INTERNAL_USE_ARCH_HSW)+$(KOKKOS_INTERNAL_USE_ARCH_BDW)+$(KOKKOS_INTERNAL_USE_ARCH_KNL)+$(KOKKOS_INTERNAL_USE_ARCH_SKX) | bc ))
KOKKOS_INTERNAL_USE_ISA_KNC := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_ARCH_KNC) | bc ))
KOKKOS_INTERNAL_USE_ISA_POWERPCLE := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_ARCH_POWER8)+$(KOKKOS_INTERNAL_USE_ARCH_POWER9) | bc ))
KOKKOS_INTERNAL_USE_ISA_POWERPCBE := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_ARCH_POWER7) | bc ))
KOKKOS_INTERNAL_USE_ISA_X86_64 := $(shell expr $(KOKKOS_INTERNAL_USE_ARCH_WSM) + $(KOKKOS_INTERNAL_USE_ARCH_SNB) + $(KOKKOS_INTERNAL_USE_ARCH_HSW) + $(KOKKOS_INTERNAL_USE_ARCH_BDW) + $(KOKKOS_INTERNAL_USE_ARCH_KNL) + $(KOKKOS_INTERNAL_USE_ARCH_SKX))
KOKKOS_INTERNAL_USE_ISA_KNC := $(shell expr $(KOKKOS_INTERNAL_USE_ARCH_KNC))
KOKKOS_INTERNAL_USE_ISA_POWERPCLE := $(shell expr $(KOKKOS_INTERNAL_USE_ARCH_POWER8) + $(KOKKOS_INTERNAL_USE_ARCH_POWER9))
KOKKOS_INTERNAL_USE_ISA_POWERPCBE := $(shell expr $(KOKKOS_INTERNAL_USE_ARCH_POWER7))
# Decide whether we can support transactional memory
KOKKOS_INTERNAL_USE_TM := $(strip $(shell echo $(KOKKOS_INTERNAL_USE_ARCH_BDW)+$(KOKKOS_INTERNAL_USE_ARCH_SKX) | bc ))
KOKKOS_INTERNAL_USE_TM := $(shell expr $(KOKKOS_INTERNAL_USE_ARCH_BDW) + $(KOKKOS_INTERNAL_USE_ARCH_SKX))
# Incompatible flags?
KOKKOS_INTERNAL_USE_ARCH_MULTIHOST := $(strip $(shell echo "$(KOKKOS_INTERNAL_USE_ARCH_SSE42)+$(KOKKOS_INTERNAL_USE_ARCH_AVX)+$(KOKKOS_INTERNAL_USE_ARCH_AVX2)+$(KOKKOS_INTERNAL_USE_ARCH_AVX512MIC)+$(KOKKOS_INTERNAL_USE_ARCH_AVX512XEON)+$(KOKKOS_INTERNAL_USE_ARCH_KNC)+$(KOKKOS_INTERNAL_USE_ARCH_IBM)+$(KOKKOS_INTERNAL_USE_ARCH_ARM)>1" | bc ))
@ -320,94 +322,100 @@ ifeq ($(KOKKOS_INTERNAL_ENABLE_COMPILER_WARNINGS), 1)
KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_COMPILER_WARNINGS)
endif
KOKKOS_LIBS = -lkokkos -ldl
KOKKOS_LIBS = -ldl
KOKKOS_LDFLAGS = -L$(shell pwd)
KOKKOS_SRC =
KOKKOS_HEADERS =
# Generating the KokkosCore_config.h file.
KOKKOS_INTERNAL_CONFIG_TMP=KokkosCore_config.tmp
KOKKOS_CONFIG_HEADER=KokkosCore_config.h
# Functions for generating config header file
kokkos_append_header = $(shell echo $1 >> $(KOKKOS_INTERNAL_CONFIG_TMP))
# Do not append first line
tmp := $(shell echo "/* ---------------------------------------------" > KokkosCore_config.tmp)
tmp := $(shell echo "Makefile constructed configuration:" >> KokkosCore_config.tmp)
tmp := $(shell date >> KokkosCore_config.tmp)
tmp := $(shell echo "----------------------------------------------*/" >> KokkosCore_config.tmp)
tmp := $(call kokkos_append_header,"Makefile constructed configuration:")
tmp := $(call kokkos_append_header,"$(shell date)")
tmp := $(call kokkos_append_header,"----------------------------------------------*/")
tmp := $(shell echo '\#if !defined(KOKKOS_MACROS_HPP) || defined(KOKKOS_CORE_CONFIG_H)' >> KokkosCore_config.tmp)
tmp := $(shell echo '\#error "Do not include KokkosCore_config.h directly; include Kokkos_Macros.hpp instead."' >> KokkosCore_config.tmp)
tmp := $(shell echo '\#else' >> KokkosCore_config.tmp)
tmp := $(shell echo '\#define KOKKOS_CORE_CONFIG_H' >> KokkosCore_config.tmp)
tmp := $(shell echo '\#endif' >> KokkosCore_config.tmp)
tmp := $(shell echo "/* Execution Spaces */" >> KokkosCore_config.tmp)
tmp := $(call kokkos_append_header,'\#if !defined(KOKKOS_MACROS_HPP) || defined(KOKKOS_CORE_CONFIG_H)')
tmp := $(call kokkos_append_header,'\#error "Do not include $(KOKKOS_CONFIG_HEADER) directly; include Kokkos_Macros.hpp instead."')
tmp := $(call kokkos_append_header,'\#else')
tmp := $(call kokkos_append_header,'\#define KOKKOS_CORE_CONFIG_H')
tmp := $(call kokkos_append_header,'\#endif')
tmp := $(call kokkos_append_header,"/* Execution Spaces */")
ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
tmp := $(shell echo "\#define KOKKOS_HAVE_CUDA 1" >> KokkosCore_config.tmp )
tmp := $(call kokkos_append_header,"\#define KOKKOS_HAVE_CUDA")
endif
ifeq ($(KOKKOS_INTERNAL_USE_ROCM), 1)
tmp := $(shell echo '\#define KOKKOS_ENABLE_ROCM 1' >> KokkosCore_config.tmp)
tmp := $(call kokkos_append_header,'\#define KOKKOS_ENABLE_ROCM')
endif
ifeq ($(KOKKOS_INTERNAL_USE_OPENMPTARGET), 1)
tmp := $(shell echo '\#define KOKKOS_ENABLE_OPENMPTARGET 1' >> KokkosCore_config.tmp)
tmp := $(call kokkos_append_header,'\#define KOKKOS_ENABLE_OPENMPTARGET')
endif
ifeq ($(KOKKOS_INTERNAL_USE_OPENMP), 1)
tmp := $(shell echo '\#define KOKKOS_HAVE_OPENMP 1' >> KokkosCore_config.tmp)
tmp := $(call kokkos_append_header,'\#define KOKKOS_HAVE_OPENMP')
endif
ifeq ($(KOKKOS_INTERNAL_USE_PTHREADS), 1)
tmp := $(shell echo "\#define KOKKOS_HAVE_PTHREAD 1" >> KokkosCore_config.tmp )
tmp := $(call kokkos_append_header,"\#define KOKKOS_HAVE_PTHREAD")
endif
ifeq ($(KOKKOS_INTERNAL_USE_QTHREADS), 1)
tmp := $(shell echo "\#define KOKKOS_HAVE_QTHREADS 1" >> KokkosCore_config.tmp )
tmp := $(call kokkos_append_header,"\#define KOKKOS_HAVE_QTHREADS")
endif
ifeq ($(KOKKOS_INTERNAL_USE_SERIAL), 1)
tmp := $(shell echo "\#define KOKKOS_HAVE_SERIAL 1" >> KokkosCore_config.tmp )
tmp := $(call kokkos_append_header,"\#define KOKKOS_HAVE_SERIAL")
endif
ifeq ($(KOKKOS_INTERNAL_USE_TM), 1)
tmp := $(shell echo "\#ifndef __CUDA_ARCH__" >> KokkosCore_config.tmp )
tmp := $(shell echo "\#define KOKKOS_ENABLE_TM" >> KokkosCore_config.tmp )
tmp := $(shell echo "\#endif" >> KokkosCore_config.tmp )
tmp := $(call kokkos_append_header,"\#ifndef __CUDA_ARCH__")
tmp := $(call kokkos_append_header,"\#define KOKKOS_ENABLE_TM")
tmp := $(call kokkos_append_header,"\#endif")
endif
ifeq ($(KOKKOS_INTERNAL_USE_ISA_X86_64), 1)
tmp := $(shell echo "\#ifndef __CUDA_ARCH__" >> KokkosCore_config.tmp )
tmp := $(shell echo "\#define KOKKOS_USE_ISA_X86_64" >> KokkosCore_config.tmp )
tmp := $(shell echo "\#endif" >> KokkosCore_config.tmp )
tmp := $(call kokkos_append_header,"\#ifndef __CUDA_ARCH__")
tmp := $(call kokkos_append_header,"\#define KOKKOS_USE_ISA_X86_64")
tmp := $(call kokkos_append_header,"\#endif")
endif
ifeq ($(KOKKOS_INTERNAL_USE_ISA_KNC), 1)
tmp := $(shell echo "\#ifndef __CUDA_ARCH__" >> KokkosCore_config.tmp )
tmp := $(shell echo "\#define KOKKOS_USE_ISA_KNC" >> KokkosCore_config.tmp )
tmp := $(shell echo "\#endif" >> KokkosCore_config.tmp )
tmp := $(call kokkos_append_header,"\#ifndef __CUDA_ARCH__")
tmp := $(call kokkos_append_header,"\#define KOKKOS_USE_ISA_KNC")
tmp := $(call kokkos_append_header,"\#endif")
endif
ifeq ($(KOKKOS_INTERNAL_USE_ISA_POWERPCLE), 1)
tmp := $(shell echo "\#ifndef __CUDA_ARCH__" >> KokkosCore_config.tmp )
tmp := $(shell echo "\#define KOKKOS_USE_ISA_POWERPCLE" >> KokkosCore_config.tmp )
tmp := $(shell echo "\#endif" >> KokkosCore_config.tmp )
tmp := $(call kokkos_append_header,"\#ifndef __CUDA_ARCH__")
tmp := $(call kokkos_append_header,"\#define KOKKOS_USE_ISA_POWERPCLE")
tmp := $(call kokkos_append_header,"\#endif")
endif
ifeq ($(KOKKOS_INTERNAL_USE_ISA_POWERPCBE), 1)
tmp := $(shell echo "\#ifndef __CUDA_ARCH__" >> KokkosCore_config.tmp )
tmp := $(shell echo "\#define KOKKOS_USE_ISA_POWERPCBE" >> KokkosCore_config.tmp )
tmp := $(shell echo "\#endif" >> KokkosCore_config.tmp )
tmp := $(call kokkos_append_header,"\#ifndef __CUDA_ARCH__")
tmp := $(call kokkos_append_header,"\#define KOKKOS_USE_ISA_POWERPCBE")
tmp := $(call kokkos_append_header,"\#endif")
endif
tmp := $(shell echo "/* General Settings */" >> KokkosCore_config.tmp)
tmp := $(call kokkos_append_header,"/* General Settings */")
ifeq ($(KOKKOS_INTERNAL_ENABLE_CXX11), 1)
KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_CXX11_FLAG)
tmp := $(shell echo "\#define KOKKOS_HAVE_CXX11 1" >> KokkosCore_config.tmp )
tmp := $(call kokkos_append_header,"\#define KOKKOS_HAVE_CXX11")
endif
ifeq ($(KOKKOS_INTERNAL_ENABLE_CXX1Z), 1)
KOKKOS_CXXFLAGS += $(KOKKOS_INTERNAL_CXX1Z_FLAG)
tmp := $(shell echo "\#define KOKKOS_HAVE_CXX11 1" >> KokkosCore_config.tmp )
tmp := $(shell echo "\#define KOKKOS_HAVE_CXX1Z 1" >> KokkosCore_config.tmp )
tmp := $(call kokkos_append_header,"\#define KOKKOS_HAVE_CXX11")
tmp := $(call kokkos_append_header,"\#define KOKKOS_HAVE_CXX1Z")
endif
ifeq ($(KOKKOS_INTERNAL_ENABLE_DEBUG), 1)
@ -417,26 +425,26 @@ ifeq ($(KOKKOS_INTERNAL_ENABLE_DEBUG), 1)
KOKKOS_CXXFLAGS += -g
KOKKOS_LDFLAGS += -g -ldl
tmp := $(shell echo "\#define KOKKOS_ENABLE_DEBUG_BOUNDS_CHECK 1" >> KokkosCore_config.tmp )
tmp := $(shell echo "\#define KOKKOS_HAVE_DEBUG 1" >> KokkosCore_config.tmp )
tmp := $(call kokkos_append_header,"\#define KOKKOS_ENABLE_DEBUG_BOUNDS_CHECK")
tmp := $(call kokkos_append_header,"\#define KOKKOS_HAVE_DEBUG")
ifeq ($(KOKKOS_INTERNAL_DISABLE_DUALVIEW_MODIFY_CHECK), 0)
tmp := $(shell echo "\#define KOKKOS_ENABLE_DEBUG_DUALVIEW_MODIFY_CHECK 1" >> KokkosCore_config.tmp )
tmp := $(call kokkos_append_header,"\#define KOKKOS_ENABLE_DEBUG_DUALVIEW_MODIFY_CHECK")
endif
endif
ifeq ($(KOKKOS_INTERNAL_ENABLE_PROFILING_LOAD_PRINT), 1)
tmp := $(shell echo "\#define KOKKOS_ENABLE_PROFILING_LOAD_PRINT 1" >> KokkosCore_config.tmp )
tmp := $(call kokkos_append_header,"\#define KOKKOS_ENABLE_PROFILING_LOAD_PRINT")
endif
ifeq ($(KOKKOS_INTERNAL_USE_HWLOC), 1)
KOKKOS_CPPFLAGS += -I$(HWLOC_PATH)/include
KOKKOS_LDFLAGS += -L$(HWLOC_PATH)/lib
KOKKOS_LIBS += -lhwloc
tmp := $(shell echo "\#define KOKKOS_HAVE_HWLOC 1" >> KokkosCore_config.tmp )
tmp := $(call kokkos_append_header,"\#define KOKKOS_HAVE_HWLOC")
endif
ifeq ($(KOKKOS_INTERNAL_USE_LIBRT), 1)
tmp := $(shell echo "\#define KOKKOS_USE_LIBRT 1" >> KokkosCore_config.tmp )
tmp := $(call kokkos_append_header,"\#define KOKKOS_USE_LIBRT")
KOKKOS_LIBS += -lrt
endif
@ -444,36 +452,36 @@ ifeq ($(KOKKOS_INTERNAL_USE_MEMKIND), 1)
KOKKOS_CPPFLAGS += -I$(MEMKIND_PATH)/include
KOKKOS_LDFLAGS += -L$(MEMKIND_PATH)/lib
KOKKOS_LIBS += -lmemkind -lnuma
tmp := $(shell echo "\#define KOKKOS_HAVE_HBWSPACE 1" >> KokkosCore_config.tmp )
tmp := $(call kokkos_append_header,"\#define KOKKOS_HAVE_HBWSPACE")
endif
ifeq ($(KOKKOS_INTERNAL_DISABLE_PROFILING), 0)
tmp := $(shell echo "\#define KOKKOS_ENABLE_PROFILING" >> KokkosCore_config.tmp )
tmp := $(call kokkos_append_header,"\#define KOKKOS_ENABLE_PROFILING")
endif
tmp := $(shell echo "/* Optimization Settings */" >> KokkosCore_config.tmp)
tmp := $(call kokkos_append_header,"/* Optimization Settings */")
ifeq ($(KOKKOS_INTERNAL_OPT_RANGE_AGGRESSIVE_VECTORIZATION), 1)
tmp := $(shell echo "\#define KOKKOS_OPT_RANGE_AGGRESSIVE_VECTORIZATION 1" >> KokkosCore_config.tmp )
tmp := $(call kokkos_append_header,"\#define KOKKOS_OPT_RANGE_AGGRESSIVE_VECTORIZATION")
endif
tmp := $(shell echo "/* Cuda Settings */" >> KokkosCore_config.tmp)
tmp := $(call kokkos_append_header,"/* Cuda Settings */")
ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
ifeq ($(KOKKOS_INTERNAL_CUDA_USE_LDG), 1)
tmp := $(shell echo "\#define KOKKOS_CUDA_USE_LDG_INTRINSIC 1" >> KokkosCore_config.tmp )
tmp := $(call kokkos_append_header,"\#define KOKKOS_CUDA_USE_LDG_INTRINSIC")
else
ifeq ($(KOKKOS_INTERNAL_COMPILER_CLANG), 1)
tmp := $(shell echo "\#define KOKKOS_CUDA_USE_LDG_INTRINSIC 1" >> KokkosCore_config.tmp )
tmp := $(call kokkos_append_header,"\#define KOKKOS_CUDA_USE_LDG_INTRINSIC")
endif
endif
ifeq ($(KOKKOS_INTERNAL_CUDA_USE_UVM), 1)
tmp := $(shell echo "\#define KOKKOS_CUDA_USE_UVM 1" >> KokkosCore_config.tmp )
tmp := $(call kokkos_append_header,"\#define KOKKOS_CUDA_USE_UVM")
endif
ifeq ($(KOKKOS_INTERNAL_CUDA_USE_RELOC), 1)
tmp := $(shell echo "\#define KOKKOS_CUDA_USE_RELOCATABLE_DEVICE_CODE 1" >> KokkosCore_config.tmp )
tmp := $(call kokkos_append_header,"\#define KOKKOS_CUDA_USE_RELOCATABLE_DEVICE_CODE")
KOKKOS_CXXFLAGS += --relocatable-device-code=true
KOKKOS_LDFLAGS += --relocatable-device-code=true
endif
@ -481,7 +489,7 @@ ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
ifeq ($(KOKKOS_INTERNAL_CUDA_USE_LAMBDA), 1)
ifeq ($(KOKKOS_INTERNAL_COMPILER_NVCC), 1)
ifeq ($(shell test $(KOKKOS_INTERNAL_COMPILER_NVCC_VERSION) -gt 70; echo $$?),0)
tmp := $(shell echo "\#define KOKKOS_CUDA_USE_LAMBDA 1" >> KokkosCore_config.tmp )
tmp := $(call kokkos_append_header,"\#define KOKKOS_CUDA_USE_LAMBDA")
KOKKOS_CXXFLAGS += -expt-extended-lambda
else
$(warning Warning: Cuda Lambda support was requested but NVCC version is too low. This requires NVCC for Cuda version 7.5 or higher. Disabling Lambda support now.)
@ -489,19 +497,19 @@ ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
endif
ifeq ($(KOKKOS_INTERNAL_COMPILER_CLANG), 1)
tmp := $(shell echo "\#define KOKKOS_CUDA_USE_LAMBDA 1" >> KokkosCore_config.tmp )
tmp := $(call kokkos_append_header,"\#define KOKKOS_CUDA_USE_LAMBDA")
endif
endif
ifeq ($(KOKKOS_INTERNAL_COMPILER_CLANG), 1)
tmp := $(shell echo "\#define KOKKOS_CUDA_CLANG_WORKAROUND" >> KokkosCore_config.tmp )
tmp := $(call kokkos_append_header,"\#define KOKKOS_CUDA_CLANG_WORKAROUND")
endif
endif
# Add Architecture flags.
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_ARMV80), 1)
tmp := $(shell echo "\#define KOKKOS_ARCH_ARMV80 1" >> KokkosCore_config.tmp )
tmp := $(call kokkos_append_header,"\#define KOKKOS_ARCH_ARMV80")
ifeq ($(KOKKOS_INTERNAL_COMPILER_CRAY), 1)
KOKKOS_CXXFLAGS +=
@ -518,7 +526,7 @@ ifeq ($(KOKKOS_INTERNAL_USE_ARCH_ARMV80), 1)
endif
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_ARMV81), 1)
tmp := $(shell echo "\#define KOKKOS_ARCH_ARMV81 1" >> KokkosCore_config.tmp )
tmp := $(call kokkos_append_header,"\#define KOKKOS_ARCH_ARMV81")
ifeq ($(KOKKOS_INTERNAL_COMPILER_CRAY), 1)
KOKKOS_CXXFLAGS +=
@ -535,8 +543,8 @@ ifeq ($(KOKKOS_INTERNAL_USE_ARCH_ARMV81), 1)
endif
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_ARMV8_THUNDERX), 1)
tmp := $(shell echo "\#define KOKKOS_ARCH_ARMV80 1" >> KokkosCore_config.tmp )
tmp := $(shell echo "\#define KOKKOS_ARCH_ARMV8_THUNDERX 1" >> KokkosCore_config.tmp )
tmp := $(call kokkos_append_header,"\#define KOKKOS_ARCH_ARMV80")
tmp := $(call kokkos_append_header,"\#define KOKKOS_ARCH_ARMV8_THUNDERX")
ifeq ($(KOKKOS_INTERNAL_COMPILER_CRAY), 1)
KOKKOS_CXXFLAGS +=
@ -553,7 +561,7 @@ ifeq ($(KOKKOS_INTERNAL_USE_ARCH_ARMV8_THUNDERX), 1)
endif
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_SSE42), 1)
tmp := $(shell echo "\#define KOKKOS_ARCH_SSE42 1" >> KokkosCore_config.tmp )
tmp := $(call kokkos_append_header,"\#define KOKKOS_ARCH_SSE42")
ifeq ($(KOKKOS_INTERNAL_COMPILER_INTEL), 1)
KOKKOS_CXXFLAGS += -xSSE4.2
@ -575,7 +583,7 @@ ifeq ($(KOKKOS_INTERNAL_USE_ARCH_SSE42), 1)
endif
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_AVX), 1)
tmp := $(shell echo "\#define KOKKOS_ARCH_AVX 1" >> KokkosCore_config.tmp )
tmp := $(call kokkos_append_header,"\#define KOKKOS_ARCH_AVX")
ifeq ($(KOKKOS_INTERNAL_COMPILER_INTEL), 1)
KOKKOS_CXXFLAGS += -mavx
@ -597,7 +605,7 @@ ifeq ($(KOKKOS_INTERNAL_USE_ARCH_AVX), 1)
endif
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_POWER7), 1)
tmp := $(shell echo "\#define KOKKOS_ARCH_POWER7 1" >> KokkosCore_config.tmp )
tmp := $(call kokkos_append_header,"\#define KOKKOS_ARCH_POWER7")
ifeq ($(KOKKOS_INTERNAL_COMPILER_PGI), 1)
@ -609,7 +617,7 @@ ifeq ($(KOKKOS_INTERNAL_USE_ARCH_POWER7), 1)
endif
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_POWER8), 1)
tmp := $(shell echo "\#define KOKKOS_ARCH_POWER8 1" >> KokkosCore_config.tmp )
tmp := $(call kokkos_append_header,"\#define KOKKOS_ARCH_POWER8")
ifeq ($(KOKKOS_INTERNAL_COMPILER_PGI), 1)
@ -630,7 +638,7 @@ ifeq ($(KOKKOS_INTERNAL_USE_ARCH_POWER8), 1)
endif
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_POWER9), 1)
tmp := $(shell echo "\#define KOKKOS_ARCH_POWER9 1" >> KokkosCore_config.tmp )
tmp := $(call kokkos_append_header,"\#define KOKKOS_ARCH_POWER9")
ifeq ($(KOKKOS_INTERNAL_COMPILER_PGI), 1)
@ -651,7 +659,7 @@ ifeq ($(KOKKOS_INTERNAL_USE_ARCH_POWER9), 1)
endif
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_HSW), 1)
tmp := $(shell echo "\#define KOKKOS_ARCH_AVX2 1" >> KokkosCore_config.tmp )
tmp := $(call kokkos_append_header,"\#define KOKKOS_ARCH_AVX2")
ifeq ($(KOKKOS_INTERNAL_COMPILER_INTEL), 1)
KOKKOS_CXXFLAGS += -xCORE-AVX2
@ -673,7 +681,7 @@ ifeq ($(KOKKOS_INTERNAL_USE_ARCH_HSW), 1)
endif
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_BDW), 1)
tmp := $(shell echo "\#define KOKKOS_ARCH_AVX2 1" >> KokkosCore_config.tmp )
tmp := $(call kokkos_append_header,"\#define KOKKOS_ARCH_AVX2")
ifeq ($(KOKKOS_INTERNAL_COMPILER_INTEL), 1)
KOKKOS_CXXFLAGS += -xCORE-AVX2
@ -695,7 +703,7 @@ ifeq ($(KOKKOS_INTERNAL_USE_ARCH_BDW), 1)
endif
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_AVX512MIC), 1)
tmp := $(shell echo "\#define KOKKOS_ARCH_AVX512MIC 1" >> KokkosCore_config.tmp )
tmp := $(call kokkos_append_header,"\#define KOKKOS_ARCH_AVX512MIC")
ifeq ($(KOKKOS_INTERNAL_COMPILER_INTEL), 1)
KOKKOS_CXXFLAGS += -xMIC-AVX512
@ -716,7 +724,7 @@ ifeq ($(KOKKOS_INTERNAL_USE_ARCH_AVX512MIC), 1)
endif
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_AVX512XEON), 1)
tmp := $(shell echo "\#define KOKKOS_ARCH_AVX512XEON 1" >> KokkosCore_config.tmp )
tmp := $(call kokkos_append_header,"\#define KOKKOS_ARCH_AVX512XEON")
ifeq ($(KOKKOS_INTERNAL_COMPILER_INTEL), 1)
KOKKOS_CXXFLAGS += -xCORE-AVX512
@ -737,7 +745,7 @@ ifeq ($(KOKKOS_INTERNAL_USE_ARCH_AVX512XEON), 1)
endif
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_KNC), 1)
tmp := $(shell echo "\#define KOKKOS_ARCH_KNC 1" >> KokkosCore_config.tmp )
tmp := $(call kokkos_append_header,"\#define KOKKOS_ARCH_KNC")
KOKKOS_CXXFLAGS += -mmic
KOKKOS_LDFLAGS += -mmic
endif
@ -753,48 +761,48 @@ ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
endif
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_KEPLER30), 1)
tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER 1" >> KokkosCore_config.tmp )
tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER30 1" >> KokkosCore_config.tmp )
tmp := $(call kokkos_append_header,"\#define KOKKOS_ARCH_KEPLER")
tmp := $(call kokkos_append_header,"\#define KOKKOS_ARCH_KEPLER30")
KOKKOS_INTERNAL_CUDA_ARCH_FLAG := $(KOKKOS_INTERNAL_CUDA_ARCH_FLAG)=sm_30
endif
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_KEPLER32), 1)
tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER 1" >> KokkosCore_config.tmp )
tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER32 1" >> KokkosCore_config.tmp )
tmp := $(call kokkos_append_header,"\#define KOKKOS_ARCH_KEPLER")
tmp := $(call kokkos_append_header,"\#define KOKKOS_ARCH_KEPLER32")
KOKKOS_INTERNAL_CUDA_ARCH_FLAG := $(KOKKOS_INTERNAL_CUDA_ARCH_FLAG)=sm_32
endif
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_KEPLER35), 1)
tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER 1" >> KokkosCore_config.tmp )
tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER35 1" >> KokkosCore_config.tmp )
tmp := $(call kokkos_append_header,"\#define KOKKOS_ARCH_KEPLER")
tmp := $(call kokkos_append_header,"\#define KOKKOS_ARCH_KEPLER35")
KOKKOS_INTERNAL_CUDA_ARCH_FLAG := $(KOKKOS_INTERNAL_CUDA_ARCH_FLAG)=sm_35
endif
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_KEPLER37), 1)
tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER 1" >> KokkosCore_config.tmp )
tmp := $(shell echo "\#define KOKKOS_ARCH_KEPLER37 1" >> KokkosCore_config.tmp )
tmp := $(call kokkos_append_header,"\#define KOKKOS_ARCH_KEPLER")
tmp := $(call kokkos_append_header,"\#define KOKKOS_ARCH_KEPLER37")
KOKKOS_INTERNAL_CUDA_ARCH_FLAG := $(KOKKOS_INTERNAL_CUDA_ARCH_FLAG)=sm_37
endif
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_MAXWELL50), 1)
tmp := $(shell echo "\#define KOKKOS_ARCH_MAXWELL 1" >> KokkosCore_config.tmp )
tmp := $(shell echo "\#define KOKKOS_ARCH_MAXWELL50 1" >> KokkosCore_config.tmp )
tmp := $(call kokkos_append_header,"\#define KOKKOS_ARCH_MAXWELL")
tmp := $(call kokkos_append_header,"\#define KOKKOS_ARCH_MAXWELL50")
KOKKOS_INTERNAL_CUDA_ARCH_FLAG := $(KOKKOS_INTERNAL_CUDA_ARCH_FLAG)=sm_50
endif
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_MAXWELL52), 1)
tmp := $(shell echo "\#define KOKKOS_ARCH_MAXWELL 1" >> KokkosCore_config.tmp )
tmp := $(shell echo "\#define KOKKOS_ARCH_MAXWELL52 1" >> KokkosCore_config.tmp )
tmp := $(call kokkos_append_header,"\#define KOKKOS_ARCH_MAXWELL")
tmp := $(call kokkos_append_header,"\#define KOKKOS_ARCH_MAXWELL52")
KOKKOS_INTERNAL_CUDA_ARCH_FLAG := $(KOKKOS_INTERNAL_CUDA_ARCH_FLAG)=sm_52
endif
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_MAXWELL53), 1)
tmp := $(shell echo "\#define KOKKOS_ARCH_MAXWELL 1" >> KokkosCore_config.tmp )
tmp := $(shell echo "\#define KOKKOS_ARCH_MAXWELL53 1" >> KokkosCore_config.tmp )
tmp := $(call kokkos_append_header,"\#define KOKKOS_ARCH_MAXWELL")
tmp := $(call kokkos_append_header,"\#define KOKKOS_ARCH_MAXWELL53")
KOKKOS_INTERNAL_CUDA_ARCH_FLAG := $(KOKKOS_INTERNAL_CUDA_ARCH_FLAG)=sm_53
endif
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_PASCAL60), 1)
tmp := $(shell echo "\#define KOKKOS_ARCH_PASCAL 1" >> KokkosCore_config.tmp )
tmp := $(shell echo "\#define KOKKOS_ARCH_PASCAL60 1" >> KokkosCore_config.tmp )
tmp := $(call kokkos_append_header,"\#define KOKKOS_ARCH_PASCAL")
tmp := $(call kokkos_append_header,"\#define KOKKOS_ARCH_PASCAL60")
KOKKOS_INTERNAL_CUDA_ARCH_FLAG := $(KOKKOS_INTERNAL_CUDA_ARCH_FLAG)=sm_60
endif
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_PASCAL61), 1)
tmp := $(shell echo "\#define KOKKOS_ARCH_PASCAL 1" >> KokkosCore_config.tmp )
tmp := $(shell echo "\#define KOKKOS_ARCH_PASCAL61 1" >> KokkosCore_config.tmp )
tmp := $(call kokkos_append_header,"\#define KOKKOS_ARCH_PASCAL")
tmp := $(call kokkos_append_header,"\#define KOKKOS_ARCH_PASCAL61")
KOKKOS_INTERNAL_CUDA_ARCH_FLAG := $(KOKKOS_INTERNAL_CUDA_ARCH_FLAG)=sm_61
endif
@ -811,28 +819,28 @@ endif
ifeq ($(KOKKOS_INTERNAL_USE_ROCM), 1)
# Lets start with adding architecture defines
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_KAVERI), 1)
tmp := $(shell echo "\#define KOKKOS_ARCH_ROCM 701" >> KokkosCore_config.tmp )
tmp := $(shell echo "\#define KOKKOS_ARCH_KAVERI 1" >> KokkosCore_config.tmp )
tmp := $(call kokkos_append_header,"\#define KOKKOS_ARCH_ROCM 701")
tmp := $(call kokkos_append_header,"\#define KOKKOS_ARCH_KAVERI")
KOKKOS_INTERNAL_ROCM_ARCH_FLAG := --amdgpu-target=gfx701
endif
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_CARRIZO), 1)
tmp := $(shell echo "\#define KOKKOS_ARCH_ROCM 801" >> KokkosCore_config.tmp )
tmp := $(shell echo "\#define KOKKOS_ARCH_CARRIZO 1" >> KokkosCore_config.tmp )
tmp := $(call kokkos_append_header,"\#define KOKKOS_ARCH_ROCM 801")
tmp := $(call kokkos_append_header,"\#define KOKKOS_ARCH_CARRIZO")
KOKKOS_INTERNAL_ROCM_ARCH_FLAG := --amdgpu-target=gfx801
endif
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_FIJI), 1)
tmp := $(shell echo "\#define KOKKOS_ARCH_ROCM 803" >> KokkosCore_config.tmp )
tmp := $(shell echo "\#define KOKKOS_ARCH_FIJI 1" >> KokkosCore_config.tmp )
tmp := $(call kokkos_append_header,"\#define KOKKOS_ARCH_ROCM 803")
tmp := $(call kokkos_append_header,"\#define KOKKOS_ARCH_FIJI")
KOKKOS_INTERNAL_ROCM_ARCH_FLAG := --amdgpu-target=gfx803
endif
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_VEGA), 1)
tmp := $(shell echo "\#define KOKKOS_ARCH_ROCM 900" >> KokkosCore_config.tmp )
tmp := $(shell echo "\#define KOKKOS_ARCH_VEGA 1" >> KokkosCore_config.tmp )
tmp := $(call kokkos_append_header,"\#define KOKKOS_ARCH_ROCM 900")
tmp := $(call kokkos_append_header,"\#define KOKKOS_ARCH_VEGA")
KOKKOS_INTERNAL_ROCM_ARCH_FLAG := --amdgpu-target=gfx900
endif
ifeq ($(KOKKOS_INTERNAL_USE_ARCH_GFX901), 1)
tmp := $(shell echo "\#define KOKKOS_ARCH_ROCM 901" >> KokkosCore_config.tmp )
tmp := $(shell echo "\#define KOKKOS_ARCH_GFX901 1" >> KokkosCore_config.tmp )
tmp := $(call kokkos_append_header,"\#define KOKKOS_ARCH_ROCM 901")
tmp := $(call kokkos_append_header,"\#define KOKKOS_ARCH_GFX901")
KOKKOS_INTERNAL_ROCM_ARCH_FLAG := --amdgpu-target=gfx901
endif
@ -952,6 +960,10 @@ ifeq ($(KOKKOS_INTERNAL_OS_CYGWIN), 1)
KOKKOS_CXXFLAGS += -U__STRICT_ANSI__
endif
# Set KokkosExtraLibs and add -lkokkos to link line
KOKKOS_EXTRA_LIBS := ${KOKKOS_LIBS}
KOKKOS_LIBS := -lkokkos ${KOKKOS_LIBS}
# Setting up dependencies.
KokkosCore_config.h:

View File

@ -22,8 +22,8 @@ Kokkos_HostThreadTeam.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/impl/Kokk
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/impl/Kokkos_HostThreadTeam.cpp
Kokkos_Spinwait.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/impl/Kokkos_Spinwait.cpp
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/impl/Kokkos_Spinwait.cpp
Kokkos_Rendezvous.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/impl/Kokkos_Rendezvous.cpp
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/impl/Kokkos_Rendezvous.cpp
Kokkos_HostBarrier.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/impl/Kokkos_HostBarrier.cpp
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/impl/Kokkos_HostBarrier.cpp
Kokkos_Profiling_Interface.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/impl/Kokkos_Profiling_Interface.cpp
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) -c $(KOKKOS_PATH)/core/src/impl/Kokkos_Profiling_Interface.cpp
Kokkos_SharedAlloc.o: $(KOKKOS_CPP_DEPENDS) $(KOKKOS_PATH)/core/src/impl/Kokkos_SharedAlloc.cpp

View File

@ -41,48 +41,44 @@ hcedwar(at)sandia.gov and crtrott(at)sandia.gov
============================================================================
Primary tested compilers on X86 are:
GCC 4.7.2
GCC 4.8.4
GCC 4.9.2
GCC 4.9.3
GCC 5.1.0
GCC 5.2.0
Intel 14.0.4
GCC 5.3.0
GCC 6.1.0
Intel 15.0.2
Intel 16.0.1
Intel 17.0.098
Intel 17.1.132
Intel 17.1.043
Intel 17.4.196
Intel 18.0.128
Clang 3.5.2
Clang 3.6.1
Clang 3.7.1
Clang 3.8.1
Clang 3.9.0
PGI 17.1
Clang 4.0.0
Clang 4.0.0 for CUDA (CUDA Toolkit 8.0.44)
PGI 17.10
NVCC 7.0 for CUDA (with gcc 4.8.4)
NVCC 7.5 for CUDA (with gcc 4.8.4)
NVCC 8.0.44 for CUDA (with gcc 5.3.0)
Primary tested compilers on Power 8 are:
GCC 5.4.0 (OpenMP,Serial)
IBM XL 13.1.3 (OpenMP, Serial) (There is a workaround in place to avoid a compiler bug)
IBM XL 13.1.5 (OpenMP, Serial) (There is a workaround in place to avoid a compiler bug)
NVCC 8.0.44 for CUDA (with gcc 5.4.0)
NVCC 9.0.103 for CUDA (with gcc 6.3.0)
Primary tested compilers on Intel KNL are:
GCC 6.2.0
Intel 16.2.181 (with gcc 4.7.2)
Intel 17.0.098 (with gcc 4.7.2)
Intel 17.1.132 (with gcc 4.9.3)
Intel 16.4.258 (with gcc 4.7.2)
Intel 17.2.174 (with gcc 4.9.3)
Intel 18.0.061 (beta) (with gcc 4.9.3)
Secondary tested compilers are:
CUDA 7.0 (with gcc 4.8.4)
CUDA 7.5 (with gcc 4.8.4)
CUDA 8.0 (with gcc 5.3.0 on X86 and gcc 5.4.0 on Power8)
CUDA/Clang 8.0 using Clang/Trunk compiler
Intel 18.0.128 (with gcc 4.9.3)
Other compilers working:
X86:
Cygwin 2.1.0 64bit with gcc 4.9.3
Limited testing of the following compilers on POWER7+ systems:
GCC 4.8.5 (on RHEL7.1 POWER7+)
Known non-working combinations:
Power8:
Pthreads backend
@ -96,8 +92,8 @@ GCC: -Wall -Wshadow -pedantic -Werror -Wsign-compare -Wtype-limits
-Wignored-qualifiers -Wempty-body -Wclobbered -Wuninitialized
Intel: -Wall -Wshadow -pedantic -Werror -Wsign-compare -Wtype-limits -Wuninitialized
Clang: -Wall -Wshadow -pedantic -Werror -Wsign-compare -Wtype-limits -Wuninitialized
NVCC: -Wall -Wshadow -pedantic -Werror -Wsign-compare -Wtype-limits -Wuninitialized
Secondary compilers are passing without -Werror.
Other compilers are tested occasionally, in particular when pushing from develop to
master branch, without -Werror and only for a select set of backends.

View File

@ -2,7 +2,9 @@
TRIBITS_SUBPACKAGE(Algorithms)
ADD_SUBDIRECTORY(src)
IF(KOKKOS_HAS_TRILINOS)
ADD_SUBDIRECTORY(src)
ENDIF()
TRIBITS_ADD_TEST_DIRECTORIES(unit_tests)
#TRIBITS_ADD_TEST_DIRECTORIES(performance_tests)

View File

@ -3,6 +3,32 @@ INCLUDE_DIRECTORIES(${CMAKE_CURRENT_BINARY_DIR})
INCLUDE_DIRECTORIES(REQUIRED_DURING_INSTALLATION_TESTING ${CMAKE_CURRENT_SOURCE_DIR})
INCLUDE_DIRECTORIES(${CMAKE_CURRENT_SOURCE_DIR}/../src )
IF(NOT KOKKOS_HAS_TRILINOS)
IF(KOKKOS_SEPARATE_LIBS)
set(TEST_LINK_TARGETS kokkoscore)
ELSE()
set(TEST_LINK_TARGETS kokkos)
ENDIF()
ENDIF()
SET(GTEST_SOURCE_DIR ${${PARENT_PACKAGE_NAME}_SOURCE_DIR}/tpls/gtest)
INCLUDE_DIRECTORIES(${GTEST_SOURCE_DIR})
# mfh 03 Nov 2017: The gtest library used here must have a different
# name than that of the gtest library built in KokkosCore. We can't
# just refer to the library in KokkosCore's tests, because it's
# possible to build only (e.g.,) KokkosAlgorithms tests, without
# building KokkosCore tests.
SET(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -DGTEST_HAS_PTHREAD=0")
TRIBITS_ADD_LIBRARY(
kokkosalgorithms_gtest
HEADERS ${GTEST_SOURCE_DIR}/gtest/gtest.h
SOURCES ${GTEST_SOURCE_DIR}/gtest/gtest-all.cc
TESTONLY
)
SET(SOURCES
UnitTestMain.cpp
TestCuda.cpp
@ -34,5 +60,5 @@ TRIBITS_ADD_EXECUTABLE_AND_TEST(
COMM serial mpi
NUM_MPI_PROCS 1
FAIL_REGULAR_EXPRESSION " FAILED "
TESTONLYLIBS kokkos_gtest
TESTONLYLIBS kokkosalgorithms_gtest ${TEST_LINK_TARGETS}
)

View File

@ -15,7 +15,8 @@ endif
CXXFLAGS = -O3
LINK ?= $(CXX)
LDFLAGS ?= -lpthread
LDFLAGS ?=
override LDFLAGS += -lpthread
include $(KOKKOS_PATH)/Makefile.kokkos

View File

@ -211,12 +211,15 @@ void test_dynamic_view_sort(unsigned int n )
const size_t upper_bound = 2 * n ;
const size_t total_alloc_size = n * sizeof(KeyType) * 1.2 ;
const size_t superblock_size = std::min(total_alloc_size, size_t(1000000));
typename KeyDynamicViewType::memory_pool
pool( memory_space()
, n * sizeof(KeyType) * 1.2
, 500 /* min block size in bytes */
, 30000 /* max block size in bytes */
, 1000000 /* min superblock size in bytes */
, superblock_size
);
KeyDynamicViewType keys("Keys",pool,upper_bound);
@ -271,8 +274,10 @@ void test_sort(unsigned int N)
{
test_1D_sort<ExecutionSpace,KeyType>(N*N*N, true);
test_1D_sort<ExecutionSpace,KeyType>(N*N*N, false);
#if !defined(KOKKOS_ENABLE_ROCM)
test_3D_sort<ExecutionSpace,KeyType>(N);
test_dynamic_view_sort<ExecutionSpace,KeyType>(N*N);
#endif
}
}

View File

@ -0,0 +1,44 @@
KOKKOS_PATH = ${HOME}/kokkos
KOKKOS_DEVICES = "OpenMP"
KOKKOS_ARCH = "SNB"
EXE_NAME = "test"
SRC = $(wildcard *.cpp)
default: build
echo "Start Build"
ifneq (,$(findstring Cuda,$(KOKKOS_DEVICES)))
CXX = ${KOKKOS_PATH}/config/nvcc_wrapper
EXE = ${EXE_NAME}.cuda
KOKKOS_CUDA_OPTIONS = "enable_lambda"
else
CXX = g++
EXE = ${EXE_NAME}.host
endif
CXXFLAGS = -O3
LINK = ${CXX}
LINKFLAGS = -O3
DEPFLAGS = -M
OBJ = $(SRC:.cpp=.o)
LIB =
include $(KOKKOS_PATH)/Makefile.kokkos
build: $(EXE)
$(EXE): $(OBJ) $(KOKKOS_LINK_DEPENDS)
$(LINK) $(KOKKOS_LDFLAGS) $(LINKFLAGS) $(EXTRA_PATH) $(OBJ) $(KOKKOS_LIBS) $(LIB) -o $(EXE)
clean: kokkos-clean
rm -f *.o *.cuda *.host
# Compilation rules
%.o:%.cpp $(KOKKOS_CPP_DEPENDS)
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $<

View File

@ -0,0 +1,124 @@
#include<Kokkos_Core.hpp>
#include<impl/Kokkos_Timer.hpp>
#include<Kokkos_Random.hpp>
template<class Scalar>
double test_atomic(int L, int N, int M,int K,int R,Kokkos::View<const int*> offsets) {
Kokkos::View<Scalar*> output("Output",N);
Kokkos::Impl::Timer timer;
for(int r = 0; r<R; r++)
Kokkos::parallel_for(L, KOKKOS_LAMBDA (const int&i) {
Scalar s = 2;
for(int m=0;m<M;m++) {
for(int k=0;k<K;k++)
s=s*s+s;
const int idx = (i+offsets(i,m))%N;
Kokkos::atomic_add(&output(idx),s);
}
});
Kokkos::fence();
double time = timer.seconds();
return time;
}
template<class Scalar>
double test_no_atomic(int L, int N, int M,int K,int R,Kokkos::View<const int*> offsets) {
Kokkos::View<Scalar*> output("Output",N);
Kokkos::Impl::Timer timer;
for(int r = 0; r<R; r++)
Kokkos::parallel_for(L, KOKKOS_LAMBDA (const int&i) {
Scalar s = 2;
for(int m=0;m<M;m++) {
for(int k=0;k<K;k++)
s=s*s+s;
const int idx = (i+offsets(i,m))%N;
output(idx) += s;
}
});
Kokkos::fence();
double time = timer.seconds();
return time;
}
int main(int argc, char* argv[]) {
Kokkos::initialize(argc,argv);
{
if(argc<8) {
printf("Arguments: L N M D K R T\n");
printf(" L: Number of iterations to run\n");
printf(" N: Length of array to do atomics into\n");
printf(" M: Number of atomics per iteration to do\n");
printf(" D: Distance from index i to do atomics into (randomly)\n");
printf(" K: Number of FMAD per atomic\n");
printf(" R: Number of repeats of the experiments\n");
printf(" T: Type of atomic\n");
printf(" 1 - int\n");
printf(" 2 - long\n");
printf(" 3 - float\n");
printf(" 4 - double\n");
printf(" 5 - complex<double>\n");
printf("Example Input GPU:\n");
printf(" Histogram : 1000000 1000 1 1000 1 10 1\n");
printf(" MD Force : 100000 100000 100 1000 20 10 4\n");
printf(" Matrix Assembly : 100000 1000000 50 1000 20 10 4\n");
Kokkos::finalize();
return 0;
}
int L = atoi(argv[1]);
int N = atoi(argv[2]);
int M = atoi(argv[3]);
int D = atoi(argv[4]);
int K = atoi(argv[5]);
int R = atoi(argv[6]);
int type = atoi(argv[7]);
Kokkos::View<int*> offsets("Offsets",L,M);
Kokkos::Random_XorShift64_Pool<> pool(12371);
Kokkos::fill_random(offsets,pool,D);
double time = 0;
if(type==1)
time = test_atomic<int>(L,N,M,K,R,offsets);
if(type==2)
time = test_atomic<long>(L,N,M,K,R,offsets);
if(type==3)
time = test_atomic<float>(L,N,M,K,R,offsets);
if(type==4)
time = test_atomic<double>(L,N,M,K,R,offsets);
if(type==5)
time = test_atomic<Kokkos::complex<double> >(L,N,M,K,R,offsets);
double time2 = 1;
if(type==1)
time2 = test_no_atomic<int>(L,N,M,K,R,offsets);
if(type==2)
time2 = test_no_atomic<long>(L,N,M,K,R,offsets);
if(type==3)
time2 = test_no_atomic<float>(L,N,M,K,R,offsets);
if(type==4)
time2 = test_no_atomic<double>(L,N,M,K,R,offsets);
if(type==5)
time2 = test_no_atomic<Kokkos::complex<double> >(L,N,M,K,R,offsets);
int size = 0;
if(type==1) size = sizeof(int);
if(type==2) size = sizeof(long);
if(type==3) size = sizeof(float);
if(type==4) size = sizeof(double);
if(type==5) size = sizeof(Kokkos::complex<double>);
printf("%i\n",size);
printf("Time: %s %i %i %i %i %i %i (t_atomic: %e t_nonatomic: %e ratio: %lf )( GUpdates/s: %lf GB/s: %lf )\n",
(type==1)?"int": (
(type==2)?"long": (
(type==3)?"float": (
(type==4)?"double":"complex"))),
L,N,M,D,K,R,time,time2,time/time2,
1.e-9*L*R*M/time, 1.0*L*R*M*2*size/time/1024/1024/1024);
}
Kokkos::finalize();
}

View File

@ -0,0 +1,84 @@
#!/bin/bash
# ---- Default Settings -----
# Paths
KOKKOS_PATH=${PWD}/kokkos
KOKKOS_KERNELS_PATH=${PWD}/kokkos-kernels
MINIMD_PATH=${PWD}/miniMD/kokkos
MINIFE_PATH=${PWD}/miniFE/kokkos
# Kokkos Configure Options
KOKKOS_DEVICES=OpenMP
KOKKOS_ARCH=SNB
# Compiler Options
CXX=mpicxx
OPT_FLAG="-O3"
while [[ $# > 0 ]]
do
key="$1"
case $key in
--kokkos-path*)
KOKKOS_PATH="${key#*=}"
;;
--kokkos-kernels-path*)
KOKKOS_KERNELS_PATH="${key#*=}"
;;
--minimd-path*)
MINIMD_PATH="${key#*=}"
;;
--minife-path*)
MINIFE_PATH="${key#*=}"
;;
--device-list*)
KOKKOS_DEVICES="${key#*=}"
;;
--arch*)
KOKKOS_ARCH="--arch=${key#*=}"
;;
--opt-flag*)
OPT_FLAG="${key#*=}"
;;
--compiler*)
CXX="${key#*=}"
;;
--with-cuda-options*)
KOKKOS_CUDA_OPTIONS="--with-cuda-options=${key#*=}"
;;
--help*)
PRINT_HELP=True
;;
*)
# args, just append
ARGS="$ARGS $1"
;;
esac
shift
done
mkdir build
# Build BytesAndFlops
mkdir build/bytes_and_flops
cd build/bytes_and_flops
make KOKKOS_ARCH=${KOKKOS_ARCH} KOKKOS_DEVICES=${KOKKOS_DEVICES} CXX=${CXX} KOKKOS_PATH=${KOKKOS_PATH}\
CXXFLAGS=${OPT_FLAG} -f ${KOKKOS_PATH}/benchmarks/bytes_and_flops/Makefile -j 16
cd ../..
mkdir build/miniMD
cd build/miniMD
make KOKKOS_ARCH=${KOKKOS_ARCH} KOKKOS_DEVICES=${KOKKOS_DEVICES} CXX=${CXX} KOKKOS_PATH=${KOKKOS_PATH} \
CXXFLAGS=${OPT_FLAG} -f ${MINIMD_PATH}/Makefile -j 16
cd ../../
mkdir build/miniFE
cd build/miniFE
make KOKKOS_ARCH=${KOKKOS_ARCH} KOKKOS_DEVICES=${KOKKOS_DEVICES} CXX=${CXX} KOKKOS_PATH=${KOKKOS_PATH} \
CXXFLAGS=${OPT_FLAG} -f ${MINIFE_PATH}/src/Makefile -j 16
cd ../../

View File

@ -0,0 +1,37 @@
#!/bin/bash
# Kokkos
if [ ! -d "kokkos" ]; then
git clone https://github.com/kokkos/kokkos
fi
cd kokkos
git checkout develop
git pull
cd ..
# KokkosKernels
if [ ! -d "kokkos-kernels" ]; then
git clone https://github.com/kokkos/kokkos-kernels
fi
cd kokkos-kernels
git pull
cd ..
# MiniMD
if [ ! -d "miniMD" ]; then
git clone https://github.com/mantevo/miniMD
fi
cd miniMD
git pull
cd ..
# MiniFE
if [ ! -d "miniFE" ]; then
git clone https://github.com/mantevo/miniFE
fi
cd miniFE
git pull
cd ..

View File

@ -0,0 +1,14 @@
#!/bin/bash
SCRIPT_PATH=$1
KOKKOS_DEVICES=$2
KOKKOS_ARCH=$3
COMPILER=$4
if [[ $# < 4 ]]; then
echo "Usage: ./run_benchmark.bash PATH_TO_SCRIPTS KOKKOS_DEVICES KOKKOS_ARCH COMPILER"
else
${SCRIPT_PATH}/checkout_repos.bash
${SCRIPT_PATH}/build_code.bash --arch=${KOKKOS_ARCH} --device-list=${KOKKOS_DEVICES} --compiler=${COMPILER}
${SCRIPT_PATH}/run_tests.bash
fi

View File

@ -0,0 +1,44 @@
#!/bin/bash
# BytesAndFlops
cd build/bytes_and_flops
USE_CUDA=`grep "_CUDA 1" KokkosCore_config.h | wc -l`
if [[ ${USE_CUDA} > 0 ]]; then
BAF_EXE=bytes_and_flops.cuda
TEAM_SIZE=256
else
BAF_EXE=bytes_and_flops.host
TEAM_SIZE=1
fi
BAF_PERF_1=`./${BAF_EXE} 2 100000 1024 1 1 1 1 ${TEAM_SIZE} 6000 | awk '{print $12/174.5}'`
BAF_PERF_2=`./${BAF_EXE} 2 100000 1024 16 1 8 64 ${TEAM_SIZE} 6000 | awk '{print $14/1142.65}'`
echo "BytesAndFlops: ${BAF_PERF_1} ${BAF_PERF_2}"
cd ../..
# MiniMD
cd build/miniMD
cp ../../miniMD/kokkos/Cu_u6.eam ./
MD_PERF_1=`./miniMD --half_neigh 0 -s 60 --ntypes 1 -t ${OMP_NUM_THREADS} -i ../../miniMD/kokkos/in.eam.miniMD | grep PERF_SUMMARY | awk '{print $10/21163341}'`
MD_PERF_2=`./miniMD --half_neigh 0 -s 20 --ntypes 1 -t ${OMP_NUM_THREADS} -i ../../miniMD/kokkos/in.eam.miniMD | grep PERF_SUMMARY | awk '{print $10/13393417}'`
echo "MiniMD: ${MD_PERF_1} ${MD_PERF_2}"
cd ../..
# MiniFE
cd build/miniFE
rm *.yaml
./miniFE.x -nx 100 &> /dev/null
FE_PERF_1=`grep "CG Mflop" *.yaml | awk '{print $4/14174}'`
rm *.yaml
./miniFE.x -nx 50 &> /dev/null
FE_PERF_2=`grep "CG Mflop" *.yaml | awk '{print $4/11897}'`
cd ../..
echo "MiniFE: ${FE_PERF_1} ${FE_PERF_2}"
PERF_RESULT=`echo "${BAF_PERF_1} ${BAF_PERF_2} ${MD_PERF_1} ${MD_PERF_2} ${FE_PERF_1} ${FE_PERF_2}" | awk '{print ($1+$2+$3+$4+$5+$6)/6}'`
echo "Total Result: " ${PERF_RESULT}

View File

@ -1,7 +1,18 @@
KOKKOS_PATH = ${HOME}/kokkos
SRC = $(wildcard *.cpp)
KOKKOS_DEVICES=Cuda
KOKKOS_CUDA_OPTIONS=enable_lambda
KOKKOS_ARCH = "SNB,Kepler35"
MAKEFILE_PATH := $(subst Makefile,,$(abspath $(lastword $(MAKEFILE_LIST))))
ifndef KOKKOS_PATH
KOKKOS_PATH = $(MAKEFILE_PATH)../..
endif
SRC = $(wildcard $(MAKEFILE_PATH)*.cpp)
HEADERS = $(wildcard $(MAKEFILE_PATH)*.hpp)
vpath %.cpp $(sort $(dir $(SRC)))
default: build
echo "Start Build"
@ -9,22 +20,19 @@ default: build
ifneq (,$(findstring Cuda,$(KOKKOS_DEVICES)))
CXX = ${KOKKOS_PATH}/bin/nvcc_wrapper
EXE = bytes_and_flops.cuda
KOKKOS_DEVICES = "Cuda,OpenMP"
KOKKOS_ARCH = "SNB,Kepler35"
else
CXX = g++
EXE = bytes_and_flops.host
KOKKOS_DEVICES = "OpenMP"
KOKKOS_ARCH = "SNB"
endif
CXXFLAGS = -O3 -g
CXXFLAGS ?= -O3 -g
override CXXFLAGS += -I$(MAKEFILE_PATH)
DEPFLAGS = -M
LINK = ${CXX}
LINKFLAGS =
OBJ = $(SRC:.cpp=.o)
OBJ = $(notdir $(SRC:.cpp=.o))
LIB =
include $(KOKKOS_PATH)/Makefile.kokkos
@ -39,5 +47,5 @@ clean: kokkos-clean
# Compilation rules
%.o:%.cpp $(KOKKOS_CPP_DEPENDS) bench.hpp bench_unroll_stride.hpp bench_stride.hpp
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $<
%.o:%.cpp $(KOKKOS_CPP_DEPENDS) $(HEADERS)
$(CXX) $(KOKKOS_CPPFLAGS) $(KOKKOS_CXXFLAGS) $(CXXFLAGS) $(EXTRA_INC) -c $< -o $(notdir $@)

View File

@ -69,11 +69,11 @@ void test_policy(int team_range, int thread_range, int vector_range,
int team_size, int vector_size, int test_type,
ViewType1 &v1, ViewType2 &v2, ViewType3 &v3,
double &result, double &result_expect, double &time) {
typedef Kokkos::TeamPolicy<ScheduleType,IndexType> t_policy;
typedef typename t_policy::member_type t_team;
Kokkos::Timer timer;
for(int orep = 0; orep<outer_repeat; orep++) {
if (test_type == 100) {
@ -95,7 +95,7 @@ void test_policy(int team_range, int thread_range, int vector_range,
v2( idx, t ) = t;
// prevent compiler optimizing loop away
});
}
}
});
}
if (test_type == 111) {
@ -178,12 +178,13 @@ void test_policy(int team_range, int thread_range, int vector_range,
for (int tr = 0; tr<thread_repeat; ++tr) {
Kokkos::parallel_reduce(Kokkos::TeamThreadRange(team,thread_range), [&] (const int t, double &lval) {
double vector_result = 0.0;
for (int vr = 0; vr<inner_repeat; ++vr)
for (int vr = 0; vr<inner_repeat; ++vr) {
vector_result = 0.0;
Kokkos::parallel_reduce(Kokkos::ThreadVectorRange(team,vector_range), [&] (const int vi, double &vval) {
vval += 1;
}, vector_result);
lval += vector_result;
}
}, team_result);
}
v1(idx) = team_result;
@ -191,7 +192,7 @@ void test_policy(int team_range, int thread_range, int vector_range,
});
}
if (test_type == 200) {
Kokkos::parallel_reduce("200 outer reduce", t_policy(team_range,team_size),
Kokkos::parallel_reduce("200 outer reduce", t_policy(team_range,team_size),
KOKKOS_LAMBDA (const t_team& team, double& lval) {
lval+=team.team_size()*team.league_rank() + team.team_rank();
},result);
@ -315,7 +316,7 @@ void test_policy(int team_range, int thread_range, int vector_range,
// parallel_for RangePolicy: range = team_size*team_range
if (test_type == 300) {
Kokkos::parallel_for("300 outer for", team_size*team_range,
Kokkos::parallel_for("300 outer for", team_size*team_range,
KOKKOS_LAMBDA (const int idx) {
v1(idx) = idx;
// prevent compiler from optimizing away the loop
@ -323,7 +324,7 @@ void test_policy(int team_range, int thread_range, int vector_range,
}
// parallel_reduce RangePolicy: range = team_size*team_range
if (test_type == 400) {
Kokkos::parallel_reduce("400 outer reduce", team_size*team_range,
Kokkos::parallel_reduce("400 outer reduce", team_size*team_range,
KOKKOS_LAMBDA (const int idx, double& val) {
val += idx;
}, result);
@ -331,7 +332,7 @@ void test_policy(int team_range, int thread_range, int vector_range,
}
// parallel_scan RangePolicy: range = team_size*team_range
if (test_type == 500) {
Kokkos::parallel_scan("500 outer scan", team_size*team_range,
Kokkos::parallel_scan("500 outer scan", team_size*team_range,
ParallelScanFunctor<ViewType1>(v1)
#if 0
// This does not compile with pre Cuda 8.0 - see Github Issue #913 for explanation

View File

@ -26,6 +26,7 @@ fi
# Get parent cpuset
HPCBIND_HWLOC_PARENT_CPUSET=""
if [[ ${HPCBIND_HAS_HWLOC} -eq 1 ]]; then
HPCBIND_HWLOC_VERSION="$(hwloc-ls --version | cut -d ' ' -f 2)"
MY_PID="$BASHPID"
HPCBIND_HWLOC_PARENT_CPUSET="$(hwloc-ps -a --cpuset | grep ${MY_PID} | cut -f 2)"
fi
@ -45,8 +46,11 @@ declare -i NUM_GPUS=0
HPCBIND_VISIBLE_GPUS=""
if [[ ${HPCBIND_HAS_NVIDIA} -eq 1 ]]; then
NUM_GPUS=$(nvidia-smi -L | wc -l);
GPU_LIST="$( seq 0 $((NUM_GPUS-1)) )"
HPCBIND_VISIBLE_GPUS=${CUDA_VISIBLE_DEVICES:-${GPU_LIST}}
HPCBIND_HAS_NVIDIA=$((!$?))
if [[ ${HPCBIND_HAS_NVIDIA} -eq 1 ]]; then
GPU_LIST="$( seq 0 $((NUM_GPUS-1)) )"
HPCBIND_VISIBLE_GPUS=${CUDA_VISIBLE_DEVICES:-${GPU_LIST}}
fi
fi
declare -i HPCBIND_ENABLE_GPU_MAPPING=$((NUM_GPUS > 0))
@ -57,33 +61,38 @@ declare -i HPCBIND_ENABLE_GPU_MAPPING=$((NUM_GPUS > 0))
# supports sbatch, bsub, aprun
################################################################################
HPCBIND_QUEUE_NAME=""
declare -i HPCBIND_QUEUE_INDEX=0
declare -i HPCBIND_QUEUE_RANK=0
declare -i HPCBIND_QUEUE_SIZE=0
declare -i HPCBIND_QUEUE_MAPPING=0
if [[ ! -z "${PMI_RANK}" ]]; then
HPCBIND_QUEUE_MAPPING=1
HPCBIND_QUEUE_NAME="mpich"
HPCBIND_QUEUE_INDEX=${PMI_RANK}
HPCBIND_QUEUE_RANK=${PMI_RANK}
HPCBIND_QUEUE_SIZE=${PMI_SIZE}
elif [[ ! -z "${OMPI_COMM_WORLD_RANK}" ]]; then
HPCBIND_QUEUE_MAPPING=1
HPCBIND_QUEUE_NAME="openmpi"
HPCBIND_QUEUE_INDEX=${OMPI_COMM_WORLD_RANK}
HPCBIND_QUEUE_RANK=${OMPI_COMM_WORLD_RANK}
HPCBIND_QUEUE_SIZE=${OMPI_COMM_WORLD_SIZE}
elif [[ ! -z "${MV2_COMM_WORLD_RANK}" ]]; then
HPCBIND_QUEUE_MAPPING=1
HPCBIND_QUEUE_NAME="mvapich2"
HPCBIND_QUEUE_INDEX=${MV2_COMM_WORLD_RANK}
HPCBIND_QUEUE_RANK=${MV2_COMM_WORLD_RANK}
HPCBIND_QUEUE_SIZE=${MV2_COMM_WORLD_SIZE}
elif [[ ! -z "${SLURM_LOCAL_ID}" ]]; then
HPCBIND_QUEUE_MAPPING=1
HPCBIND_QUEUE_NAME="slurm"
HPCBIND_QUEUE_INDEX=${SLURM_LOCAL_ID}
elif [[ ! -z "${LBS_JOBINDEX}" ]]; then
HPCBIND_QUEUE_MAPPING=1
HPCBIND_QUEUE_NAME="bsub"
HPCBIND_QUEUE_INDEX=${LBS_JOBINDEX}
HPCBIND_QUEUE_RANK=${SLURM_PROCID}
HPCBIND_QUEUE_SIZE=${SLURM_NPROCS}
elif [[ ! -z "${ALPS_APP_PE}" ]]; then
HPCBIND_QUEUE_MAPPING=1
HPCBIND_QUEUE_NAME="aprun"
HPCBIND_QUEUE_INDEX=${ALPS_APP_PE}
HPCBIND_QUEUE_RANK=${ALPS_APP_PE}
elif [[ ! -z "${LBS_JOBINDEX}" ]]; then
HPCBIND_QUEUE_MAPPING=1
HPCBIND_QUEUE_NAME="bsub"
HPCBIND_QUEUE_RANK=${LBS_JOBINDEX}
fi
################################################################################
@ -113,8 +122,8 @@ function show_help {
echo " --no-gpu-mapping Do not set CUDA_VISIBLE_DEVICES"
echo " --openmp=M.m Set env variables for the given OpenMP version"
echo " Default: 4.0"
echo " --openmp-percent=N Integer percentage of cpuset to use for OpenMP"
echo " threads Default: 100"
echo " --openmp-ratio=N/D Ratio of the cpuset to use for OpenMP"
echo " Default: 1"
echo " --openmp-places=<Op> Op=threads|cores|sockets. Default: threads"
echo " --no-openmp-proc-bind Set OMP_PROC_BIND to false and unset OMP_PLACES"
echo " --force-openmp-num-threads=N"
@ -123,8 +132,8 @@ function show_help {
echo " Override logic for selecting OMP_PROC_BIND"
echo " --no-openmp-nested Set OMP_NESTED to false"
echo " --output-prefix=<P> Save the output to files of the form"
echo " P-N.log, P-N.out and P-N.err where P is the prefix"
echo " and N is the queue index or mpi rank (no spaces)"
echo " P.hpcbind.N, P.stdout.N and P.stderr.N where P is "
echo " the prefix and N is the rank (no spaces)"
echo " --output-mode=<Op> How console output should be handled."
echo " Options are all, rank0, and none. Default: rank0"
echo " --lstopo Show bindings in lstopo"
@ -132,20 +141,27 @@ function show_help {
echo " -h|--help Show this message"
echo ""
echo "Sample Usage:"
echo ""
echo " Split the current process cpuset into 4 and use the 3rd partition"
echo " ${cmd} --distribute=4 --distribute-partition=2 -v -- command ..."
echo ""
echo " Launch 16 jobs over 4 nodes with 4 jobs per node using only the even pus"
echo " and save the output to rank specific files"
echo " mpiexec -N 16 -npernode 4 ${cmd} --whole-system --proc-bind=pu:even \\"
echo " --distribute=4 -v --output-prefix=output -- command ..."
echo ""
echo " Bind the process to all even cores"
echo " ${cmd} --proc-bind=core:even -v -- command ..."
echo ""
echo " Bind the the even cores of socket 0 and the odd cores of socket 1"
echo " ${cmd} --proc-bind='socket:0.core:even socket:1.core:odd' -v -- command ..."
echo ""
echo " Skip GPU 0 when mapping visible devices"
echo " ${cmd} --distribute=4 --distribute-partition=0 --visible-gpus=1,2 -v -- command ..."
echo ""
echo " Display the current bindings"
echo " ${cmd} --proc-bind=numa:0 -- command"
echo ""
echo " Display the current bindings using lstopo"
echo " ${cmd} --proc-bind=numa:0.core:odd --lstopo"
echo ""
@ -167,12 +183,13 @@ declare -i HPCBIND_DISTRIBUTE=1
declare -i HPCBIND_PARTITION=-1
HPCBIND_PROC_BIND="all"
HPCBIND_OPENMP_VERSION=4.0
declare -i HPCBIND_OPENMP_PERCENT=100
declare -i HPCBIND_OPENMP_RATIO_NUMERATOR=1
declare -i HPCBIND_OPENMP_RATIO_DENOMINATOR=1
HPCBIND_OPENMP_PLACES=${OMP_PLACES:-threads}
declare -i HPCBIND_OPENMP_PROC_BIND=1
declare -i HPCBIND_OPENMP_FORCE_NUM_THREADS=-1
HPCBIND_OPENMP_FORCE_NUM_THREADS=""
HPCBIND_OPENMP_FORCE_PROC_BIND=""
HPCBIND_OPENMP_NESTED=${OMP_NESTED:-true}
declare -i HPCBIND_OPENMP_NESTED=1
declare -i HPCBIND_VERBOSE=0
declare -i HPCBIND_LSTOPO=0
@ -199,6 +216,9 @@ for i in "$@"; do
;;
--distribute=*)
HPCBIND_DISTRIBUTE="${i#*=}"
if [[ ${HPCBIND_DISTRIBUTE} -le 0 ]]; then
HPCBIND_DISTRIBUTE=1
fi
shift
;;
# which partition to use
@ -222,8 +242,18 @@ for i in "$@"; do
HPCBIND_OPENMP_VERSION="${i#*=}"
shift
;;
--openmp-percent=*)
HPCBIND_OPENMP_PERCENT="${i#*=}"
--openmp-ratio=*)
IFS=/ read HPCBIND_OPENMP_RATIO_NUMERATOR HPCBIND_OPENMP_RATIO_DENOMINATOR <<< "${i#*=}"
if [[ ${HPCBIND_OPENMP_RATIO_NUMERATOR} -le 0 ]]; then
HPCBIND_OPENMP_RATIO_NUMERATOR=1
fi
if [[ ${HPCBIND_OPENMP_RATIO_DENOMINATOR} -le 0 ]]; then
HPCBIND_OPENMP_RATIO_DENOMINATOR=1
fi
if [[ ${HPCBIND_OPENMP_RATIO_NUMERATOR} -gt ${HPCBIND_OPENMP_RATIO_DENOMINATOR} ]]; then
HPCBIND_OPENMP_RATIO_NUMERATOR=1
HPCBIND_OPENMP_RATIO_DENOMINATOR=1
fi
shift
;;
--openmp-places=*)
@ -243,7 +273,7 @@ for i in "$@"; do
shift
;;
--no-openmp-nested)
HPCBIND_OPENMP_NESTED="false"
HPCBIND_OPENMP_NESTED=0
shift
;;
--output-prefix=*)
@ -292,7 +322,7 @@ if [[ "${HPCBIND_OUTPUT_MODE}" == "none" ]]; then
HPCBIND_TEE=0
elif [[ "${HPCBIND_OUTPUT_MODE}" == "all" ]]; then
HPCBIND_TEE=1
elif [[ ${HPCBIND_QUEUE_INDEX} -eq 0 ]]; then
elif [[ ${HPCBIND_QUEUE_RANK} -eq 0 ]]; then
#default to rank0 printing to screen
HPCBIND_TEE=1
fi
@ -303,9 +333,18 @@ if [[ "${HPCBIND_OUTPUT_PREFIX}" == "" ]]; then
HPCBIND_ERR=/dev/null
HPCBIND_OUT=/dev/null
else
HPCBIND_LOG="${HPCBIND_OUTPUT_PREFIX}-${HPCBIND_QUEUE_INDEX}.hpc.log"
HPCBIND_ERR="${HPCBIND_OUTPUT_PREFIX}-${HPCBIND_QUEUE_INDEX}.err"
HPCBIND_OUT="${HPCBIND_OUTPUT_PREFIX}-${HPCBIND_QUEUE_INDEX}.out"
if [[ ${HPCBIND_QUEUE_SIZE} -gt 0 ]]; then
HPCBIND_STR_QUEUE_SIZE="${HPCBIND_QUEUE_SIZE}"
HPCBIND_STR_QUEUE_RANK=$(printf %0*d ${#HPCBIND_STR_QUEUE_SIZE} ${HPCBIND_QUEUE_RANK})
HPCBIND_LOG="${HPCBIND_OUTPUT_PREFIX}.hpcbind.${HPCBIND_STR_QUEUE_RANK}"
HPCBIND_ERR="${HPCBIND_OUTPUT_PREFIX}.stderr.${HPCBIND_STR_QUEUE_RANK}"
HPCBIND_OUT="${HPCBIND_OUTPUT_PREFIX}.stdout.${HPCBIND_STR_QUEUE_RANK}"
else
HPCBIND_LOG="${HPCBIND_OUTPUT_PREFIX}.hpcbind.${HPCBIND_QUEUE_RANK}"
HPCBIND_ERR="${HPCBIND_OUTPUT_PREFIX}.stderr.${HPCBIND_QUEUE_RANK}"
HPCBIND_OUT="${HPCBIND_OUTPUT_PREFIX}.stdout.${HPCBIND_QUEUE_RANK}"
fi
> ${HPCBIND_LOG}
fi
@ -333,27 +372,12 @@ if [[ ${HPCBIND_ENABLE_GPU_MAPPING} -eq 1 ]]; then
NUM_GPUS=${#HPCBIND_VISIBLE_GPUS[@]}
fi
################################################################################
# Check OpenMP percent
################################################################################
if [[ ${HPCBIND_OPENMP_PERCENT} -lt 1 ]]; then
HPCBIND_OPENMP_PERCENT=1
elif [[ ${HPCBIND_OPENMP_PERCENT} -gt 100 ]]; then
HPCBIND_OPENMP_PERCENT=100
fi
################################################################################
# Check distribute
################################################################################
if [[ ${HPCBIND_DISTRIBUTE} -le 0 ]]; then
HPCBIND_DISTRIBUTE=1
fi
################################################################################
#choose the correct partition
################################################################################
if [[ ${HPCBIND_PARTITION} -lt 0 && ${HPCBIND_QUEUE_MAPPING} -eq 1 ]]; then
HPCBIND_PARTITION=${HPCBIND_QUEUE_INDEX}
HPCBIND_PARTITION=${HPCBIND_QUEUE_RANK}
elif [[ ${HPCBIND_PARTITION} -lt 0 ]]; then
HPCBIND_PARTITION=0
fi
@ -381,23 +405,40 @@ if [[ ${HPCBIND_ENABLE_HWLOC_BIND} -eq 1 ]]; then
else
HPCBIND_HWLOC_CPUSET="${BINDING}"
fi
HPCBIND_NUM_PUS=$(hwloc-ls --restrict ${HPCBIND_HWLOC_CPUSET} --only pu | wc -l)
HPCBIND_NUM_PUS=$(hwloc-calc -q -N pu ${HPCBIND_HWLOC_CPUSET} )
if [ $? -ne 0 ]; then
HPCBIND_NUM_PUS=1
fi
HPCBIND_NUM_CORES=$(hwloc-calc -q -N core ${HPCBIND_HWLOC_CPUSET} )
if [ $? -ne 0 ]; then
HPCBIND_NUM_CORES=1
fi
HPCBIND_NUM_NUMAS=$(hwloc-calc -q -N numa ${HPCBIND_HWLOC_CPUSET} )
if [ $? -ne 0 ]; then
HPCBIND_NUM_NUMAS=1
fi
HPCBIND_NUM_SOCKETS=$(hwloc-calc -q -N socket ${HPCBIND_HWLOC_CPUSET} )
if [ $? -ne 0 ]; then
HPCBIND_NUM_SOCKETS=1
fi
else
HPCBIND_NUM_PUS=$(cat /proc/cpuinfo | grep -c processor)
HPCBIND_NUM_CORES=${HPCBIND_NUM_PUS}
HPCBIND_NUM_NUMAS=1
HPCBIND_NUM_SOCKETS=1
fi
declare -i HPCBIND_OPENMP_NUM_THREADS=$((HPCBIND_NUM_PUS * HPCBIND_OPENMP_PERCENT))
HPCBIND_OPENMP_NUM_THREADS=$((HPCBIND_OPENMP_NUM_THREADS / 100))
if [[ ${HPCBIND_OPENMP_NUM_THREADS} -lt 1 ]]; then
HPCBIND_OPENMP_NUM_THREADS=1
elif [[ ${HPCBIND_OPENMP_NUM_THREADS} -gt ${HPCBIND_NUM_PUS} ]]; then
HPCBIND_OPENMP_NUM_THREADS=${HPCBIND_NUM_PUS}
fi
if [[ ${HPCBIND_OPENMP_FORCE_NUM_THREADS} -gt 0 ]]; then
if [[ ${HPCBIND_OPENMP_FORCE_NUM_THREADS} != "" ]]; then
HPCBIND_OPENMP_NUM_THREADS=${HPCBIND_OPENMP_FORCE_NUM_THREADS}
else
declare -i HPCBIND_OPENMP_NUM_THREADS=$((HPCBIND_NUM_PUS * HPCBIND_OPENMP_RATIO_NUMERATOR / HPCBIND_OPENMP_RATIO_DENOMINATOR))
if [[ ${HPCBIND_OPENMP_NUM_THREADS} -lt 1 ]]; then
HPCBIND_OPENMP_NUM_THREADS=1
elif [[ ${HPCBIND_OPENMP_NUM_THREADS} -gt ${HPCBIND_NUM_PUS} ]]; then
HPCBIND_OPENMP_NUM_THREADS=${HPCBIND_NUM_PUS}
fi
fi
################################################################################
@ -405,7 +446,11 @@ fi
################################################################################
# set OMP_NUM_THREADS
export OMP_NUM_THREADS=${HPCBIND_OPENMP_NUM_THREADS}
if [[ ${HPCBIND_OPENMP_NESTED} -eq 1 ]]; then
export OMP_NUM_THREADS="${HPCBIND_OPENMP_NUM_THREADS},1"
else
export OMP_NUM_THREADS=${HPCBIND_OPENMP_NUM_THREADS}
fi
# set OMP_PROC_BIND and OMP_PLACES
if [[ ${HPCBIND_OPENMP_PROC_BIND} -eq 1 ]]; then
@ -413,7 +458,11 @@ if [[ ${HPCBIND_OPENMP_PROC_BIND} -eq 1 ]]; then
#default proc bind logic
if [[ "${HPCBIND_OPENMP_VERSION}" == "4.0" || "${HPCBIND_OPENMP_VERSION}" > "4.0" ]]; then
export OMP_PLACES="${HPCBIND_OPENMP_PLACES}"
export OMP_PROC_BIND="spread"
if [[ ${HPCBIND_OPENMP_NESTED} -eq 1 ]]; then
export OMP_PROC_BIND="spread,spread"
else
export OMP_PROC_BIND="spread"
fi
else
export OMP_PROC_BIND="true"
unset OMP_PLACES
@ -429,9 +478,17 @@ else
unset OMP_PROC_BIND
fi
# set OMP_NESTED
export OMP_NESTED=${HPCBIND_OPENMP_NESTED}
# set up hot teams (intel specific)
if [[ ${HPCBIND_OPENMP_NESTED} -eq 1 ]]; then
export OMP_NESTED="true"
export OMP_MAX_ACTIVE_LEVELS=2
export KMP_HOT_TEAMS=1
export KMP_HOT_TEAMS_MAX_LEVEL=2
else
export OMP_NESTED="false"
fi
# set OMP_NESTED
################################################################################
# Set CUDA environment variables
@ -442,7 +499,7 @@ if [[ ${HPCBIND_ENABLE_GPU_MAPPING} -eq 1 ]]; then
declare -i GPU_ID=$((HPCBIND_PARTITION % NUM_GPUS))
export CUDA_VISIBLE_DEVICES="${HPCBIND_VISIBLE_GPUS[${GPU_ID}]}"
else
declare -i MY_TASK_ID=$((HPCBIND_QUEUE_INDEX * HPCBIND_DISTRIBUTE + HPCBIND_PARTITION))
declare -i MY_TASK_ID=$((HPCBIND_QUEUE_RANK * HPCBIND_DISTRIBUTE + HPCBIND_PARTITION))
declare -i GPU_ID=$((MY_TASK_ID % NUM_GPUS))
export CUDA_VISIBLE_DEVICES="${HPCBIND_VISIBLE_GPUS[${GPU_ID}]}"
fi
@ -451,12 +508,17 @@ fi
################################################################################
# Set hpcbind environment variables
################################################################################
export HPCBIND_HWLOC_VERSION=${HPCBIND_HWLOC_VERSION}
export HPCBIND_HAS_HWLOC=${HPCBIND_HAS_HWLOC}
export HPCBIND_HAS_NVIDIA=${HPCBIND_HAS_NVIDIA}
export HPCBIND_NUM_PUS=${HPCBIND_NUM_PUS}
export HPCBIND_NUM_CORES=${HPCBIND_NUM_CORES}
export HPCBIND_NUM_NUMAS=${HPCBIND_NUM_NUMAS}
export HPCBIND_NUM_SOCKETS=${HPCBIND_NUM_SOCKETS}
export HPCBIND_HWLOC_CPUSET="${HPCBIND_HWLOC_CPUSET}"
export HPCBIND_HWLOC_DISTRIBUTE=${HPCBIND_DISTRIBUTE}
export HPCBIND_HWLOC_DISTRIBUTE_PARTITION=${HPCBIND_PARTITION}
export HPCBIND_OPENMP_RATIO="${HPCBIND_OPENMP_RATIO_NUMERATOR}/${HPCBIND_OPENMP_RATIO_DENOMINATOR}"
if [[ "${HPCBIND_HWLOC_PARENT_CPUSET}" == "" ]]; then
export HPCBIND_HWLOC_PARENT_CPUSET="all"
else
@ -467,7 +529,8 @@ export HPCBIND_NVIDIA_ENABLE_GPU_MAPPING=${HPCBIND_ENABLE_GPU_MAPPING}
export HPCBIND_NVIDIA_VISIBLE_GPUS=$(echo "${HPCBIND_VISIBLE_GPUS[*]}" | tr ' ' ',')
export HPCBIND_OPENMP_VERSION="${HPCBIND_OPENMP_VERSION}"
if [[ "${HPCBIND_QUEUE_NAME}" != "" ]]; then
export HPCBIND_QUEUE_INDEX=${HPCBIND_QUEUE_INDEX}
export HPCBIND_QUEUE_RANK=${HPCBIND_QUEUE_RANK}
export HPCBIND_QUEUE_SIZE=${HPCBIND_QUEUE_SIZE}
export HPCBIND_QUEUE_NAME="${HPCBIND_QUEUE_NAME}"
export HPCBIND_QUEUE_MAPPING=${HPCBIND_QUEUE_MAPPING}
fi
@ -487,10 +550,16 @@ if [[ ${HPCBIND_TEE} -eq 0 || ${HPCBIND_VERBOSE} -eq 0 ]]; then
echo "${TMP_ENV}" | grep -E "^CUDA_" >> ${HPCBIND_LOG}
echo "[OPENMP]" >> ${HPCBIND_LOG}
echo "${TMP_ENV}" | grep -E "^OMP_" >> ${HPCBIND_LOG}
echo "[GOMP] (gcc, g++, and gfortran)" >> ${HPCBIND_LOG}
echo "${TMP_ENV}" | grep -E "^GOMP_" >> ${HPCBIND_LOG}
echo "[KMP] (icc, icpc, and ifort)" >> ${HPCBIND_LOG}
echo "${TMP_ENV}" | grep -E "^KMP_" >> ${HPCBIND_LOG}
echo "[XLSMPOPTS] (xlc, xlc++, and xlf)" >> ${HPCBIND_LOG}
echo "${TMP_ENV}" | grep -E "^XLSMPOPTS" >> ${HPCBIND_LOG}
if [[ ${HPCBIND_HAS_HWLOC} -eq 1 ]]; then
echo "[BINDINGS]" >> ${HPCBIND_LOG}
hwloc-ls --restrict "${HPCBIND_HWLOC_CPUSET}" --only pu >> ${HPCBIND_LOG}
hwloc-ls --restrict "${HPCBIND_HWLOC_CPUSET}" >> ${HPCBIND_LOG}
else
echo "Unable to show bindings, hwloc not available." >> ${HPCBIND_LOG}
fi
@ -503,10 +572,16 @@ else
echo "${TMP_ENV}" | grep -E "^CUDA_" > >(tee -a ${HPCBIND_LOG})
echo "[OPENMP]" > >(tee -a ${HPCBIND_LOG})
echo "${TMP_ENV}" | grep -E "^OMP_" > >(tee -a ${HPCBIND_LOG})
echo "[GOMP] (gcc, g++, and gfortran)" > >(tee -a ${HPCBIND_LOG})
echo "${TMP_ENV}" | grep -E "^GOMP_" > >(tee -a ${HPCBIND_LOG})
echo "[KMP] (icc, icpc, and ifort)" > >(tee -a ${HPCBIND_LOG})
echo "${TMP_ENV}" | grep -E "^KMP_" > >(tee -a ${HPCBIND_LOG})
echo "[XLSMPOPTS] (xlc, xlc++, and xlf)" > >(tee -a ${HPCBIND_LOG})
echo "${TMP_ENV}" | grep -E "^XLSMPOPTS" > >(tee -a ${HPCBIND_LOG})
if [[ ${HPCBIND_HAS_HWLOC} -eq 1 ]]; then
echo "[BINDINGS]" > >(tee -a ${HPCBIND_LOG})
hwloc-ls --restrict "${HPCBIND_HWLOC_CPUSET}" --only pu > >(tee -a ${HPCBIND_LOG})
hwloc-ls --restrict "${HPCBIND_HWLOC_CPUSET}" --no-io --no-bridges > >(tee -a ${HPCBIND_LOG})
else
echo "Unable to show bindings, hwloc not available." > >(tee -a ${HPCBIND_LOG})
fi

View File

@ -39,6 +39,12 @@ cuda_args=""
# Arguments for both NVCC and Host compiler
shared_args=""
# Argument -c
compile_arg=""
# Argument -o <obj>
output_arg=""
# Linker arguments
xlinker_args=""
@ -66,6 +72,7 @@ dry_run=0
# Skip NVCC compilation and use host compiler directly
host_only=0
host_only_args=""
# Enable workaround for CUDA 6.5 for pragma ident
replace_pragma_ident=0
@ -81,6 +88,11 @@ optimization_applied=0
# Check if we have -std=c++X or --std=c++X already
stdcxx_applied=0
# Run nvcc a second time to generate dependencies if needed
depfile_separate=0
depfile_output_arg=""
depfile_target_arg=""
#echo "Arguments: $# $@"
while [ $# -gt 0 ]
@ -112,12 +124,31 @@ do
fi
;;
#Handle shared args (valid for both nvcc and the host compiler)
-D*|-c|-I*|-L*|-l*|-g|--help|--version|-E|-M|-shared)
-D*|-I*|-L*|-l*|-g|--help|--version|-E|-M|-shared)
shared_args="$shared_args $1"
;;
#Handle shared args that have an argument
-o|-MT)
shared_args="$shared_args $1 $2"
#Handle compilation argument
-c)
compile_arg="$1"
;;
#Handle output argument
-o)
output_arg="$output_arg $1 $2"
shift
;;
# Handle depfile arguments. We map them to a separate call to nvcc.
-MD|-MMD)
depfile_separate=1
host_only_args="$host_only_args $1"
;;
-MF)
depfile_output_arg="-o $2"
host_only_args="$host_only_args $1 $2"
shift
;;
-MT)
depfile_target_arg="$1 $2"
host_only_args="$host_only_args $1 $2"
shift
;;
#Handle known nvcc args
@ -242,7 +273,7 @@ if [ $first_xcompiler_arg -eq 0 ]; then
fi
#Compose host only command
host_command="$host_compiler $shared_args $xcompiler_args $host_linker_args $shared_versioned_libraries_host"
host_command="$host_compiler $shared_args $host_only_args $compile_arg $output_arg $xcompiler_args $host_linker_args $shared_versioned_libraries_host"
#nvcc does not accept '#pragma ident SOME_MACRO_STRING' but it does accept '#ident SOME_MACRO_STRING'
if [ $replace_pragma_ident -eq 1 ]; then
@ -274,10 +305,21 @@ else
host_command="$host_command $object_files"
fi
if [ $depfile_separate -eq 1 ]; then
# run nvcc a second time to generate dependencies (without compiling)
nvcc_depfile_command="$nvcc_command -M $depfile_target_arg $depfile_output_arg"
else
nvcc_depfile_command=""
fi
nvcc_command="$nvcc_command $compile_arg $output_arg"
#Print command for dryrun
if [ $dry_run -eq 1 ]; then
if [ $host_only -eq 1 ]; then
echo $host_command
elif [ -n "$nvcc_depfile_command" ]; then
echo $nvcc_command "&&" $nvcc_depfile_command
else
echo $nvcc_command
fi
@ -287,6 +329,8 @@ fi
#Run compilation command
if [ $host_only -eq 1 ]; then
$host_command
elif [ -n "$nvcc_depfile_command" ]; then
$nvcc_command && $nvcc_depfile_command
else
$nvcc_command
fi

View File

@ -0,0 +1,8 @@
ifndef KOKKOS_PATH
MAKEFILE_PATH := $(abspath $(lastword $(MAKEFILE_LIST)))
KOKKOS_PATH = $(subst Makefile,,$(MAKEFILE_PATH))..
endif
include $(KOKKOS_PATH)/Makefile.kokkos
include $(KOKKOS_PATH)/core/src/Makefile.generate_header_lists
include $(KOKKOS_PATH)/core/src/Makefile.generate_build_files

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,219 @@
# kokkos_generated_settings.cmake includes the kokkos library itself in KOKKOS_LIBS
# which we do not want to use for the cmake builds so clean this up
string(REGEX REPLACE "-lkokkos" "" KOKKOS_LIBS ${KOKKOS_LIBS})
############################ Detect if submodule ###############################
#
# With thanks to StackOverflow:
# http://stackoverflow.com/questions/25199677/how-to-detect-if-current-scope-has-a-parent-in-cmake
#
get_directory_property(HAS_PARENT PARENT_DIRECTORY)
if(HAS_PARENT)
message(STATUS "Submodule build")
SET(KOKKOS_HEADER_DIR "include/kokkos")
else()
message(STATUS "Standalone build")
SET(KOKKOS_HEADER_DIR "include")
endif()
################################ Handle the actual build #######################
SET(INSTALL_LIB_DIR lib CACHE PATH "Installation directory for libraries")
SET(INSTALL_BIN_DIR bin CACHE PATH "Installation directory for executables")
SET(INSTALL_INCLUDE_DIR ${KOKKOS_HEADER_DIR} CACHE PATH
"Installation directory for header files")
IF(WIN32 AND NOT CYGWIN)
SET(DEF_INSTALL_CMAKE_DIR CMake)
ELSE()
SET(DEF_INSTALL_CMAKE_DIR lib/CMake/Kokkos)
ENDIF()
SET(INSTALL_CMAKE_DIR ${DEF_INSTALL_CMAKE_DIR} CACHE PATH
"Installation directory for CMake files")
# Make relative paths absolute (needed later on)
FOREACH(p LIB BIN INCLUDE CMAKE)
SET(var INSTALL_${p}_DIR)
IF(NOT IS_ABSOLUTE "${${var}}")
SET(${var} "${CMAKE_INSTALL_PREFIX}/${${var}}")
ENDIF()
ENDFOREACH()
# set up include-directories
SET (Kokkos_INCLUDE_DIRS
${Kokkos_SOURCE_DIR}/core/src
${Kokkos_SOURCE_DIR}/containers/src
${Kokkos_SOURCE_DIR}/algorithms/src
${Kokkos_BINARY_DIR} # to find KokkosCore_config.h
${KOKKOS_INCLUDE_DIRS}
)
# pass include dirs back to parent scope
if(HAS_PARENT)
SET(Kokkos_INCLUDE_DIRS_RET ${Kokkos_INCLUDE_DIRS} PARENT_SCOPE)
else()
SET(Kokkos_INCLUDE_DIRS_RET ${Kokkos_INCLUDE_DIRS})
endif()
INCLUDE_DIRECTORIES(${Kokkos_INCLUDE_DIRS})
IF(KOKKOS_SEPARATE_LIBS)
# Sources come from makefile-generated kokkos_generated_settings.cmake file
# Separate libs need to separate the sources
set_kokkos_srcs(KOKKOS_SRC ${KOKKOS_SRC})
# kokkoscore
ADD_LIBRARY(
kokkoscore
${KOKKOS_CORE_SRCS}
)
target_compile_options(
kokkoscore
PUBLIC $<$<COMPILE_LANGUAGE:CXX>:${KOKKOS_CXX_FLAGS}>
)
# Install the kokkoscore library
INSTALL (TARGETS kokkoscore
EXPORT KokkosTargets
ARCHIVE DESTINATION ${CMAKE_INSTALL_PREFIX}/lib
LIBRARY DESTINATION ${CMAKE_INSTALL_PREFIX}/lib
RUNTIME DESTINATION ${CMAKE_INSTALL_PREFIX}/bin
)
TARGET_LINK_LIBRARIES(
kokkoscore
${KOKKOS_LD_FLAGS}
${KOKKOS_EXTRA_LIBS_LIST}
)
# kokkoscontainers
if (DEFINED KOKKOS_CONTAINERS_SRCS)
ADD_LIBRARY(
kokkoscontainers
${KOKKOS_CONTAINERS_SRCS}
)
endif()
TARGET_LINK_LIBRARIES(
kokkoscontainers
kokkoscore
)
# Install the kokkocontainers library
INSTALL (TARGETS kokkoscontainers
EXPORT KokkosTargets
ARCHIVE DESTINATION ${CMAKE_INSTALL_PREFIX}/lib
LIBRARY DESTINATION ${CMAKE_INSTALL_PREFIX}/lib
RUNTIME DESTINATION ${CMAKE_INSTALL_PREFIX}/bin)
# kokkosalgorithms - Build as interface library since no source files.
ADD_LIBRARY(
kokkosalgorithms
INTERFACE
)
target_include_directories(
kokkosalgorithms
INTERFACE ${Kokkos_SOURCE_DIR}/algorithms/src
)
TARGET_LINK_LIBRARIES(
kokkosalgorithms
INTERFACE kokkoscore
)
# Install the kokkoalgorithms library
INSTALL (TARGETS kokkosalgorithms
ARCHIVE DESTINATION ${CMAKE_INSTALL_PREFIX}/lib
LIBRARY DESTINATION ${CMAKE_INSTALL_PREFIX}/lib
RUNTIME DESTINATION ${CMAKE_INSTALL_PREFIX}/bin)
SET (Kokkos_LIBRARIES_NAMES kokkoscore kokkoscontainers kokkosalgorithms)
ELSE()
# kokkos
ADD_LIBRARY(
kokkos
${KOKKOS_CORE_SRCS}
${KOKKOS_CONTAINERS_SRCS}
)
target_compile_options(
kokkos
PUBLIC $<$<COMPILE_LANGUAGE:CXX>:${KOKKOS_CXX_FLAGS}>
)
TARGET_LINK_LIBRARIES(
kokkos
${KOKKOS_LD_FLAGS}
${KOKKOS_EXTRA_LIBS_LIST}
)
# Install the kokkos library
INSTALL (TARGETS kokkos
EXPORT KokkosTargets
ARCHIVE DESTINATION ${CMAKE_INSTALL_PREFIX}/lib
LIBRARY DESTINATION ${CMAKE_INSTALL_PREFIX}/lib
RUNTIME DESTINATION ${CMAKE_INSTALL_PREFIX}/bin)
SET (Kokkos_LIBRARIES_NAMES kokkos)
endif() # KOKKOS_SEPARATE_LIBS
# Install the kokkos headers
INSTALL (DIRECTORY
EXPORT KokkosTargets
${Kokkos_SOURCE_DIR}/core/src/
DESTINATION ${KOKKOS_HEADER_DIR}
FILES_MATCHING PATTERN "*.hpp"
)
INSTALL (DIRECTORY
EXPORT KokkosTargets
${Kokkos_SOURCE_DIR}/containers/src/
DESTINATION ${KOKKOS_HEADER_DIR}
FILES_MATCHING PATTERN "*.hpp"
)
INSTALL (DIRECTORY
EXPORT KokkosTargets
${Kokkos_SOURCE_DIR}/algorithms/src/
DESTINATION ${KOKKOS_HEADER_DIR}
FILES_MATCHING PATTERN "*.hpp"
)
INSTALL (FILES
${Kokkos_BINARY_DIR}/KokkosCore_config.h
DESTINATION ${KOKKOS_HEADER_DIR}
)
# Add all targets to the build-tree export set
export(TARGETS ${Kokkos_LIBRARIES_NAMES}
FILE "${Kokkos_BINARY_DIR}/KokkosTargets.cmake")
# Export the package for use from the build-tree
# (this registers the build-tree with a global CMake-registry)
export(PACKAGE Kokkos)
# Create the KokkosConfig.cmake and KokkosConfigVersion files
file(RELATIVE_PATH REL_INCLUDE_DIR "${INSTALL_CMAKE_DIR}"
"${INSTALL_INCLUDE_DIR}")
# ... for the build tree
set(CONF_INCLUDE_DIRS "${Kokkos_SOURCE_DIR}" "${Kokkos_BINARY_DIR}")
configure_file(${Kokkos_SOURCE_DIR}/cmake/KokkosConfig.cmake.in
"${Kokkos_BINARY_DIR}/KokkosConfig.cmake" @ONLY)
# ... for the install tree
set(CONF_INCLUDE_DIRS "\${Kokkos_CMAKE_DIR}/${REL_INCLUDE_DIR}")
configure_file(${Kokkos_SOURCE_DIR}/cmake/KokkosConfig.cmake.in
"${Kokkos_BINARY_DIR}${CMAKE_FILES_DIRECTORY}/KokkosConfig.cmake" @ONLY)
# Install the KokkosConfig.cmake and KokkosConfigVersion.cmake
install(FILES
"${Kokkos_BINARY_DIR}${CMAKE_FILES_DIRECTORY}/KokkosConfig.cmake"
DESTINATION "${INSTALL_CMAKE_DIR}")
#This seems not to do anything?
#message(STATUS "KokkosTargets: " ${KokkosTargets})
# Install the export set for use with the install-tree
INSTALL(EXPORT KokkosTargets DESTINATION
"${INSTALL_CMAKE_DIR}")

View File

@ -0,0 +1,345 @@
################################### FUNCTIONS ##################################
# List of functions
# set_kokkos_cxx_compiler
# set_kokkos_cxx_standard
# set_kokkos_srcs
#-------------------------------------------------------------------------------
# function(set_kokkos_cxx_compiler)
# Sets the following compiler variables that are analogous to the CMAKE_*
# versions. We add the ability to detect NVCC (really nvcc_wrapper).
# KOKKOS_CXX_COMPILER
# KOKKOS_CXX_COMPILER_ID
# KOKKOS_CXX_COMPILER_VERSION
#
# Inputs:
# KOKKOS_ENABLE_CUDA
# CMAKE_CXX_COMPILER
# CMAKE_CXX_COMPILER_ID
# CMAKE_CXX_COMPILER_VERSION
#
# Also verifies the compiler version meets the minimum required by Kokkos.
function(set_kokkos_cxx_compiler)
# Since CMake doesn't recognize the nvcc compiler until 3.8, we use our own
# version of the CMake variables and detect nvcc ourselves. Initially set to
# the CMake variable values.
set(INTERNAL_CXX_COMPILER ${CMAKE_CXX_COMPILER})
set(INTERNAL_CXX_COMPILER_ID ${CMAKE_CXX_COMPILER_ID})
set(INTERNAL_CXX_COMPILER_VERSION ${CMAKE_CXX_COMPILER_VERSION})
# Check if the compiler is nvcc (which really means nvcc_wrapper).
execute_process(COMMAND ${INTERNAL_CXX_COMPILER} --version
COMMAND grep nvcc
COMMAND wc -l
OUTPUT_VARIABLE INTERNAL_HAVE_COMPILER_NVCC
OUTPUT_STRIP_TRAILING_WHITESPACE)
string(REGEX REPLACE "^ +" ""
INTERNAL_HAVE_COMPILER_NVCC ${INTERNAL_HAVE_COMPILER_NVCC})
if(INTERNAL_HAVE_COMPILER_NVCC)
# Set the compiler id to nvcc. We use the value used by CMake 3.8.
set(INTERNAL_CXX_COMPILER_ID NVIDIA)
# Set nvcc's compiler version.
execute_process(COMMAND ${INTERNAL_CXX_COMPILER} --version
COMMAND grep release
OUTPUT_VARIABLE INTERNAL_CXX_COMPILER_VERSION
OUTPUT_STRIP_TRAILING_WHITESPACE)
string(REGEX MATCH "[0-9]+\.[0-9]+\.[0-9]+$"
INTERNAL_CXX_COMPILER_VERSION ${INTERNAL_CXX_COMPILER_VERSION})
endif()
# Enforce the minimum compilers supported by Kokkos.
set(KOKKOS_MESSAGE_TEXT "Compiler not supported by Kokkos. Required compiler versions:")
set(KOKKOS_MESSAGE_TEXT "${KOKKOS_MESSAGE_TEXT}\n Clang 3.5.2 or higher")
set(KOKKOS_MESSAGE_TEXT "${KOKKOS_MESSAGE_TEXT}\n GCC 4.8.4 or higher")
set(KOKKOS_MESSAGE_TEXT "${KOKKOS_MESSAGE_TEXT}\n Intel 15.0.2 or higher")
set(KOKKOS_MESSAGE_TEXT "${KOKKOS_MESSAGE_TEXT}\n NVCC 7.0.28 or higher")
set(KOKKOS_MESSAGE_TEXT "${KOKKOS_MESSAGE_TEXT}\n PGI 17.1 or higher\n")
if(INTERNAL_CXX_COMPILER_ID STREQUAL Clang)
if(INTERNAL_CXX_COMPILER_VERSION VERSION_LESS 3.5.2)
message(FATAL_ERROR "${KOKKOS_MESSAGE_TEXT}")
endif()
elseif(INTERNAL_CXX_COMPILER_ID STREQUAL GNU)
if(INTERNAL_CXX_COMPILER_VERSION VERSION_LESS 4.8.4)
message(FATAL_ERROR "${KOKKOS_MESSAGE_TEXT}")
endif()
elseif(INTERNAL_CXX_COMPILER_ID STREQUAL Intel)
if(INTERNAL_CXX_COMPILER_VERSION VERSION_LESS 15.0.2)
message(FATAL_ERROR "${KOKKOS_MESSAGE_TEXT}")
endif()
elseif(INTERNAL_CXX_COMPILER_ID STREQUAL NVIDIA)
if(INTERNAL_CXX_COMPILER_VERSION VERSION_LESS 7.0.28)
message(FATAL_ERROR "${KOKKOS_MESSAGE_TEXT}")
endif()
elseif(INTERNAL_CXX_COMPILER_ID STREQUAL PGI)
if(INTERNAL_CXX_COMPILER_VERSION VERSION_LESS 17.1)
message(FATAL_ERROR "${KOKKOS_MESSAGE_TEXT}")
endif()
endif()
# Enforce that extensions are turned off for nvcc_wrapper.
if(INTERNAL_CXX_COMPILER_ID STREQUAL NVIDIA)
if(DEFINED CMAKE_CXX_EXTENSIONS AND CMAKE_CXX_EXTENSIONS STREQUAL ON)
message(FATAL_ERROR "NVCC doesn't support C++ extensions. Set CMAKE_CXX_EXTENSIONS to OFF in your CMakeLists.txt.")
endif()
endif()
if(KOKKOS_ENABLE_CUDA)
# Enforce that the compiler can compile CUDA code.
if(INTERNAL_CXX_COMPILER_ID STREQUAL Clang)
if(INTERNAL_CXX_COMPILER_VERSION VERSION_LESS 4.0.0)
message(FATAL_ERROR "Compiling CUDA code directly with Clang requires version 4.0.0 or higher.")
endif()
elseif(NOT INTERNAL_CXX_COMPILER_ID STREQUAL NVIDIA)
message(FATAL_ERROR "Invalid compiler for CUDA. The compiler must be nvcc_wrapper or Clang.")
endif()
endif()
set(KOKKOS_CXX_COMPILER ${INTERNAL_CXX_COMPILER} PARENT_SCOPE)
set(KOKKOS_CXX_COMPILER_ID ${INTERNAL_CXX_COMPILER_ID} PARENT_SCOPE)
set(KOKKOS_CXX_COMPILER_VERSION ${INTERNAL_CXX_COMPILER_VERSION} PARENT_SCOPE)
endfunction()
#-------------------------------------------------------------------------------
# function(set_kokkos_cxx_standard)
# Transitively enforces that the appropriate CXX standard compile flags (C++11
# or above) are added to targets that use the Kokkos library. Compile features
# are used if possible. Otherwise, the appropriate flags are added to
# KOKKOS_CXX_FLAGS. Values set by the user to CMAKE_CXX_STANDARD and
# CMAKE_CXX_EXTENSIONS are honored.
#
# Outputs:
# KOKKOS_CXX11_FEATURES
# KOKKOS_CXX_FLAGS
#
# Inputs:
# KOKKOS_CXX_COMPILER
# KOKKOS_CXX_COMPILER_ID
# KOKKOS_CXX_COMPILER_VERSION
#
function(set_kokkos_cxx_standard)
# The following table lists the versions of CMake that supports CXX_STANDARD
# and the CXX compile features for different compilers. The versions are
# based on CMake documentation, looking at CMake code, and verifying by
# testing with specific CMake versions.
#
# COMPILER CXX_STANDARD Compile Features
# ---------------------------------------------------------------
# Clang 3.1 3.1
# GNU 3.1 3.2
# AppleClang 3.2 3.2
# Intel 3.6 3.6
# Cray No No
# PGI No No
# XL No No
#
# For compiling CUDA code using nvcc_wrapper, we will use the host compiler's
# flags for turning on C++11. Since for compiler ID and versioning purposes
# CMake recognizes the host compiler when calling nvcc_wrapper, this just
# works. Both NVCC and nvcc_wrapper only recognize '-std=c++11' which means
# that we can only use host compilers for CUDA builds that use those flags.
# It also means that extensions (gnu++11) can't be turned on for CUDA builds.
# Check if we can use compile features.
if(NOT KOKKOS_CXX_COMPILER_ID STREQUAL NVIDIA)
if(CMAKE_CXX_COMPILER_ID STREQUAL Clang)
if(NOT CMAKE_VERSION VERSION_LESS 3.1)
set(INTERNAL_USE_COMPILE_FEATURES ON)
endif()
elseif(CMAKE_CXX_COMPILER_ID STREQUAL AppleClang OR CMAKE_CXX_COMPILER_ID STREQUAL GNU)
if(NOT CMAKE_VERSION VERSION_LESS 3.2)
set(INTERNAL_USE_COMPILE_FEATURES ON)
endif()
elseif(CMAKE_CXX_COMPILER_ID STREQUAL Intel)
if(NOT CMAKE_VERSION VERSION_LESS 3.6)
set(INTERNAL_USE_COMPILE_FEATURES ON)
endif()
endif()
endif()
if(INTERNAL_USE_COMPILE_FEATURES)
# Use the compile features aspect of CMake to transitively cause C++ flags
# to populate to user code.
# I'm using a hack by requiring features that I know force the lowest version
# of the compilers we want to support. Clang 3.3 and later support all of
# the C++11 standard. With CMake 3.8 and higher, we could switch to using
# cxx_std_11.
set(KOKKOS_CXX11_FEATURES
cxx_nonstatic_member_init # Forces GCC 4.7 or later and Intel 14.0 or later.
PARENT_SCOPE
)
else()
# CXX compile features are not yet implemented for this combination of
# compiler and version of CMake.
if(CMAKE_CXX_COMPILER_ID STREQUAL AppleClang)
# Versions of CMAKE before 3.2 don't support CXX_STANDARD or C++ compile
# features for the AppleClang compiler. Set compiler flags transitively
# here such that they trickle down to a call to target_compile_options().
# The following two blocks of code were copied from
# /Modules/Compiler/AppleClang-CXX.cmake from CMake 3.7.2 and then
# modified.
if(NOT CMAKE_CXX_COMPILER_VERSION VERSION_LESS 4.0)
set(INTERNAL_CXX11_STANDARD_COMPILE_OPTION "-std=c++11")
set(INTERNAL_CXX11_EXTENSION_COMPILE_OPTION "-std=gnu++11")
endif()
if(NOT CMAKE_CXX_COMPILER_VERSION VERSION_LESS 6.1)
set(INTERNAL_CXX14_STANDARD_COMPILE_OPTION "-std=c++14")
set(INTERNAL_CXX14_EXTENSION_COMPILE_OPTION "-std=gnu++14")
elseif(NOT CMAKE_CXX_COMPILER_VERSION VERSION_LESS 5.1)
# AppleClang 5.0 knows this flag, but does not set a __cplusplus macro
# greater than 201103L.
set(INTERNAL_CXX14_STANDARD_COMPILE_OPTION "-std=c++1y")
set(INTERNAL_CXX14_EXTENSION_COMPILE_OPTION "-std=gnu++1y")
endif()
elseif(CMAKE_CXX_COMPILER_ID STREQUAL Intel)
# Versions of CMAKE before 3.6 don't support CXX_STANDARD or C++ compile
# features for the Intel compiler. Set compiler flags transitively here
# such that they trickle down to a call to target_compile_options().
# The following three blocks of code were copied from
# /Modules/Compiler/Intel-CXX.cmake from CMake 3.7.2 and then modified.
if("x${CMAKE_CXX_SIMULATE_ID}" STREQUAL "xMSVC")
set(_std -Qstd)
set(_ext c++)
else()
set(_std -std)
set(_ext gnu++)
endif()
if(NOT CMAKE_CXX_COMPILER_VERSION VERSION_LESS 15.0.2)
set(INTERNAL_CXX14_STANDARD_COMPILE_OPTION "${_std}=c++14")
# TODO: There is no gnu++14 value supported; figure out what to do.
set(INTERNAL_CXX14_EXTENSION_COMPILE_OPTION "${_std}=c++14")
elseif(NOT CMAKE_CXX_COMPILER_VERSION VERSION_LESS 15.0.0)
set(INTERNAL_CXX14_STANDARD_COMPILE_OPTION "${_std}=c++1y")
# TODO: There is no gnu++14 value supported; figure out what to do.
set(INTERNAL_CXX14_EXTENSION_COMPILE_OPTION "${_std}=c++1y")
endif()
if(NOT CMAKE_CXX_COMPILER_VERSION VERSION_LESS 13.0)
set(INTERNAL_CXX11_STANDARD_COMPILE_OPTION "${_std}=c++11")
set(INTERNAL_CXX11_EXTENSION_COMPILE_OPTION "${_std}=${_ext}11")
elseif(NOT CMAKE_CXX_COMPILER_VERSION VERSION_LESS 12.1)
set(INTERNAL_CXX11_STANDARD_COMPILE_OPTION "${_std}=c++0x")
set(INTERNAL_CXX11_EXTENSION_COMPILE_OPTION "${_std}=${_ext}0x")
endif()
elseif(CMAKE_CXX_COMPILER_ID STREQUAL Cray)
# CMAKE doesn't support CXX_STANDARD or C++ compile features for the Cray
# compiler. Set compiler options transitively here such that they trickle
# down to a call to target_compile_options().
set(INTERNAL_CXX11_STANDARD_COMPILE_OPTION "-hstd=c++11")
set(INTERNAL_CXX11_EXTENSION_COMPILE_OPTION "-hstd=c++11")
set(INTERNAL_CXX14_STANDARD_COMPILE_OPTION "-hstd=c++11")
set(INTERNAL_CXX14_EXTENSION_COMPILE_OPTION "-hstd=c++11")
elseif(CMAKE_CXX_COMPILER_ID STREQUAL PGI)
# CMAKE doesn't support CXX_STANDARD or C++ compile features for the PGI
# compiler. Set compiler options transitively here such that they trickle
# down to a call to target_compile_options().
set(INTERNAL_CXX11_STANDARD_COMPILE_OPTION "--c++11")
set(INTERNAL_CXX11_EXTENSION_COMPILE_OPTION "--c++11")
set(INTERNAL_CXX14_STANDARD_COMPILE_OPTION "--c++11")
set(INTERNAL_CXX14_EXTENSION_COMPILE_OPTION "--c++11")
elseif(CMAKE_CXX_COMPILER_ID STREQUAL XL)
# CMAKE doesn't support CXX_STANDARD or C++ compile features for the XL
# compiler. Set compiler options transitively here such that they trickle
# down to a call to target_compile_options().
set(INTERNAL_CXX11_STANDARD_COMPILE_OPTION "-std=c++11")
set(INTERNAL_CXX11_EXTENSION_COMPILE_OPTION "-std=c++11")
set(INTERNAL_CXX14_STANDARD_COMPILE_OPTION "-std=c++11")
set(INTERNAL_CXX14_EXTENSION_COMPILE_OPTION "-std=c++11")
else()
# Assume GNU. CMAKE_CXX_STANDARD is handled correctly by CMake 3.1 and
# above for this compiler. If the user explicitly requests a C++
# standard, CMake takes care of it. If not, transitively require C++11.
if(NOT CMAKE_CXX_STANDARD)
set(INTERNAL_CXX11_STANDARD_COMPILE_OPTION ${CMAKE_CXX11_STANDARD_COMPILE_OPTION})
set(INTERNAL_CXX11_EXTENSION_COMPILE_OPTION ${CMAKE_CXX11_EXTENSION_COMPILE_OPTION})
endif()
endif()
# Set the C++ standard info for Kokkos respecting user set values for
# CMAKE_CXX_STANDARD and CMAKE_CXX_EXTENSIONS.
# Only use cxx extension if explicitly requested
if(CMAKE_CXX_STANDARD EQUAL 14)
if(DEFINED CMAKE_CXX_EXTENSIONS AND CMAKE_CXX_EXTENSIONS STREQUAL ON)
set(INTERNAL_CXX_FLAGS ${INTERNAL_CXX14_EXTENSION_COMPILE_OPTION})
else()
set(INTERNAL_CXX_FLAGS ${INTERNAL_CXX14_STANDARD_COMPILE_OPTION})
endif()
elseif(CMAKE_CXX_STANDARD EQUAL 11)
if(DEFINED CMAKE_CXX_EXTENSIONS AND CMAKE_CXX_EXTENSIONS STREQUAL ON)
set(INTERNAL_CXX_FLAGS ${INTERNAL_CXX11_EXTENSION_COMPILE_OPTION})
else()
set(INTERNAL_CXX_FLAGS ${INTERNAL_CXX11_STANDARD_COMPILE_OPTION})
endif()
else()
# The user didn't explicitly request a standard, transitively require
# C++11 respecting CMAKE_CXX_EXTENSIONS.
if(DEFINED CMAKE_CXX_EXTENSIONS AND CMAKE_CXX_EXTENSIONS STREQUAL ON)
set(INTERNAL_CXX_FLAGS ${INTERNAL_CXX11_EXTENSION_COMPILE_OPTION})
else()
set(INTERNAL_CXX_FLAGS ${INTERNAL_CXX11_STANDARD_COMPILE_OPTION})
endif()
endif()
set(KOKKOS_CXX_FLAGS ${INTERNAL_CXX_FLAGS} PARENT_SCOPE)
endif()
endfunction()
#-------------------------------------------------------------------------------
# function(set_kokkos_sources)
# Takes a list of sources for kokkos (e.g., KOKKOS_SRC from Makefile.kokkos and
# put it into kokkos_generated_settings.cmake) and sorts the files into the subpackages or
# separate_libraries. This is core and containers (algorithms is pure header
# files).
#
# Inputs:
# KOKKOS_SRC
#
# Outputs:
# KOKKOS_CORE_SRCS
# KOKKOS_CONTAINERS_SRCS
#
function(set_kokkos_srcs)
set(opts ) # no-value args
set(oneValArgs )
set(multValArgs KOKKOS_SRC) # e.g., lists
cmake_parse_arguments(IN "${opts}" "${oneValArgs}" "${multValArgs}" ${ARGN})
foreach(sfile ${IN_KOKKOS_SRC})
string(REPLACE "${CMAKE_CURRENT_SOURCE_DIR}/" "" stripfile "${sfile}")
string(REPLACE "/" ";" striplist "${stripfile}")
list(GET striplist 0 firstdir)
if(${firstdir} STREQUAL "core")
list(APPEND KOKKOS_CORE_SRCS ${sfile})
else()
list(APPEND KOKKOS_CONTAINERS_SRCS ${sfile})
endif()
endforeach()
set(KOKKOS_CORE_SRCS ${KOKKOS_CORE_SRCS} PARENT_SCOPE)
set(KOKKOS_CONTAINERS_SRCS ${KOKKOS_CONTAINERS_SRCS} PARENT_SCOPE)
return()
endfunction()
# Setting a default value if it is not already set
macro(set_kokkos_default_default VARIABLE DEFAULT)
IF( "${KOKKOS_INTERNAL_ENABLE_${VARIABLE}_DEFAULT}" STREQUAL "" )
IF( "${KOKKOS_ENABLE_${VARIABLE}}" STREQUAL "" )
set(KOKKOS_INTERNAL_ENABLE_${VARIABLE}_DEFAULT ${DEFAULT})
# MESSAGE(WARNING "Set: KOKKOS_INTERNAL_ENABLE_${VARIABLE}_DEFAULT to ${KOKKOS_INTERNAL_ENABLE_${VARIABLE}_DEFAULT}")
ELSE()
set(KOKKOS_INTERNAL_ENABLE_${VARIABLE}_DEFAULT ${KOKKOS_ENABLE_${VARIABLE}})
# MESSAGE(WARNING "Set: KOKKOS_INTERNAL_ENABLE_${VARIABLE}_DEFAULT to ${KOKKOS_INTERNAL_ENABLE_${VARIABLE}_DEFAULT}")
ENDIF()
ENDIF()
UNSET(KOKKOS_ENABLE_${VARIABLE} CACHE)
endmacro()

View File

@ -0,0 +1,365 @@
########################## NOTES ###############################################
# List the options for configuring kokkos using CMake method of doing it.
# These options then get mapped onto KOKKOS_SETTINGS environment variable by
# kokkos_settings.cmake. It is separate to allow other packages to override
# these variables (e.g., TriBITS).
########################## AVAILABLE OPTIONS ###################################
# Use lists for documentation, verification, and programming convenience
# All CMake options of the type KOKKOS_ENABLE_*
set(KOKKOS_INTERNAL_ENABLE_OPTIONS_LIST)
list(APPEND KOKKOS_INTERNAL_ENABLE_OPTIONS_LIST
Serial
OpenMP
Pthread
Qthread
Cuda
ROCm
HWLOC
MEMKIND
LIBRT
Cuda_Lambda
Cuda_Relocatable_Device_Code
Cuda_UVM
Cuda_LDG_Intrinsic
Debug
Debug_DualView_Modify_Check
Debug_Bounds_Checkt
Compiler_Warnings
Profiling
Profiling_Load_Print
Aggressive_Vectorization
)
#-------------------------------------------------------------------------------
#------------------------------- Recognize CamelCase Options ---------------------------
#-------------------------------------------------------------------------------
foreach(opt ${KOKKOS_INTERNAL_ENABLE_OPTIONS_LIST})
string(TOUPPER ${opt} OPT )
IF(DEFINED Kokkos_ENABLE_${opt})
IF(DEFINED KOKKOS_ENABLE_${OPT})
IF(NOT ("${KOKKOS_ENABLE_${OPT}}" STREQUAL "${Kokkos_ENABLE_${opt}}"))
IF(DEFINED KOKKOS_ENABLE_${OPT}_INTERNAL)
MESSAGE(WARNING "Defined both Kokkos_ENABLE_${opt}=[${Kokkos_ENABLE_${opt}}] and KOKKOS_ENABLE_${OPT}=[${KOKKOS_ENABLE_${OPT}}] and they differ! Could be caused by old CMakeCache Variable. Run CMake again and warning should disappear. If not you are truly setting both variables.")
IF(NOT ("${Kokkos_ENABLE_${opt}}" STREQUAL "${KOKKOS_ENABLE_${OPT}_INTERNAL}"))
UNSET(KOKKOS_ENABLE_${OPT} CACHE)
SET(KOKKOS_ENABLE_${OPT} ${Kokkos_ENABLE_${opt}})
MESSAGE(WARNING "SET BOTH VARIABLES KOKKOS_ENABLE_${OPT}: ${KOKKOS_ENABLE_${OPT}}")
ELSE()
SET(Kokkos_ENABLE_${opt} ${KOKKOS_ENABLE_${OPT}})
ENDIF()
ELSE()
MESSAGE(FATAL_ERROR "Defined both Kokkos_ENABLE_${opt}=[${Kokkos_ENABLE_${opt}}] and KOKKOS_ENABLE_${OPT}=[${KOKKOS_ENABLE_${OPT}}] and they differ!")
ENDIF()
ENDIF()
ELSE()
SET(KOKKOS_INTERNAL_ENABLE_${OPT}_DEFAULT ${Kokkos_ENABLE_${opt}})
ENDIF()
ENDIF()
endforeach()
IF(DEFINED Kokkos_Arch)
IF(DEFINED KOKKOS_ARCH)
IF(NOT (${KOKKOS_ARCH} STREQUAL "${Kokkos_Arch}"))
MESSAGE(FATAL_ERROR "Defined both Kokkos_Arch and KOKKOS_ARCH and they differ!")
ENDIF()
ELSE()
SET(KOKKOS_ARCH ${Kokkos_Arch})
ENDIF()
ENDIF()
#-------------------------------------------------------------------------------
# List of possible host architectures.
#-------------------------------------------------------------------------------
set(KOKKOS_ARCH_LIST)
list(APPEND KOKKOS_ARCH_LIST
None # No architecture optimization
AMDAVX # (HOST) AMD chip
ARMv80 # (HOST) ARMv8.0 Compatible CPU
ARMv81 # (HOST) ARMv8.1 Compatible CPU
ARMv8-ThunderX # (HOST) ARMv8 Cavium ThunderX CPU
WSM # (HOST) Intel Westmere CPU
SNB # (HOST) Intel Sandy/Ivy Bridge CPUs
HSW # (HOST) Intel Haswell CPUs
BDW # (HOST) Intel Broadwell Xeon E-class CPUs
SKX # (HOST) Intel Sky Lake Xeon E-class HPC CPUs (AVX512)
KNC # (HOST) Intel Knights Corner Xeon Phi
KNL # (HOST) Intel Knights Landing Xeon Phi
BGQ # (HOST) IBM Blue Gene Q
Power7 # (HOST) IBM POWER7 CPUs
Power8 # (HOST) IBM POWER8 CPUs
Power9 # (HOST) IBM POWER9 CPUs
Kepler # (GPU) NVIDIA Kepler default (generation CC 3.5)
Kepler30 # (GPU) NVIDIA Kepler generation CC 3.0
Kepler32 # (GPU) NVIDIA Kepler generation CC 3.2
Kepler35 # (GPU) NVIDIA Kepler generation CC 3.5
Kepler37 # (GPU) NVIDIA Kepler generation CC 3.7
Maxwell # (GPU) NVIDIA Maxwell default (generation CC 5.0)
Maxwell50 # (GPU) NVIDIA Maxwell generation CC 5.0
Maxwell52 # (GPU) NVIDIA Maxwell generation CC 5.2
Maxwell53 # (GPU) NVIDIA Maxwell generation CC 5.3
Pascal60 # (GPU) NVIDIA Pascal generation CC 6.0
Pascal61 # (GPU) NVIDIA Pascal generation CC 6.1
)
# List of possible device architectures.
# The case and spelling here needs to match Makefile.kokkos
set(KOKKOS_DEVICES_LIST)
# Options: Cuda,ROCm,OpenMP,Pthread,Qthreads,Serial
list(APPEND KOKKOS_DEVICES_LIST
Cuda # NVIDIA GPU -- see below
OpenMP # OpenMP
Pthread # pthread
Qthreads # qthreads
Serial # serial
ROCm # Relocatable device code
)
# List of possible TPLs for Kokkos
# From Makefile.kokkos: Options: hwloc,librt,experimental_memkind
set(KOKKOS_USE_TPLS_LIST)
list(APPEND KOKKOS_USE_TPLS_LIST
HWLOC # hwloc
LIBRT # librt
MEMKIND # experimental_memkind
)
# Map of cmake variables to Makefile variables
set(KOKKOS_INTERNAL_HWLOC hwloc)
set(KOKKOS_INTERNAL_LIBRT librt)
set(KOKKOS_INTERNAL_MEMKIND experimental_memkind)
# List of possible Advanced options
set(KOKKOS_OPTIONS_LIST)
list(APPEND KOKKOS_OPTIONS_LIST
AGGRESSIVE_VECTORIZATION
DISABLE_PROFILING
DISABLE_DUALVIEW_MODIFY_CHECK
ENABLE_PROFILE_LOAD_PRINT
)
# Map of cmake variables to Makefile variables
set(KOKKOS_INTERNAL_LDG_INTRINSIC use_ldg)
set(KOKKOS_INTERNAL_UVM librt)
set(KOKKOS_INTERNAL_RELOCATABLE_DEVICE_CODE rdc)
#-------------------------------------------------------------------------------
# List of possible Options for CUDA
#-------------------------------------------------------------------------------
# From Makefile.kokkos: Options: use_ldg,force_uvm,rdc
set(KOKKOS_CUDA_OPTIONS_LIST)
list(APPEND KOKKOS_CUDA_OPTIONS_LIST
LDG_INTRINSIC # use_ldg
UVM # force_uvm
RELOCATABLE_DEVICE_CODE # rdc
LAMBDA # enable_lambda
)
# Map of cmake variables to Makefile variables
set(KOKKOS_INTERNAL_LDG_INTRINSIC use_ldg)
set(KOKKOS_INTERNAL_UVM force_uvm)
set(KOKKOS_INTERNAL_RELOCATABLE_DEVICE_CODE rdc)
set(KOKKOS_INTERNAL_LAMBDA enable_lambda)
#-------------------------------------------------------------------------------
#------------------------------- Create doc strings ----------------------------
#-------------------------------------------------------------------------------
set(tmpr "\n ")
string(REPLACE ";" ${tmpr} KOKKOS_INTERNAL_ARCH_DOCSTR "${KOKKOS_ARCH_LIST}")
# This would be useful, but we use Foo_ENABLE mechanisms
#string(REPLACE ";" ${tmpr} KOKKOS_INTERNAL_DEVICES_DOCSTR "${KOKKOS_DEVICES_LIST}")
#string(REPLACE ";" ${tmpr} KOKKOS_INTERNAL_USE_TPLS_DOCSTR "${KOKKOS_USE_TPLS_LIST}")
#string(REPLACE ";" ${tmpr} KOKKOS_INTERNAL_CUDA_OPTIONS_DOCSTR "${KOKKOS_CUDA_OPTIONS_LIST}")
#-------------------------------------------------------------------------------
#------------------------------- GENERAL OPTIONS -------------------------------
#-------------------------------------------------------------------------------
# Setting this variable to a value other than "None" can improve host
# performance by turning on architecture specific code.
# NOT SET is used to determine if the option is passed in. It is reset to
# default "None" down below.
set(KOKKOS_ARCH "NOT_SET" CACHE STRING
"Optimize for specific host architecture. Options are: ${KOKKOS_INTERNAL_ARCH_DOCSTR}")
# Whether to build separate libraries or now
set(KOKKOS_SEPARATE_LIBS OFF CACHE BOOL "OFF = kokkos. ON = kokkoscore, kokkoscontainers, and kokkosalgorithms.")
# Qthreads options.
set(KOKKOS_QTHREADS_DIR "" CACHE PATH "Location of Qthreads library.")
#-------------------------------------------------------------------------------
#------------------------------- KOKKOS_DEVICES --------------------------------
#-------------------------------------------------------------------------------
# Figure out default settings
IF(Trilinos_ENABLE_Kokkos)
set_kokkos_default_default(SERIAL ON)
set_kokkos_default_default(PTHREAD OFF)
IF(TPL_ENABLE_QTHREAD)
set_kokkos_default_default(QTHREADS ${TPL_ENABLE_QTHREAD})
ELSE()
set_kokkos_default_default(QTHREADS OFF)
ENDIF()
IF(Trilinos_ENABLE_OpenMP)
set_kokkos_default_default(OPENMP ${Trilinos_ENABLE_OpenMP})
ELSE()
set_kokkos_default_default(OPENMP OFF)
ENDIF()
IF(TPL_ENABLE_CUDA)
set_kokkos_default_default(CUDA ${TPL_ENABLE_CUDA})
ELSE()
set_kokkos_default_default(CUDA OFF)
ENDIF()
set_kokkos_default_default(ROCM OFF)
ELSE()
set_kokkos_default_default(SERIAL ON)
set_kokkos_default_default(OPENMP OFF)
set_kokkos_default_default(PTHREAD OFF)
set_kokkos_default_default(QTHREAD OFF)
set_kokkos_default_default(CUDA OFF)
set_kokkos_default_default(ROCM OFF)
ENDIF()
# Set which Kokkos backend to use.
# These are the actual options that define the settings.
set(KOKKOS_ENABLE_SERIAL ${KOKKOS_INTERNAL_ENABLE_SERIAL_DEFAULT} CACHE BOOL "Whether to enable the Kokkos::Serial device. This device executes \"parallel\" kernels sequentially on a single CPU thread. It is enabled by default. If you disable this device, please enable at least one other CPU device, such as Kokkos::OpenMP or Kokkos::Threads.")
set(KOKKOS_ENABLE_OPENMP ${KOKKOS_INTERNAL_ENABLE_OPENMP_DEFAULT} CACHE BOOL "Enable OpenMP support in Kokkos." FORCE)
set(KOKKOS_ENABLE_PTHREAD ${KOKKOS_INTERNAL_ENABLE_PTHREAD_DEFAULT} CACHE BOOL "Enable Pthread support in Kokkos.")
set(KOKKOS_ENABLE_QTHREADS ${KOKKOS_INTERNAL_ENABLE_QTHREADS_DEFAULT} CACHE BOOL "Enable Qthreads support in Kokkos.")
set(KOKKOS_ENABLE_CUDA ${KOKKOS_INTERNAL_ENABLE_CUDA_DEFAULT} CACHE BOOL "Enable CUDA support in Kokkos.")
set(KOKKOS_ENABLE_ROCM ${KOKKOS_INTERNAL_ENABLE_ROCM_DEFAULT} CACHE BOOL "Enable ROCm support in Kokkos.")
#-------------------------------------------------------------------------------
#------------------------------- KOKKOS DEBUG and PROFILING --------------------
#-------------------------------------------------------------------------------
# Debug related options enable compiler warnings
set_kokkos_default_default(DEBUG OFF)
set(KOKKOS_ENABLE_DEBUG ${KOKKOS_INTERNAL_ENABLE_DEBUG_DEFAULT} CACHE BOOL "Enable Kokkos Debug.")
# From Makefile.kokkos: Advanced Options:
#compiler_warnings, aggressive_vectorization, disable_profiling, disable_dualview_modify_check, enable_profile_load_print
set_kokkos_default_default(COMPILER_WARNINGS OFF)
set(KOKKOS_ENABLE_COMPILER_WARNINGS ${KOKKOS_INTERNAL_ENABLE_COMPILER_WARNINGS_DEFAULT} CACHE BOOL "Enable compiler warnings.")
set_kokkos_default_default(DEBUG_DUALVIEW_MODIFY_CHECK OFF)
set(KOKKOS_ENABLE_DEBUG_DUALVIEW_MODIFY_CHECK ${KOKKOS_INTERNAL_ENABLE_DEBUG_DUALVIEW_MODIFY_CHECK_DEFAULT} CACHE BOOL "Enable dualview modify check.")
# Enable aggressive vectorization.
set_kokkos_default_default(AGGRESSIVE_VECTORIZATION OFF)
set(KOKKOS_ENABLE_AGGRESSIVE_VECTORIZATION ${KOKKOS_INTERNAL_ENABLE_AGGRESSIVE_VECTORIZATION_DEFAULT} CACHE BOOL "Enable aggressive vectorization.")
# Enable profiling.
set_kokkos_default_default(PROFILING ON)
set(KOKKOS_ENABLE_PROFILING ${KOKKOS_INTERNAL_ENABLE_PROFILING_DEFAULT} CACHE BOOL "Enable profiling.")
set_kokkos_default_default(PROFILING_LOAD_PRINT OFF)
set(KOKKOS_ENABLE_PROFILING_LOAD_PRINT ${KOKKOS_INTERNAL_ENABLE_PROFILING_LOAD_PRINT_DEFAULT} CACHE BOOL "Enable profile load print.")
#-------------------------------------------------------------------------------
#------------------------------- KOKKOS_USE_TPLS -------------------------------
#-------------------------------------------------------------------------------
# Enable hwloc library.
# Figure out default:
IF(Trilinos_ENABLE_Kokkos AND TPL_ENABLE_HWLOC)
set_kokkos_default_default(HWLOC ON)
ELSE()
set_kokkos_default_default(HWLOC OFF)
ENDIF()
set(KOKKOS_ENABLE_HWLOC ${KOKKOS_INTERNAL_ENABLE_HWLOC_DEFAULT} CACHE BOOL "Enable hwloc for better process placement.")
set(KOKKOS_HWLOC_DIR "" CACHE PATH "Location of hwloc library. (kokkos tpl)")
# Enable memkind library.
set_kokkos_default_default(MEMKIND OFF)
set(KOKKOS_ENABLE_MEMKIND ${KOKKOS_INTERNAL_ENABLE_MEMKIND_DEFAULT} CACHE BOOL "Enable memkind. (kokkos tpl)")
set(KOKKOS_MEMKIND_DIR "" CACHE PATH "Location of memkind library. (kokkos tpl)")
# Enable rt library.
IF(Trilinos_ENABLE_Kokkos)
IF(DEFINED TPL_ENABLE_LIBRT)
set_kokkos_default_default(LIBRT ${TPL_ENABLE_LIBRT})
ELSE()
set_kokkos_default_default(LIBRT OFF)
ENDIF()
ELSE()
set_kokkos_default_default(LIBRT ON)
ENDIF()
set(KOKKOS_ENABLE_LIBRT ${KOKKOS_INTERNAL_ENABLE_LIBRT_DEFAULT} CACHE BOOL "Enable librt for more precise timer. (kokkos tpl)")
#-------------------------------------------------------------------------------
#------------------------------- KOKKOS_CUDA_OPTIONS ---------------------------
#-------------------------------------------------------------------------------
# CUDA options.
# Set Defaults
set_kokkos_default_default(CUDA_LDG_INTRINSIC_DEFAULT OFF)
set_kokkos_default_default(CUDA_UVM_DEFAULT OFF)
set_kokkos_default_default(CUDA_RELOCATABLE_DEVICE_CODE OFF)
IF(Trilinos_ENABLE_Kokkos)
IF(KOKKOS_ENABLE_CUDA)
find_package(CUDA)
ENDIF()
IF (DEFINED CUDA_VERSION)
IF (CUDA_VERSION VERSION_GREATER "7.0")
set_kokkos_default_default(CUDA_LAMBDA ON)
ELSE()
set_kokkos_default_default(CUDA_LAMBDA OFF)
ENDIF()
ENDIF()
ELSE()
set_kokkos_default_default(CUDA_LAMBDA OFF)
ENDIF()
# Set actual options
set(KOKKOS_CUDA_DIR "" CACHE PATH "Location of CUDA library. Defaults to where nvcc installed.")
set(KOKKOS_ENABLE_CUDA_LDG_INTRINSIC ${KOKKOS_INTERNAL_ENABLE_CUDA_LDG_INTRINSIC_DEFAULT} CACHE BOOL "Enable CUDA LDG. (cuda option)")
set(KOKKOS_ENABLE_CUDA_UVM ${KOKKOS_INTERNAL_ENABLE_CUDA_UVM_DEFAULT} CACHE BOOL "Enable CUDA unified virtual memory.")
set(KOKKOS_ENABLE_CUDA_RELOCATABLE_DEVICE_CODE ${KOKKOS_INTERNAL_ENABLE_CUDA_RELOCATABLE_DEVICE_CODE_DEFAULT} CACHE BOOL "Enable relocatable device code for CUDA. (cuda option)")
set(KOKKOS_ENABLE_CUDA_LAMBDA ${KOKKOS_INTERNAL_ENABLE_CUDA_LAMBDA_DEFAULT} CACHE BOOL "Enable lambdas for CUDA. (cuda option)")
#-------------------------------------------------------------------------------
#----------------------- HOST ARCH AND LEGACY TRIBITS --------------------------
#-------------------------------------------------------------------------------
# This defines the previous legacy TriBITS builds.
set(KOKKOS_LEGACY_TRIBITS False)
IF ("${KOKKOS_ARCH}" STREQUAL "NOT_SET")
set(KOKKOS_ARCH "None")
IF(KOKKOS_HAS_TRILINOS)
set(KOKKOS_LEGACY_TRIBITS True)
ENDIF()
ENDIF()
IF (KOKKOS_HAS_TRILINOS)
IF (KOKKOS_LEGACY_TRIBITS)
message(STATUS "Using the legacy tribits build because KOKKOS_ARCH not set")
ELSE()
message(STATUS "NOT using the legacy tribits build because KOKKOS_ARCH *is* set")
ENDIF()
ENDIF()
#-------------------------------------------------------------------------------
#----------------------- Set CamelCase Options if they are not yet set ---------
#-------------------------------------------------------------------------------
foreach(opt ${KOKKOS_INTERNAL_ENABLE_OPTIONS_LIST})
string(TOUPPER ${opt} OPT )
UNSET(KOKKOS_ENABLE_${OPT}_INTERNAL CACHE)
SET(KOKKOS_ENABLE_${OPT}_INTERNAL ${KOKKOS_ENABLE_${OPT}} CACHE BOOL INTERNAL)
IF(DEFINED KOKKOS_ENABLE_${OPT})
UNSET(Kokkos_ENABLE_${opt} CACHE)
SET(Kokkos_ENABLE_${opt} ${KOKKOS_ENABLE_${OPT}} CACHE BOOL "CamelCase Compatibility setting for KOKKOS_ENABLE_${OPT}")
ENDIF()
endforeach()

View File

@ -0,0 +1,257 @@
########################## NOTES ###############################################
# This files goal is to take CMake options found in kokkos_options.cmake but
# possibly set from elsewhere
# (see: trilinos/cmake/ProjectCOmpilerPostConfig.cmake)
# using CMake idioms and map them onto the KOKKOS_SETTINGS variables that gets
# passed to the kokkos makefile configuration:
# make -f ${CMAKE_SOURCE_DIR}/core/src/Makefile ${KOKKOS_SETTINGS} build-makefile-cmake-kokkos
# that generates KokkosCore_config.h and kokkos_generated_settings.cmake
# To understand how to form KOKKOS_SETTINGS, see
# <KOKKOS_PATH>/Makefile.kokkos
#-------------------------------------------------------------------------------
#------------------------------- GENERAL OPTIONS -------------------------------
#-------------------------------------------------------------------------------
# Ensure that KOKKOS_ARCH is in the ARCH_LIST
foreach(arch ${KOKKOS_ARCH})
list(FIND KOKKOS_ARCH_LIST ${arch} indx)
if (indx EQUAL -1)
message(FATAL_ERROR "${arch} is not an accepted value for KOKKOS_ARCH."
" Please pick from these choices: ${KOKKOS_INTERNAL_ARCH_DOCSTR}")
endif ()
endforeach()
# KOKKOS_SETTINGS uses KOKKOS_ARCH
string(REPLACE ";" "," KOKKOS_ARCH "${KOKKOS_ARCH}")
set(KOKKOS_ARCH ${KOKKOS_ARCH})
# From Makefile.kokkos: Options: yes,no
if(${KOKKOS_ENABLE_DEBUG})
set(KOKKOS_DEBUG yes)
else()
set(KOKKOS_DEBUG no)
endif()
#------------------------------- KOKKOS_DEVICES --------------------------------
# Can have multiple devices
set(KOKKOS_DEVICESl)
foreach(devopt ${KOKKOS_DEVICES_LIST})
string(TOUPPER ${devopt} devoptuc)
if (${KOKKOS_ENABLE_${devoptuc}})
list(APPEND KOKKOS_DEVICESl ${devopt})
endif ()
endforeach()
# List needs to be comma-delmitted
string(REPLACE ";" "," KOKKOS_DEVICES "${KOKKOS_DEVICESl}")
#------------------------------- KOKKOS_OPTIONS --------------------------------
# From Makefile.kokkos: Options: aggressive_vectorization,disable_profiling
#compiler_warnings, aggressive_vectorization, disable_profiling, disable_dualview_modify_check, enable_profile_load_print
set(KOKKOS_OPTIONSl)
if(${KOKKOS_ENABLE_COMPILER_WARNINGS})
list(APPEND KOKKOS_OPTIONSl compiler_warnings)
endif()
if(${KOKKOS_ENABLE_AGGRESSIVE_VECTORIZATION})
list(APPEND KOKKOS_OPTIONSl aggressive_vectorization)
endif()
if(NOT ${KOKKOS_ENABLE_PROFILING})
list(APPEND KOKKOS_OPTIONSl disable_vectorization)
endif()
if(NOT ${KOKKOS_ENABLE_DEBUG_DUALVIEW_MODIFY_CHECK})
list(APPEND KOKKOS_OPTIONSl disable_dualview_modify_check)
endif()
if(${KOKKOS_ENABLE_PROFILING_LOAD_PRINT})
list(APPEND KOKKOS_OPTIONSl enable_profile_load_print)
endif()
# List needs to be comma-delimitted
string(REPLACE ";" "," KOKKOS_OPTIONS "${KOKKOS_OPTIONSl}")
#------------------------------- KOKKOS_USE_TPLS -------------------------------
# Construct the Makefile options
set(KOKKOS_USE_TPLSl)
foreach(tplopt ${KOKKOS_USE_TPLS_LIST})
if (${KOKKOS_ENABLE_${tplopt}})
list(APPEND KOKKOS_USE_TPLSl ${KOKKOS_INTERNAL_${tplopt}})
endif ()
endforeach()
# List needs to be comma-delimitted
string(REPLACE ";" "," KOKKOS_USE_TPLS "${KOKKOS_USE_TPLSl}")
#------------------------------- KOKKOS_CUDA_OPTIONS ---------------------------
# Construct the Makefile options
set(KOKKOS_CUDA_OPTIONS)
foreach(cudaopt ${KOKKOS_CUDA_OPTIONS_LIST})
if (${KOKKOS_ENABLE_CUDA_${cudaopt}})
list(APPEND KOKKOS_CUDA_OPTIONSl ${KOKKOS_INTERNAL_${cudaopt}})
endif ()
endforeach()
# List needs to be comma-delmitted
string(REPLACE ";" "," KOKKOS_CUDA_OPTIONS "${KOKKOS_CUDA_OPTIONSl}")
#------------------------------- PATH VARIABLES --------------------------------
# Want makefile to use same executables specified which means modifying
# the path so the $(shell ...) commands in the makefile see the right exec
# Also, the Makefile's use FOO_PATH naming scheme for -I/-L construction
#TODO: Makefile.kokkos allows this to be overwritten? ROCM_HCC_PATH
set(KOKKOS_INTERNAL_PATHS)
set(addpathl)
foreach(kvar "CUDA;QTHREADS;${KOKKOS_USE_TPLS_LIST}")
if(${KOKKOS_ENABLE_${kvar}})
if(DEFINED KOKKOS_${kvar}_DIR)
set(KOKKOS_INTERNAL_PATHS "${KOKKOS_INTERNAL_PATHS} ${kvar}_PATH=${KOKKOS_${kvar}_DIR}")
if(IS_DIRECTORY ${KOKKOS_${kvar}_DIR}/bin)
list(APPEND addpathl ${KOKKOS_${kvar}_DIR}/bin)
endif()
endif()
endif()
endforeach()
# Path env is : delimitted
string(REPLACE ";" ":" KOKKOS_INTERNAL_ADDTOPATH "${addpathl}")
######################### SET KOKKOS_SETTINGS ##################################
# Set the KOKKOS_SETTINGS String -- this is the primary communication with the
# makefile configuration. See Makefile.kokkos
set(KOKKOS_SETTINGS KOKKOS_SRC_PATH=${KOKKOS_SRC_PATH})
set(KOKKOS_SETTINGS ${KOKKOS_SETTINGS} KOKKOS_PATH=${KOKKOS_PATH})
set(KOKKOS_SETTINGS ${KOKKOS_SETTINGS} KOKKOS_INSTALL_PATH=${CMAKE_INSTALL_PREFIX})
# Form of KOKKOS_foo=$KOKKOS_foo
foreach(kvar ARCH;DEVICES;DEBUG;OPTIONS;CUDA_OPTIONS;USE_TPLS)
set(KOKKOS_VAR KOKKOS_${kvar})
if(DEFINED KOKKOS_${kvar})
if (NOT "${${KOKKOS_VAR}}" STREQUAL "")
set(KOKKOS_SETTINGS ${KOKKOS_SETTINGS} ${KOKKOS_VAR}=${${KOKKOS_VAR}})
endif()
endif()
endforeach()
# Form of VAR=VAL
#TODO: Makefile supports MPICH_CXX, OMPI_CXX as well
foreach(ovar CXX;CXXFLAGS;LDFLAGS)
if(DEFINED ${ovar})
if (NOT "${${ovar}}" STREQUAL "")
set(KOKKOS_SETTINGS ${KOKKOS_SETTINGS} ${ovar}=${${ovar}})
endif()
endif()
endforeach()
# Finally, do the paths
if (NOT "${KOKKOS_INTERNAL_PATHS}" STREQUAL "")
set(KOKKOS_SETTINGS ${KOKKOS_SETTINGS} ${KOKKOS_INTERNAL_PATHS})
endif()
if (NOT "${KOKKOS_INTERNAL_ADDTOPATH}" STREQUAL "")
set(KOKKOS_SETTINGS ${KOKKOS_SETTINGS} PATH=${KOKKOS_INTERNAL_ADDTOPATH}:\${PATH})
endif()
# Final form that gets passed to make
set(KOKKOS_SETTINGS env ${KOKKOS_SETTINGS})
############################ PRINT CONFIGURE STATUS ############################
if(KOKKOS_CMAKE_VERBOSE)
message(STATUS "")
message(STATUS "****************** Kokkos Settings ******************")
message(STATUS "Execution Spaces")
if(KOKKOS_ENABLE_CUDA)
message(STATUS " Device Parallel: Cuda")
else()
message(STATUS " Device Parallel: None")
endif()
if(KOKKOS_ENABLE_OPENMP)
message(STATUS " Host Parallel: OpenMP")
elseif(KOKKOS_ENABLE_PTHREAD)
message(STATUS " Host Parallel: Pthread")
elseif(KOKKOS_ENABLE_QTHREADS)
message(STATUS " Host Parallel: Qthreads")
else()
message(STATUS " Host Parallel: None")
endif()
if(KOKKOS_ENABLE_SERIAL)
message(STATUS " Host Serial: Serial")
else()
message(STATUS " Host Serial: None")
endif()
message(STATUS "")
message(STATUS "Architectures:")
message(STATUS " ${KOKKOS_ARCH}")
message(STATUS "")
message(STATUS "Enabled options")
if(KOKKOS_SEPARATE_LIBS)
message(STATUS " KOKKOS_SEPARATE_LIBS")
endif()
if(KOKKOS_ENABLE_HWLOC)
message(STATUS " KOKKOS_ENABLE_HWLOC")
endif()
if(KOKKOS_ENABLE_MEMKIND)
message(STATUS " KOKKOS_ENABLE_MEMKIND")
endif()
if(KOKKOS_ENABLE_DEBUG)
message(STATUS " KOKKOS_ENABLE_DEBUG")
endif()
if(KOKKOS_ENABLE_PROFILING)
message(STATUS " KOKKOS_ENABLE_PROFILING")
endif()
if(KOKKOS_ENABLE_AGGRESSIVE_VECTORIZATION)
message(STATUS " KOKKOS_ENABLE_AGGRESSIVE_VECTORIZATION")
endif()
if(KOKKOS_ENABLE_CUDA)
if(KOKKOS_ENABLE_CUDA_LDG_INTRINSIC)
message(STATUS " KOKKOS_ENABLE_CUDA_LDG_INTRINSIC")
endif()
if(KOKKOS_ENABLE_CUDA_UVM)
message(STATUS " KOKKOS_ENABLE_CUDA_UVM")
endif()
if(KOKKOS_ENABLE_CUDA_RELOCATABLE_DEVICE_CODE)
message(STATUS " KOKKOS_ENABLE_CUDA_RELOCATABLE_DEVICE_CODE")
endif()
if(KOKKOS_ENABLE_CUDA_LAMBDA)
message(STATUS " KOKKOS_ENABLE_CUDA_LAMBDA")
endif()
if(KOKKOS_CUDA_DIR)
message(STATUS " KOKKOS_CUDA_DIR: ${KOKKOS_CUDA_DIR}")
endif()
endif()
if(KOKKOS_QTHREADS_DIR)
message(STATUS " KOKKOS_QTHREADS_DIR: ${KOKKOS_QTHREADS_DIR}")
endif()
if(KOKKOS_HWLOC_DIR)
message(STATUS " KOKKOS_HWLOC_DIR: ${KOKKOS_HWLOC_DIR}")
endif()
if(KOKKOS_MEMKIND_DIR)
message(STATUS " KOKKOS_MEMKIND_DIR: ${KOKKOS_MEMKIND_DIR}")
endif()
message(STATUS "")
message(STATUS "Final kokkos settings variable:")
message(STATUS " ${KOKKOS_SETTINGS}")
message(STATUS "*****************************************************")
message(STATUS "")
endif()

View File

@ -3,10 +3,6 @@ INCLUDE(CTest)
cmake_policy(SET CMP0054 NEW)
IF(NOT DEFINED ${PROJECT_NAME})
project(KokkosCMake)
ENDIF()
MESSAGE(WARNING "The project name is: ${PROJECT_NAME}")
IF(NOT DEFINED ${PROJECT_NAME}_ENABLE_OpenMP)
@ -46,26 +42,26 @@ MACRO(PREPEND_GLOBAL_SET VARNAME)
GLOBAL_SET(${VARNAME} ${ARGN} ${${VARNAME}})
ENDMACRO()
FUNCTION(REMOVE_GLOBAL_DUPLICATES VARNAME)
ASSERT_DEFINED(${VARNAME})
IF (${VARNAME})
SET(TMP ${${VARNAME}})
LIST(REMOVE_DUPLICATES TMP)
GLOBAL_SET(${VARNAME} ${TMP})
ENDIF()
ENDFUNCTION()
#FUNCTION(REMOVE_GLOBAL_DUPLICATES VARNAME)
# ASSERT_DEFINED(${VARNAME})
# IF (${VARNAME})
# SET(TMP ${${VARNAME}})
# LIST(REMOVE_DUPLICATES TMP)
# GLOBAL_SET(${VARNAME} ${TMP})
# ENDIF()
#ENDFUNCTION()
MACRO(TRIBITS_ADD_OPTION_AND_DEFINE USER_OPTION_NAME MACRO_DEFINE_NAME DOCSTRING DEFAULT_VALUE)
MESSAGE(STATUS "TRIBITS_ADD_OPTION_AND_DEFINE: '${USER_OPTION_NAME}' '${MACRO_DEFINE_NAME}' '${DEFAULT_VALUE}'")
SET( ${USER_OPTION_NAME} "${DEFAULT_VALUE}" CACHE BOOL "${DOCSTRING}" )
IF(NOT ${MACRO_DEFINE_NAME} STREQUAL "")
IF(${USER_OPTION_NAME})
GLOBAL_SET(${MACRO_DEFINE_NAME} ON)
ELSE()
GLOBAL_SET(${MACRO_DEFINE_NAME} OFF)
ENDIF()
ENDIF()
ENDMACRO()
#MACRO(TRIBITS_ADD_OPTION_AND_DEFINE USER_OPTION_NAME MACRO_DEFINE_NAME DOCSTRING DEFAULT_VALUE)
# MESSAGE(STATUS "TRIBITS_ADD_OPTION_AND_DEFINE: '${USER_OPTION_NAME}' '${MACRO_DEFINE_NAME}' '${DEFAULT_VALUE}'")
# SET( ${USER_OPTION_NAME} "${DEFAULT_VALUE}" CACHE BOOL "${DOCSTRING}" )
# IF(NOT ${MACRO_DEFINE_NAME} STREQUAL "")
# IF(${USER_OPTION_NAME})
# GLOBAL_SET(${MACRO_DEFINE_NAME} ON)
# ELSE()
# GLOBAL_SET(${MACRO_DEFINE_NAME} OFF)
# ENDIF()
# ENDIF()
#ENDMACRO()
FUNCTION(TRIBITS_CONFIGURE_FILE PACKAGE_NAME_CONFIG_FILE)
@ -77,17 +73,20 @@ FUNCTION(TRIBITS_CONFIGURE_FILE PACKAGE_NAME_CONFIG_FILE)
ENDFUNCTION()
MACRO(TRIBITS_ADD_DEBUG_OPTION)
TRIBITS_ADD_OPTION_AND_DEFINE(
${PROJECT_NAME}_ENABLE_DEBUG
HAVE_${PROJECT_NAME_UC}_DEBUG
"Enable a host of runtime debug checking."
OFF
)
ENDMACRO()
#MACRO(TRIBITS_ADD_DEBUG_OPTION)
# TRIBITS_ADD_OPTION_AND_DEFINE(
# ${PROJECT_NAME}_ENABLE_DEBUG
# HAVE_${PROJECT_NAME_UC}_DEBUG
# "Enable a host of runtime debug checking."
# OFF
# )
#ENDMACRO()
MACRO(TRIBITS_ADD_TEST_DIRECTORIES)
message(STATUS "ProjectName: " ${PROJECT_NAME})
message(STATUS "Tests: " ${${PROJECT_NAME}_ENABLE_TESTS})
IF(${${PROJECT_NAME}_ENABLE_TESTS})
FOREACH(TEST_DIR ${ARGN})
ADD_SUBDIRECTORY(${TEST_DIR})
@ -387,17 +386,17 @@ FUNCTION(TRIBITS_TPL_FIND_INCLUDE_DIRS_AND_LIBRARIES TPL_NAME)
ENDFUNCTION()
MACRO(TRIBITS_PROCESS_TPL_DEP_FILE TPL_FILE)
GET_FILENAME_COMPONENT(TPL_NAME ${TPL_FILE} NAME_WE)
INCLUDE("${TPL_FILE}")
IF(TARGET TPL_LIB_${TPL_NAME})
MESSAGE(STATUS "Found tpl library: ${TPL_NAME}")
SET(TPL_ENABLE_${TPL_NAME} TRUE)
ELSE()
MESSAGE(STATUS "Tpl library not found: ${TPL_NAME}")
SET(TPL_ENABLE_${TPL_NAME} FALSE)
ENDIF()
ENDMACRO()
#MACRO(TRIBITS_PROCESS_TPL_DEP_FILE TPL_FILE)
# GET_FILENAME_COMPONENT(TPL_NAME ${TPL_FILE} NAME_WE)
# INCLUDE("${TPL_FILE}")
# IF(TARGET TPL_LIB_${TPL_NAME})
# MESSAGE(STATUS "Found tpl library: ${TPL_NAME}")
# SET(TPL_ENABLE_${TPL_NAME} TRUE)
# ELSE()
# MESSAGE(STATUS "Tpl library not found: ${TPL_NAME}")
# SET(TPL_ENABLE_${TPL_NAME} FALSE)
# ENDIF()
#ENDMACRO()
MACRO(PREPEND_TARGET_SET VARNAME TARGET_NAME TYPE)
IF(TYPE STREQUAL "REQUIRED")
@ -475,6 +474,7 @@ MACRO(TRIBITS_SUBPACKAGE NAME)
SET(PARENT_PACKAGE_NAME ${PACKAGE_NAME})
SET(PACKAGE_NAME ${PACKAGE_NAME}${NAME})
STRING(TOUPPER ${PACKAGE_NAME} PACKAGE_NAME_UC)
SET(${PACKAGE_NAME}_SOURCE_DIR ${CMAKE_CURRENT_SOURCE_DIR})
ADD_INTERFACE_LIBRARY(PACKAGE_${PACKAGE_NAME})
@ -494,11 +494,11 @@ MACRO(TRIBITS_PACKAGE_DECL NAME)
SET(${PACKAGE_NAME}_SOURCE_DIR ${CMAKE_CURRENT_SOURCE_DIR})
STRING(TOUPPER ${PACKAGE_NAME} PACKAGE_NAME_UC)
SET(TRIBITS_DEPS_DIR "${CMAKE_SOURCE_DIR}/cmake/deps")
FILE(GLOB TPLS_FILES "${TRIBITS_DEPS_DIR}/*.cmake")
FOREACH(TPL_FILE ${TPLS_FILES})
TRIBITS_PROCESS_TPL_DEP_FILE(${TPL_FILE})
ENDFOREACH()
#SET(TRIBITS_DEPS_DIR "${CMAKE_SOURCE_DIR}/cmake/deps")
#FILE(GLOB TPLS_FILES "${TRIBITS_DEPS_DIR}/*.cmake")
#FOREACH(TPL_FILE ${TPLS_FILES})
# TRIBITS_PROCESS_TPL_DEP_FILE(${TPL_FILE})
#ENDFOREACH()
ENDMACRO()

View File

@ -10,3 +10,5 @@ tag: 2.03.05 date: 05:27:2017 master: 36b92f43 develop: 79073186
tag: 2.03.13 date: 07:27:2017 master: da314444 develop: 29ccb58a
tag: 2.04.00 date: 08:16:2017 master: 54eb75c0 develop: 32fb8ee1
tag: 2.04.04 date: 09:11:2017 master: 2b7e9c20 develop: 51e7b25a
tag: 2.04.11 date: 10:28:2017 master: 54a1330a develop: ed36c017
tag: 2.5.11 date: 12:15:2017 master: dfe685f4 develop: ec7ad6d8

View File

@ -39,6 +39,12 @@ cuda_args=""
# Arguments for both NVCC and Host compiler
shared_args=""
# Argument -c
compile_arg=""
# Argument -o <obj>
output_arg=""
# Linker arguments
xlinker_args=""
@ -66,6 +72,7 @@ dry_run=0
# Skip NVCC compilation and use host compiler directly
host_only=0
host_only_args=""
# Enable workaround for CUDA 6.5 for pragma ident
replace_pragma_ident=0
@ -78,6 +85,14 @@ temp_dir=${TMPDIR:-/tmp}
# Check if we have an optimization argument already
optimization_applied=0
# Check if we have -std=c++X or --std=c++X already
stdcxx_applied=0
# Run nvcc a second time to generate dependencies if needed
depfile_separate=0
depfile_output_arg=""
depfile_target_arg=""
#echo "Arguments: $# $@"
while [ $# -gt 0 ]
@ -109,12 +124,31 @@ do
fi
;;
#Handle shared args (valid for both nvcc and the host compiler)
-D*|-c|-I*|-L*|-l*|-g|--help|--version|-E|-M|-shared)
-D*|-I*|-L*|-l*|-g|--help|--version|-E|-M|-shared)
shared_args="$shared_args $1"
;;
#Handle shared args that have an argument
-o|-MT)
shared_args="$shared_args $1 $2"
#Handle compilation argument
-c)
compile_arg="$1"
;;
#Handle output argument
-o)
output_arg="$output_arg $1 $2"
shift
;;
# Handle depfile arguments. We map them to a separate call to nvcc.
-MD|-MMD)
depfile_separate=1
host_only_args="$host_only_args $1"
;;
-MF)
depfile_output_arg="-o $2"
host_only_args="$host_only_args $1 $2"
shift
;;
-MT)
depfile_target_arg="$1 $2"
host_only_args="$host_only_args $1 $2"
shift
;;
#Handle known nvcc args
@ -130,16 +164,25 @@ do
cuda_args="$cuda_args $1 $2"
shift
;;
#Handle c++11 setting
--std=c++11|-std=c++11)
shared_args="$shared_args $1"
#Handle c++11
--std=c++11|-std=c++11|--std=c++14|-std=c++14|--std=c++1z|-std=c++1z)
if [ $stdcxx_applied -eq 1 ]; then
echo "nvcc_wrapper - *warning* you have set multiple optimization flags (-std=c++1* or --std=c++1*), only the first is used because nvcc can only accept a single std setting"
else
shared_args="$shared_args $1"
stdcxx_applied=1
fi
;;
#strip of -std=c++98 due to nvcc warnings and Tribits will place both -std=c++11 and -std=c++98
-std=c++98|--std=c++98)
;;
#strip of pedantic because it produces endless warnings about #LINE added by the preprocessor
-pedantic|-Wpedantic|-ansi)
;;
#strip of -Woverloaded-virtual to avoid "cc1: warning: command line option -Woverloaded-virtual is valid for C++/ObjC++ but not for C"
-Woverloaded-virtual)
;;
#strip -Xcompiler because we add it
-Xcompiler)
if [ $first_xcompiler_arg -eq 1 ]; then
@ -190,7 +233,7 @@ do
object_files_xlinker="$object_files_xlinker -Xlinker $1"
;;
#Handle object files which always need to use "-Xlinker": -x cu applies to all input files, so give them to linker, except if only linking
*.dylib)
@*|*.dylib)
object_files="$object_files -Xlinker $1"
object_files_xlinker="$object_files_xlinker -Xlinker $1"
;;
@ -230,7 +273,7 @@ if [ $first_xcompiler_arg -eq 0 ]; then
fi
#Compose host only command
host_command="$host_compiler $shared_args $xcompiler_args $host_linker_args $shared_versioned_libraries_host"
host_command="$host_compiler $shared_args $host_only_args $compile_arg $output_arg $xcompiler_args $host_linker_args $shared_versioned_libraries_host"
#nvcc does not accept '#pragma ident SOME_MACRO_STRING' but it does accept '#ident SOME_MACRO_STRING'
if [ $replace_pragma_ident -eq 1 ]; then
@ -262,10 +305,21 @@ else
host_command="$host_command $object_files"
fi
if [ $depfile_separate -eq 1 ]; then
# run nvcc a second time to generate dependencies (without compiling)
nvcc_depfile_command="$nvcc_command -M $depfile_target_arg $depfile_output_arg"
else
nvcc_depfile_command=""
fi
nvcc_command="$nvcc_command $compile_arg $output_arg"
#Print command for dryrun
if [ $dry_run -eq 1 ]; then
if [ $host_only -eq 1 ]; then
echo $host_command
elif [ -n "$nvcc_depfile_command" ]; then
echo $nvcc_command "&&" $nvcc_depfile_command
else
echo $nvcc_command
fi
@ -275,6 +329,8 @@ fi
#Run compilation command
if [ $host_only -eq 1 ]; then
$host_command
elif [ -n "$nvcc_depfile_command" ]; then
$nvcc_command && $nvcc_depfile_command
else
$nvcc_command
fi

View File

@ -16,12 +16,12 @@ if [[ "$HOSTNAME" =~ (white|ride).* ]]; then
MACHINE=white
elif [[ "$HOSTNAME" =~ .*bowman.* ]]; then
MACHINE=bowman
elif [[ "$HOSTNAME" =~ node.* ]]; then # Warning: very generic name
elif [[ "$HOSTNAME" =~ n.* ]]; then # Warning: very generic name
if [[ "$PROCESSOR" = "aarch64" ]]; then
MACHINE=sullivan
else
MACHINE=shepard
fi
elif [[ "$HOSTNAME" =~ node.* ]]; then # Warning: very generic name
MACHINE=shepard
elif [[ "$HOSTNAME" =~ apollo ]]; then
MACHINE=apollo
elif [[ "$HOSTNAME" =~ sullivan ]]; then
@ -45,7 +45,8 @@ GCC_WARNING_FLAGS="-Wall,-Wshadow,-pedantic,-Werror,-Wsign-compare,-Wtype-limits
IBM_WARNING_FLAGS="-Wall,-Wshadow,-pedantic,-Werror,-Wsign-compare,-Wtype-limits,-Wuninitialized"
CLANG_WARNING_FLAGS="-Wall,-Wshadow,-pedantic,-Werror,-Wsign-compare,-Wtype-limits,-Wuninitialized"
INTEL_WARNING_FLAGS="-Wall,-Wshadow,-pedantic,-Werror,-Wsign-compare,-Wtype-limits,-Wuninitialized"
CUDA_WARNING_FLAGS=""
CUDA_WARNING_FLAGS="-Wall,-Wshadow,-pedantic,-Werror,-Wsign-compare,-Wtype-limits,-Wuninitialized"
PGI_WARNING_FLAGS=""
# Default. Machine specific can override.
DEBUG=False
@ -61,6 +62,8 @@ SPOT_CHECK=False
PRINT_HELP=False
OPT_FLAG=""
CXX_FLAGS_EXTRA=""
LD_FLAGS_EXTRA=""
KOKKOS_OPTIONS=""
#
@ -111,6 +114,12 @@ do
--with-cuda-options*)
KOKKOS_CUDA_OPTIONS="--with-cuda-options=${key#*=}"
;;
--cxxflags-extra*)
CXX_FLAGS_EXTRA="${key#*=}"
;;
--ldflags-extra*)
LD_FLAGS_EXTRA="${key#*=}"
;;
--help*)
PRINT_HELP=True
;;
@ -150,20 +159,18 @@ if [ "$MACHINE" = "sems" ]; then
if [ "$SPOT_CHECK" = "True" ]; then
# Format: (compiler module-list build-list exe-name warning-flag)
COMPILERS=("gcc/4.7.2 $BASE_MODULE_LIST "OpenMP,Pthread" g++ $GCC_WARNING_FLAGS"
"gcc/5.1.0 $BASE_MODULE_LIST "Serial" g++ $GCC_WARNING_FLAGS"
"intel/16.0.1 $BASE_MODULE_LIST "OpenMP" icpc $INTEL_WARNING_FLAGS"
COMPILERS=("gcc/5.3.0 $BASE_MODULE_LIST "OpenMP" g++ $GCC_WARNING_FLAGS"
"gcc/6.1.0 $BASE_MODULE_LIST "Serial" g++ $GCC_WARNING_FLAGS"
"intel/17.0.1 $BASE_MODULE_LIST "OpenMP" icpc $INTEL_WARNING_FLAGS"
"clang/3.9.0 $BASE_MODULE_LIST "Pthread_Serial" clang++ $CLANG_WARNING_FLAGS"
"cuda/8.0.44 $CUDA8_MODULE_LIST "Cuda_OpenMP" $KOKKOS_PATH/bin/nvcc_wrapper $CUDA_WARNING_FLAGS"
)
else
# Format: (compiler module-list build-list exe-name warning-flag)
COMPILERS=("gcc/4.7.2 $BASE_MODULE_LIST $GCC_BUILD_LIST g++ $GCC_WARNING_FLAGS"
"gcc/4.8.4 $BASE_MODULE_LIST $GCC_BUILD_LIST g++ $GCC_WARNING_FLAGS"
COMPILERS=("gcc/4.8.4 $BASE_MODULE_LIST $GCC_BUILD_LIST g++ $GCC_WARNING_FLAGS"
"gcc/4.9.3 $BASE_MODULE_LIST $GCC_BUILD_LIST g++ $GCC_WARNING_FLAGS"
"gcc/5.3.0 $BASE_MODULE_LIST $GCC_BUILD_LIST g++ $GCC_WARNING_FLAGS"
"gcc/6.1.0 $BASE_MODULE_LIST $GCC_BUILD_LIST g++ $GCC_WARNING_FLAGS"
"intel/14.0.4 $BASE_MODULE_LIST $INTEL_BUILD_LIST icpc $INTEL_WARNING_FLAGS"
"intel/15.0.2 $BASE_MODULE_LIST $INTEL_BUILD_LIST icpc $INTEL_WARNING_FLAGS"
"intel/16.0.1 $BASE_MODULE_LIST $INTEL_BUILD_LIST icpc $INTEL_WARNING_FLAGS"
"intel/16.0.3 $BASE_MODULE_LIST $INTEL_BUILD_LIST icpc $INTEL_WARNING_FLAGS"
@ -184,6 +191,7 @@ elif [ "$MACHINE" = "white" ]; then
BASE_MODULE_LIST="<COMPILER_NAME>/<COMPILER_VERSION>"
IBM_MODULE_LIST="<COMPILER_NAME>/xl/<COMPILER_VERSION>"
CUDA_MODULE_LIST="<COMPILER_NAME>/<COMPILER_VERSION>,gcc/5.4.0"
CUDA_MODULE_LIST2="<COMPILER_NAME>/<COMPILER_VERSION>,gcc/6.3.0,ibm/xl/13.1.6-BETA"
# Don't do pthread on white.
GCC_BUILD_LIST="OpenMP,Serial,OpenMP_Serial"
@ -192,6 +200,7 @@ elif [ "$MACHINE" = "white" ]; then
COMPILERS=("gcc/5.4.0 $BASE_MODULE_LIST $IBM_BUILD_LIST g++ $GCC_WARNING_FLAGS"
"ibm/13.1.3 $IBM_MODULE_LIST $IBM_BUILD_LIST xlC $IBM_WARNING_FLAGS"
"cuda/8.0.44 $CUDA_MODULE_LIST $CUDA_IBM_BUILD_LIST ${KOKKOS_PATH}/bin/nvcc_wrapper $CUDA_WARNING_FLAGS"
"cuda/9.0.103 $CUDA_MODULE_LIST2 $CUDA_IBM_BUILD_LIST ${KOKKOS_PATH}/bin/nvcc_wrapper $CUDA_WARNING_FLAGS"
)
if [ -z "$ARCH_FLAG" ]; then
@ -210,8 +219,9 @@ elif [ "$MACHINE" = "bowman" ]; then
OLD_INTEL_BUILD_LIST="Pthread,Serial,Pthread_Serial"
# Format: (compiler module-list build-list exe-name warning-flag)
COMPILERS=("intel/16.2.181 $BASE_MODULE_LIST $OLD_INTEL_BUILD_LIST icpc $INTEL_WARNING_FLAGS"
"intel/17.0.098 $BASE_MODULE_LIST $INTEL_BUILD_LIST icpc $INTEL_WARNING_FLAGS"
COMPILERS=("intel/16.4.258 $BASE_MODULE_LIST $OLD_INTEL_BUILD_LIST icpc $INTEL_WARNING_FLAGS"
"intel/17.2.174 $BASE_MODULE_LIST $INTEL_BUILD_LIST icpc $INTEL_WARNING_FLAGS"
"intel/18.0.128 $BASE_MODULE_LIST $INTEL_BUILD_LIST icpc $INTEL_WARNING_FLAGS"
)
if [ -z "$ARCH_FLAG" ]; then
@ -241,13 +251,13 @@ elif [ "$MACHINE" = "shepard" ]; then
SKIP_HWLOC=True
export SLURM_TASKS_PER_NODE=32
BASE_MODULE_LIST="<COMPILER_NAME>/compilers/<COMPILER_VERSION>"
OLD_INTEL_BUILD_LIST="Pthread,Serial,Pthread_Serial"
BASE_MODULE_LIST="<COMPILER_NAME>/<COMPILER_VERSION>"
BASE_MODULE_LIST_INTEL="<COMPILER_NAME>/compilers/<COMPILER_VERSION>"
# Format: (compiler module-list build-list exe-name warning-flag)
COMPILERS=("intel/16.2.181 $BASE_MODULE_LIST $OLD_INTEL_BUILD_LIST icpc $INTEL_WARNING_FLAGS"
"intel/17.0.098 $BASE_MODULE_LIST $INTEL_BUILD_LIST icpc $INTEL_WARNING_FLAGS"
COMPILERS=("intel/17.4.196 $BASE_MODULE_LIST_INTEL $INTEL_BUILD_LIST icpc $INTEL_WARNING_FLAGS"
"intel/18.0.128 $BASE_MODULE_LIST_INTEL $INTEL_BUILD_LIST icpc $INTEL_WARNING_FLAGS"
"pgi/17.10.0 $BASE_MODULE_LIST $GCC_BUILD_LIST pgc++ $PGI_WARNING_FLAGS"
)
if [ -z "$ARCH_FLAG" ]; then
@ -280,7 +290,7 @@ elif [ "$MACHINE" = "apollo" ]; then
if [ "$SPOT_CHECK" = "True" ]; then
# Format: (compiler module-list build-list exe-name warning-flag)
COMPILERS=("gcc/4.7.2 $BASE_MODULE_LIST "OpenMP,Pthread" g++ $GCC_WARNING_FLAGS"
COMPILERS=("gcc/4.8.4 $BASE_MODULE_LIST "OpenMP,Pthread" g++ $GCC_WARNING_FLAGS"
"gcc/5.1.0 $BASE_MODULE_LIST "Serial" g++ $GCC_WARNING_FLAGS"
"intel/16.0.1 $BASE_MODULE_LIST "OpenMP" icpc $INTEL_WARNING_FLAGS"
"clang/3.9.0 $BASE_MODULE_LIST "Pthread_Serial" clang++ $CLANG_WARNING_FLAGS"
@ -292,14 +302,13 @@ elif [ "$MACHINE" = "apollo" ]; then
COMPILERS=("cuda/8.0.44 $CUDA8_MODULE_LIST $BUILD_LIST_CUDA_NVCC $KOKKOS_PATH/bin/nvcc_wrapper $CUDA_WARNING_FLAGS"
"clang/4.0.0 $CLANG_MODULE_LIST $BUILD_LIST_CUDA_CLANG clang++ $CUDA_WARNING_FLAGS"
"clang/3.9.0 $CLANG_MODULE_LIST $BUILD_LIST_CLANG clang++ $CLANG_WARNING_FLAGS"
"gcc/4.7.2 $BASE_MODULE_LIST $GCC_BUILD_LIST g++ $GCC_WARNING_FLAGS"
"gcc/4.8.4 $BASE_MODULE_LIST $GCC_BUILD_LIST g++ $GCC_WARNING_FLAGS"
"gcc/4.9.2 $BASE_MODULE_LIST $GCC_BUILD_LIST g++ $GCC_WARNING_FLAGS"
"gcc/4.9.3 $BASE_MODULE_LIST $GCC_BUILD_LIST g++ $GCC_WARNING_FLAGS"
"gcc/5.3.0 $BASE_MODULE_LIST $GCC_BUILD_LIST g++ $GCC_WARNING_FLAGS"
"gcc/6.1.0 $BASE_MODULE_LIST $GCC_BUILD_LIST g++ $GCC_WARNING_FLAGS"
"intel/14.0.4 $BASE_MODULE_LIST $INTEL_BUILD_LIST icpc $INTEL_WARNING_FLAGS"
"intel/15.0.2 $BASE_MODULE_LIST $INTEL_BUILD_LIST icpc $INTEL_WARNING_FLAGS"
"intel/16.0.1 $BASE_MODULE_LIST $INTEL_BUILD_LIST icpc $INTEL_WARNING_FLAGS"
"intel/17.0.1 $BASE_MODULE_LIST $INTEL_BUILD_LIST icpc $INTEL_WARNING_FLAGS"
"clang/3.5.2 $BASE_MODULE_LIST $CLANG_BUILD_LIST clang++ $CLANG_WARNING_FLAGS"
"clang/3.6.1 $BASE_MODULE_LIST $CLANG_BUILD_LIST clang++ $CLANG_WARNING_FLAGS"
"cuda/7.0.28 $CUDA_MODULE_LIST $CUDA_BUILD_LIST $KOKKOS_PATH/bin/nvcc_wrapper $CUDA_WARNING_FLAGS"
@ -336,6 +345,8 @@ if [ "$PRINT_HELP" = "True" ]; then
echo "--dry-run: Just print what would be executed"
echo "--build-only: Just do builds, don't run anything"
echo "--opt-flag=FLAG: Optimization flag (default: -O3)"
echo "--cxxflags-extra=FLAGS: Extra flags to be added to CXX_FLAGS"
echo "--ldflags-extra=FLAGS: Extra flags to be added to LD_FLAGS"
echo "--arch=ARCHITECTURE: overwrite architecture flags"
echo "--with-cuda-options=OPT: set KOKKOS_CUDA_OPTIONS"
echo "--build-list=BUILD,BUILD,BUILD..."
@ -361,14 +372,14 @@ if [ "$PRINT_HELP" = "True" ]; then
echo " Run all gcc tests"
echo " % test_all_sandia gcc"
echo ""
echo " Run all gcc/4.7.2 and all intel tests"
echo " % test_all_sandia gcc/4.7.2 intel"
echo " Run all gcc/4.8.4 and all intel tests"
echo " % test_all_sandia gcc/4.8.4 intel"
echo ""
echo " Run all tests in debug"
echo " % test_all_sandia --debug"
echo ""
echo " Run gcc/4.7.2 and only do OpenMP and OpenMP_Serial builds"
echo " % test_all_sandia gcc/4.7.2 --build-list=OpenMP,OpenMP_Serial"
echo " Run gcc/4.8.4 and only do OpenMP and OpenMP_Serial builds"
echo " % test_all_sandia gcc/4.8.4 --build-list=OpenMP,OpenMP_Serial"
echo ""
echo "If you want to kill the tests, do:"
echo " hit ctrl-z"
@ -566,10 +577,15 @@ single_build_and_test() {
if [[ "$build_type" = *debug* ]]; then
local extra_args="$extra_args --debug"
local cxxflags="-g $compiler_warning_flags"
local ldflags="-g"
else
local cxxflags="$OPT_FLAG $compiler_warning_flags"
local ldflags="${OPT_FLAG}"
fi
local cxxflags="${cxxflags} ${CXX_FLAGS_EXTRA}"
local ldflags="${ldflags} ${LD_FLAGS_EXTRA}"
if [[ "$KOKKOS_CUDA_OPTIONS" != "" ]]; then
local extra_args="$extra_args $KOKKOS_CUDA_OPTIONS"
fi
@ -586,7 +602,7 @@ single_build_and_test() {
run_cmd ls fake_problem >& ${desc}.configure.log || { report_and_log_test_result 1 $desc configure && return 0; }
fi
else
run_cmd ${KOKKOS_PATH}/generate_makefile.bash --with-devices=$build $ARCH_FLAG --compiler=$(which $compiler_exe) --cxxflags=\"$cxxflags\" $extra_args &>> ${desc}.configure.log || { report_and_log_test_result 1 ${desc} configure && return 0; }
run_cmd ${KOKKOS_PATH}/generate_makefile.bash --with-devices=$build $ARCH_FLAG --compiler=$(which $compiler_exe) --cxxflags=\"$cxxflags\" --ldflags=\"$ldflags\" $extra_args &>> ${desc}.configure.log || { report_and_log_test_result 1 ${desc} configure && return 0; }
local -i build_start_time=$(date +%s)
run_cmd make -j 32 build-test >& ${desc}.build.log || { report_and_log_test_result 1 ${desc} build && return 0; }
local -i build_end_time=$(date +%s)

View File

@ -1,6 +1,6 @@
#!/bin/bash -el
ulimit -c 0
module load devpack/openmpi/1.10.0/intel/16.1.056/cuda/none
module load devpack/openmpi/2.1.1/intel/17.4.196/cuda/none
KOKKOS_BRANCH=$1
TRILINOS_UPDATE_BRANCH=$2

View File

@ -1,6 +1,6 @@
#!/bin/bash -el
ulimit -c 0
module load devpack/openmpi/1.10.0/intel/16.1.056/cuda/none
module load devpack/openmpi/2.1.1/intel/17.4.196/cuda/none
KOKKOS_BRANCH=$1
TRILINOS_UPDATE_BRANCH=$2

View File

@ -2,7 +2,10 @@
TRIBITS_SUBPACKAGE(Containers)
ADD_SUBDIRECTORY(src)
IF(KOKKOS_HAS_TRILINOS)
ADD_SUBDIRECTORY(src)
ENDIF()
TRIBITS_ADD_TEST_DIRECTORIES(unit_tests)
TRIBITS_ADD_TEST_DIRECTORIES(performance_tests)

View File

@ -3,6 +3,14 @@ INCLUDE_DIRECTORIES(${CMAKE_CURRENT_BINARY_DIR})
INCLUDE_DIRECTORIES(REQUIRED_DURING_INSTALLATION_TESTING ${CMAKE_CURRENT_SOURCE_DIR})
INCLUDE_DIRECTORIES(${CMAKE_CURRENT_SOURCE_DIR}/../src )
IF(NOT KOKKOS_HAS_TRILINOS)
IF(KOKKOS_SEPARATE_LIBS)
set(TEST_LINK_TARGETS kokkoscore)
ELSE()
set(TEST_LINK_TARGETS kokkos)
ENDIF()
ENDIF()
SET(SOURCES
TestMain.cpp
TestCuda.cpp
@ -24,7 +32,7 @@ TRIBITS_ADD_EXECUTABLE(
PerfTestExec
SOURCES ${SOURCES}
COMM serial mpi
TESTONLYLIBS kokkos_gtest
TESTONLYLIBS kokkos_gtest ${TEST_LINK_TARGETS}
)
TRIBITS_ADD_TEST(

View File

@ -15,7 +15,8 @@ endif
CXXFLAGS = -O3
LINK ?= $(CXX)
LDFLAGS ?= -lpthread
LDFLAGS ?=
override LDFLAGS += -lpthread
include $(KOKKOS_PATH)/Makefile.kokkos
@ -30,6 +31,12 @@ ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
TEST_TARGETS += test-cuda
endif
ifeq ($(KOKKOS_INTERNAL_USE_ROCM), 1)
OBJ_ROCM = TestROCm.o TestMain.o gtest-all.o
TARGETS += KokkosContainers_PerformanceTest_ROCm
TEST_TARGETS += test-rocm
endif
ifeq ($(KOKKOS_INTERNAL_USE_PTHREADS), 1)
OBJ_THREADS = TestThreads.o TestMain.o gtest-all.o
TARGETS += KokkosContainers_PerformanceTest_Threads
@ -45,6 +52,9 @@ endif
KokkosContainers_PerformanceTest_Cuda: $(OBJ_CUDA) $(KOKKOS_LINK_DEPENDS)
$(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) $(OBJ_CUDA) $(KOKKOS_LIBS) $(LIB) -o KokkosContainers_PerformanceTest_Cuda
KokkosContainers_PerformanceTest_ROCm: $(OBJ_ROCM) $(KOKKOS_LINK_DEPENDS)
$(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) $(OBJ_ROCM) $(KOKKOS_LIBS) $(LIB) -o KokkosContainers_PerformanceTest_ROCm
KokkosContainers_PerformanceTest_Threads: $(OBJ_THREADS) $(KOKKOS_LINK_DEPENDS)
$(LINK) $(KOKKOS_LDFLAGS) $(LDFLAGS) $(EXTRA_PATH) $(OBJ_THREADS) $(KOKKOS_LIBS) $(LIB) -o KokkosContainers_PerformanceTest_Threads
@ -54,6 +64,9 @@ KokkosContainers_PerformanceTest_OpenMP: $(OBJ_OPENMP) $(KOKKOS_LINK_DEPENDS)
test-cuda: KokkosContainers_PerformanceTest_Cuda
./KokkosContainers_PerformanceTest_Cuda
test-rocm: KokkosContainers_PerformanceTest_ROCm
./KokkosContainers_PerformanceTest_ROCm
test-threads: KokkosContainers_PerformanceTest_Threads
./KokkosContainers_PerformanceTest_Threads

View File

@ -180,8 +180,8 @@ void test_dynrankview_op_perf( const int par_size )
typedef DeviceType execution_space;
typedef typename execution_space::size_type size_type;
const size_type dim2 = 90;
const size_type dim3 = 30;
const size_type dim_2 = 90;
const size_type dim_3 = 30;
double elapsed_time_view = 0;
double elapsed_time_compview = 0;
@ -191,7 +191,7 @@ void test_dynrankview_op_perf( const int par_size )
double elapsed_time_compdrview = 0;
Kokkos::Timer timer;
{
Kokkos::View<double***,DeviceType> testview("testview",par_size,dim2,dim3);
Kokkos::View<double***,DeviceType> testview("testview",par_size,dim_2,dim_3);
typedef InitViewFunctor<DeviceType> FunctorType;
timer.reset();
@ -220,7 +220,7 @@ void test_dynrankview_op_perf( const int par_size )
std::cout << " Strided View time (init only): " << elapsed_time_strideview << std::endl;
}
{
Kokkos::View<double*******,DeviceType> testview("testview",par_size,dim2,dim3,1,1,1,1);
Kokkos::View<double*******,DeviceType> testview("testview",par_size,dim_2,dim_3,1,1,1,1);
typedef InitViewRank7Functor<DeviceType> FunctorType;
timer.reset();
@ -231,7 +231,7 @@ void test_dynrankview_op_perf( const int par_size )
std::cout << " View Rank7 time (init only): " << elapsed_time_view_rank7 << std::endl;
}
{
Kokkos::DynRankView<double,DeviceType> testdrview("testdrview",par_size,dim2,dim3);
Kokkos::DynRankView<double,DeviceType> testdrview("testdrview",par_size,dim_2,dim_3);
typedef InitDynRankViewFunctor<DeviceType> FunctorType;
timer.reset();

View File

@ -54,6 +54,7 @@
#include <TestUnorderedMapPerformance.hpp>
#include <TestDynRankView.hpp>
#include <TestScatterView.hpp>
#include <iomanip>
#include <sstream>
@ -122,6 +123,18 @@ TEST_F( openmp, unordered_map_performance_far)
Perf::run_performance_tests<Kokkos::OpenMP,false>(base_file_name.str());
}
TEST_F( openmp, scatter_view)
{
std::cout << "ScatterView data-duplicated test:\n";
Perf::test_scatter_view<Kokkos::OpenMP, Kokkos::LayoutRight,
Kokkos::Experimental::ScatterDuplicated,
Kokkos::Experimental::ScatterNonAtomic>(10, 1000 * 1000);
//std::cout << "ScatterView atomics test:\n";
//Perf::test_scatter_view<Kokkos::OpenMP, Kokkos::LayoutRight,
// Kokkos::Experimental::ScatterNonDuplicated,
// Kokkos::Experimental::ScatterAtomic>(10, 1000 * 1000);
}
} // namespace test
#else
void KOKKOS_CONTAINERS_PERFORMANCE_TESTS_TESTOPENMP_PREVENT_EMPTY_LINK_ERROR() {}

View File

@ -0,0 +1,113 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#include <Kokkos_Macros.hpp>
#if defined( KOKKOS_ENABLE_ROCM )
#include <cstdint>
#include <string>
#include <iostream>
#include <iomanip>
#include <sstream>
#include <fstream>
#include <gtest/gtest.h>
#include <Kokkos_Core.hpp>
#include <TestDynRankView.hpp>
#include <Kokkos_UnorderedMap.hpp>
#include <TestGlobal2LocalIds.hpp>
#include <TestUnorderedMapPerformance.hpp>
namespace Performance {
class rocm : public ::testing::Test {
protected:
static void SetUpTestCase()
{
std::cout << std::setprecision(5) << std::scientific;
Kokkos::HostSpace::execution_space::initialize();
Kokkos::Experimental::ROCm::initialize( Kokkos::Experimental::ROCm::SelectDevice(0) );
}
static void TearDownTestCase()
{
Kokkos::Experimental::ROCm::finalize();
Kokkos::HostSpace::execution_space::finalize();
}
};
#if 0
// issue 1089
TEST_F( rocm, dynrankview_perf )
{
std::cout << "ROCm" << std::endl;
std::cout << " DynRankView vs View: Initialization Only " << std::endl;
test_dynrankview_op_perf<Kokkos::Experimental::ROCm>( 40960 );
}
TEST_F( rocm, global_2_local)
{
std::cout << "ROCm" << std::endl;
std::cout << "size, create, generate, fill, find" << std::endl;
for (unsigned i=Performance::begin_id_size; i<=Performance::end_id_size; i *= Performance::id_step)
test_global_to_local_ids<Kokkos::Experimental::ROCm>(i);
}
#endif
TEST_F( rocm, unordered_map_performance_near)
{
Perf::run_performance_tests<Kokkos::Experimental::ROCm,true>("rocm-near");
}
TEST_F( rocm, unordered_map_performance_far)
{
Perf::run_performance_tests<Kokkos::Experimental::ROCm,false>("rocm-far");
}
}
#else
void KOKKOS_CONTAINERS_PERFORMANCE_TESTS_TESTROCM_PREVENT_EMPTY_LINK_ERROR() {}
#endif /* #if defined( KOKKOS_ENABLE_ROCM ) */

View File

@ -0,0 +1,113 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#ifndef KOKKOS_TEST_SCATTER_VIEW_HPP
#define KOKKOS_TEST_SCATTER_VIEW_HPP
#include <Kokkos_ScatterView.hpp>
#include <impl/Kokkos_Timer.hpp>
namespace Perf {
template <typename ExecSpace, typename Layout, int duplication, int contribution>
void test_scatter_view(int m, int n)
{
Kokkos::View<double *[3], Layout, ExecSpace> original_view("original_view", n);
{
auto scatter_view = Kokkos::Experimental::create_scatter_view
< Kokkos::Experimental::ScatterSum
, duplication
, contribution
> (original_view);
Kokkos::Experimental::UniqueToken<
ExecSpace, Kokkos::Experimental::UniqueTokenScope::Global>
unique_token{ExecSpace()};
//auto internal_view = scatter_view.internal_view;
auto policy = Kokkos::RangePolicy<ExecSpace, int>(0, n);
for (int foo = 0; foo < 5; ++foo) {
{
auto num_threads = unique_token.size();
std::cout << "num_threads " << num_threads << '\n';
Kokkos::View<double **[3], Layout, ExecSpace> hand_coded_duplicate_view("hand_coded_duplicate", num_threads, n);
auto f2 = KOKKOS_LAMBDA(int i) {
auto thread_id = unique_token.acquire();
for (int j = 0; j < 10; ++j) {
auto k = (i + j) % n;
hand_coded_duplicate_view(thread_id, k, 0) += 4.2;
hand_coded_duplicate_view(thread_id, k, 1) += 2.0;
hand_coded_duplicate_view(thread_id, k, 2) += 1.0;
}
};
Kokkos::Timer timer;
timer.reset();
for (int k = 0; k < m; ++k) {
Kokkos::parallel_for(policy, f2, "hand_coded_duplicate_scatter_view_test");
}
auto t = timer.seconds();
std::cout << "hand-coded test took " << t << " seconds\n";
}
{
auto f = KOKKOS_LAMBDA(int i) {
auto scatter_access = scatter_view.access();
for (int j = 0; j < 10; ++j) {
auto k = (i + j) % n;
scatter_access(k, 0) += 4.2;
scatter_access(k, 1) += 2.0;
scatter_access(k, 2) += 1.0;
}
};
Kokkos::Timer timer;
timer.reset();
for (int k = 0; k < m; ++k) {
Kokkos::parallel_for(policy, f, "scatter_view_test");
}
auto t = timer.seconds();
std::cout << "test took " << t << " seconds\n";
}
}
}
}
}
#endif

View File

@ -6,26 +6,42 @@ INCLUDE_DIRECTORIES(${CMAKE_CURRENT_SOURCE_DIR})
#-----------------------------------------------------------------------------
SET(HEADERS "")
SET(SOURCES "")
SET(HEADERS_IMPL "")
FILE(GLOB HEADERS *.hpp)
FILE(GLOB HEADERS_IMPL impl/*.hpp)
FILE(GLOB SOURCES impl/*.cpp)
SET(TRILINOS_INCDIR ${CMAKE_INSTALL_PREFIX}/${${PROJECT_NAME}_INSTALL_INCLUDE_DIR})
INSTALL(FILES ${HEADERS_IMPL} DESTINATION ${TRILINOS_INCDIR}/impl/)
if(KOKKOS_LEGACY_TRIBITS)
TRIBITS_ADD_LIBRARY(
kokkoscontainers
HEADERS ${HEADERS}
NOINSTALLHEADERS ${HEADERS_IMPL}
SOURCES ${SOURCES}
DEPLIBS
)
SET(HEADERS "")
SET(SOURCES "")
SET(HEADERS_IMPL "")
FILE(GLOB HEADERS *.hpp)
FILE(GLOB HEADERS_IMPL impl/*.hpp)
FILE(GLOB SOURCES impl/*.cpp)
INSTALL(FILES ${HEADERS_IMPL} DESTINATION ${TRILINOS_INCDIR}/impl/)
TRIBITS_ADD_LIBRARY(
kokkoscontainers
HEADERS ${HEADERS}
NOINSTALLHEADERS ${HEADERS_IMPL}
SOURCES ${SOURCES}
DEPLIBS
)
else()
INSTALL (
DIRECTORY "${CMAKE_CURRENT_SOURCE_DIR}/"
DESTINATION ${TRILINOS_INCDIR}
FILES_MATCHING PATTERN "*.hpp"
)
TRIBITS_ADD_LIBRARY(
kokkoscontainers
SOURCES ${KOKKOS_CONTAINERS_SRCS}
DEPLIBS
)
endif()
#-----------------------------------------------------------------------------

View File

@ -72,8 +72,10 @@ private:
, "DynamicView must be rank-one" );
static_assert( std::is_trivial< typename traits::value_type >::value &&
std::is_same< typename traits::specialize , void >::value
, "DynamicView must have trivial data type" );
std::is_same< typename traits::specialize , void >::value &&
Kokkos::Impl::is_power_of_two
<sizeof(typename traits::value_type)>::value
, "DynamicView must have trivial value_type and sizeof(value_type) is a power-of-two");
template< class Space , bool = Kokkos::Impl::MemorySpaceAccess< Space , typename traits::memory_space >::accessible > struct verify_space

View File

@ -0,0 +1,999 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
/// \file Kokkos_ScatterView.hpp
/// \brief Declaration and definition of Kokkos::ScatterView.
///
/// This header file declares and defines Kokkos::ScatterView and its
/// related nonmember functions.
#ifndef KOKKOS_SCATTER_VIEW_HPP
#define KOKKOS_SCATTER_VIEW_HPP
#include <Kokkos_Core.hpp>
#include <utility>
namespace Kokkos {
namespace Experimental {
//TODO: replace this enum with the Kokkos::Sum, etc reducers for parallel_reduce
enum : int {
ScatterSum,
};
enum : int {
ScatterNonDuplicated = 0,
ScatterDuplicated = 1
};
enum : int {
ScatterNonAtomic = 0,
ScatterAtomic = 1
};
}} // Kokkos::Experimental
namespace Kokkos {
namespace Impl {
namespace Experimental {
template <typename ExecSpace>
struct DefaultDuplication;
template <typename ExecSpace, int duplication>
struct DefaultContribution;
#ifdef KOKKOS_ENABLE_SERIAL
template <>
struct DefaultDuplication<Kokkos::Serial> {
enum : int { value = Kokkos::Experimental::ScatterNonDuplicated };
};
template <>
struct DefaultContribution<Kokkos::Serial, Kokkos::Experimental::ScatterNonDuplicated> {
enum : int { value = Kokkos::Experimental::ScatterNonAtomic };
};
template <>
struct DefaultContribution<Kokkos::Serial, Kokkos::Experimental::ScatterDuplicated> {
enum : int { value = Kokkos::Experimental::ScatterNonAtomic };
};
#endif
#ifdef KOKKOS_ENABLE_OPENMP
template <>
struct DefaultDuplication<Kokkos::OpenMP> {
enum : int { value = Kokkos::Experimental::ScatterDuplicated };
};
template <>
struct DefaultContribution<Kokkos::OpenMP, Kokkos::Experimental::ScatterNonDuplicated> {
enum : int { value = Kokkos::Experimental::ScatterAtomic };
};
template <>
struct DefaultContribution<Kokkos::OpenMP, Kokkos::Experimental::ScatterDuplicated> {
enum : int { value = Kokkos::Experimental::ScatterNonAtomic };
};
#endif
#ifdef KOKKOS_ENABLE_THREADS
template <>
struct DefaultDuplication<Kokkos::Threads> {
enum : int { value = Kokkos::Experimental::ScatterDuplicated };
};
template <>
struct DefaultContribution<Kokkos::Threads, Kokkos::Experimental::ScatterNonDuplicated> {
enum : int { value = Kokkos::Experimental::ScatterAtomic };
};
template <>
struct DefaultContribution<Kokkos::Threads, Kokkos::Experimental::ScatterDuplicated> {
enum : int { value = Kokkos::Experimental::ScatterNonAtomic };
};
#endif
#ifdef KOKKOS_ENABLE_CUDA
template <>
struct DefaultDuplication<Kokkos::Cuda> {
enum : int { value = Kokkos::Experimental::ScatterNonDuplicated };
};
template <>
struct DefaultContribution<Kokkos::Cuda, Kokkos::Experimental::ScatterNonDuplicated> {
enum : int { value = Kokkos::Experimental::ScatterAtomic };
};
template <>
struct DefaultContribution<Kokkos::Cuda, Kokkos::Experimental::ScatterDuplicated> {
enum : int { value = Kokkos::Experimental::ScatterAtomic };
};
#endif
/* ScatterValue is the object returned by the access operator() of ScatterAccess,
similar to that returned by an Atomic View, it wraps Kokkos::atomic_add with convenient
operator+=, etc. */
template <typename ValueType, int Op, int contribution>
struct ScatterValue;
template <typename ValueType>
struct ScatterValue<ValueType, Kokkos::Experimental::ScatterSum, Kokkos::Experimental::ScatterNonAtomic> {
public:
KOKKOS_FORCEINLINE_FUNCTION ScatterValue(ValueType& value_in) : value( value_in ) {}
KOKKOS_FORCEINLINE_FUNCTION ScatterValue(ScatterValue&& other) : value( other.value ) {}
KOKKOS_FORCEINLINE_FUNCTION void operator+=(ValueType const& rhs) {
value += rhs;
}
KOKKOS_FORCEINLINE_FUNCTION void operator-=(ValueType const& rhs) {
value -= rhs;
}
private:
ValueType& value;
};
template <typename ValueType>
struct ScatterValue<ValueType, Kokkos::Experimental::ScatterSum, Kokkos::Experimental::ScatterAtomic> {
public:
KOKKOS_FORCEINLINE_FUNCTION ScatterValue(ValueType& value_in) : value( value_in ) {}
KOKKOS_FORCEINLINE_FUNCTION void operator+=(ValueType const& rhs) {
Kokkos::atomic_add(&value, rhs);
}
KOKKOS_FORCEINLINE_FUNCTION void operator-=(ValueType const& rhs) {
Kokkos::atomic_add(&value, -rhs);
}
private:
ValueType& value;
};
/* DuplicatedDataType, given a View DataType, will create a new DataType
that has a new runtime dimension which becomes the largest-stride dimension.
In the case of LayoutLeft, due to the limitation induced by the design of DataType
itself, it must convert any existing compile-time dimensions into runtime dimensions. */
template <typename T, typename Layout>
struct DuplicatedDataType;
template <typename T>
struct DuplicatedDataType<T, Kokkos::LayoutRight> {
typedef T* value_type; // For LayoutRight, add a star all the way on the left
};
template <typename T, size_t N>
struct DuplicatedDataType<T[N], Kokkos::LayoutRight> {
typedef typename DuplicatedDataType<T, Kokkos::LayoutRight>::value_type value_type[N];
};
template <typename T>
struct DuplicatedDataType<T[], Kokkos::LayoutRight> {
typedef typename DuplicatedDataType<T, Kokkos::LayoutRight>::value_type value_type[];
};
template <typename T>
struct DuplicatedDataType<T*, Kokkos::LayoutRight> {
typedef typename DuplicatedDataType<T, Kokkos::LayoutRight>::value_type* value_type;
};
template <typename T>
struct DuplicatedDataType<T, Kokkos::LayoutLeft> {
typedef T* value_type;
};
template <typename T, size_t N>
struct DuplicatedDataType<T[N], Kokkos::LayoutLeft> {
typedef typename DuplicatedDataType<T, Kokkos::LayoutLeft>::value_type* value_type;
};
template <typename T>
struct DuplicatedDataType<T[], Kokkos::LayoutLeft> {
typedef typename DuplicatedDataType<T, Kokkos::LayoutLeft>::value_type* value_type;
};
template <typename T>
struct DuplicatedDataType<T*, Kokkos::LayoutLeft> {
typedef typename DuplicatedDataType<T, Kokkos::LayoutLeft>::value_type* value_type;
};
/* Slice is just responsible for stuffing the correct number of Kokkos::ALL
arguments on the correct side of the index in a call to subview() to get a
subview where the index specified is the largest-stride one. */
template <typename Layout, int rank, typename V, typename ... Args>
struct Slice {
typedef Slice<Layout, rank - 1, V, Kokkos::Impl::ALL_t, Args...> next;
typedef typename next::value_type value_type;
static
value_type get(V const& src, const size_t i, Args ... args) {
return next::get(src, i, Kokkos::ALL, args...);
}
};
template <typename V, typename ... Args>
struct Slice<Kokkos::LayoutRight, 1, V, Args...> {
typedef typename Kokkos::Impl::ViewMapping
< void
, V
, const size_t
, Args ...
>::type value_type;
static
value_type get(V const& src, const size_t i, Args ... args) {
return Kokkos::subview(src, i, args...);
}
};
template <typename V, typename ... Args>
struct Slice<Kokkos::LayoutLeft, 1, V, Args...> {
typedef typename Kokkos::Impl::ViewMapping
< void
, V
, Args ...
, const size_t
>::type value_type;
static
value_type get(V const& src, const size_t i, Args ... args) {
return Kokkos::subview(src, args..., i);
}
};
template <typename ExecSpace, typename ValueType, int Op>
struct ReduceDuplicates;
template <typename ExecSpace, typename ValueType, int Op>
struct ReduceDuplicatesBase {
typedef ReduceDuplicates<ExecSpace, ValueType, Op> Derived;
ValueType const* src;
ValueType* dst;
size_t stride;
size_t start;
size_t n;
ReduceDuplicatesBase(ValueType const* src_in, ValueType* dest_in, size_t stride_in, size_t start_in, size_t n_in, std::string const& name)
: src(src_in)
, dst(dest_in)
, stride(stride_in)
, start(start_in)
, n(n_in)
{
#if defined(KOKKOS_ENABLE_PROFILING)
uint64_t kpID = 0;
if(Kokkos::Profiling::profileLibraryLoaded()) {
Kokkos::Profiling::beginParallelFor(std::string("reduce_") + name, 0, &kpID);
}
#endif
typedef RangePolicy<ExecSpace, size_t> policy_type;
typedef Kokkos::Impl::ParallelFor<Derived, policy_type> closure_type;
const closure_type closure(*(static_cast<Derived*>(this)), policy_type(0, stride));
closure.execute();
#if defined(KOKKOS_ENABLE_PROFILING)
if(Kokkos::Profiling::profileLibraryLoaded()) {
Kokkos::Profiling::endParallelFor(kpID);
}
#endif
}
};
template <typename ExecSpace, typename ValueType>
struct ReduceDuplicates<ExecSpace, ValueType, Kokkos::Experimental::ScatterSum> :
public ReduceDuplicatesBase<ExecSpace, ValueType, Kokkos::Experimental::ScatterSum>
{
typedef ReduceDuplicatesBase<ExecSpace, ValueType, Kokkos::Experimental::ScatterSum> Base;
ReduceDuplicates(ValueType const* src_in, ValueType* dst_in, size_t stride_in, size_t start_in, size_t n_in, std::string const& name):
Base(src_in, dst_in, stride_in, start_in, n_in, name)
{}
KOKKOS_FORCEINLINE_FUNCTION void operator()(size_t i) const {
for (size_t j = Base::start; j < Base::n; ++j) {
Base::dst[i] += Base::src[i + Base::stride * j];
}
}
};
template <typename ExecSpace, typename ValueType, int Op>
struct ResetDuplicates;
template <typename ExecSpace, typename ValueType, int Op>
struct ResetDuplicatesBase {
typedef ResetDuplicates<ExecSpace, ValueType, Op> Derived;
ValueType* data;
ResetDuplicatesBase(ValueType* data_in, size_t size_in, std::string const& name)
: data(data_in)
{
#if defined(KOKKOS_ENABLE_PROFILING)
uint64_t kpID = 0;
if(Kokkos::Profiling::profileLibraryLoaded()) {
Kokkos::Profiling::beginParallelFor(std::string("reduce_") + name, 0, &kpID);
}
#endif
typedef RangePolicy<ExecSpace, size_t> policy_type;
typedef Kokkos::Impl::ParallelFor<Derived, policy_type> closure_type;
const closure_type closure(*(static_cast<Derived*>(this)), policy_type(0, size_in));
closure.execute();
#if defined(KOKKOS_ENABLE_PROFILING)
if(Kokkos::Profiling::profileLibraryLoaded()) {
Kokkos::Profiling::endParallelFor(kpID);
}
#endif
}
};
template <typename ExecSpace, typename ValueType>
struct ResetDuplicates<ExecSpace, ValueType, Kokkos::Experimental::ScatterSum> :
public ResetDuplicatesBase<ExecSpace, ValueType, Kokkos::Experimental::ScatterSum>
{
typedef ResetDuplicatesBase<ExecSpace, ValueType, Kokkos::Experimental::ScatterSum> Base;
ResetDuplicates(ValueType* data_in, size_t size_in, std::string const& name):
Base(data_in, size_in, name)
{}
KOKKOS_FORCEINLINE_FUNCTION void operator()(size_t i) const {
Base::data[i] = Kokkos::reduction_identity<ValueType>::sum();
}
};
}}} // Kokkos::Impl::Experimental
namespace Kokkos {
namespace Experimental {
template <typename DataType
,typename Layout = Kokkos::DefaultExecutionSpace::array_layout
,typename ExecSpace = Kokkos::DefaultExecutionSpace
,int Op = ScatterSum
,int duplication = Kokkos::Impl::Experimental::DefaultDuplication<ExecSpace>::value
,int contribution = Kokkos::Impl::Experimental::DefaultContribution<ExecSpace, duplication>::value
>
class ScatterView;
template <typename DataType
,int Op
,typename ExecSpace
,typename Layout
,int duplication
,int contribution
,int override_contribution
>
class ScatterAccess;
// non-duplicated implementation
template <typename DataType
,int Op
,typename ExecSpace
,typename Layout
,int contribution
>
class ScatterView<DataType
,Layout
,ExecSpace
,Op
,ScatterNonDuplicated
,contribution>
{
public:
typedef Kokkos::View<DataType, Layout, ExecSpace> original_view_type;
typedef typename original_view_type::value_type original_value_type;
typedef typename original_view_type::reference_type original_reference_type;
friend class ScatterAccess<DataType, Op, ExecSpace, Layout, ScatterNonDuplicated, contribution, ScatterNonAtomic>;
friend class ScatterAccess<DataType, Op, ExecSpace, Layout, ScatterNonDuplicated, contribution, ScatterAtomic>;
ScatterView()
{
}
template <typename RT, typename ... RP>
ScatterView(View<RT, RP...> const& original_view)
: internal_view(original_view)
{
}
template <typename ... Dims>
ScatterView(std::string const& name, Dims ... dims)
: internal_view(name, dims ...)
{
}
template <int override_contrib = contribution>
KOKKOS_FORCEINLINE_FUNCTION
ScatterAccess<DataType, Op, ExecSpace, Layout, ScatterNonDuplicated, contribution, override_contrib>
access() const {
return ScatterAccess<DataType, Op, ExecSpace, Layout, ScatterNonDuplicated, contribution, override_contrib>{*this};
}
original_view_type subview() const {
return internal_view;
}
template <typename DT, typename ... RP>
void contribute_into(View<DT, RP...> const& dest) const
{
typedef View<DT, RP...> dest_type;
static_assert(std::is_same<
typename dest_type::array_layout,
Layout>::value,
"ScatterView contribute destination has different layout");
static_assert(Kokkos::Impl::VerifyExecutionCanAccessMemorySpace<
typename ExecSpace::memory_space,
typename dest_type::memory_space>::value,
"ScatterView contribute destination memory space not accessible");
if (dest.data() == internal_view.data()) return;
Kokkos::Impl::Experimental::ReduceDuplicates<ExecSpace, original_value_type, Op>(
internal_view.data(),
dest.data(),
0,
0,
1,
internal_view.label());
}
void reset() {
Kokkos::Impl::Experimental::ResetDuplicates<ExecSpace, original_value_type, Op>(
internal_view.data(),
internal_view.size(),
internal_view.label());
}
template <typename DT, typename ... RP>
void reset_except(View<DT, RP...> const& view) {
if (view.data() != internal_view.data()) reset();
}
void resize(const size_t n0 = 0,
const size_t n1 = 0,
const size_t n2 = 0,
const size_t n3 = 0,
const size_t n4 = 0,
const size_t n5 = 0,
const size_t n6 = 0,
const size_t n7 = 0) {
::Kokkos::resize(internal_view,n0,n1,n2,n3,n4,n5,n6,n7);
}
void realloc(const size_t n0 = 0,
const size_t n1 = 0,
const size_t n2 = 0,
const size_t n3 = 0,
const size_t n4 = 0,
const size_t n5 = 0,
const size_t n6 = 0,
const size_t n7 = 0) {
::Kokkos::realloc(internal_view,n0,n1,n2,n3,n4,n5,n6,n7);
}
protected:
template <typename ... Args>
KOKKOS_FORCEINLINE_FUNCTION
original_reference_type at(Args ... args) const {
return internal_view(args...);
}
private:
typedef original_view_type internal_view_type;
internal_view_type internal_view;
};
template <typename DataType
,int Op
,typename ExecSpace
,typename Layout
,int contribution
,int override_contribution
>
class ScatterAccess<DataType
,Op
,ExecSpace
,Layout
,ScatterNonDuplicated
,contribution
,override_contribution>
{
public:
typedef ScatterView<DataType, Layout, ExecSpace, Op, ScatterNonDuplicated, contribution> view_type;
typedef typename view_type::original_value_type original_value_type;
typedef Kokkos::Impl::Experimental::ScatterValue<
original_value_type, Op, override_contribution> value_type;
KOKKOS_INLINE_FUNCTION
ScatterAccess(view_type const& view_in)
: view(view_in)
{
}
template <typename ... Args>
KOKKOS_FORCEINLINE_FUNCTION
value_type operator()(Args ... args) const {
return view.at(args...);
}
template <typename Arg>
KOKKOS_FORCEINLINE_FUNCTION
typename std::enable_if<view_type::original_view_type::rank == 1 &&
std::is_integral<Arg>::value, value_type>::type
operator[](Arg arg) const {
return view.at(arg);
}
private:
view_type const& view;
};
// duplicated implementation
// LayoutLeft and LayoutRight are different enough that we'll just specialize each
template <typename DataType
,int Op
,typename ExecSpace
,int contribution
>
class ScatterView<DataType
,Kokkos::LayoutRight
,ExecSpace
,Op
,ScatterDuplicated
,contribution>
{
public:
typedef Kokkos::View<DataType, Kokkos::LayoutRight, ExecSpace> original_view_type;
typedef typename original_view_type::value_type original_value_type;
typedef typename original_view_type::reference_type original_reference_type;
friend class ScatterAccess<DataType, Op, ExecSpace, Kokkos::LayoutRight, ScatterDuplicated, contribution, ScatterNonAtomic>;
friend class ScatterAccess<DataType, Op, ExecSpace, Kokkos::LayoutRight, ScatterDuplicated, contribution, ScatterAtomic>;
typedef typename Kokkos::Impl::Experimental::DuplicatedDataType<DataType, Kokkos::LayoutRight> data_type_info;
typedef typename data_type_info::value_type internal_data_type;
typedef Kokkos::View<internal_data_type, Kokkos::LayoutRight, ExecSpace> internal_view_type;
ScatterView()
{
}
template <typename RT, typename ... RP >
ScatterView(View<RT, RP...> const& original_view)
: unique_token()
, internal_view(Kokkos::ViewAllocateWithoutInitializing(
std::string("duplicated_") + original_view.label()),
unique_token.size(),
original_view.extent(0),
original_view.extent(1),
original_view.extent(2),
original_view.extent(3),
original_view.extent(4),
original_view.extent(5),
original_view.extent(6))
{
reset();
}
template <typename ... Dims>
ScatterView(std::string const& name, Dims ... dims)
: internal_view(Kokkos::ViewAllocateWithoutInitializing(name), unique_token.size(), dims ...)
{
reset();
}
template <int override_contribution = contribution>
inline
ScatterAccess<DataType, Op, ExecSpace, Kokkos::LayoutRight, ScatterDuplicated, contribution, override_contribution>
access() const {
return ScatterAccess<DataType, Op, ExecSpace, Kokkos::LayoutRight, ScatterDuplicated, contribution, override_contribution>{*this};
}
typename Kokkos::Impl::Experimental::Slice<
Kokkos::LayoutRight, internal_view_type::rank, internal_view_type>::value_type
subview() const
{
return Kokkos::Impl::Experimental::Slice<
Kokkos::LayoutRight, internal_view_type::Rank, internal_view_type>::get(internal_view, 0);
}
template <typename DT, typename ... RP>
void contribute_into(View<DT, RP...> const& dest) const
{
typedef View<DT, RP...> dest_type;
static_assert(std::is_same<
typename dest_type::array_layout,
Kokkos::LayoutRight>::value,
"ScatterView deep_copy destination has different layout");
static_assert(Kokkos::Impl::VerifyExecutionCanAccessMemorySpace<
typename ExecSpace::memory_space,
typename dest_type::memory_space>::value,
"ScatterView deep_copy destination memory space not accessible");
size_t strides[8];
internal_view.stride(strides);
bool is_equal = (dest.data() == internal_view.data());
size_t start = is_equal ? 1 : 0;
Kokkos::Impl::Experimental::ReduceDuplicates<ExecSpace, original_value_type, Op>(
internal_view.data(),
dest.data(),
strides[0],
start,
internal_view.extent(0),
internal_view.label());
}
void reset() {
Kokkos::Impl::Experimental::ResetDuplicates<ExecSpace, original_value_type, Op>(
internal_view.data(),
internal_view.size(),
internal_view.label());
}
template <typename DT, typename ... RP>
void reset_except(View<DT, RP...> const& view) {
if (view.data() != internal_view.data()) {
reset();
return;
}
Kokkos::Impl::Experimental::ResetDuplicates<ExecSpace, original_value_type, Op>(
internal_view.data() + view.size(),
internal_view.size() - view.size(),
internal_view.label());
}
void resize(const size_t n0 = 0,
const size_t n1 = 0,
const size_t n2 = 0,
const size_t n3 = 0,
const size_t n4 = 0,
const size_t n5 = 0,
const size_t n6 = 0) {
::Kokkos::resize(internal_view,unique_token.size(),n0,n1,n2,n3,n4,n5,n6);
}
void realloc(const size_t n0 = 0,
const size_t n1 = 0,
const size_t n2 = 0,
const size_t n3 = 0,
const size_t n4 = 0,
const size_t n5 = 0,
const size_t n6 = 0) {
::Kokkos::realloc(internal_view,unique_token.size(),n0,n1,n2,n3,n4,n5,n6);
}
protected:
template <typename ... Args>
KOKKOS_FORCEINLINE_FUNCTION
original_reference_type at(int rank, Args ... args) const {
return internal_view(rank, args...);
}
protected:
typedef Kokkos::Experimental::UniqueToken<
ExecSpace, Kokkos::Experimental::UniqueTokenScope::Global> unique_token_type;
unique_token_type unique_token;
internal_view_type internal_view;
};
template <typename DataType
,int Op
,typename ExecSpace
,int contribution
>
class ScatterView<DataType
,Kokkos::LayoutLeft
,ExecSpace
,Op
,ScatterDuplicated
,contribution>
{
public:
typedef Kokkos::View<DataType, Kokkos::LayoutLeft, ExecSpace> original_view_type;
typedef typename original_view_type::value_type original_value_type;
typedef typename original_view_type::reference_type original_reference_type;
friend class ScatterAccess<DataType, Op, ExecSpace, Kokkos::LayoutLeft, ScatterDuplicated, contribution, ScatterNonAtomic>;
friend class ScatterAccess<DataType, Op, ExecSpace, Kokkos::LayoutLeft, ScatterDuplicated, contribution, ScatterAtomic>;
typedef typename Kokkos::Impl::Experimental::DuplicatedDataType<DataType, Kokkos::LayoutLeft> data_type_info;
typedef typename data_type_info::value_type internal_data_type;
typedef Kokkos::View<internal_data_type, Kokkos::LayoutLeft, ExecSpace> internal_view_type;
ScatterView()
{
}
template <typename RT, typename ... RP >
ScatterView(View<RT, RP...> const& original_view)
: unique_token()
{
size_t arg_N[8] = {
original_view.extent(0),
original_view.extent(1),
original_view.extent(2),
original_view.extent(3),
original_view.extent(4),
original_view.extent(5),
original_view.extent(6),
0
};
arg_N[internal_view_type::rank - 1] = unique_token.size();
internal_view = internal_view_type(
Kokkos::ViewAllocateWithoutInitializing(
std::string("duplicated_") + original_view.label()),
arg_N[0], arg_N[1], arg_N[2], arg_N[3],
arg_N[4], arg_N[5], arg_N[6], arg_N[7]);
reset();
}
template <typename ... Dims>
ScatterView(std::string const& name, Dims ... dims)
: internal_view(Kokkos::ViewAllocateWithoutInitializing(name), dims ..., unique_token.size())
{
reset();
}
template <int override_contribution = contribution>
inline
ScatterAccess<DataType, Op, ExecSpace, Kokkos::LayoutLeft, ScatterDuplicated, contribution, override_contribution>
access() const {
return ScatterAccess<DataType, Op, ExecSpace, Kokkos::LayoutLeft, ScatterDuplicated, contribution, override_contribution>{*this};
}
typename Kokkos::Impl::Experimental::Slice<
Kokkos::LayoutLeft, internal_view_type::rank, internal_view_type>::value_type
subview() const
{
return Kokkos::Impl::Experimental::Slice<
Kokkos::LayoutLeft, internal_view_type::rank, internal_view_type>::get(internal_view, 0);
}
template <typename ... RP>
void contribute_into(View<DataType, RP...> const& dest) const
{
typedef View<DataType, RP...> dest_type;
static_assert(std::is_same<
typename dest_type::array_layout,
Kokkos::LayoutLeft>::value,
"ScatterView deep_copy destination has different layout");
static_assert(Kokkos::Impl::VerifyExecutionCanAccessMemorySpace<
typename ExecSpace::memory_space,
typename dest_type::memory_space>::value,
"ScatterView deep_copy destination memory space not accessible");
size_t strides[8];
internal_view.stride(strides);
size_t stride = strides[internal_view_type::rank - 1];
auto extent = internal_view.extent(
internal_view_type::rank - 1);
bool is_equal = (dest.data() == internal_view.data());
size_t start = is_equal ? 1 : 0;
Kokkos::Impl::Experimental::ReduceDuplicates<ExecSpace, original_value_type, Op>(
internal_view.data(),
dest.data(),
stride,
start,
extent,
internal_view.label());
}
void reset() {
Kokkos::Impl::Experimental::ResetDuplicates<ExecSpace, original_value_type, Op>(
internal_view.data(),
internal_view.size(),
internal_view.label());
}
template <typename DT, typename ... RP>
void reset_except(View<DT, RP...> const& view) {
if (view.data() != internal_view.data()) {
reset();
return;
}
Kokkos::Impl::Experimental::ResetDuplicates<ExecSpace, original_value_type, Op>(
internal_view.data() + view.size(),
internal_view.size() - view.size(),
internal_view.label());
}
void resize(const size_t n0 = 0,
const size_t n1 = 0,
const size_t n2 = 0,
const size_t n3 = 0,
const size_t n4 = 0,
const size_t n5 = 0,
const size_t n6 = 0) {
size_t arg_N[8] = {n0,n1,n2,n3,n4,n5,n6,0};
const int i = internal_view.rank-1;
arg_N[i] = unique_token.size();
::Kokkos::resize(internal_view,
arg_N[0], arg_N[1], arg_N[2], arg_N[3],
arg_N[4], arg_N[5], arg_N[6], arg_N[7]);
}
void realloc(const size_t n0 = 0,
const size_t n1 = 0,
const size_t n2 = 0,
const size_t n3 = 0,
const size_t n4 = 0,
const size_t n5 = 0,
const size_t n6 = 0) {
size_t arg_N[8] = {n0,n1,n2,n3,n4,n5,n6,0};
const int i = internal_view.rank-1;
arg_N[i] = unique_token.size();
::Kokkos::realloc(internal_view,
arg_N[0], arg_N[1], arg_N[2], arg_N[3],
arg_N[4], arg_N[5], arg_N[6], arg_N[7]);
}
protected:
template <typename ... Args>
inline original_reference_type at(int thread_id, Args ... args) const {
return internal_view(args..., thread_id);
}
protected:
typedef Kokkos::Experimental::UniqueToken<
ExecSpace, Kokkos::Experimental::UniqueTokenScope::Global> unique_token_type;
unique_token_type unique_token;
internal_view_type internal_view;
};
/* This object has to be separate in order to store the thread ID, which cannot
be obtained until one is inside a parallel construct, and may be relatively
expensive to obtain at every contribution
(calls a non-inlined function, looks up a thread-local variable).
Due to the expense, it is sensible to query it at most once per parallel iterate
(ideally once per thread, but parallel_for doesn't expose that)
and then store it in a stack variable.
ScatterAccess serves as a non-const object on the stack which can store the thread ID */
template <typename DataType
,int Op
,typename ExecSpace
,typename Layout
,int contribution
,int override_contribution
>
class ScatterAccess<DataType
,Op
,ExecSpace
,Layout
,ScatterDuplicated
,contribution
,override_contribution>
{
public:
typedef ScatterView<DataType, Layout, ExecSpace, Op, ScatterDuplicated, contribution> view_type;
typedef typename view_type::original_value_type original_value_type;
typedef Kokkos::Impl::Experimental::ScatterValue<
original_value_type, Op, override_contribution> value_type;
inline ScatterAccess(view_type const& view_in)
: view(view_in)
, thread_id(view_in.unique_token.acquire()) {
}
inline ~ScatterAccess() {
if (thread_id != ~thread_id_type(0)) view.unique_token.release(thread_id);
}
template <typename ... Args>
KOKKOS_FORCEINLINE_FUNCTION
value_type operator()(Args ... args) const {
return view.at(thread_id, args...);
}
template <typename Arg>
KOKKOS_FORCEINLINE_FUNCTION
typename std::enable_if<view_type::original_view_type::rank == 1 &&
std::is_integral<Arg>::value, value_type>::type
operator[](Arg arg) const {
return view.at(thread_id, arg);
}
private:
view_type const& view;
// simplify RAII by disallowing copies
ScatterAccess(ScatterAccess const& other) = delete;
ScatterAccess& operator=(ScatterAccess const& other) = delete;
ScatterAccess& operator=(ScatterAccess&& other) = delete;
public:
// do need to allow moves though, for the common
// auto b = a.access();
// that assignments turns into a move constructor call
inline ScatterAccess(ScatterAccess&& other)
: view(other.view)
, thread_id(other.thread_id)
{
other.thread_id = ~thread_id_type(0);
}
private:
typedef typename view_type::unique_token_type unique_token_type;
typedef typename unique_token_type::size_type thread_id_type;
thread_id_type thread_id;
};
template <int Op = Kokkos::Experimental::ScatterSum,
int duplication = -1,
int contribution = -1,
typename RT, typename ... RP>
ScatterView
< RT
, typename ViewTraits<RT, RP...>::array_layout
, typename ViewTraits<RT, RP...>::execution_space
, Op
/* just setting defaults if not specified... things got messy because the view type
does not come before the duplication/contribution settings in the
template parameter list */
, duplication == -1 ? Kokkos::Impl::Experimental::DefaultDuplication<typename ViewTraits<RT, RP...>::execution_space>::value : duplication
, contribution == -1 ?
Kokkos::Impl::Experimental::DefaultContribution<
typename ViewTraits<RT, RP...>::execution_space,
(duplication == -1 ?
Kokkos::Impl::Experimental::DefaultDuplication<
typename ViewTraits<RT, RP...>::execution_space
>::value
: duplication
)
>::value
: contribution
>
create_scatter_view(View<RT, RP...> const& original_view) {
return original_view; // implicit ScatterView constructor call
}
}} // namespace Kokkos::Experimental
namespace Kokkos {
namespace Experimental {
template <typename DT1, typename DT2, typename LY, typename ES, int OP, int CT, int DP, typename ... VP>
void
contribute(View<DT1, VP...>& dest, Kokkos::Experimental::ScatterView<DT2, LY, ES, OP, CT, DP> const& src)
{
src.contribute_into(dest);
}
}} // namespace Kokkos::Experimental
namespace Kokkos {
template <typename DT, typename LY, typename ES, int OP, int CT, int DP, typename ... IS>
void
realloc(Kokkos::Experimental::ScatterView<DT, LY, ES, OP, CT, DP>& scatter_view, IS ... is)
{
scatter_view.realloc(is ...);
}
template <typename DT, typename LY, typename ES, int OP, int CT, int DP, typename ... IS>
void
resize(Kokkos::Experimental::ScatterView<DT, LY, ES, OP, CT, DP>& scatter_view, IS ... is)
{
scatter_view.resize(is ...);
}
} // namespace Kokkos
#endif

View File

@ -517,7 +517,7 @@ public:
size_type find_attempts = 0;
enum { bounded_find_attempts = 32u };
enum : unsigned { bounded_find_attempts = 32u };
const size_type max_attempts = (m_bounded_insert && (bounded_find_attempts < m_available_indexes.max_hint()) ) ?
bounded_find_attempts :
m_available_indexes.max_hint();

View File

@ -56,11 +56,12 @@
template< class Scalar, class Arg1Type = void>
class vector : public DualView<Scalar*,LayoutLeft,Arg1Type> {
public:
typedef Scalar value_type;
typedef Scalar* pointer;
typedef const Scalar* const_pointer;
typedef Scalar* reference;
typedef const Scalar* const_reference;
typedef Scalar& reference;
typedef const Scalar& const_reference;
typedef Scalar* iterator;
typedef const Scalar* const_iterator;
@ -73,11 +74,11 @@ private:
public:
#ifdef KOKKOS_ENABLE_CUDA_UVM
KOKKOS_INLINE_FUNCTION Scalar& operator() (int i) const {return DV::h_view(i);};
KOKKOS_INLINE_FUNCTION Scalar& operator[] (int i) const {return DV::h_view(i);};
KOKKOS_INLINE_FUNCTION reference operator() (int i) const {return DV::h_view(i);};
KOKKOS_INLINE_FUNCTION reference operator[] (int i) const {return DV::h_view(i);};
#else
inline Scalar& operator() (int i) const {return DV::h_view(i);};
inline Scalar& operator[] (int i) const {return DV::h_view(i);};
inline reference operator() (int i) const {return DV::h_view(i);};
inline reference operator[] (int i) const {return DV::h_view(i);};
#endif
/* Member functions which behave like std::vector functions */
@ -86,7 +87,7 @@ public:
_size = 0;
_extra_storage = 1.1;
DV::modified_host() = 1;
};
}
vector(int n, Scalar val=Scalar()):DualView<Scalar*,LayoutLeft,Arg1Type>("Vector",size_t(n*(1.1))) {
@ -146,25 +147,32 @@ public:
DV::h_view(_size) = val;
_size++;
};
}
void pop_back() {
_size--;
};
}
void clear() {
_size = 0;
}
size_type size() const {return _size;};
size_type size() const {return _size;}
size_type max_size() const {return 2000000000;}
size_type capacity() const {return DV::capacity();};
bool empty() const {return _size==0;};
size_type capacity() const {return DV::capacity();}
bool empty() const {return _size==0;}
iterator begin() const {return &DV::h_view(0);};
iterator begin() const {return &DV::h_view(0);}
iterator end() const {return &DV::h_view(_size);};
iterator end() const {return &DV::h_view(_size);}
reference front() {return DV::h_view(0);}
reference back() {return DV::h_view(_size - 1);}
const_reference front() const {return DV::h_view(0);}
const_reference back() const {return DV::h_view(_size - 1);}
/* std::algorithms wich work originally with iterators, here they are implemented as member functions */

View File

@ -3,7 +3,13 @@ INCLUDE_DIRECTORIES(${CMAKE_CURRENT_BINARY_DIR})
INCLUDE_DIRECTORIES(REQUIRED_DURING_INSTALLATION_TESTING ${CMAKE_CURRENT_SOURCE_DIR})
INCLUDE_DIRECTORIES(${CMAKE_CURRENT_SOURCE_DIR}/../src )
SET(LIBRARIES kokkoscore)
IF(NOT KOKKOS_HAS_TRILINOS)
IF(KOKKOS_SEPARATE_LIBS)
set(TEST_LINK_TARGETS kokkoscore)
ELSE()
set(TEST_LINK_TARGETS kokkos)
ENDIF()
ENDIF()
IF(Kokkos_ENABLE_Pthread)
TRIBITS_ADD_EXECUTABLE_AND_TEST(
@ -12,7 +18,7 @@ TRIBITS_ADD_EXECUTABLE_AND_TEST(
COMM serial mpi
NUM_MPI_PROCS 1
FAIL_REGULAR_EXPRESSION " FAILED "
TESTONLYLIBS kokkos_gtest
TESTONLYLIBS kokkos_gtest ${TEST_LINK_TARGETS}
)
ENDIF()
@ -23,7 +29,7 @@ TRIBITS_ADD_EXECUTABLE_AND_TEST(
COMM serial mpi
NUM_MPI_PROCS 1
FAIL_REGULAR_EXPRESSION " FAILED "
TESTONLYLIBS kokkos_gtest
TESTONLYLIBS kokkos_gtest ${TEST_LINK_TARGETS}
)
ENDIF()
@ -34,7 +40,7 @@ TRIBITS_ADD_EXECUTABLE_AND_TEST(
COMM serial mpi
NUM_MPI_PROCS 1
FAIL_REGULAR_EXPRESSION " FAILED "
TESTONLYLIBS kokkos_gtest
TESTONLYLIBS kokkos_gtest ${TEST_LINK_TARGETS}
)
ENDIF()
@ -45,7 +51,7 @@ TRIBITS_ADD_EXECUTABLE_AND_TEST(
COMM serial mpi
NUM_MPI_PROCS 1
FAIL_REGULAR_EXPRESSION " FAILED "
TESTONLYLIBS kokkos_gtest
TESTONLYLIBS kokkos_gtest ${TEST_LINK_TARGETS}
)
ENDIF()

View File

@ -15,7 +15,8 @@ endif
CXXFLAGS = -O3
LINK ?= $(CXX)
LDFLAGS ?= -lpthread
LDFLAGS ?=
override LDFLAGS += -lpthread
include $(KOKKOS_PATH)/Makefile.kokkos
@ -30,6 +31,12 @@ ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
TEST_TARGETS += test-cuda
endif
ifeq ($(KOKKOS_INTERNAL_USE_ROCM), 1)
OBJ_ROCM = TestROCm.o UnitTestMain.o gtest-all.o
TARGETS += KokkosContainers_UnitTest_ROCm
TEST_TARGETS += test-rocm
endif
ifeq ($(KOKKOS_INTERNAL_USE_PTHREADS), 1)
OBJ_THREADS = TestThreads.o UnitTestMain.o gtest-all.o
TARGETS += KokkosContainers_UnitTest_Threads
@ -51,6 +58,9 @@ endif
KokkosContainers_UnitTest_Cuda: $(OBJ_CUDA) $(KOKKOS_LINK_DEPENDS)
$(LINK) $(EXTRA_PATH) $(OBJ_CUDA) $(KOKKOS_LIBS) $(LIB) $(KOKKOS_LDFLAGS) $(LDFLAGS) -o KokkosContainers_UnitTest_Cuda
KokkosContainers_UnitTest_ROCm: $(OBJ_ROCM) $(KOKKOS_LINK_DEPENDS)
$(LINK) $(EXTRA_PATH) $(OBJ_ROCM) $(KOKKOS_LIBS) $(LIB) $(KOKKOS_LDFLAGS) $(LDFLAGS) -o KokkosContainers_UnitTest_ROCm
KokkosContainers_UnitTest_Threads: $(OBJ_THREADS) $(KOKKOS_LINK_DEPENDS)
$(LINK) $(EXTRA_PATH) $(OBJ_THREADS) $(KOKKOS_LIBS) $(LIB) $(KOKKOS_LDFLAGS) $(LDFLAGS) -o KokkosContainers_UnitTest_Threads
@ -63,6 +73,9 @@ KokkosContainers_UnitTest_Serial: $(OBJ_SERIAL) $(KOKKOS_LINK_DEPENDS)
test-cuda: KokkosContainers_UnitTest_Cuda
./KokkosContainers_UnitTest_Cuda
test-rocm: KokkosContainers_UnitTest_ROCm
./KokkosContainers_UnitTest_ROCm
test-threads: KokkosContainers_UnitTest_Threads
./KokkosContainers_UnitTest_Threads

View File

@ -62,6 +62,7 @@
#include <TestVector.hpp>
#include <TestDualView.hpp>
#include <TestDynamicView.hpp>
#include <TestScatterView.hpp>
#include <Kokkos_DynRankView.hpp>
#include <TestDynViewAPI.hpp>
@ -201,10 +202,18 @@ void cuda_test_bitset()
cuda_test_dualview_combinations(size); \
}
#define CUDA_SCATTERVIEW_TEST( size ) \
TEST_F( cuda, scatterview_##size##x) { \
test_scatter_view<Kokkos::Cuda>(size); \
}
CUDA_DUALVIEW_COMBINE_TEST( 10 )
CUDA_VECTOR_COMBINE_TEST( 10 )
CUDA_VECTOR_COMBINE_TEST( 3057 )
CUDA_SCATTERVIEW_TEST( 10 )
CUDA_SCATTERVIEW_TEST( 1000000 )
CUDA_INSERT_TEST(close, 100000, 90000, 100, 500)
CUDA_INSERT_TEST(far, 100000, 90000, 100, 500)

View File

@ -131,11 +131,14 @@ struct TestDynamicView
// printf("TestDynamicView::run(%d) construct memory pool\n",arg_total_size);
const size_t total_alloc_size = arg_total_size * sizeof(Scalar) * 1.2 ;
const size_t superblock = std::min( total_alloc_size , size_t(1000000) );
memory_pool_type pool( memory_space()
, arg_total_size * sizeof(Scalar) * 1.2
, total_alloc_size
, 500 /* min block size in bytes */
, 30000 /* max block size in bytes */
, 1000000 /* min superblock size in bytes */
, superblock
);
// printf("TestDynamicView::run(%d) construct dynamic view\n",arg_total_size);

View File

@ -63,6 +63,8 @@
#include <Kokkos_DynRankView.hpp>
#include <TestDynViewAPI.hpp>
#include <TestScatterView.hpp>
#include <Kokkos_ErrorReporter.hpp>
#include <TestErrorReporter.hpp>
@ -152,6 +154,11 @@ TEST_F( openmp , staticcrsgraph )
test_dualview_combinations<int,Kokkos::OpenMP>(size); \
}
#define OPENMP_SCATTERVIEW_TEST( size ) \
TEST_F( openmp, scatterview_##size##x) { \
test_scatter_view<Kokkos::OpenMP>(size); \
}
OPENMP_INSERT_TEST(close, 100000, 90000, 100, 500, true)
OPENMP_INSERT_TEST(far, 100000, 90000, 100, 500, false)
OPENMP_FAILED_INSERT_TEST( 10000, 1000 )
@ -161,6 +168,10 @@ OPENMP_VECTOR_COMBINE_TEST( 10 )
OPENMP_VECTOR_COMBINE_TEST( 3057 )
OPENMP_DUALVIEW_COMBINE_TEST( 10 )
OPENMP_SCATTERVIEW_TEST( 10 )
OPENMP_SCATTERVIEW_TEST( 1000000 )
#undef OPENMP_INSERT_TEST
#undef OPENMP_FAILED_INSERT_TEST
#undef OPENMP_ASSIGNEMENT_TEST

View File

@ -0,0 +1,263 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#include <Kokkos_Macros.hpp>
#ifdef KOKKOS_ENABLE_ROCM
#include <iostream>
#include <iomanip>
#include <cstdint>
#include <gtest/gtest.h>
#include <Kokkos_Core.hpp>
#include <Kokkos_Bitset.hpp>
#include <Kokkos_UnorderedMap.hpp>
#include <Kokkos_Vector.hpp>
#include <TestBitset.hpp>
#include <TestUnorderedMap.hpp>
#include <TestStaticCrsGraph.hpp>
#include <TestVector.hpp>
#include <TestDualView.hpp>
#include <TestDynamicView.hpp>
#include <Kokkos_DynRankView.hpp>
#include <TestDynViewAPI.hpp>
#include <Kokkos_ErrorReporter.hpp>
#include <TestErrorReporter.hpp>
#include <TestViewCtorPropEmbeddedDim.hpp>
//----------------------------------------------------------------------------
namespace Test {
class rocm : public ::testing::Test {
protected:
static void SetUpTestCase()
{
std::cout << std::setprecision(5) << std::scientific;
Kokkos::HostSpace::execution_space::initialize();
Kokkos::Experimental::ROCm::initialize( Kokkos::Experimental::ROCm::SelectDevice(0) );
}
static void TearDownTestCase()
{
Kokkos::Experimental::ROCm::finalize();
Kokkos::HostSpace::execution_space::finalize();
}
};
#if !defined(KOKKOS_ENABLE_ROCM)
//issue 964
TEST_F( rocm , dyn_view_api) {
TestDynViewAPI< double , Kokkos::Experimental::ROCm >();
}
#endif
TEST_F( rocm, viewctorprop_embedded_dim ) {
TestViewCtorProp_EmbeddedDim< Kokkos::Experimental::ROCm >::test_vcpt( 2, 3 );
}
TEST_F( rocm , staticcrsgraph )
{
TestStaticCrsGraph::run_test_graph< Kokkos::Experimental::ROCm >();
TestStaticCrsGraph::run_test_graph2< Kokkos::Experimental::ROCm >();
TestStaticCrsGraph::run_test_graph3< Kokkos::Experimental::ROCm >(1, 0);
TestStaticCrsGraph::run_test_graph3< Kokkos::Experimental::ROCm >(1, 1000);
TestStaticCrsGraph::run_test_graph3< Kokkos::Experimental::ROCm >(1, 10000);
TestStaticCrsGraph::run_test_graph3< Kokkos::Experimental::ROCm >(1, 100000);
TestStaticCrsGraph::run_test_graph3< Kokkos::Experimental::ROCm >(3, 0);
TestStaticCrsGraph::run_test_graph3< Kokkos::Experimental::ROCm >(3, 1000);
TestStaticCrsGraph::run_test_graph3< Kokkos::Experimental::ROCm >(3, 10000);
TestStaticCrsGraph::run_test_graph3< Kokkos::Experimental::ROCm >(3, 100000);
TestStaticCrsGraph::run_test_graph3< Kokkos::Experimental::ROCm >(75, 0);
TestStaticCrsGraph::run_test_graph3< Kokkos::Experimental::ROCm >(75, 1000);
TestStaticCrsGraph::run_test_graph3< Kokkos::Experimental::ROCm >(75, 10000);
TestStaticCrsGraph::run_test_graph3< Kokkos::Experimental::ROCm >(75, 100000);
}
#if !defined(KOKKOS_ENABLE_ROCM)
// issue 1089
// same as 130203 (MemPool, static member function link issue
void rocm_test_insert_close( uint32_t num_nodes
, uint32_t num_inserts
, uint32_t num_duplicates
)
{
test_insert< Kokkos::Experimental::ROCm >( num_nodes, num_inserts, num_duplicates, true);
}
// hcc link error , Referencing function in another module!
void rocm_test_insert_far( uint32_t num_nodes
, uint32_t num_inserts
, uint32_t num_duplicates
)
{
test_insert< Kokkos::Experimental::ROCm >( num_nodes, num_inserts, num_duplicates, false);
}
void rocm_test_failed_insert( uint32_t num_nodes )
{
test_failed_insert< Kokkos::Experimental::ROCm >( num_nodes );
}
void rocm_test_deep_copy( uint32_t num_nodes )
{
test_deep_copy< Kokkos::Experimental::ROCm >( num_nodes );
}
void rocm_test_vector_combinations(unsigned int size)
{
test_vector_combinations<int,Kokkos::Experimental::ROCm>(size);
}
void rocm_test_dualview_combinations(unsigned int size)
{
test_dualview_combinations<int,Kokkos::Experimental::ROCm>(size);
}
void rocm_test_bitset()
{
test_bitset<Kokkos::Experimental::ROCm>();
}
/*TEST_F( rocm, bitset )
{
rocm_test_bitset();
}*/
#define ROCM_INSERT_TEST( name, num_nodes, num_inserts, num_duplicates, repeat ) \
TEST_F( rocm, UnorderedMap_insert_##name##_##num_nodes##_##num_inserts##_##num_duplicates##_##repeat##x) { \
for (int i=0; i<repeat; ++i) \
rocm_test_insert_##name(num_nodes,num_inserts,num_duplicates); \
}
#define ROCM_FAILED_INSERT_TEST( num_nodes, repeat ) \
TEST_F( rocm, UnorderedMap_failed_insert_##num_nodes##_##repeat##x) { \
for (int i=0; i<repeat; ++i) \
rocm_test_failed_insert(num_nodes); \
}
#define ROCM_ASSIGNEMENT_TEST( num_nodes, repeat ) \
TEST_F( rocm, UnorderedMap_assignment_operators_##num_nodes##_##repeat##x) { \
for (int i=0; i<repeat; ++i) \
rocm_test_assignment_operators(num_nodes); \
}
#define ROCM_DEEP_COPY( num_nodes, repeat ) \
TEST_F( rocm, UnorderedMap_deep_copy##num_nodes##_##repeat##x) { \
for (int i=0; i<repeat; ++i) \
rocm_test_deep_copy(num_nodes); \
}
#define ROCM_VECTOR_COMBINE_TEST( size ) \
TEST_F( rocm, vector_combination##size##x) { \
rocm_test_vector_combinations(size); \
}
#define ROCM_DUALVIEW_COMBINE_TEST( size ) \
TEST_F( rocm, dualview_combination##size##x) { \
rocm_test_dualview_combinations(size); \
}
//ROCM_DUALVIEW_COMBINE_TEST( 10 )
//ROCM_VECTOR_COMBINE_TEST( 10 )
//ROCM_VECTOR_COMBINE_TEST( 3057 )
//ROCM_INSERT_TEST(close, 100000, 90000, 100, 500)
//ROCM_INSERT_TEST(far, 100000, 90000, 100, 500)
//ROCM_DEEP_COPY( 10000, 1 )
//ROCM_FAILED_INSERT_TEST( 10000, 1000 )
#undef ROCM_INSERT_TEST
#undef ROCM_FAILED_INSERT_TEST
#undef ROCM_ASSIGNEMENT_TEST
#undef ROCM_DEEP_COPY
#undef ROCM_VECTOR_COMBINE_TEST
#undef ROCM_DUALVIEW_COMBINE_TEST
#endif
#if !defined(KOKKOS_ENABLE_ROCM)
//static member function issue
TEST_F( rocm , dynamic_view )
{
// typedef TestDynamicView< double , Kokkos::ROCmUVMSpace >
typedef TestDynamicView< double , Kokkos::Experimental::ROCmSpace >
TestDynView ;
for ( int i = 0 ; i < 10 ; ++i ) {
TestDynView::run( 100000 + 100 * i );
}
}
#endif
#if defined(KOKKOS_CLASS_LAMBDA)
TEST_F(rocm, ErrorReporterViaLambda)
{
TestErrorReporter<ErrorReporterDriverUseLambda<Kokkos::Experimental::ROCm>>();
}
#endif
TEST_F(rocm, ErrorReporter)
{
TestErrorReporter<ErrorReporterDriver<Kokkos::Experimental::ROCm>>();
}
}
#else
void KOKKOS_CONTAINERS_UNIT_TESTS_TESTROCM_PREVENT_EMPTY_LINK_ERROR() {}
#endif /* #ifdef KOKKOS_ENABLE_ROCM */

View File

@ -0,0 +1,156 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#ifndef KOKKOS_TEST_SCATTER_VIEW_HPP
#define KOKKOS_TEST_SCATTER_VIEW_HPP
#include <Kokkos_ScatterView.hpp>
namespace Test {
template <typename ExecSpace, typename Layout, int duplication, int contribution>
void test_scatter_view_config(int n)
{
Kokkos::View<double *[3], Layout, ExecSpace> original_view("original_view", n);
{
auto scatter_view = Kokkos::Experimental::create_scatter_view
< Kokkos::Experimental::ScatterSum
, duplication
, contribution
> (original_view);
#if defined( KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA )
auto policy = Kokkos::RangePolicy<ExecSpace, int>(0, n);
auto f = KOKKOS_LAMBDA(int i) {
auto scatter_access = scatter_view.access();
auto scatter_access_atomic = scatter_view.template access<Kokkos::Experimental::ScatterAtomic>();
for (int j = 0; j < 10; ++j) {
auto k = (i + j) % n;
scatter_access(k, 0) += 4.2;
scatter_access_atomic(k, 1) += 2.0;
scatter_access(k, 2) += 1.0;
}
};
Kokkos::parallel_for(policy, f, "scatter_view_test");
#endif
Kokkos::Experimental::contribute(original_view, scatter_view);
scatter_view.reset_except(original_view);
#if defined( KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA )
Kokkos::parallel_for(policy, f, "scatter_view_test");
#endif
Kokkos::Experimental::contribute(original_view, scatter_view);
}
#if defined( KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA )
auto host_view = Kokkos::create_mirror_view_and_copy(Kokkos::HostSpace(), original_view);
for (typename decltype(host_view)::size_type i = 0; i < host_view.dimension_0(); ++i) {
auto val0 = host_view(i, 0);
auto val1 = host_view(i, 1);
auto val2 = host_view(i, 2);
EXPECT_TRUE(std::fabs((val0 - 84.0) / 84.0) < 1e-15);
EXPECT_TRUE(std::fabs((val1 - 40.0) / 40.0) < 1e-15);
EXPECT_TRUE(std::fabs((val2 - 20.0) / 20.0) < 1e-15);
}
#endif
{
Kokkos::Experimental::ScatterView
< double*[3]
, Layout
, ExecSpace
, Kokkos::Experimental::ScatterSum
, duplication
, contribution
>
persistent_view("persistent", n);
auto result_view = persistent_view.subview();
contribute(result_view, persistent_view);
}
}
template <typename ExecSpace>
struct TestDuplicatedScatterView {
TestDuplicatedScatterView(int n) {
test_scatter_view_config<ExecSpace, Kokkos::LayoutRight,
Kokkos::Experimental::ScatterDuplicated,
Kokkos::Experimental::ScatterNonAtomic>(n);
test_scatter_view_config<ExecSpace, Kokkos::LayoutRight,
Kokkos::Experimental::ScatterDuplicated,
Kokkos::Experimental::ScatterAtomic>(n);
}
};
#ifdef KOKKOS_ENABLE_CUDA
// disable duplicated instantiation with CUDA until
// UniqueToken can support it
template <>
struct TestDuplicatedScatterView<Kokkos::Cuda> {
TestDuplicatedScatterView(int) {
}
};
#endif
template <typename ExecSpace>
void test_scatter_view(int n)
{
// all of these configurations should compile okay, but only some of them are
// correct and/or sensible in terms of memory use
Kokkos::Experimental::UniqueToken<ExecSpace> unique_token{ExecSpace()};
// no atomics or duplication is only sensible if the execution space
// is running essentially in serial (doesn't have to be Serial though,
// we also test OpenMP with one thread: LAMMPS cares about that)
if (unique_token.size() == 1) {
test_scatter_view_config<ExecSpace, Kokkos::LayoutRight,
Kokkos::Experimental::ScatterNonDuplicated,
Kokkos::Experimental::ScatterNonAtomic>(n);
}
test_scatter_view_config<ExecSpace, Kokkos::LayoutRight,
Kokkos::Experimental::ScatterNonDuplicated,
Kokkos::Experimental::ScatterAtomic>(n);
TestDuplicatedScatterView<ExecSpace> duptest(n);
}
} // namespace Test
#endif //KOKKOS_TEST_UNORDERED_MAP_HPP

View File

@ -58,6 +58,7 @@
#include <TestVector.hpp>
#include <TestDualView.hpp>
#include <TestDynamicView.hpp>
#include <TestScatterView.hpp>
#include <iomanip>
@ -148,6 +149,11 @@ TEST_F( serial, bitset )
test_dualview_combinations<int,Kokkos::Serial>(size); \
}
#define SERIAL_SCATTERVIEW_TEST( size ) \
TEST_F( serial, scatterview_##size##x) { \
test_scatter_view<Kokkos::Serial>(size); \
}
SERIAL_INSERT_TEST(close, 100000, 90000, 100, 500, true)
SERIAL_INSERT_TEST(far, 100000, 90000, 100, 500, false)
SERIAL_FAILED_INSERT_TEST( 10000, 1000 )
@ -157,6 +163,10 @@ SERIAL_VECTOR_COMBINE_TEST( 10 )
SERIAL_VECTOR_COMBINE_TEST( 3057 )
SERIAL_DUALVIEW_COMBINE_TEST( 10 )
SERIAL_SCATTERVIEW_TEST( 10 )
SERIAL_SCATTERVIEW_TEST( 1000000 )
#undef SERIAL_INSERT_TEST
#undef SERIAL_FAILED_INSERT_TEST
#undef SERIAL_ASSIGNEMENT_TEST

View File

@ -71,7 +71,7 @@ void run_test_graph()
}
dx = Kokkos::create_staticcrsgraph<dView>( "dx" , graph );
hx = Kokkos::create_mirror( dx );
hx = Kokkos::create_mirror( dx );
ASSERT_EQ( hx.row_map.dimension_0() - 1 , LENGTH );
@ -83,6 +83,16 @@ void run_test_graph()
ASSERT_EQ( (int) hx.entries( j + begin ) , graph[i][j] );
}
}
// Test row view access
for ( size_t i = 0 ; i < LENGTH ; ++i ) {
auto rowView = hx.rowConst(i);
ASSERT_EQ( rowView.length, graph[i].size() );
for ( size_t j = 0 ; j < rowView.length ; ++j ) {
ASSERT_EQ( rowView.colidx( j ) , graph[i][j] );
ASSERT_EQ( rowView( j ) , graph[i][j] );
}
}
}
template< class Space >
@ -182,5 +192,6 @@ void run_test_graph3(size_t B, size_t N)
ASSERT_FALSE((ne>2*((hx.row_map(hx.numRows())+C*hx.numRows())/B))&&(hx.row_block_offsets(i+1)>hx.row_block_offsets(i)+1));
}
}
} /* namespace TestStaticCrsGraph */

View File

@ -2,7 +2,9 @@
TRIBITS_SUBPACKAGE(Core)
ADD_SUBDIRECTORY(src)
IF(KOKKOS_HAS_TRILINOS)
ADD_SUBDIRECTORY(src)
ENDIF()
TRIBITS_ADD_TEST_DIRECTORIES(unit_test)
TRIBITS_ADD_TEST_DIRECTORIES(perf_test)

View File

@ -2,6 +2,14 @@
INCLUDE_DIRECTORIES(${CMAKE_CURRENT_BINARY_DIR})
INCLUDE_DIRECTORIES(REQUIRED_DURING_INSTALLATION_TESTING ${CMAKE_CURRENT_SOURCE_DIR})
IF(NOT KOKKOS_HAS_TRILINOS)
IF(KOKKOS_SEPARATE_LIBS)
set(TEST_LINK_TARGETS kokkoscore)
ELSE()
set(TEST_LINK_TARGETS kokkos)
ENDIF()
ENDIF()
# warning: PerfTest_CustomReduction.cpp uses
# ../../algorithms/src/Kokkos_Random.hpp
# we'll just allow it to be included, but note
@ -23,7 +31,7 @@ TRIBITS_ADD_EXECUTABLE(
PerfTestExec
SOURCES ${SOURCES}
COMM serial mpi
TESTONLYLIBS kokkos_gtest
TESTONLYLIBS kokkos_gtest ${TEST_LINK_TARGETS}
)
TRIBITS_ADD_TEST(

View File

@ -17,7 +17,8 @@ endif
CXXFLAGS = -O3
#CXXFLAGS += -DGENERIC_REDUCER
LINK ?= $(CXX)
LDFLAGS ?= -lpthread
LDFLAGS ?=
override LDFLAGS += -lpthread
include $(KOKKOS_PATH)/Makefile.kokkos
@ -43,6 +44,7 @@ TEST_TARGETS += test-atomic
#
ifneq ($(KOKKOS_INTERNAL_USE_ROCM), 1)
OBJ_MEMPOOL = test_mempool.o
TARGETS += KokkosCore_PerformanceTest_Mempool
TEST_TARGETS += test-mempool
@ -52,6 +54,7 @@ TEST_TARGETS += test-mempool
OBJ_TASKDAG = test_taskdag.o
TARGETS += KokkosCore_PerformanceTest_TaskDAG
TEST_TARGETS += test-taskdag
endif
#

View File

@ -1,15 +1,4 @@
TRIBITS_ADD_OPTION_AND_DEFINE(
Kokkos_ENABLE_Serial
KOKKOS_HAVE_SERIAL
"Whether to enable the Kokkos::Serial device. This device executes \"parallel\" kernels sequentially on a single CPU thread. It is enabled by default. If you disable this device, please enable at least one other CPU device, such as Kokkos::OpenMP or Kokkos::Threads."
ON
)
ASSERT_DEFINED(${PROJECT_NAME}_ENABLE_CXX11)
ASSERT_DEFINED(${PACKAGE_NAME}_ENABLE_CUDA)
TRIBITS_CONFIGURE_FILE(${PACKAGE_NAME}_config.h)
INCLUDE_DIRECTORIES(${CMAKE_CURRENT_BINARY_DIR})
INCLUDE_DIRECTORIES(${CMAKE_CURRENT_SOURCE_DIR})
@ -20,68 +9,90 @@ SET(TRILINOS_INCDIR ${CMAKE_INSTALL_PREFIX}/${${PROJECT_NAME}_INSTALL_INCLUDE_DI
#-----------------------------------------------------------------------------
SET(HEADERS_PUBLIC "")
SET(HEADERS_PRIVATE "")
SET(SOURCES "")
IF(KOKKOS_LEGACY_TRIBITS)
FILE(GLOB HEADERS_PUBLIC Kokkos*.hpp)
LIST( APPEND HEADERS_PUBLIC ${CMAKE_CURRENT_BINARY_DIR}/${PACKAGE_NAME}_config.h )
ASSERT_DEFINED(${PROJECT_NAME}_ENABLE_CXX11)
ASSERT_DEFINED(${PACKAGE_NAME}_ENABLE_CUDA)
SET(HEADERS_PUBLIC "")
SET(HEADERS_PRIVATE "")
SET(SOURCES "")
FILE(GLOB HEADERS_PUBLIC Kokkos*.hpp)
LIST( APPEND HEADERS_PUBLIC ${CMAKE_BINARY_DIR}/${PACKAGE_NAME}_config.h )
#-----------------------------------------------------------------------------
FILE(GLOB HEADERS_IMPL impl/*.hpp)
FILE(GLOB SOURCES_IMPL impl/*.cpp)
LIST(APPEND HEADERS_PRIVATE ${HEADERS_IMPL} )
LIST(APPEND SOURCES ${SOURCES_IMPL} )
INSTALL(FILES ${HEADERS_IMPL} DESTINATION ${TRILINOS_INCDIR}/impl/)
#-----------------------------------------------------------------------------
FILE(GLOB HEADERS_THREADS Threads/*.hpp)
FILE(GLOB SOURCES_THREADS Threads/*.cpp)
LIST(APPEND HEADERS_PRIVATE ${HEADERS_THREADS} )
LIST(APPEND SOURCES ${SOURCES_THREADS} )
INSTALL(FILES ${HEADERS_THREADS} DESTINATION ${TRILINOS_INCDIR}/Threads/)
#-----------------------------------------------------------------------------
FILE(GLOB HEADERS_OPENMP OpenMP/*.hpp)
FILE(GLOB SOURCES_OPENMP OpenMP/*.cpp)
LIST(APPEND HEADERS_PRIVATE ${HEADERS_OPENMP} )
LIST(APPEND SOURCES ${SOURCES_OPENMP} )
INSTALL(FILES ${HEADERS_OPENMP} DESTINATION ${TRILINOS_INCDIR}/OpenMP/)
#-----------------------------------------------------------------------------
FILE(GLOB HEADERS_CUDA Cuda/*.hpp)
FILE(GLOB SOURCES_CUDA Cuda/*.cpp)
LIST(APPEND HEADERS_PRIVATE ${HEADERS_CUDA} )
LIST(APPEND SOURCES ${SOURCES_CUDA} )
INSTALL(FILES ${HEADERS_CUDA} DESTINATION ${TRILINOS_INCDIR}/Cuda/)
#-----------------------------------------------------------------------------
FILE(GLOB HEADERS_QTHREADS Qthreads/*.hpp)
FILE(GLOB SOURCES_QTHREADS Qthreads/*.cpp)
LIST(APPEND HEADERS_PRIVATE ${HEADERS_QTHREADS} )
LIST(APPEND SOURCES ${SOURCES_QTHREADS} )
INSTALL(FILES ${HEADERS_QTHREADS} DESTINATION ${TRILINOS_INCDIR}/Qthreads/)
TRIBITS_ADD_LIBRARY(
kokkoscore
HEADERS ${HEADERS_PUBLIC}
NOINSTALLHEADERS ${HEADERS_PRIVATE}
SOURCES ${SOURCES}
DEPLIBS
)
#-----------------------------------------------------------------------------
# In the new build system, sources are calculated by Makefile.kokkos
else()
FILE(GLOB HEADERS_IMPL impl/*.hpp)
FILE(GLOB SOURCES_IMPL impl/*.cpp)
INSTALL (DIRECTORY
"${CMAKE_CURRENT_SOURCE_DIR}/"
DESTINATION ${TRILINOS_INCDIR}
FILES_MATCHING PATTERN "*.hpp"
)
LIST(APPEND HEADERS_PRIVATE ${HEADERS_IMPL} )
LIST(APPEND SOURCES ${SOURCES_IMPL} )
INSTALL(FILES ${HEADERS_IMPL} DESTINATION ${TRILINOS_INCDIR}/impl/)
TRIBITS_ADD_LIBRARY(
kokkoscore
SOURCES ${KOKKOS_CORE_SRCS}
DEPLIBS
)
endif()
#-----------------------------------------------------------------------------
FILE(GLOB HEADERS_THREADS Threads/*.hpp)
FILE(GLOB SOURCES_THREADS Threads/*.cpp)
LIST(APPEND HEADERS_PRIVATE ${HEADERS_THREADS} )
LIST(APPEND SOURCES ${SOURCES_THREADS} )
INSTALL(FILES ${HEADERS_THREADS} DESTINATION ${TRILINOS_INCDIR}/Threads/)
#-----------------------------------------------------------------------------
FILE(GLOB HEADERS_OPENMP OpenMP/*.hpp)
FILE(GLOB SOURCES_OPENMP OpenMP/*.cpp)
LIST(APPEND HEADERS_PRIVATE ${HEADERS_OPENMP} )
LIST(APPEND SOURCES ${SOURCES_OPENMP} )
INSTALL(FILES ${HEADERS_OPENMP} DESTINATION ${TRILINOS_INCDIR}/OpenMP/)
#-----------------------------------------------------------------------------
FILE(GLOB HEADERS_CUDA Cuda/*.hpp)
FILE(GLOB SOURCES_CUDA Cuda/*.cpp)
LIST(APPEND HEADERS_PRIVATE ${HEADERS_CUDA} )
LIST(APPEND SOURCES ${SOURCES_CUDA} )
INSTALL(FILES ${HEADERS_CUDA} DESTINATION ${TRILINOS_INCDIR}/Cuda/)
#-----------------------------------------------------------------------------
FILE(GLOB HEADERS_QTHREADS Qthreads/*.hpp)
FILE(GLOB SOURCES_QTHREADS Qthreads/*.cpp)
LIST(APPEND HEADERS_PRIVATE ${HEADERS_QTHREADS} )
LIST(APPEND SOURCES ${SOURCES_QTHREADS} )
INSTALL(FILES ${HEADERS_QTHREADS} DESTINATION ${TRILINOS_INCDIR}/Qthreads/)
#-----------------------------------------------------------------------------
TRIBITS_ADD_LIBRARY(
kokkoscore
HEADERS ${HEADERS_PUBLIC}
NOINSTALLHEADERS ${HEADERS_PRIVATE}
SOURCES ${SOURCES}
DEPLIBS
)

View File

@ -63,7 +63,7 @@
#include <typeinfo>
#endif
namespace Kokkos { namespace Experimental { namespace Impl {
namespace Kokkos { namespace Impl {
// ------------------------------------------------------------------ //
@ -110,21 +110,12 @@ struct apply_impl<2,RP,Functor,void >
{
// LL
if (RP::inner_direction == RP::Left) {
/*
index_type offset_1 = blockIdx.y*m_rp.m_tile[1] + threadIdx.y;
index_type offset_0 = blockIdx.x*m_rp.m_tile[0] + threadIdx.x;
for ( index_type j = offset_1; j < m_rp.m_upper[1], threadIdx.y < m_rp.m_tile[1]; j += (gridDim.y*m_rp.m_tile[1]) ) {
for ( index_type i = offset_0; i < m_rp.m_upper[0], threadIdx.x < m_rp.m_tile[0]; i += (gridDim.x*m_rp.m_tile[0]) ) {
m_func(i, j);
} }
*/
for ( index_type tile_id1 = blockIdx.y; tile_id1 < m_rp.m_tile_end[1]; tile_id1 += gridDim.y ) {
const index_type offset_1 = tile_id1*m_rp.m_tile[1] + threadIdx.y;
const index_type offset_1 = tile_id1*m_rp.m_tile[1] + (index_type)threadIdx.y + (index_type)m_rp.m_lower[1];
if ( offset_1 < m_rp.m_upper[1] && threadIdx.y < m_rp.m_tile[1] ) {
for ( index_type tile_id0 = blockIdx.x; tile_id0 < m_rp.m_tile_end[0]; tile_id0 += gridDim.x ) {
const index_type offset_0 = tile_id0*m_rp.m_tile[0] + threadIdx.x;
const index_type offset_0 = tile_id0*m_rp.m_tile[0] + (index_type)threadIdx.x + (index_type)m_rp.m_lower[0];
if ( offset_0 < m_rp.m_upper[0] && threadIdx.x < m_rp.m_tile[0] ) {
m_func(offset_0 , offset_1);
}
@ -134,21 +125,12 @@ struct apply_impl<2,RP,Functor,void >
}
// LR
else {
/*
index_type offset_1 = blockIdx.y*m_rp.m_tile[1] + threadIdx.y;
index_type offset_0 = blockIdx.x*m_rp.m_tile[0] + threadIdx.x;
for ( index_type i = offset_0; i < m_rp.m_upper[0], threadIdx.x < m_rp.m_tile[0]; i += (gridDim.x*m_rp.m_tile[0]) ) {
for ( index_type j = offset_1; j < m_rp.m_upper[1], threadIdx.y < m_rp.m_tile[1]; j += (gridDim.y*m_rp.m_tile[1]) ) {
m_func(i, j);
} }
*/
for ( index_type tile_id0 = blockIdx.x; tile_id0 < m_rp.m_tile_end[0]; tile_id0 += gridDim.x ) {
const index_type offset_0 = tile_id0*m_rp.m_tile[0] + threadIdx.x;
const index_type offset_0 = tile_id0*m_rp.m_tile[0] + (index_type)threadIdx.x + (index_type)m_rp.m_lower[0];
if ( offset_0 < m_rp.m_upper[0] && threadIdx.x < m_rp.m_tile[0] ) {
for ( index_type tile_id1 = blockIdx.y; tile_id1 < m_rp.m_tile_end[1]; tile_id1 += gridDim.y ) {
const index_type offset_1 = tile_id1*m_rp.m_tile[1] + threadIdx.y;
const index_type offset_1 = tile_id1*m_rp.m_tile[1] + (index_type)threadIdx.y + (index_type)m_rp.m_lower[1];
if ( offset_1 < m_rp.m_upper[1] && threadIdx.y < m_rp.m_tile[1] ) {
m_func(offset_0 , offset_1);
}
@ -182,21 +164,12 @@ struct apply_impl<2,RP,Functor,Tag>
{
if (RP::inner_direction == RP::Left) {
// Loop over size maxnumblocks until full range covered
/*
index_type offset_1 = blockIdx.y*m_rp.m_tile[1] + threadIdx.y;
index_type offset_0 = blockIdx.x*m_rp.m_tile[0] + threadIdx.x;
for ( index_type j = offset_1; j < m_rp.m_upper[1], threadIdx.y < m_rp.m_tile[1]; j += (gridDim.y*m_rp.m_tile[1]) ) {
for ( index_type i = offset_0; i < m_rp.m_upper[0], threadIdx.x < m_rp.m_tile[0]; i += (gridDim.x*m_rp.m_tile[0]) ) {
m_func(Tag(), i, j);
} }
*/
for ( index_type tile_id1 = blockIdx.y; tile_id1 < m_rp.m_tile_end[1]; tile_id1 += gridDim.y ) {
const index_type offset_1 = tile_id1*m_rp.m_tile[1] + threadIdx.y;
const index_type offset_1 = tile_id1*m_rp.m_tile[1] + (index_type)threadIdx.y + (index_type)m_rp.m_lower[1];
if ( offset_1 < m_rp.m_upper[1] && threadIdx.y < m_rp.m_tile[1] ) {
for ( index_type tile_id0 = blockIdx.x; tile_id0 < m_rp.m_tile_end[0]; tile_id0 += gridDim.x ) {
const index_type offset_0 = tile_id0*m_rp.m_tile[0] + threadIdx.x;
const index_type offset_0 = tile_id0*m_rp.m_tile[0] + (index_type)threadIdx.x + (index_type)m_rp.m_lower[0];
if ( offset_0 < m_rp.m_upper[0] && threadIdx.x < m_rp.m_tile[0] ) {
m_func(Tag(), offset_0 , offset_1);
}
@ -205,21 +178,12 @@ struct apply_impl<2,RP,Functor,Tag>
}
}
else {
/*
index_type offset_1 = blockIdx.y*m_rp.m_tile[1] + threadIdx.y;
index_type offset_0 = blockIdx.x*m_rp.m_tile[0] + threadIdx.x;
for ( index_type i = offset_0; i < m_rp.m_upper[0], threadIdx.x < m_rp.m_tile[0]; i += (gridDim.x*m_rp.m_tile[0]) ) {
for ( index_type j = offset_1; j < m_rp.m_upper[1], threadIdx.y < m_rp.m_tile[1]; j += (gridDim.y*m_rp.m_tile[1]) ) {
m_func(Tag(), i, j);
} }
*/
for ( index_type tile_id0 = blockIdx.x; tile_id0 < m_rp.m_tile_end[0]; tile_id0 += gridDim.x ) {
const index_type offset_0 = tile_id0*m_rp.m_tile[0] + threadIdx.x;
const index_type offset_0 = tile_id0*m_rp.m_tile[0] + (index_type)threadIdx.x + (index_type)m_rp.m_lower[0];
if ( offset_0 < m_rp.m_upper[0] && threadIdx.x < m_rp.m_tile[0] ) {
for ( index_type tile_id1 = blockIdx.y; tile_id1 < m_rp.m_tile_end[1]; tile_id1 += gridDim.y ) {
const index_type offset_1 = tile_id1*m_rp.m_tile[1] + threadIdx.y;
const index_type offset_1 = tile_id1*m_rp.m_tile[1] + (index_type)threadIdx.y + (index_type)m_rp.m_lower[1];
if ( offset_1 < m_rp.m_upper[1] && threadIdx.y < m_rp.m_tile[1] ) {
m_func(Tag(), offset_0 , offset_1);
}
@ -255,15 +219,15 @@ struct apply_impl<3,RP,Functor,void >
// LL
if (RP::inner_direction == RP::Left) {
for ( index_type tile_id2 = blockIdx.z; tile_id2 < m_rp.m_tile_end[2]; tile_id2 += gridDim.z ) {
const index_type offset_2 = tile_id2*m_rp.m_tile[2] + threadIdx.z;
const index_type offset_2 = tile_id2*m_rp.m_tile[2] + (index_type)threadIdx.z + (index_type)m_rp.m_lower[2];
if ( offset_2 < m_rp.m_upper[2] && threadIdx.z < m_rp.m_tile[2] ) {
for ( index_type tile_id1 = blockIdx.y; tile_id1 < m_rp.m_tile_end[1]; tile_id1 += gridDim.y ) {
const index_type offset_1 = tile_id1*m_rp.m_tile[1] + threadIdx.y;
const index_type offset_1 = tile_id1*m_rp.m_tile[1] + (index_type)threadIdx.y + (index_type)m_rp.m_lower[1];
if ( offset_1 < m_rp.m_upper[1] && threadIdx.y < m_rp.m_tile[1] ) {
for ( index_type tile_id0 = blockIdx.x; tile_id0 < m_rp.m_tile_end[0]; tile_id0 += gridDim.x ) {
const index_type offset_0 = tile_id0*m_rp.m_tile[0] + threadIdx.x;
const index_type offset_0 = tile_id0*m_rp.m_tile[0] + (index_type)threadIdx.x + (index_type)m_rp.m_lower[0];
if ( offset_0 < m_rp.m_upper[0] && threadIdx.x < m_rp.m_tile[0] ) {
m_func(offset_0 , offset_1 , offset_2);
}
@ -276,15 +240,15 @@ struct apply_impl<3,RP,Functor,void >
// LR
else {
for ( index_type tile_id0 = blockIdx.x; tile_id0 < m_rp.m_tile_end[0]; tile_id0 += gridDim.x ) {
const index_type offset_0 = tile_id0*m_rp.m_tile[0] + threadIdx.x;
const index_type offset_0 = tile_id0*m_rp.m_tile[0] + (index_type)threadIdx.x + (index_type)m_rp.m_lower[0];
if ( offset_0 < m_rp.m_upper[0] && threadIdx.x < m_rp.m_tile[0] ) {
for ( index_type tile_id1 = blockIdx.y; tile_id1 < m_rp.m_tile_end[1]; tile_id1 += gridDim.y ) {
const index_type offset_1 = tile_id1*m_rp.m_tile[1] + threadIdx.y;
const index_type offset_1 = tile_id1*m_rp.m_tile[1] + (index_type)threadIdx.y + (index_type)m_rp.m_lower[1];
if ( offset_1 < m_rp.m_upper[1] && threadIdx.y < m_rp.m_tile[1] ) {
for ( index_type tile_id2 = blockIdx.z; tile_id2 < m_rp.m_tile_end[2]; tile_id2 += gridDim.z ) {
const index_type offset_2 = tile_id2*m_rp.m_tile[2] + threadIdx.z;
const index_type offset_2 = tile_id2*m_rp.m_tile[2] + (index_type)threadIdx.z + (index_type)m_rp.m_lower[2];
if ( offset_2 < m_rp.m_upper[2] && threadIdx.z < m_rp.m_tile[2] ) {
m_func(offset_0 , offset_1 , offset_2);
}
@ -319,15 +283,15 @@ struct apply_impl<3,RP,Functor,Tag>
{
if (RP::inner_direction == RP::Left) {
for ( index_type tile_id2 = blockIdx.z; tile_id2 < m_rp.m_tile_end[2]; tile_id2 += gridDim.z ) {
const index_type offset_2 = tile_id2*m_rp.m_tile[2] + threadIdx.z;
const index_type offset_2 = tile_id2*m_rp.m_tile[2] + (index_type)threadIdx.z + (index_type)m_rp.m_lower[2];
if ( offset_2 < m_rp.m_upper[2] && threadIdx.z < m_rp.m_tile[2] ) {
for ( index_type tile_id1 = blockIdx.y; tile_id1 < m_rp.m_tile_end[1]; tile_id1 += gridDim.y ) {
const index_type offset_1 = tile_id1*m_rp.m_tile[1] + threadIdx.y;
const index_type offset_1 = tile_id1*m_rp.m_tile[1] + (index_type)threadIdx.y + (index_type)m_rp.m_lower[1];
if ( offset_1 < m_rp.m_upper[1] && threadIdx.y < m_rp.m_tile[1] ) {
for ( index_type tile_id0 = blockIdx.x; tile_id0 < m_rp.m_tile_end[0]; tile_id0 += gridDim.x ) {
const index_type offset_0 = tile_id0*m_rp.m_tile[0] + threadIdx.x;
const index_type offset_0 = tile_id0*m_rp.m_tile[0] + (index_type)threadIdx.x + (index_type)m_rp.m_lower[0];
if ( offset_0 < m_rp.m_upper[0] && threadIdx.x < m_rp.m_tile[0] ) {
m_func(Tag(), offset_0 , offset_1 , offset_2);
}
@ -339,15 +303,15 @@ struct apply_impl<3,RP,Functor,Tag>
}
else {
for ( index_type tile_id0 = blockIdx.x; tile_id0 < m_rp.m_tile_end[0]; tile_id0 += gridDim.x ) {
const index_type offset_0 = tile_id0*m_rp.m_tile[0] + threadIdx.x;
const index_type offset_0 = tile_id0*m_rp.m_tile[0] + (index_type)threadIdx.x + (index_type)m_rp.m_lower[0];
if ( offset_0 < m_rp.m_upper[0] && threadIdx.x < m_rp.m_tile[0] ) {
for ( index_type tile_id1 = blockIdx.y; tile_id1 < m_rp.m_tile_end[1]; tile_id1 += gridDim.y ) {
const index_type offset_1 = tile_id1*m_rp.m_tile[1] + threadIdx.y;
const index_type offset_1 = tile_id1*m_rp.m_tile[1] + (index_type)threadIdx.y + (index_type)m_rp.m_lower[1];
if ( offset_1 < m_rp.m_upper[1] && threadIdx.y < m_rp.m_tile[1] ) {
for ( index_type tile_id2 = blockIdx.z; tile_id2 < m_rp.m_tile_end[2]; tile_id2 += gridDim.z ) {
const index_type offset_2 = tile_id2*m_rp.m_tile[2] + threadIdx.z;
const index_type offset_2 = tile_id2*m_rp.m_tile[2] + (index_type)threadIdx.z + (index_type)m_rp.m_lower[2];
if ( offset_2 < m_rp.m_upper[2] && threadIdx.z < m_rp.m_tile[2] ) {
m_func(Tag(), offset_0 , offset_1 , offset_2);
}
@ -398,19 +362,19 @@ struct apply_impl<4,RP,Functor,void >
const index_type thr_id1 = threadIdx.x / m_rp.m_tile[0];
for ( index_type tile_id3 = blockIdx.z; tile_id3 < m_rp.m_tile_end[3]; tile_id3 += gridDim.z ) {
const index_type offset_3 = tile_id3*m_rp.m_tile[3] + threadIdx.z;
const index_type offset_3 = tile_id3*m_rp.m_tile[3] + (index_type)threadIdx.z + (index_type)m_rp.m_lower[3];
if ( offset_3 < m_rp.m_upper[3] && threadIdx.z < m_rp.m_tile[3] ) {
for ( index_type tile_id2 = blockIdx.y; tile_id2 < m_rp.m_tile_end[2]; tile_id2 += gridDim.y ) {
const index_type offset_2 = tile_id2*m_rp.m_tile[2] + threadIdx.y;
const index_type offset_2 = tile_id2*m_rp.m_tile[2] + (index_type)threadIdx.y + (index_type)m_rp.m_lower[2];
if ( offset_2 < m_rp.m_upper[2] && threadIdx.y < m_rp.m_tile[2] ) {
for ( index_type j = tile_id1 ; j < m_rp.m_tile_end[1]; j += numbl1 ) {
const index_type offset_1 = j*m_rp.m_tile[1] + thr_id1;
const index_type offset_1 = j*m_rp.m_tile[1] + thr_id1 + (index_type)m_rp.m_lower[1];
if ( offset_1 < m_rp.m_upper[1] && thr_id1 < m_rp.m_tile[1] ) {
for ( index_type i = tile_id0 ; i < m_rp.m_tile_end[0]; i += numbl0 ) {
const index_type offset_0 = i*m_rp.m_tile[0] + thr_id0;
const index_type offset_0 = i*m_rp.m_tile[0] + thr_id0 + (index_type)m_rp.m_lower[0];
if ( offset_0 < m_rp.m_upper[0] && thr_id0 < m_rp.m_tile[0] ) {
m_func(offset_0 , offset_1 , offset_2 , offset_3);
}
@ -436,19 +400,19 @@ struct apply_impl<4,RP,Functor,void >
const index_type thr_id1 = threadIdx.x % m_rp.m_tile[1];
for ( index_type i = tile_id0; i < m_rp.m_tile_end[0]; i += numbl0 ) {
const index_type offset_0 = i*m_rp.m_tile[0] + thr_id0;
const index_type offset_0 = i*m_rp.m_tile[0] + thr_id0 + (index_type)m_rp.m_lower[0];
if ( offset_0 < m_rp.m_upper[0] && thr_id0 < m_rp.m_tile[0] ) {
for ( index_type j = tile_id1; j < m_rp.m_tile_end[1]; j += numbl1 ) {
const index_type offset_1 = j*m_rp.m_tile[1] + thr_id1;
const index_type offset_1 = j*m_rp.m_tile[1] + thr_id1 + (index_type)m_rp.m_lower[1];
if ( offset_1 < m_rp.m_upper[1] && thr_id1 < m_rp.m_tile[1] ) {
for ( index_type tile_id2 = blockIdx.y; tile_id2 < m_rp.m_tile_end[2]; tile_id2 += gridDim.y ) {
const index_type offset_2 = tile_id2*m_rp.m_tile[2] + threadIdx.y;
const index_type offset_2 = tile_id2*m_rp.m_tile[2] + (index_type)threadIdx.y + (index_type)m_rp.m_lower[2];
if ( offset_2 < m_rp.m_upper[2] && threadIdx.y < m_rp.m_tile[2] ) {
for ( index_type tile_id3 = blockIdx.z; tile_id3 < m_rp.m_tile_end[3]; tile_id3 += gridDim.z ) {
const index_type offset_3 = tile_id3*m_rp.m_tile[3] + threadIdx.z;
const index_type offset_3 = tile_id3*m_rp.m_tile[3] + (index_type)threadIdx.z + (index_type)m_rp.m_lower[3];
if ( offset_3 < m_rp.m_upper[3] && threadIdx.z < m_rp.m_tile[3] ) {
m_func(offset_0 , offset_1 , offset_2 , offset_3);
}
@ -498,19 +462,19 @@ struct apply_impl<4,RP,Functor,Tag>
const index_type thr_id1 = threadIdx.x / m_rp.m_tile[0];
for ( index_type tile_id3 = blockIdx.z; tile_id3 < m_rp.m_tile_end[3]; tile_id3 += gridDim.z ) {
const index_type offset_3 = tile_id3*m_rp.m_tile[3] + threadIdx.z;
const index_type offset_3 = tile_id3*m_rp.m_tile[3] + (index_type)threadIdx.z + (index_type)m_rp.m_lower[3];
if ( offset_3 < m_rp.m_upper[3] && threadIdx.z < m_rp.m_tile[3] ) {
for ( index_type tile_id2 = blockIdx.y; tile_id2 < m_rp.m_tile_end[2]; tile_id2 += gridDim.y ) {
const index_type offset_2 = tile_id2*m_rp.m_tile[2] + threadIdx.y;
const index_type offset_2 = tile_id2*m_rp.m_tile[2] + (index_type)threadIdx.y + (index_type)m_rp.m_lower[2];
if ( offset_2 < m_rp.m_upper[2] && threadIdx.y < m_rp.m_tile[2] ) {
for ( index_type j = tile_id1; j < m_rp.m_tile_end[1]; j += numbl1 ) {
const index_type offset_1 = j*m_rp.m_tile[1] + thr_id1;
const index_type offset_1 = j*m_rp.m_tile[1] + thr_id1 + (index_type)m_rp.m_lower[1];
if ( offset_1 < m_rp.m_upper[1] && thr_id1 < m_rp.m_tile[1] ) {
for ( index_type i = tile_id0; i < m_rp.m_tile_end[0]; i += numbl0 ) {
const index_type offset_0 = i*m_rp.m_tile[0] + thr_id0;
const index_type offset_0 = i*m_rp.m_tile[0] + thr_id0 + (index_type)m_rp.m_lower[0];
if ( offset_0 < m_rp.m_upper[0] && thr_id0 < m_rp.m_tile[0] ) {
m_func(Tag(), offset_0 , offset_1 , offset_2 , offset_3);
}
@ -535,19 +499,19 @@ struct apply_impl<4,RP,Functor,Tag>
const index_type thr_id1 = threadIdx.x % m_rp.m_tile[1];
for ( index_type i = tile_id0; i < m_rp.m_tile_end[0]; i += numbl0 ) {
const index_type offset_0 = i*m_rp.m_tile[0] + thr_id0;
const index_type offset_0 = i*m_rp.m_tile[0] + thr_id0 + (index_type)m_rp.m_lower[0];
if ( offset_0 < m_rp.m_upper[0] && thr_id0 < m_rp.m_tile[0] ) {
for ( index_type j = tile_id1; j < m_rp.m_tile_end[1]; j += numbl1 ) {
const index_type offset_1 = tile_id1*m_rp.m_tile[1] + thr_id1;
const index_type offset_1 = tile_id1*m_rp.m_tile[1] + thr_id1 + (index_type)m_rp.m_lower[1];
if ( offset_1 < m_rp.m_upper[1] && thr_id1 < m_rp.m_tile[1] ) {
for ( index_type tile_id2 = blockIdx.y; tile_id2 < m_rp.m_tile_end[2]; tile_id2 += gridDim.y ) {
const index_type offset_2 = tile_id2*m_rp.m_tile[2] + threadIdx.y;
const index_type offset_2 = tile_id2*m_rp.m_tile[2] + (index_type)threadIdx.y + (index_type)m_rp.m_lower[2];
if ( offset_2 < m_rp.m_upper[2] && threadIdx.y < m_rp.m_tile[2] ) {
for ( index_type tile_id3 = blockIdx.z; tile_id3 < m_rp.m_tile_end[3]; tile_id3 += gridDim.z ) {
const index_type offset_3 = tile_id3*m_rp.m_tile[3] + threadIdx.z;
const index_type offset_3 = tile_id3*m_rp.m_tile[3] + (index_type)threadIdx.z + (index_type)m_rp.m_lower[3];
if ( offset_3 < m_rp.m_upper[3] && threadIdx.z < m_rp.m_tile[3] ) {
m_func(Tag() , offset_0 , offset_1 , offset_2 , offset_3);
}
@ -612,23 +576,23 @@ struct apply_impl<5,RP,Functor,void >
const index_type thr_id3 = threadIdx.y / m_rp.m_tile[2];
for ( index_type tile_id4 = blockIdx.z; tile_id4 < m_rp.m_tile_end[4]; tile_id4 += gridDim.z ) {
const index_type offset_4 = tile_id4*m_rp.m_tile[4] + threadIdx.z;
const index_type offset_4 = tile_id4*m_rp.m_tile[4] + (index_type)threadIdx.z + (index_type)m_rp.m_lower[4];
if ( offset_4 < m_rp.m_upper[4] && threadIdx.z < m_rp.m_tile[4] ) {
for ( index_type l = tile_id3; l < m_rp.m_tile_end[3]; l += numbl3 ) {
const index_type offset_3 = l*m_rp.m_tile[3] + thr_id3;
const index_type offset_3 = l*m_rp.m_tile[3] + thr_id3 + (index_type)m_rp.m_lower[3];
if ( offset_3 < m_rp.m_upper[3] && thr_id3 < m_rp.m_tile[3] ) {
for ( index_type k = tile_id2; k < m_rp.m_tile_end[2]; k += numbl2 ) {
const index_type offset_2 = k*m_rp.m_tile[2] + thr_id2;
const index_type offset_2 = k*m_rp.m_tile[2] + thr_id2 + (index_type)m_rp.m_lower[2];
if ( offset_2 < m_rp.m_upper[2] && thr_id2 < m_rp.m_tile[2] ) {
for ( index_type j = tile_id1 ; j < m_rp.m_tile_end[1]; j += numbl1 ) {
const index_type offset_1 = j*m_rp.m_tile[1] + thr_id1;
const index_type offset_1 = j*m_rp.m_tile[1] + thr_id1 + (index_type)m_rp.m_lower[1];
if ( offset_1 < m_rp.m_upper[1] && thr_id1 < m_rp.m_tile[1] ) {
for ( index_type i = tile_id0 ; i < m_rp.m_tile_end[0]; i += numbl0 ) {
const index_type offset_0 = i*m_rp.m_tile[0] + thr_id0;
const index_type offset_0 = i*m_rp.m_tile[0] + thr_id0 + (index_type)m_rp.m_lower[0];
if ( offset_0 < m_rp.m_upper[0] && thr_id0 < m_rp.m_tile[0] ) {
m_func(offset_0 , offset_1 , offset_2 , offset_3, offset_4);
}
@ -667,23 +631,23 @@ struct apply_impl<5,RP,Functor,void >
const index_type thr_id3 = threadIdx.y % m_rp.m_tile[3];
for ( index_type i = tile_id0; i < m_rp.m_tile_end[0]; i += numbl0 ) {
const index_type offset_0 = i*m_rp.m_tile[0] + thr_id0;
const index_type offset_0 = i*m_rp.m_tile[0] + thr_id0 + (index_type)m_rp.m_lower[0];
if ( offset_0 < m_rp.m_upper[0] && thr_id0 < m_rp.m_tile[0] ) {
for ( index_type j = tile_id1; j < m_rp.m_tile_end[1]; j += numbl1 ) {
const index_type offset_1 = j*m_rp.m_tile[1] + thr_id1;
const index_type offset_1 = j*m_rp.m_tile[1] + thr_id1 + (index_type)m_rp.m_lower[1];
if ( offset_1 < m_rp.m_upper[1] && thr_id1 < m_rp.m_tile[1] ) {
for ( index_type k = tile_id2; k < m_rp.m_tile_end[2]; k += numbl2 ) {
const index_type offset_2 = k*m_rp.m_tile[2] + thr_id2;
const index_type offset_2 = k*m_rp.m_tile[2] + thr_id2 + (index_type)m_rp.m_lower[2];
if ( offset_2 < m_rp.m_upper[2] && thr_id2 < m_rp.m_tile[2] ) {
for ( index_type l = tile_id3; l < m_rp.m_tile_end[3]; l += numbl3 ) {
const index_type offset_3 = l*m_rp.m_tile[3] + thr_id3;
const index_type offset_3 = l*m_rp.m_tile[3] + thr_id3 + (index_type)m_rp.m_lower[3];
if ( offset_3 < m_rp.m_upper[3] && thr_id3 < m_rp.m_tile[3] ) {
for ( index_type tile_id4 = blockIdx.z; tile_id4 < m_rp.m_tile_end[4]; tile_id4 += gridDim.z ) {
const index_type offset_4 = tile_id4*m_rp.m_tile[4] + threadIdx.z;
const index_type offset_4 = tile_id4*m_rp.m_tile[4] + (index_type)threadIdx.z + (index_type)m_rp.m_lower[4];
if ( offset_4 < m_rp.m_upper[4] && threadIdx.z < m_rp.m_tile[4] ) {
m_func(offset_0 , offset_1 , offset_2 , offset_3 , offset_4);
}
@ -747,23 +711,23 @@ struct apply_impl<5,RP,Functor,Tag>
const index_type thr_id3 = threadIdx.y / m_rp.m_tile[2];
for ( index_type tile_id4 = blockIdx.z; tile_id4 < m_rp.m_tile_end[4]; tile_id4 += gridDim.z ) {
const index_type offset_4 = tile_id4*m_rp.m_tile[4] + threadIdx.z;
const index_type offset_4 = tile_id4*m_rp.m_tile[4] + (index_type)threadIdx.z + (index_type)m_rp.m_lower[4];
if ( offset_4 < m_rp.m_upper[4] && threadIdx.z < m_rp.m_tile[4] ) {
for ( index_type l = tile_id3; l < m_rp.m_tile_end[3]; l += numbl3 ) {
const index_type offset_3 = l*m_rp.m_tile[3] + thr_id3;
const index_type offset_3 = l*m_rp.m_tile[3] + thr_id3 + (index_type)m_rp.m_lower[3];
if ( offset_3 < m_rp.m_upper[3] && thr_id3 < m_rp.m_tile[3] ) {
for ( index_type k = tile_id2; k < m_rp.m_tile_end[2]; k += numbl2 ) {
const index_type offset_2 = k*m_rp.m_tile[2] + thr_id2;
const index_type offset_2 = k*m_rp.m_tile[2] + thr_id2 + (index_type)m_rp.m_lower[2];
if ( offset_2 < m_rp.m_upper[2] && thr_id2 < m_rp.m_tile[2] ) {
for ( index_type j = tile_id1 ; j < m_rp.m_tile_end[1]; j += numbl1 ) {
const index_type offset_1 = j*m_rp.m_tile[1] + thr_id1;
const index_type offset_1 = j*m_rp.m_tile[1] + thr_id1 + (index_type)m_rp.m_lower[1];
if ( offset_1 < m_rp.m_upper[1] && thr_id1 < m_rp.m_tile[1] ) {
for ( index_type i = tile_id0 ; i < m_rp.m_tile_end[0]; i += numbl0 ) {
const index_type offset_0 = i*m_rp.m_tile[0] + thr_id0;
const index_type offset_0 = i*m_rp.m_tile[0] + thr_id0 + (index_type)m_rp.m_lower[0];
if ( offset_0 < m_rp.m_upper[0] && thr_id0 < m_rp.m_tile[0] ) {
m_func(Tag() , offset_0 , offset_1 , offset_2 , offset_3, offset_4);
}
@ -802,23 +766,23 @@ struct apply_impl<5,RP,Functor,Tag>
const index_type thr_id3 = threadIdx.y % m_rp.m_tile[3];
for ( index_type i = tile_id0; i < m_rp.m_tile_end[0]; i += numbl0 ) {
const index_type offset_0 = i*m_rp.m_tile[0] + thr_id0;
const index_type offset_0 = i*m_rp.m_tile[0] + thr_id0 + (index_type)m_rp.m_lower[0];
if ( offset_0 < m_rp.m_upper[0] && thr_id0 < m_rp.m_tile[0] ) {
for ( index_type j = tile_id1; j < m_rp.m_tile_end[1]; j += numbl1 ) {
const index_type offset_1 = j*m_rp.m_tile[1] + thr_id1;
const index_type offset_1 = j*m_rp.m_tile[1] + thr_id1 + (index_type)m_rp.m_lower[1];
if ( offset_1 < m_rp.m_upper[1] && thr_id1 < m_rp.m_tile[1] ) {
for ( index_type k = tile_id2; k < m_rp.m_tile_end[2]; k += numbl2 ) {
const index_type offset_2 = k*m_rp.m_tile[2] + thr_id2;
const index_type offset_2 = k*m_rp.m_tile[2] + thr_id2 + (index_type)m_rp.m_lower[2];
if ( offset_2 < m_rp.m_upper[2] && thr_id2 < m_rp.m_tile[2] ) {
for ( index_type l = tile_id3; l < m_rp.m_tile_end[3]; l += numbl3 ) {
const index_type offset_3 = l*m_rp.m_tile[3] + thr_id3;
const index_type offset_3 = l*m_rp.m_tile[3] + thr_id3 + (index_type)m_rp.m_lower[3];
if ( offset_3 < m_rp.m_upper[3] && thr_id3 < m_rp.m_tile[3] ) {
for ( index_type tile_id4 = blockIdx.z; tile_id4 < m_rp.m_tile_end[4]; tile_id4 += gridDim.z ) {
const index_type offset_4 = tile_id4*m_rp.m_tile[4] + threadIdx.z;
const index_type offset_4 = tile_id4*m_rp.m_tile[4] + (index_type)threadIdx.z + (index_type)m_rp.m_lower[4];
if ( offset_4 < m_rp.m_upper[4] && threadIdx.z < m_rp.m_tile[4] ) {
m_func(Tag() , offset_0 , offset_1 , offset_2 , offset_3 , offset_4);
}
@ -895,27 +859,27 @@ struct apply_impl<6,RP,Functor,void >
const index_type thr_id5 = threadIdx.z / m_rp.m_tile[4];
for ( index_type n = tile_id5; n < m_rp.m_tile_end[5]; n += numbl5 ) {
const index_type offset_5 = n*m_rp.m_tile[5] + thr_id5;
const index_type offset_5 = n*m_rp.m_tile[5] + thr_id5 + (index_type)m_rp.m_lower[5];
if ( offset_5 < m_rp.m_upper[5] && thr_id5 < m_rp.m_tile[5] ) {
for ( index_type m = tile_id4; m < m_rp.m_tile_end[4]; m += numbl4 ) {
const index_type offset_4 = m*m_rp.m_tile[4] + thr_id4;
const index_type offset_4 = m*m_rp.m_tile[4] + thr_id4 + (index_type)m_rp.m_lower[4];
if ( offset_4 < m_rp.m_upper[4] && thr_id4 < m_rp.m_tile[4] ) {
for ( index_type l = tile_id3; l < m_rp.m_tile_end[3]; l += numbl3 ) {
const index_type offset_3 = l*m_rp.m_tile[3] + thr_id3;
const index_type offset_3 = l*m_rp.m_tile[3] + thr_id3 + (index_type)m_rp.m_lower[3];
if ( offset_3 < m_rp.m_upper[3] && thr_id3 < m_rp.m_tile[3] ) {
for ( index_type k = tile_id2; k < m_rp.m_tile_end[2]; k += numbl2 ) {
const index_type offset_2 = k*m_rp.m_tile[2] + thr_id2;
const index_type offset_2 = k*m_rp.m_tile[2] + thr_id2 + (index_type)m_rp.m_lower[2];
if ( offset_2 < m_rp.m_upper[2] && thr_id2 < m_rp.m_tile[2] ) {
for ( index_type j = tile_id1 ; j < m_rp.m_tile_end[1]; j += numbl1 ) {
const index_type offset_1 = j*m_rp.m_tile[1] + thr_id1;
const index_type offset_1 = j*m_rp.m_tile[1] + thr_id1 + (index_type)m_rp.m_lower[1];
if ( offset_1 < m_rp.m_upper[1] && thr_id1 < m_rp.m_tile[1] ) {
for ( index_type i = tile_id0 ; i < m_rp.m_tile_end[0]; i += numbl0 ) {
const index_type offset_0 = i*m_rp.m_tile[0] + thr_id0;
const index_type offset_0 = i*m_rp.m_tile[0] + thr_id0 + (index_type)m_rp.m_lower[0];
if ( offset_0 < m_rp.m_upper[0] && thr_id0 < m_rp.m_tile[0] ) {
m_func(offset_0 , offset_1 , offset_2 , offset_3, offset_4, offset_5);
}
@ -967,27 +931,27 @@ struct apply_impl<6,RP,Functor,void >
const index_type thr_id5 = threadIdx.z % m_rp.m_tile[5];
for ( index_type i = tile_id0; i < m_rp.m_tile_end[0]; i += numbl0 ) {
const index_type offset_0 = i*m_rp.m_tile[0] + thr_id0;
const index_type offset_0 = i*m_rp.m_tile[0] + thr_id0 + (index_type)m_rp.m_lower[0];
if ( offset_0 < m_rp.m_upper[0] && thr_id0 < m_rp.m_tile[0] ) {
for ( index_type j = tile_id1; j < m_rp.m_tile_end[1]; j += numbl1 ) {
const index_type offset_1 = j*m_rp.m_tile[1] + thr_id1;
const index_type offset_1 = j*m_rp.m_tile[1] + thr_id1 + (index_type)m_rp.m_lower[1];
if ( offset_1 < m_rp.m_upper[1] && thr_id1 < m_rp.m_tile[1] ) {
for ( index_type k = tile_id2; k < m_rp.m_tile_end[2]; k += numbl2 ) {
const index_type offset_2 = k*m_rp.m_tile[2] + thr_id2;
const index_type offset_2 = k*m_rp.m_tile[2] + thr_id2 + (index_type)m_rp.m_lower[2];
if ( offset_2 < m_rp.m_upper[2] && thr_id2 < m_rp.m_tile[2] ) {
for ( index_type l = tile_id3; l < m_rp.m_tile_end[3]; l += numbl3 ) {
const index_type offset_3 = l*m_rp.m_tile[3] + thr_id3;
const index_type offset_3 = l*m_rp.m_tile[3] + thr_id3 + (index_type)m_rp.m_lower[3];
if ( offset_3 < m_rp.m_upper[3] && thr_id3 < m_rp.m_tile[3] ) {
for ( index_type m = tile_id4; m < m_rp.m_tile_end[4]; m += numbl4 ) {
const index_type offset_4 = m*m_rp.m_tile[4] + thr_id4;
const index_type offset_4 = m*m_rp.m_tile[4] + thr_id4 + (index_type)m_rp.m_lower[4];
if ( offset_4 < m_rp.m_upper[4] && thr_id4 < m_rp.m_tile[4] ) {
for ( index_type n = tile_id5; n < m_rp.m_tile_end[5]; n += numbl5 ) {
const index_type offset_5 = n*m_rp.m_tile[5] + thr_id5;
const index_type offset_5 = n*m_rp.m_tile[5] + thr_id5 + (index_type)m_rp.m_lower[5];
if ( offset_5 < m_rp.m_upper[5] && thr_id5 < m_rp.m_tile[5] ) {
m_func(offset_0 , offset_1 , offset_2 , offset_3 , offset_4 , offset_5);
}
@ -1064,27 +1028,27 @@ struct apply_impl<6,RP,Functor,Tag>
const index_type thr_id5 = threadIdx.z / m_rp.m_tile[4];
for ( index_type n = tile_id5; n < m_rp.m_tile_end[5]; n += numbl5 ) {
const index_type offset_5 = n*m_rp.m_tile[5] + thr_id5;
const index_type offset_5 = n*m_rp.m_tile[5] + thr_id5 + (index_type)m_rp.m_lower[5];
if ( offset_5 < m_rp.m_upper[5] && thr_id5 < m_rp.m_tile[5] ) {
for ( index_type m = tile_id4; m < m_rp.m_tile_end[4]; m += numbl4 ) {
const index_type offset_4 = m*m_rp.m_tile[4] + thr_id4;
const index_type offset_4 = m*m_rp.m_tile[4] + thr_id4 + (index_type)m_rp.m_lower[4];
if ( offset_4 < m_rp.m_upper[4] && thr_id4 < m_rp.m_tile[4] ) {
for ( index_type l = tile_id3; l < m_rp.m_tile_end[3]; l += numbl3 ) {
const index_type offset_3 = l*m_rp.m_tile[3] + thr_id3;
const index_type offset_3 = l*m_rp.m_tile[3] + thr_id3 + (index_type)m_rp.m_lower[3];
if ( offset_3 < m_rp.m_upper[3] && thr_id3 < m_rp.m_tile[3] ) {
for ( index_type k = tile_id2; k < m_rp.m_tile_end[2]; k += numbl2 ) {
const index_type offset_2 = k*m_rp.m_tile[2] + thr_id2;
const index_type offset_2 = k*m_rp.m_tile[2] + thr_id2 + (index_type)m_rp.m_lower[2];
if ( offset_2 < m_rp.m_upper[2] && thr_id2 < m_rp.m_tile[2] ) {
for ( index_type j = tile_id1 ; j < m_rp.m_tile_end[1]; j += numbl1 ) {
const index_type offset_1 = j*m_rp.m_tile[1] + thr_id1;
const index_type offset_1 = j*m_rp.m_tile[1] + thr_id1 + (index_type)m_rp.m_lower[1];
if ( offset_1 < m_rp.m_upper[1] && thr_id1 < m_rp.m_tile[1] ) {
for ( index_type i = tile_id0 ; i < m_rp.m_tile_end[0]; i += numbl0 ) {
const index_type offset_0 = i*m_rp.m_tile[0] + thr_id0;
const index_type offset_0 = i*m_rp.m_tile[0] + thr_id0 + (index_type)m_rp.m_lower[0];
if ( offset_0 < m_rp.m_upper[0] && thr_id0 < m_rp.m_tile[0] ) {
m_func(Tag() , offset_0 , offset_1 , offset_2 , offset_3, offset_4, offset_5);
}
@ -1136,27 +1100,27 @@ struct apply_impl<6,RP,Functor,Tag>
const index_type thr_id5 = threadIdx.z % m_rp.m_tile[5];
for ( index_type i = tile_id0; i < m_rp.m_tile_end[0]; i += numbl0 ) {
const index_type offset_0 = i*m_rp.m_tile[0] + thr_id0;
const index_type offset_0 = i*m_rp.m_tile[0] + thr_id0 + (index_type)m_rp.m_lower[0];
if ( offset_0 < m_rp.m_upper[0] && thr_id0 < m_rp.m_tile[0] ) {
for ( index_type j = tile_id1; j < m_rp.m_tile_end[1]; j += numbl1 ) {
const index_type offset_1 = j*m_rp.m_tile[1] + thr_id1;
const index_type offset_1 = j*m_rp.m_tile[1] + thr_id1 + (index_type)m_rp.m_lower[1];
if ( offset_1 < m_rp.m_upper[1] && thr_id1 < m_rp.m_tile[1] ) {
for ( index_type k = tile_id2; k < m_rp.m_tile_end[2]; k += numbl2 ) {
const index_type offset_2 = k*m_rp.m_tile[2] + thr_id2;
const index_type offset_2 = k*m_rp.m_tile[2] + thr_id2 + (index_type)m_rp.m_lower[2];
if ( offset_2 < m_rp.m_upper[2] && thr_id2 < m_rp.m_tile[2] ) {
for ( index_type l = tile_id3; l < m_rp.m_tile_end[3]; l += numbl3 ) {
const index_type offset_3 = l*m_rp.m_tile[3] + thr_id3;
const index_type offset_3 = l*m_rp.m_tile[3] + thr_id3 + (index_type)m_rp.m_lower[3];
if ( offset_3 < m_rp.m_upper[3] && thr_id3 < m_rp.m_tile[3] ) {
for ( index_type m = tile_id4; m < m_rp.m_tile_end[4]; m += numbl4 ) {
const index_type offset_4 = m*m_rp.m_tile[4] + thr_id4;
const index_type offset_4 = m*m_rp.m_tile[4] + thr_id4 + (index_type)m_rp.m_lower[4];
if ( offset_4 < m_rp.m_upper[4] && thr_id4 < m_rp.m_tile[4] ) {
for ( index_type n = tile_id5; n < m_rp.m_tile_end[5]; n += numbl5 ) {
const index_type offset_5 = n*m_rp.m_tile[5] + thr_id5;
const index_type offset_5 = n*m_rp.m_tile[5] + thr_id5 + (index_type)m_rp.m_lower[5];
if ( offset_5 < m_rp.m_upper[5] && thr_id5 < m_rp.m_tile[5] ) {
m_func(Tag() , offset_0 , offset_1 , offset_2 , offset_3 , offset_4 , offset_5);
}
@ -1292,7 +1256,7 @@ protected:
const Functor m_func;
};
} } } //end namespace Kokkos::Experimental::Impl
} } //end namespace Kokkos::Impl
#endif
#endif

View File

@ -63,7 +63,7 @@
#include <typeinfo>
#endif
namespace Kokkos { namespace Experimental { namespace Impl {
namespace Kokkos { namespace Impl {
namespace Refactor {
@ -2709,7 +2709,7 @@ private:
// ----------------------------------------------------------------------------------
} } } //end namespace Kokkos::Experimental::Impl
} } //end namespace Kokkos::Impl
#endif
#endif

View File

@ -164,7 +164,7 @@ static void cuda_parallel_launch_constant_memory()
template< class DriverType, unsigned int maxTperB, unsigned int minBperSM >
__global__
//__launch_bounds__(maxTperB, minBperSM)
__launch_bounds__(maxTperB, minBperSM)
static void cuda_parallel_launch_constant_memory()
{
const DriverType & driver =
@ -182,7 +182,7 @@ static void cuda_parallel_launch_local_memory( const DriverType driver )
template< class DriverType, unsigned int maxTperB, unsigned int minBperSM >
__global__
//__launch_bounds__(maxTperB, minBperSM)
__launch_bounds__(maxTperB, minBperSM)
static void cuda_parallel_launch_local_memory( const DriverType driver )
{
driver();
@ -193,9 +193,14 @@ template < class DriverType
, bool Large = ( CudaTraits::ConstantMemoryUseThreshold < sizeof(DriverType) ) >
struct CudaParallelLaunch ;
template < class DriverType, class LaunchBounds >
struct CudaParallelLaunch< DriverType, LaunchBounds, true > {
template < class DriverType
, unsigned int MaxThreadsPerBlock
, unsigned int MinBlocksPerSM >
struct CudaParallelLaunch< DriverType
, Kokkos::LaunchBounds< MaxThreadsPerBlock
, MinBlocksPerSM >
, true >
{
inline
CudaParallelLaunch( const DriverType & driver
, const dim3 & grid
@ -216,21 +221,28 @@ struct CudaParallelLaunch< DriverType, LaunchBounds, true > {
if ( CudaTraits::SharedMemoryCapacity < shmem ) {
Kokkos::Impl::throw_runtime_exception( std::string("CudaParallelLaunch FAILED: shared memory request is too large") );
}
#ifndef KOKKOS_ARCH_KEPLER //On Kepler the L1 has no benefit since it doesn't cache reads
else if ( shmem ) {
CUDA_SAFE_CALL( cudaFuncSetCacheConfig( cuda_parallel_launch_constant_memory< DriverType, LaunchBounds::maxTperB, LaunchBounds::minBperSM > , cudaFuncCachePreferShared ) );
} else {
CUDA_SAFE_CALL( cudaFuncSetCacheConfig( cuda_parallel_launch_constant_memory< DriverType, LaunchBounds::maxTperB, LaunchBounds::minBperSM > , cudaFuncCachePreferL1 ) );
#ifndef KOKKOS_ARCH_KEPLER
// On Kepler the L1 has no benefit since it doesn't cache reads
else {
CUDA_SAFE_CALL(
cudaFuncSetCacheConfig
( cuda_parallel_launch_constant_memory
< DriverType, MaxThreadsPerBlock, MinBlocksPerSM >
, ( shmem ? cudaFuncCachePreferShared : cudaFuncCachePreferL1 )
) );
}
#endif
// Copy functor to constant memory on the device
cudaMemcpyToSymbol( kokkos_impl_cuda_constant_memory_buffer , & driver , sizeof(DriverType) );
cudaMemcpyToSymbol(
kokkos_impl_cuda_constant_memory_buffer, &driver, sizeof(DriverType) );
KOKKOS_ENSURE_CUDA_LOCK_ARRAYS_ON_DEVICE();
// Invoke the driver function on the device
cuda_parallel_launch_constant_memory< DriverType, LaunchBounds::maxTperB, LaunchBounds::minBperSM ><<< grid , block , shmem , stream >>>();
cuda_parallel_launch_constant_memory
< DriverType, MaxThreadsPerBlock, MinBlocksPerSM >
<<< grid , block , shmem , stream >>>();
#if defined( KOKKOS_ENABLE_DEBUG_BOUNDS_CHECK )
CUDA_SAFE_CALL( cudaGetLastError() );
@ -240,9 +252,11 @@ struct CudaParallelLaunch< DriverType, LaunchBounds, true > {
}
};
template < class DriverType, class LaunchBounds >
struct CudaParallelLaunch< DriverType, LaunchBounds, false > {
template < class DriverType >
struct CudaParallelLaunch< DriverType
, Kokkos::LaunchBounds<>
, true >
{
inline
CudaParallelLaunch( const DriverType & driver
, const dim3 & grid
@ -252,20 +266,136 @@ struct CudaParallelLaunch< DriverType, LaunchBounds, false > {
{
if ( grid.x && ( block.x * block.y * block.z ) ) {
if ( sizeof( Kokkos::Impl::CudaTraits::ConstantGlobalBufferType ) <
sizeof( DriverType ) ) {
Kokkos::Impl::throw_runtime_exception( std::string("CudaParallelLaunch FAILED: Functor is too large") );
}
// Fence before changing settings and copying closure
Kokkos::Cuda::fence();
if ( CudaTraits::SharedMemoryCapacity < shmem ) {
Kokkos::Impl::throw_runtime_exception( std::string("CudaParallelLaunch FAILED: shared memory request is too large") );
}
#ifndef KOKKOS_ARCH_KEPLER //On Kepler the L1 has no benefit since it doesn't cache reads
else if ( shmem ) {
CUDA_SAFE_CALL( cudaFuncSetCacheConfig( cuda_parallel_launch_local_memory< DriverType, LaunchBounds::maxTperB, LaunchBounds::minBperSM > , cudaFuncCachePreferShared ) );
} else {
CUDA_SAFE_CALL( cudaFuncSetCacheConfig( cuda_parallel_launch_local_memory< DriverType, LaunchBounds::maxTperB, LaunchBounds::minBperSM > , cudaFuncCachePreferL1 ) );
#ifndef KOKKOS_ARCH_KEPLER
// On Kepler the L1 has no benefit since it doesn't cache reads
else {
CUDA_SAFE_CALL(
cudaFuncSetCacheConfig
( cuda_parallel_launch_constant_memory< DriverType >
, ( shmem ? cudaFuncCachePreferShared : cudaFuncCachePreferL1 )
) );
}
#endif
// Copy functor to constant memory on the device
cudaMemcpyToSymbol(
kokkos_impl_cuda_constant_memory_buffer, &driver, sizeof(DriverType) );
KOKKOS_ENSURE_CUDA_LOCK_ARRAYS_ON_DEVICE();
// Invoke the driver function on the device
cuda_parallel_launch_constant_memory< DriverType >
<<< grid , block , shmem , stream >>>();
#if defined( KOKKOS_ENABLE_DEBUG_BOUNDS_CHECK )
CUDA_SAFE_CALL( cudaGetLastError() );
Kokkos::Cuda::fence();
#endif
}
}
};
template < class DriverType
, unsigned int MaxThreadsPerBlock
, unsigned int MinBlocksPerSM >
struct CudaParallelLaunch< DriverType
, Kokkos::LaunchBounds< MaxThreadsPerBlock
, MinBlocksPerSM >
, false >
{
inline
CudaParallelLaunch( const DriverType & driver
, const dim3 & grid
, const dim3 & block
, const int shmem
, const cudaStream_t stream = 0 )
{
if ( grid.x && ( block.x * block.y * block.z ) ) {
if ( sizeof( Kokkos::Impl::CudaTraits::ConstantGlobalBufferType ) <
sizeof( DriverType ) ) {
Kokkos::Impl::throw_runtime_exception( std::string("CudaParallelLaunch FAILED: Functor is too large") );
}
if ( CudaTraits::SharedMemoryCapacity < shmem ) {
Kokkos::Impl::throw_runtime_exception( std::string("CudaParallelLaunch FAILED: shared memory request is too large") );
}
#ifndef KOKKOS_ARCH_KEPLER
// On Kepler the L1 has no benefit since it doesn't cache reads
else {
CUDA_SAFE_CALL(
cudaFuncSetCacheConfig
( cuda_parallel_launch_local_memory
< DriverType, MaxThreadsPerBlock, MinBlocksPerSM >
, ( shmem ? cudaFuncCachePreferShared : cudaFuncCachePreferL1 )
) );
}
#endif
KOKKOS_ENSURE_CUDA_LOCK_ARRAYS_ON_DEVICE();
cuda_parallel_launch_local_memory< DriverType, LaunchBounds::maxTperB, LaunchBounds::minBperSM ><<< grid , block , shmem , stream >>>( driver );
// Invoke the driver function on the device
cuda_parallel_launch_local_memory
< DriverType, MaxThreadsPerBlock, MinBlocksPerSM >
<<< grid , block , shmem , stream >>>( driver );
#if defined( KOKKOS_ENABLE_DEBUG_BOUNDS_CHECK )
CUDA_SAFE_CALL( cudaGetLastError() );
Kokkos::Cuda::fence();
#endif
}
}
};
template < class DriverType >
struct CudaParallelLaunch< DriverType
, Kokkos::LaunchBounds<>
, false >
{
inline
CudaParallelLaunch( const DriverType & driver
, const dim3 & grid
, const dim3 & block
, const int shmem
, const cudaStream_t stream = 0 )
{
if ( grid.x && ( block.x * block.y * block.z ) ) {
if ( sizeof( Kokkos::Impl::CudaTraits::ConstantGlobalBufferType ) <
sizeof( DriverType ) ) {
Kokkos::Impl::throw_runtime_exception( std::string("CudaParallelLaunch FAILED: Functor is too large") );
}
if ( CudaTraits::SharedMemoryCapacity < shmem ) {
Kokkos::Impl::throw_runtime_exception( std::string("CudaParallelLaunch FAILED: shared memory request is too large") );
}
#ifndef KOKKOS_ARCH_KEPLER
// On Kepler the L1 has no benefit since it doesn't cache reads
else {
CUDA_SAFE_CALL(
cudaFuncSetCacheConfig
( cuda_parallel_launch_local_memory< DriverType >
, ( shmem ? cudaFuncCachePreferShared : cudaFuncCachePreferL1 )
) );
}
#endif
KOKKOS_ENSURE_CUDA_LOCK_ARRAYS_ON_DEVICE();
// Invoke the driver function on the device
cuda_parallel_launch_local_memory< DriverType >
<<< grid , block , shmem , stream >>>( driver );
#if defined( KOKKOS_ENABLE_DEBUG_BOUNDS_CHECK )
CUDA_SAFE_CALL( cudaGetLastError() );

View File

@ -366,7 +366,7 @@ SharedAllocationRecord< Kokkos::CudaSpace , void >::
if(Kokkos::Profiling::profileLibraryLoaded()) {
SharedAllocationHeader header ;
Kokkos::Impl::DeepCopy<CudaSpace,HostSpace>::DeepCopy( & header , RecordBase::m_alloc_ptr , sizeof(SharedAllocationHeader) );
Kokkos::Impl::DeepCopy<CudaSpace,HostSpace>( & header , RecordBase::m_alloc_ptr , sizeof(SharedAllocationHeader) );
Kokkos::Profiling::deallocateData(
Kokkos::Profiling::SpaceHandle(Kokkos::CudaSpace::name()),header.m_label,
@ -446,7 +446,7 @@ SharedAllocationRecord( const Kokkos::CudaSpace & arg_space
);
// Copy to device memory
Kokkos::Impl::DeepCopy<CudaSpace,HostSpace>::DeepCopy( RecordBase::m_alloc_ptr , & header , sizeof(SharedAllocationHeader) );
Kokkos::Impl::DeepCopy<CudaSpace,HostSpace>( RecordBase::m_alloc_ptr , & header , sizeof(SharedAllocationHeader) );
}
SharedAllocationRecord< Kokkos::CudaUVMSpace , void >::
@ -655,7 +655,7 @@ SharedAllocationRecord< Kokkos::CudaSpace , void >::get_record( void * alloc_ptr
Header const * const head_cuda = alloc_ptr ? Header::get_header( alloc_ptr ) : (Header*) 0 ;
if ( alloc_ptr ) {
Kokkos::Impl::DeepCopy<HostSpace,CudaSpace>::DeepCopy( & head , head_cuda , sizeof(SharedAllocationHeader) );
Kokkos::Impl::DeepCopy<HostSpace,CudaSpace>( & head , head_cuda , sizeof(SharedAllocationHeader) );
}
RecordCuda * const record = alloc_ptr ? static_cast< RecordCuda * >( head.m_record ) : (RecordCuda *) 0 ;
@ -713,7 +713,7 @@ SharedAllocationRecord< Kokkos::CudaHostPinnedSpace , void >::get_record( void *
// Iterate records to print orphaned memory ...
void
SharedAllocationRecord< Kokkos::CudaSpace , void >::
print_records( std::ostream & s , const Kokkos::CudaSpace & space , bool detail )
print_records( std::ostream & s , const Kokkos::CudaSpace & , bool detail )
{
SharedAllocationRecord< void , void > * r = & s_root_record ;
@ -724,7 +724,7 @@ print_records( std::ostream & s , const Kokkos::CudaSpace & space , bool detail
if ( detail ) {
do {
if ( r->m_alloc_ptr ) {
Kokkos::Impl::DeepCopy<HostSpace,CudaSpace>::DeepCopy( & head , r->m_alloc_ptr , sizeof(SharedAllocationHeader) );
Kokkos::Impl::DeepCopy<HostSpace,CudaSpace>( & head , r->m_alloc_ptr , sizeof(SharedAllocationHeader) );
}
else {
head.m_label[0] = 0 ;
@ -751,7 +751,7 @@ print_records( std::ostream & s , const Kokkos::CudaSpace & space , bool detail
, reinterpret_cast<uintptr_t>( r->m_dealloc )
, head.m_label
);
std::cout << buffer ;
s << buffer ;
r = r->m_next ;
} while ( r != & s_root_record );
}
@ -759,7 +759,7 @@ print_records( std::ostream & s , const Kokkos::CudaSpace & space , bool detail
do {
if ( r->m_alloc_ptr ) {
Kokkos::Impl::DeepCopy<HostSpace,CudaSpace>::DeepCopy( & head , r->m_alloc_ptr , sizeof(SharedAllocationHeader) );
Kokkos::Impl::DeepCopy<HostSpace,CudaSpace>( & head , r->m_alloc_ptr , sizeof(SharedAllocationHeader) );
//Formatting dependent on sizeof(uintptr_t)
const char * format_string;
@ -781,7 +781,7 @@ print_records( std::ostream & s , const Kokkos::CudaSpace & space , bool detail
else {
snprintf( buffer , 256 , "Cuda [ 0 + 0 ]\n" );
}
std::cout << buffer ;
s << buffer ;
r = r->m_next ;
} while ( r != & s_root_record );
}
@ -789,14 +789,14 @@ print_records( std::ostream & s , const Kokkos::CudaSpace & space , bool detail
void
SharedAllocationRecord< Kokkos::CudaUVMSpace , void >::
print_records( std::ostream & s , const Kokkos::CudaUVMSpace & space , bool detail )
print_records( std::ostream & s , const Kokkos::CudaUVMSpace & , bool detail )
{
SharedAllocationRecord< void , void >::print_host_accessible_records( s , "CudaUVM" , & s_root_record , detail );
}
void
SharedAllocationRecord< Kokkos::CudaHostPinnedSpace , void >::
print_records( std::ostream & s , const Kokkos::CudaHostPinnedSpace & space , bool detail )
print_records( std::ostream & s , const Kokkos::CudaHostPinnedSpace & , bool detail )
{
SharedAllocationRecord< void , void >::print_host_accessible_records( s , "CudaHostPinned" , & s_root_record , detail );
}

View File

@ -421,7 +421,7 @@ void CudaInternal::initialize( int cuda_device_id , int stream_count )
std::string msg = ss.str();
Kokkos::abort( msg.c_str() );
}
if ( compiled_major != cudaProp.major || compiled_minor != cudaProp.minor ) {
if ( Kokkos::show_warnings() && (compiled_major != cudaProp.major || compiled_minor != cudaProp.minor) ) {
std::cerr << "Kokkos::Cuda::initialize WARNING: running kernels compiled for compute capability "
<< compiled_major << "." << compiled_minor
<< " on device with compute capability "
@ -467,7 +467,7 @@ void CudaInternal::initialize( int cuda_device_id , int stream_count )
m_scratchUnifiedSupported = cudaProp.unifiedAddressing ;
if ( ! m_scratchUnifiedSupported ) {
if ( Kokkos::show_warnings() && ! m_scratchUnifiedSupported ) {
std::cout << "Kokkos::Cuda device "
<< cudaProp.name << " capability "
<< cudaProp.major << "." << cudaProp.minor
@ -545,7 +545,7 @@ void CudaInternal::initialize( int cuda_device_id , int stream_count )
}
#ifdef KOKKOS_ENABLE_CUDA_UVM
if(!cuda_launch_blocking()) {
if( Kokkos::show_warnings() && !cuda_launch_blocking() ) {
std::cout << "Kokkos::Cuda::initialize WARNING: Cuda is allocating into UVMSpace by default" << std::endl;
std::cout << " without setting CUDA_LAUNCH_BLOCKING=1." << std::endl;
std::cout << " The code must call Cuda::fence() after each kernel" << std::endl;
@ -561,7 +561,7 @@ void CudaInternal::initialize( int cuda_device_id , int stream_count )
bool visible_devices_one=true;
if (env_visible_devices == 0) visible_devices_one=false;
if(!visible_devices_one && !force_device_alloc) {
if( Kokkos::show_warnings() && (!visible_devices_one && !force_device_alloc) ) {
std::cout << "Kokkos::Cuda::initialize WARNING: Cuda is allocating into UVMSpace by default" << std::endl;
std::cout << " without setting CUDA_MANAGED_FORCE_DEVICE_ALLOC=1 or " << std::endl;
std::cout << " setting CUDA_VISIBLE_DEVICES." << std::endl;

View File

@ -381,12 +381,12 @@ public:
// MDRangePolicy impl
template< class FunctorType , class ... Traits >
class ParallelFor< FunctorType
, Kokkos::Experimental::MDRangePolicy< Traits ... >
, Kokkos::MDRangePolicy< Traits ... >
, Kokkos::Cuda
>
{
private:
typedef Kokkos::Experimental::MDRangePolicy< Traits ... > Policy ;
typedef Kokkos::MDRangePolicy< Traits ... > Policy ;
using RP = Policy;
typedef typename Policy::array_index_type array_index_type;
typedef typename Policy::index_type index_type;
@ -402,7 +402,7 @@ public:
__device__
void operator()(void) const
{
Kokkos::Experimental::Impl::Refactor::DeviceIterateTile<Policy::rank,Policy,FunctorType,typename Policy::work_tag>(m_rp,m_functor).exec_range();
Kokkos::Impl::Refactor::DeviceIterateTile<Policy::rank,Policy,FunctorType,typename Policy::work_tag>(m_rp,m_functor).exec_range();
}
@ -648,10 +648,11 @@ private:
typedef Kokkos::Impl::if_c< std::is_same<InvalidType,ReducerType>::value, FunctorType, ReducerType> ReducerConditional;
typedef typename ReducerConditional::type ReducerTypeFwd;
typedef typename Kokkos::Impl::if_c< std::is_same<InvalidType,ReducerType>::value, WorkTag, void>::type WorkTagFwd;
typedef Kokkos::Impl::FunctorValueTraits< ReducerTypeFwd, WorkTag > ValueTraits ;
typedef Kokkos::Impl::FunctorValueInit< ReducerTypeFwd, WorkTag > ValueInit ;
typedef Kokkos::Impl::FunctorValueJoin< ReducerTypeFwd, WorkTag > ValueJoin ;
typedef Kokkos::Impl::FunctorValueTraits< ReducerTypeFwd, WorkTagFwd > ValueTraits ;
typedef Kokkos::Impl::FunctorValueInit< ReducerTypeFwd, WorkTagFwd > ValueInit ;
typedef Kokkos::Impl::FunctorValueJoin< ReducerTypeFwd, WorkTagFwd > ValueJoin ;
public:
@ -721,7 +722,7 @@ public:
}
// Reduce with final value at blockDim.y - 1 location.
if ( cuda_single_inter_block_reduce_scan<false,ReducerTypeFwd,WorkTag>(
if ( cuda_single_inter_block_reduce_scan<false,ReducerTypeFwd,WorkTagFwd>(
ReducerConditional::select(m_functor , m_reducer) , blockIdx.x , gridDim.x ,
kokkos_impl_cuda_shared_memory<size_type>() , m_scratch_space , m_scratch_flags ) ) {
@ -731,7 +732,7 @@ public:
size_type * const global = m_unified_space ? m_unified_space : m_scratch_space ;
if ( threadIdx.y == 0 ) {
Kokkos::Impl::FunctorFinal< ReducerTypeFwd , WorkTag >::final( ReducerConditional::select(m_functor , m_reducer) , shared );
Kokkos::Impl::FunctorFinal< ReducerTypeFwd , WorkTagFwd >::final( ReducerConditional::select(m_functor , m_reducer) , shared );
}
if ( CudaTraits::WarpSize < word_count.value ) { __syncthreads(); }
@ -766,11 +767,11 @@ public:
value_type init;
ValueInit::init( ReducerConditional::select(m_functor , m_reducer) , &init);
if(Impl::cuda_inter_block_reduction<ReducerTypeFwd,ValueJoin,WorkTag>
if(Impl::cuda_inter_block_reduction<ReducerTypeFwd,ValueJoin,WorkTagFwd>
(value,init,ValueJoin(ReducerConditional::select(m_functor , m_reducer)),m_scratch_space,result,m_scratch_flags,max_active_thread)) {
const unsigned id = threadIdx.y*blockDim.x + threadIdx.x;
if(id==0) {
Kokkos::Impl::FunctorFinal< ReducerTypeFwd , WorkTag >::final( ReducerConditional::select(m_functor , m_reducer) , (void*) &value );
Kokkos::Impl::FunctorFinal< ReducerTypeFwd , WorkTagFwd >::final( ReducerConditional::select(m_functor , m_reducer) , (void*) &value );
*result = value;
}
}
@ -858,14 +859,14 @@ public:
// MDRangePolicy impl
template< class FunctorType , class ReducerType, class ... Traits >
class ParallelReduce< FunctorType
, Kokkos::Experimental::MDRangePolicy< Traits ... >
, Kokkos::MDRangePolicy< Traits ... >
, ReducerType
, Kokkos::Cuda
>
{
private:
typedef Kokkos::Experimental::MDRangePolicy< Traits ... > Policy ;
typedef Kokkos::MDRangePolicy< Traits ... > Policy ;
typedef typename Policy::array_index_type array_index_type;
typedef typename Policy::index_type index_type;
@ -875,10 +876,11 @@ private:
typedef Kokkos::Impl::if_c< std::is_same<InvalidType,ReducerType>::value, FunctorType, ReducerType> ReducerConditional;
typedef typename ReducerConditional::type ReducerTypeFwd;
typedef typename Kokkos::Impl::if_c< std::is_same<InvalidType,ReducerType>::value, WorkTag, void>::type WorkTagFwd;
typedef Kokkos::Impl::FunctorValueTraits< ReducerTypeFwd, WorkTag > ValueTraits ;
typedef Kokkos::Impl::FunctorValueInit< ReducerTypeFwd, WorkTag > ValueInit ;
typedef Kokkos::Impl::FunctorValueJoin< ReducerTypeFwd, WorkTag > ValueJoin ;
typedef Kokkos::Impl::FunctorValueTraits< ReducerTypeFwd, WorkTagFwd > ValueTraits ;
typedef Kokkos::Impl::FunctorValueInit< ReducerTypeFwd, WorkTagFwd > ValueInit ;
typedef Kokkos::Impl::FunctorValueJoin< ReducerTypeFwd, WorkTagFwd > ValueJoin ;
public:
@ -898,7 +900,7 @@ public:
size_type * m_scratch_flags ;
size_type * m_unified_space ;
typedef typename Kokkos::Experimental::Impl::Reduce::DeviceIterateTile<Policy::rank, Policy, FunctorType, typename Policy::work_tag, reference_type> DeviceIteratePattern;
typedef typename Kokkos::Impl::Reduce::DeviceIterateTile<Policy::rank, Policy, FunctorType, typename Policy::work_tag, reference_type> DeviceIteratePattern;
// Shall we use the shfl based reduction or not (only use it for static sized types of more than 128bit
enum { UseShflReduction = ((sizeof(value_type)>2*sizeof(double)) && ValueTraits::StaticValueSize) };
@ -913,7 +915,7 @@ public:
void
exec_range( reference_type update ) const
{
Kokkos::Experimental::Impl::Reduce::DeviceIterateTile<Policy::rank,Policy,FunctorType,typename Policy::work_tag, reference_type>(m_policy, m_functor, update).exec_range();
Kokkos::Impl::Reduce::DeviceIterateTile<Policy::rank,Policy,FunctorType,typename Policy::work_tag, reference_type>(m_policy, m_functor, update).exec_range();
}
inline
@ -942,7 +944,7 @@ public:
// Reduce with final value at blockDim.y - 1 location.
// Problem: non power-of-two blockDim
if ( cuda_single_inter_block_reduce_scan<false,ReducerTypeFwd,WorkTag>(
if ( cuda_single_inter_block_reduce_scan<false,ReducerTypeFwd,WorkTagFwd>(
ReducerConditional::select(m_functor , m_reducer) , blockIdx.x , gridDim.x ,
kokkos_impl_cuda_shared_memory<size_type>() , m_scratch_space , m_scratch_flags ) ) {
@ -951,7 +953,7 @@ public:
size_type * const global = m_unified_space ? m_unified_space : m_scratch_space ;
if ( threadIdx.y == 0 ) {
Kokkos::Impl::FunctorFinal< ReducerTypeFwd , WorkTag >::final( ReducerConditional::select(m_functor , m_reducer) , shared );
Kokkos::Impl::FunctorFinal< ReducerTypeFwd , WorkTagFwd >::final( ReducerConditional::select(m_functor , m_reducer) , shared );
}
if ( CudaTraits::WarpSize < word_count.value ) { __syncthreads(); }
@ -983,11 +985,11 @@ public:
value_type init;
ValueInit::init( ReducerConditional::select(m_functor , m_reducer) , &init);
if(Impl::cuda_inter_block_reduction<ReducerTypeFwd,ValueJoin,WorkTag>
if(Impl::cuda_inter_block_reduction<ReducerTypeFwd,ValueJoin,WorkTagFwd>
(value,init,ValueJoin(ReducerConditional::select(m_functor , m_reducer)),m_scratch_space,result,m_scratch_flags,max_active_thread)) {
const unsigned id = threadIdx.y*blockDim.x + threadIdx.x;
if(id==0) {
Kokkos::Impl::FunctorFinal< ReducerTypeFwd , WorkTag >::final( ReducerConditional::select(m_functor , m_reducer) , (void*) &value );
Kokkos::Impl::FunctorFinal< ReducerTypeFwd , WorkTagFwd >::final( ReducerConditional::select(m_functor , m_reducer) , (void*) &value );
*result = value;
}
}
@ -1100,10 +1102,11 @@ private:
typedef Kokkos::Impl::if_c< std::is_same<InvalidType,ReducerType>::value, FunctorType, ReducerType> ReducerConditional;
typedef typename ReducerConditional::type ReducerTypeFwd;
typedef typename Kokkos::Impl::if_c< std::is_same<InvalidType,ReducerType>::value, WorkTag, void>::type WorkTagFwd;
typedef Kokkos::Impl::FunctorValueTraits< ReducerTypeFwd, WorkTag > ValueTraits ;
typedef Kokkos::Impl::FunctorValueInit< ReducerTypeFwd, WorkTag > ValueInit ;
typedef Kokkos::Impl::FunctorValueJoin< ReducerTypeFwd, WorkTag > ValueJoin ;
typedef Kokkos::Impl::FunctorValueTraits< ReducerTypeFwd, WorkTagFwd > ValueTraits ;
typedef Kokkos::Impl::FunctorValueInit< ReducerTypeFwd, WorkTagFwd > ValueInit ;
typedef Kokkos::Impl::FunctorValueJoin< ReducerTypeFwd, WorkTagFwd > ValueJoin ;
typedef typename ValueTraits::pointer_type pointer_type ;
typedef typename ValueTraits::reference_type reference_type ;
@ -1222,7 +1225,7 @@ public:
size_type * const global = m_unified_space ? m_unified_space : m_scratch_space ;
if ( threadIdx.y == 0 ) {
Kokkos::Impl::FunctorFinal< ReducerTypeFwd , WorkTag >::final( ReducerConditional::select(m_functor , m_reducer) , shared );
Kokkos::Impl::FunctorFinal< ReducerTypeFwd , WorkTagFwd >::final( ReducerConditional::select(m_functor , m_reducer) , shared );
}
if ( CudaTraits::WarpSize < word_count.value ) { __syncthreads(); }
@ -1260,7 +1263,7 @@ public:
(value,init,ValueJoin(ReducerConditional::select(m_functor , m_reducer)),m_scratch_space,result,m_scratch_flags,blockDim.y)) {
const unsigned id = threadIdx.y*blockDim.x + threadIdx.x;
if(id==0) {
Kokkos::Impl::FunctorFinal< ReducerTypeFwd , WorkTag >::final( ReducerConditional::select(m_functor , m_reducer) , (void*) &value );
Kokkos::Impl::FunctorFinal< ReducerTypeFwd , WorkTagFwd >::final( ReducerConditional::select(m_functor , m_reducer) , (void*) &value );
*result = value;
}
}

View File

@ -69,7 +69,7 @@ void cuda_shfl( T & out , T const & in , int lane ,
typename std::enable_if< sizeof(int) == sizeof(T) , int >::type width )
{
*reinterpret_cast<int*>(&out) =
__shfl( *reinterpret_cast<int const *>(&in) , lane , width );
KOKKOS_IMPL_CUDA_SHFL( *reinterpret_cast<int const *>(&in) , lane , width );
}
template< typename T >
@ -83,7 +83,7 @@ void cuda_shfl( T & out , T const & in , int lane ,
for ( int i = 0 ; i < N ; ++i ) {
reinterpret_cast<int*>(&out)[i] =
__shfl( reinterpret_cast<int const *>(&in)[i] , lane , width );
KOKKOS_IMPL_CUDA_SHFL( reinterpret_cast<int const *>(&in)[i] , lane , width );
}
}
@ -95,7 +95,7 @@ void cuda_shfl_down( T & out , T const & in , int delta ,
typename std::enable_if< sizeof(int) == sizeof(T) , int >::type width )
{
*reinterpret_cast<int*>(&out) =
__shfl_down( *reinterpret_cast<int const *>(&in) , delta , width );
KOKKOS_IMPL_CUDA_SHFL_DOWN( *reinterpret_cast<int const *>(&in) , delta , width );
}
template< typename T >
@ -109,7 +109,7 @@ void cuda_shfl_down( T & out , T const & in , int delta ,
for ( int i = 0 ; i < N ; ++i ) {
reinterpret_cast<int*>(&out)[i] =
__shfl_down( reinterpret_cast<int const *>(&in)[i] , delta , width );
KOKKOS_IMPL_CUDA_SHFL_DOWN( reinterpret_cast<int const *>(&in)[i] , delta , width );
}
}
@ -121,7 +121,7 @@ void cuda_shfl_up( T & out , T const & in , int delta ,
typename std::enable_if< sizeof(int) == sizeof(T) , int >::type width )
{
*reinterpret_cast<int*>(&out) =
__shfl_up( *reinterpret_cast<int const *>(&in) , delta , width );
KOKKOS_IMPL_CUDA_SHFL_UP( *reinterpret_cast<int const *>(&in) , delta , width );
}
template< typename T >
@ -135,7 +135,7 @@ void cuda_shfl_up( T & out , T const & in , int delta ,
for ( int i = 0 ; i < N ; ++i ) {
reinterpret_cast<int*>(&out)[i] =
__shfl_up( reinterpret_cast<int const *>(&in)[i] , delta , width );
KOKKOS_IMPL_CUDA_SHFL_UP( reinterpret_cast<int const *>(&in)[i] , delta , width );
}
}
@ -268,31 +268,31 @@ bool cuda_inter_block_reduction( typename FunctorValueTraits< FunctorType , ArgT
if( id + 1 < int(gridDim.x) )
join(value, tmp);
}
int active = __ballot(1);
int active = KOKKOS_IMPL_CUDA_BALLOT(1);
if (int(blockDim.x*blockDim.y) > 2) {
value_type tmp = Kokkos::shfl_down(value, 2,32);
if( id + 2 < int(gridDim.x) )
join(value, tmp);
}
active += __ballot(1);
active += KOKKOS_IMPL_CUDA_BALLOT(1);
if (int(blockDim.x*blockDim.y) > 4) {
value_type tmp = Kokkos::shfl_down(value, 4,32);
if( id + 4 < int(gridDim.x) )
join(value, tmp);
}
active += __ballot(1);
active += KOKKOS_IMPL_CUDA_BALLOT(1);
if (int(blockDim.x*blockDim.y) > 8) {
value_type tmp = Kokkos::shfl_down(value, 8,32);
if( id + 8 < int(gridDim.x) )
join(value, tmp);
}
active += __ballot(1);
active += KOKKOS_IMPL_CUDA_BALLOT(1);
if (int(blockDim.x*blockDim.y) > 16) {
value_type tmp = Kokkos::shfl_down(value, 16,32);
if( id + 16 < int(gridDim.x) )
join(value, tmp);
}
active += __ballot(1);
active += KOKKOS_IMPL_CUDA_BALLOT(1);
}
}
//The last block has in its thread=0 the global reduction value through "value"
@ -432,31 +432,31 @@ cuda_inter_block_reduction( const ReducerType& reducer,
if( id + 1 < int(gridDim.x) )
reducer.join(value, tmp);
}
int active = __ballot(1);
int active = KOKKOS_IMPL_CUDA_BALLOT(1);
if (int(blockDim.x*blockDim.y) > 2) {
value_type tmp = Kokkos::shfl_down(value, 2,32);
if( id + 2 < int(gridDim.x) )
reducer.join(value, tmp);
}
active += __ballot(1);
active += KOKKOS_IMPL_CUDA_BALLOT(1);
if (int(blockDim.x*blockDim.y) > 4) {
value_type tmp = Kokkos::shfl_down(value, 4,32);
if( id + 4 < int(gridDim.x) )
reducer.join(value, tmp);
}
active += __ballot(1);
active += KOKKOS_IMPL_CUDA_BALLOT(1);
if (int(blockDim.x*blockDim.y) > 8) {
value_type tmp = Kokkos::shfl_down(value, 8,32);
if( id + 8 < int(gridDim.x) )
reducer.join(value, tmp);
}
active += __ballot(1);
active += KOKKOS_IMPL_CUDA_BALLOT(1);
if (int(blockDim.x*blockDim.y) > 16) {
value_type tmp = Kokkos::shfl_down(value, 16,32);
if( id + 16 < int(gridDim.x) )
reducer.join(value, tmp);
}
active += __ballot(1);
active += KOKKOS_IMPL_CUDA_BALLOT(1);
}
}

View File

@ -73,16 +73,16 @@ public:
KOKKOS_INLINE_FUNCTION
UniqueToken() : m_buffer(0), m_count(0) {}
KOKKOS_INLINE_FUNCTION
KOKKOS_FUNCTION_DEFAULTED
UniqueToken( const UniqueToken & ) = default;
KOKKOS_INLINE_FUNCTION
KOKKOS_FUNCTION_DEFAULTED
UniqueToken( UniqueToken && ) = default;
KOKKOS_INLINE_FUNCTION
KOKKOS_FUNCTION_DEFAULTED
UniqueToken & operator=( const UniqueToken & ) = default ;
KOKKOS_INLINE_FUNCTION
KOKKOS_FUNCTION_DEFAULTED
UniqueToken & operator=( UniqueToken && ) = default ;
/// \brief upper bound for acquired values, i.e. 0 <= value < size()

View File

@ -47,7 +47,7 @@
#ifdef KOKKOS_ENABLE_CUDA
#include <Kokkos_Cuda.hpp>
#include <Cuda/Kokkos_Cuda_Version_9_8_Compatibility.hpp>
namespace Kokkos {
@ -91,12 +91,12 @@ namespace Impl {
KOKKOS_INLINE_FUNCTION
int shfl(const int &val, const int& srcLane, const int& width ) {
return __shfl(val,srcLane,width);
return KOKKOS_IMPL_CUDA_SHFL(val,srcLane,width);
}
KOKKOS_INLINE_FUNCTION
float shfl(const float &val, const int& srcLane, const int& width ) {
return __shfl(val,srcLane,width);
return KOKKOS_IMPL_CUDA_SHFL(val,srcLane,width);
}
template<typename Scalar>
@ -105,7 +105,7 @@ namespace Impl {
) {
Scalar tmp1 = val;
float tmp = *reinterpret_cast<float*>(&tmp1);
tmp = __shfl(tmp,srcLane,width);
tmp = KOKKOS_IMPL_CUDA_SHFL(tmp,srcLane,width);
return *reinterpret_cast<Scalar*>(&tmp);
}
@ -113,8 +113,8 @@ namespace Impl {
double shfl(const double &val, const int& srcLane, const int& width) {
int lo = __double2loint(val);
int hi = __double2hiint(val);
lo = __shfl(lo,srcLane,width);
hi = __shfl(hi,srcLane,width);
lo = KOKKOS_IMPL_CUDA_SHFL(lo,srcLane,width);
hi = KOKKOS_IMPL_CUDA_SHFL(hi,srcLane,width);
return __hiloint2double(hi,lo);
}
@ -123,8 +123,8 @@ namespace Impl {
Scalar shfl(const Scalar &val, const int& srcLane, const typename Impl::enable_if< (sizeof(Scalar) == 8) ,int>::type& width) {
int lo = __double2loint(*reinterpret_cast<const double*>(&val));
int hi = __double2hiint(*reinterpret_cast<const double*>(&val));
lo = __shfl(lo,srcLane,width);
hi = __shfl(hi,srcLane,width);
lo = KOKKOS_IMPL_CUDA_SHFL(lo,srcLane,width);
hi = KOKKOS_IMPL_CUDA_SHFL(hi,srcLane,width);
const double tmp = __hiloint2double(hi,lo);
return *(reinterpret_cast<const Scalar*>(&tmp));
}
@ -137,18 +137,18 @@ namespace Impl {
s_val = val;
for(int i = 0; i<s_val.n; i++)
r_val.fval[i] = __shfl(s_val.fval[i],srcLane,width);
r_val.fval[i] = KOKKOS_IMPL_CUDA_SHFL(s_val.fval[i],srcLane,width);
return r_val.value();
}
KOKKOS_INLINE_FUNCTION
int shfl_down(const int &val, const int& delta, const int& width) {
return __shfl_down(val,delta,width);
return KOKKOS_IMPL_CUDA_SHFL_DOWN(val,delta,width);
}
KOKKOS_INLINE_FUNCTION
float shfl_down(const float &val, const int& delta, const int& width) {
return __shfl_down(val,delta,width);
return KOKKOS_IMPL_CUDA_SHFL_DOWN(val,delta,width);
}
template<typename Scalar>
@ -156,7 +156,7 @@ namespace Impl {
Scalar shfl_down(const Scalar &val, const int& delta, const typename Impl::enable_if< (sizeof(Scalar) == 4) , int >::type & width) {
Scalar tmp1 = val;
float tmp = *reinterpret_cast<float*>(&tmp1);
tmp = __shfl_down(tmp,delta,width);
tmp = KOKKOS_IMPL_CUDA_SHFL_DOWN(tmp,delta,width);
return *reinterpret_cast<Scalar*>(&tmp);
}
@ -164,8 +164,8 @@ namespace Impl {
double shfl_down(const double &val, const int& delta, const int& width) {
int lo = __double2loint(val);
int hi = __double2hiint(val);
lo = __shfl_down(lo,delta,width);
hi = __shfl_down(hi,delta,width);
lo = KOKKOS_IMPL_CUDA_SHFL_DOWN(lo,delta,width);
hi = KOKKOS_IMPL_CUDA_SHFL_DOWN(hi,delta,width);
return __hiloint2double(hi,lo);
}
@ -174,8 +174,8 @@ namespace Impl {
Scalar shfl_down(const Scalar &val, const int& delta, const typename Impl::enable_if< (sizeof(Scalar) == 8) , int >::type & width) {
int lo = __double2loint(*reinterpret_cast<const double*>(&val));
int hi = __double2hiint(*reinterpret_cast<const double*>(&val));
lo = __shfl_down(lo,delta,width);
hi = __shfl_down(hi,delta,width);
lo = KOKKOS_IMPL_CUDA_SHFL_DOWN(lo,delta,width);
hi = KOKKOS_IMPL_CUDA_SHFL_DOWN(hi,delta,width);
const double tmp = __hiloint2double(hi,lo);
return *(reinterpret_cast<const Scalar*>(&tmp));
}
@ -188,18 +188,18 @@ namespace Impl {
s_val = val;
for(int i = 0; i<s_val.n; i++)
r_val.fval[i] = __shfl_down(s_val.fval[i],delta,width);
r_val.fval[i] = KOKKOS_IMPL_CUDA_SHFL_DOWN(s_val.fval[i],delta,width);
return r_val.value();
}
KOKKOS_INLINE_FUNCTION
int shfl_up(const int &val, const int& delta, const int& width ) {
return __shfl_up(val,delta,width);
return KOKKOS_IMPL_CUDA_SHFL_UP(val,delta,width);
}
KOKKOS_INLINE_FUNCTION
float shfl_up(const float &val, const int& delta, const int& width ) {
return __shfl_up(val,delta,width);
return KOKKOS_IMPL_CUDA_SHFL_UP(val,delta,width);
}
template<typename Scalar>
@ -207,7 +207,7 @@ namespace Impl {
Scalar shfl_up(const Scalar &val, const int& delta, const typename Impl::enable_if< (sizeof(Scalar) == 4) , int >::type & width) {
Scalar tmp1 = val;
float tmp = *reinterpret_cast<float*>(&tmp1);
tmp = __shfl_up(tmp,delta,width);
tmp = KOKKOS_IMPL_CUDA_SHFL_UP(tmp,delta,width);
return *reinterpret_cast<Scalar*>(&tmp);
}
@ -215,8 +215,8 @@ namespace Impl {
double shfl_up(const double &val, const int& delta, const int& width ) {
int lo = __double2loint(val);
int hi = __double2hiint(val);
lo = __shfl_up(lo,delta,width);
hi = __shfl_up(hi,delta,width);
lo = KOKKOS_IMPL_CUDA_SHFL_UP(lo,delta,width);
hi = KOKKOS_IMPL_CUDA_SHFL_UP(hi,delta,width);
return __hiloint2double(hi,lo);
}
@ -225,8 +225,8 @@ namespace Impl {
Scalar shfl_up(const Scalar &val, const int& delta, const typename Impl::enable_if< (sizeof(Scalar) == 8) , int >::type & width) {
int lo = __double2loint(*reinterpret_cast<const double*>(&val));
int hi = __double2hiint(*reinterpret_cast<const double*>(&val));
lo = __shfl_up(lo,delta,width);
hi = __shfl_up(hi,delta,width);
lo = KOKKOS_IMPL_CUDA_SHFL_UP(lo,delta,width);
hi = KOKKOS_IMPL_CUDA_SHFL_UP(hi,delta,width);
const double tmp = __hiloint2double(hi,lo);
return *(reinterpret_cast<const Scalar*>(&tmp));
}
@ -239,7 +239,7 @@ namespace Impl {
s_val = val;
for(int i = 0; i<s_val.n; i++)
r_val.fval[i] = __shfl_up(s_val.fval[i],delta,width);
r_val.fval[i] = KOKKOS_IMPL_CUDA_SHFL_UP(s_val.fval[i],delta,width);
return r_val.value();
}

View File

@ -0,0 +1,12 @@
#include<Kokkos_Macros.hpp>
#if ( CUDA_VERSION < 9000 )
#define KOKKOS_IMPL_CUDA_BALLOT(x) __ballot(x)
#define KOKKOS_IMPL_CUDA_SHFL(x,y,z) __shfl(x,y,z)
#define KOKKOS_IMPL_CUDA_SHFL_UP(x,y,z) __shfl_up(x,y,z)
#define KOKKOS_IMPL_CUDA_SHFL_DOWN(x,y,z) __shfl_down(x,y,z)
#else
#define KOKKOS_IMPL_CUDA_BALLOT(x) __ballot_sync(0xffffffff,x)
#define KOKKOS_IMPL_CUDA_SHFL(x,y,z) __shfl_sync(0xffffffff,x,y,z)
#define KOKKOS_IMPL_CUDA_SHFL_UP(x,y,z) __shfl_up_sync(0xffffffff,x,y,z)
#define KOKKOS_IMPL_CUDA_SHFL_DOWN(x,y,z) __shfl_down_sync(0xffffffff,x,y,z)
#endif

View File

@ -127,11 +127,11 @@ struct CudaTextureFetch {
template< class CudaMemorySpace >
inline explicit
CudaTextureFetch( const ValueType * const arg_ptr
, Kokkos::Experimental::Impl::SharedAllocationRecord< CudaMemorySpace , void > & record
, Kokkos::Impl::SharedAllocationRecord< CudaMemorySpace , void > * record
)
: m_obj( record.template attach_texture_object< AliasType >() )
: m_obj( record->template attach_texture_object< AliasType >() )
, m_ptr( arg_ptr )
, m_offset( record.attach_texture_object_offset( reinterpret_cast<const AliasType*>( arg_ptr ) ) )
, m_offset( record->attach_texture_object_offset( reinterpret_cast<const AliasType*>( arg_ptr ) ) )
{}
// Texture object spans the entire allocation.
@ -199,8 +199,8 @@ struct CudaLDGFetch {
template< class CudaMemorySpace >
inline explicit
CudaLDGFetch( const ValueType * const arg_ptr
, Kokkos::Experimental::Impl::SharedAllocationRecord< CudaMemorySpace , void > const &
)
, Kokkos::Impl::SharedAllocationRecord<CudaMemorySpace,void>*
)
: m_ptr( arg_ptr )
{}
@ -285,7 +285,21 @@ public:
// Assignment of texture = non-texture requires creation of a texture object
// which can only occur on the host. In addition, 'get_record' is only valid
// if called in a host execution space
return handle_type( arg_data_ptr , arg_tracker.template get_record< typename Traits::memory_space >() );
typedef typename Traits::memory_space memory_space ;
typedef typename Impl::SharedAllocationRecord<memory_space,void> record ;
record * const r = arg_tracker.template get_record< memory_space >();
#if ! defined( KOKKOS_ENABLE_CUDA_LDG_INTRINSIC )
if ( 0 == r ) {
Kokkos::abort("Cuda const random access View using Cuda texture memory requires Kokkos to allocate the View's memory");
}
#endif
return handle_type( arg_data_ptr , r );
#else
Kokkos::Impl::cuda_abort("Cannot create Cuda texture object from within a Cuda kernel");
return handle_type();

View File

@ -48,50 +48,52 @@ namespace Kokkos {
namespace Impl {
template< class FunctorType , class ... Traits >
class ParallelFor< FunctorType ,
Kokkos::Experimental::WorkGraphPolicy< Traits ... > ,
Kokkos::Cuda
class ParallelFor< FunctorType
, Kokkos::WorkGraphPolicy< Traits ... >
, Kokkos::Cuda
>
: public Kokkos::Impl::Experimental::
WorkGraphExec< FunctorType,
Kokkos::Cuda,
Traits ...
>
{
public:
typedef Kokkos::Experimental::WorkGraphPolicy< Traits ... > Policy ;
typedef Kokkos::Impl::Experimental::
WorkGraphExec<FunctorType, Kokkos::Cuda, Traits ... > Base ;
typedef Kokkos::WorkGraphPolicy< Traits ... > Policy ;
typedef ParallelFor<FunctorType, Policy, Kokkos::Cuda> Self ;
private:
template< class TagType >
__device__
typename std::enable_if< std::is_same< TagType , void >::value >::type
exec_one(const typename Policy::member_type& i) const {
Base::m_functor( i );
}
Policy m_policy ;
FunctorType m_functor ;
template< class TagType >
__device__
__device__ inline
typename std::enable_if< std::is_same< TagType , void >::value >::type
exec_one( const std::int32_t w ) const noexcept
{ m_functor( w ); }
template< class TagType >
__device__ inline
typename std::enable_if< ! std::is_same< TagType , void >::value >::type
exec_one(const typename Policy::member_type& i) const {
const TagType t{} ;
Base::m_functor( t , i );
}
exec_one( const std::int32_t w ) const noexcept
{ const TagType t{} ; m_functor( t , w ); }
public:
__device__
inline
void operator()() const {
for (std::int32_t i; (-1 != (i = Base::before_work())); ) {
exec_one< typename Policy::work_tag >( i );
Base::after_work(i);
__device__ inline
void operator()() const noexcept
{
if ( 0 == ( threadIdx.y % 16 ) ) {
// Spin until COMPLETED_TOKEN.
// END_TOKEN indicates no work is currently available.
for ( std::int32_t w = Policy::END_TOKEN ;
Policy::COMPLETED_TOKEN != ( w = m_policy.pop_work() ) ; ) {
if ( Policy::END_TOKEN != w ) {
exec_one< typename Policy::work_tag >( w );
m_policy.completed_work(w);
}
}
}
}
}
inline
void execute()
@ -108,9 +110,9 @@ public:
inline
ParallelFor( const FunctorType & arg_functor
, const Policy & arg_policy )
: Base( arg_functor, arg_policy )
{
}
: m_policy( arg_policy )
, m_functor( arg_functor )
{}
};
} // namespace Impl

View File

@ -55,7 +55,7 @@
#include <Cuda/KokkosExp_Cuda_IterateTile_Refactor.hpp>
#endif
namespace Kokkos { namespace Experimental {
namespace Kokkos {
// ------------------------------------------------------------------ //
@ -331,11 +331,23 @@ struct MDRangePolicy
}
};
} // namespace Kokkos
// For backward compatibility
namespace Kokkos { namespace Experimental {
using Kokkos::MDRangePolicy;
using Kokkos::Rank;
using Kokkos::Iterate;
} } // end Kokkos::Experimental
// ------------------------------------------------------------------ //
// ------------------------------------------------------------------ //
//md_parallel_for - deprecated use parallel_for
// ------------------------------------------------------------------ //
namespace Kokkos { namespace Experimental {
template <typename MDRange, typename Functor, typename Enable = void>
void md_parallel_for( MDRange const& range
, Functor const& f
@ -347,7 +359,7 @@ void md_parallel_for( MDRange const& range
) >::type* = 0
)
{
Impl::MDFunctor<MDRange, Functor, void> g(range, f);
Kokkos::Impl::Experimental::MDFunctor<MDRange, Functor, void> g(range, f);
using range_policy = typename MDRange::impl_range_policy;
@ -365,7 +377,7 @@ void md_parallel_for( const std::string& str
) >::type* = 0
)
{
Impl::MDFunctor<MDRange, Functor, void> g(range, f);
Kokkos::Impl::Experimental::MDFunctor<MDRange, Functor, void> g(range, f);
using range_policy = typename MDRange::impl_range_policy;
@ -385,7 +397,7 @@ void md_parallel_for( const std::string& str
) >::type* = 0
)
{
Impl::DeviceIterateTile<MDRange, Functor, typename MDRange::work_tag> closure(range, f);
Kokkos::Impl::DeviceIterateTile<MDRange, Functor, typename MDRange::work_tag> closure(range, f);
closure.execute();
}
@ -400,7 +412,7 @@ void md_parallel_for( MDRange const& range
) >::type* = 0
)
{
Impl::DeviceIterateTile<MDRange, Functor, typename MDRange::work_tag> closure(range, f);
Kokkos::Impl::DeviceIterateTile<MDRange, Functor, typename MDRange::work_tag> closure(range, f);
closure.execute();
}
#endif
@ -421,7 +433,7 @@ void md_parallel_reduce( MDRange const& range
) >::type* = 0
)
{
Impl::MDFunctor<MDRange, Functor, ValueType> g(range, f);
Kokkos::Impl::Experimental::MDFunctor<MDRange, Functor, ValueType> g(range, f);
using range_policy = typename MDRange::impl_range_policy;
Kokkos::parallel_reduce( str, range_policy(0, range.m_num_tiles).set_chunk_size(1), g, v );
@ -439,7 +451,7 @@ void md_parallel_reduce( const std::string& str
) >::type* = 0
)
{
Impl::MDFunctor<MDRange, Functor, ValueType> g(range, f);
Kokkos::Impl::Experimental::MDFunctor<MDRange, Functor, ValueType> g(range, f);
using range_policy = typename MDRange::impl_range_policy;
@ -448,7 +460,7 @@ void md_parallel_reduce( const std::string& str
// Cuda - md_parallel_reduce not implemented - use parallel_reduce
}} // namespace Kokkos::Experimental
} } // namespace Kokkos::Experimental
#endif //KOKKOS_CORE_EXP_MD_RANGE_POLICY_HPP

View File

@ -81,10 +81,10 @@ struct IndexType
/**\brief Specify Launch Bounds for CUDA execution.
*
* The "best" defaults may be architecture specific.
* If no launch bounds specified then do not set launch bounds.
*/
template< unsigned int maxT = 1024 /* Max threads per block */
, unsigned int minB = 1 /* Min blocks per SM */
template< unsigned int maxT = 0 /* Max threads per block */
, unsigned int minB = 0 /* Min blocks per SM */
>
struct LaunchBounds
{
@ -280,6 +280,9 @@ struct MemorySpaceAccess {
enum { deepcopy = assignable };
};
}} // namespace Kokkos::Impl
namespace Kokkos {
/**\brief Can AccessSpace access MemorySpace ?
*
@ -358,6 +361,13 @@ public:
>::type space ;
};
} // namespace Kokkos
namespace Kokkos {
namespace Impl {
using Kokkos::SpaceAccessibility ; // For backward compatibility
}} // namespace Kokkos::Impl
//----------------------------------------------------------------------------

View File

@ -99,13 +99,17 @@ struct InitArguments {
int num_threads;
int num_numa;
int device_id;
bool disable_warnings;
InitArguments( int nt = -1
, int nn = -1
, int dv = -1)
: num_threads( nt )
, num_numa( nn )
, device_id( dv )
, int dv = -1
, bool dw = false
)
: num_threads{ nt }
, num_numa{ nn }
, device_id{ dv }
, disable_warnings{ dw }
{}
};
@ -113,6 +117,10 @@ void initialize(int& narg, char* arg[]);
void initialize(const InitArguments& args = InitArguments());
bool is_initialized() noexcept;
bool show_warnings() noexcept;
/** \brief Finalize the spaces that were initialized via Kokkos::initialize */
void finalize();

View File

@ -45,7 +45,6 @@
#define KOKKOS_CRS_HPP
namespace Kokkos {
namespace Experimental {
/// \class Crs
/// \brief Compressed row storage array.
@ -164,7 +163,7 @@ void transpose_crs(
Crs<DataType, Arg1Type, Arg2Type, SizeType>& out,
Crs<DataType, Arg1Type, Arg2Type, SizeType> const& in);
}} // namespace Kokkos::Experimental
} // namespace Kokkos
/*--------------------------------------------------------------------------*/
@ -172,7 +171,6 @@ void transpose_crs(
namespace Kokkos {
namespace Impl {
namespace Experimental {
template <class InCrs, class OutCounts>
class GetCrsTransposeCounts {
@ -277,14 +275,13 @@ class FillCrsTransposeEntries {
}
};
}}} // namespace Kokkos::Impl::Experimental
}} // namespace Kokkos::Impl
/*--------------------------------------------------------------------------*/
/*--------------------------------------------------------------------------*/
namespace Kokkos {
namespace Experimental {
template< class OutCounts,
class DataType,
@ -297,8 +294,7 @@ void get_crs_transpose_counts(
std::string const& name) {
using InCrs = Crs<DataType, Arg1Type, Arg2Type, SizeType>;
out = OutCounts(name, in.numRows());
Kokkos::Impl::Experimental::
GetCrsTransposeCounts<InCrs, OutCounts> functor(in, out);
Kokkos::Impl::GetCrsTransposeCounts<InCrs, OutCounts> functor(in, out);
}
template< class OutRowMap,
@ -308,8 +304,7 @@ typename OutRowMap::value_type get_crs_row_map_from_counts(
InCounts const& in,
std::string const& name) {
out = OutRowMap(ViewAllocateWithoutInitializing(name), in.size() + 1);
Kokkos::Impl::Experimental::
CrsRowMapFromCounts<InCounts, OutRowMap> functor(in, out);
Kokkos::Impl::CrsRowMapFromCounts<InCounts, OutRowMap> functor(in, out);
return functor.execute();
}
@ -326,32 +321,37 @@ void transpose_crs(
typedef View<SizeType*, memory_space> counts_type ;
{
counts_type counts;
Kokkos::Experimental::get_crs_transpose_counts(counts, in);
Kokkos::Experimental::get_crs_row_map_from_counts(out.row_map, counts,
Kokkos::get_crs_transpose_counts(counts, in);
Kokkos::get_crs_row_map_from_counts(out.row_map, counts,
"tranpose_row_map");
}
out.entries = decltype(out.entries)("transpose_entries", in.entries.size());
Kokkos::Impl::Experimental::
Kokkos::Impl::
FillCrsTransposeEntries<crs_type, crs_type> entries_functor(in, out);
}
template< class CrsType,
class Functor>
struct CountAndFill {
class Functor,
class ExecutionSpace = typename CrsType::execution_space>
struct CountAndFillBase;
template< class CrsType,
class Functor,
class ExecutionSpace>
struct CountAndFillBase {
using data_type = typename CrsType::size_type;
using size_type = typename CrsType::size_type;
using row_map_type = typename CrsType::row_map_type;
using entries_type = typename CrsType::entries_type;
using counts_type = row_map_type;
CrsType m_crs;
Functor m_functor;
counts_type m_counts;
struct Count {};
KOKKOS_INLINE_FUNCTION void operator()(Count, size_type i) const {
inline void operator()(Count, size_type i) const {
m_counts(i) = m_functor(i, nullptr);
}
struct Fill {};
KOKKOS_INLINE_FUNCTION void operator()(Fill, size_type i) const {
inline void operator()(Fill, size_type i) const {
auto j = m_crs.row_map(i);
/* we don't want to access entries(entries.size()), even if its just to get its
address and never use it.
@ -363,13 +363,63 @@ struct CountAndFill {
nullptr : (&(m_crs.entries(j)));
m_functor(i, fill);
}
using self_type = CountAndFill<CrsType, Functor>;
CountAndFill(CrsType& crs, size_type nrows, Functor const& f):
CountAndFillBase(CrsType& crs, Functor const& f):
m_crs(crs),
m_functor(f)
{}
};
#if defined( KOKKOS_ENABLE_CUDA )
template< class CrsType,
class Functor>
struct CountAndFillBase<CrsType, Functor, Kokkos::Cuda> {
using data_type = typename CrsType::size_type;
using size_type = typename CrsType::size_type;
using row_map_type = typename CrsType::row_map_type;
using counts_type = row_map_type;
CrsType m_crs;
Functor m_functor;
counts_type m_counts;
struct Count {};
__device__ inline void operator()(Count, size_type i) const {
m_counts(i) = m_functor(i, nullptr);
}
struct Fill {};
__device__ inline void operator()(Fill, size_type i) const {
auto j = m_crs.row_map(i);
/* we don't want to access entries(entries.size()), even if its just to get its
address and never use it.
this can happen when row (i) is empty and all rows after it are also empty.
we could compare to row_map(i + 1), but that is a read from global memory,
whereas dimension_0() should be part of the View in registers (or constant memory) */
data_type* fill =
(j == static_cast<decltype(j)>(m_crs.entries.dimension_0())) ?
nullptr : (&(m_crs.entries(j)));
m_functor(i, fill);
}
CountAndFillBase(CrsType& crs, Functor const& f):
m_crs(crs),
m_functor(f)
{}
};
#endif
template< class CrsType,
class Functor>
struct CountAndFill : public CountAndFillBase<CrsType, Functor> {
using base_type = CountAndFillBase<CrsType, Functor>;
using typename base_type::data_type;
using typename base_type::size_type;
using typename base_type::counts_type;
using typename base_type::Count;
using typename base_type::Fill;
using entries_type = typename CrsType::entries_type;
using self_type = CountAndFill<CrsType, Functor>;
CountAndFill(CrsType& crs, size_type nrows, Functor const& f):
base_type(crs, f)
{
using execution_space = typename CrsType::execution_space;
m_counts = counts_type("counts", nrows);
this->m_counts = counts_type("counts", nrows);
{
using count_policy_type = RangePolicy<size_type, execution_space, Count>;
using count_closure_type =
@ -377,10 +427,10 @@ struct CountAndFill {
const count_closure_type closure(*this, count_policy_type(0, nrows));
closure.execute();
}
auto nentries = Kokkos::Experimental::
get_crs_row_map_from_counts(m_crs.row_map, m_counts);
m_counts = counts_type();
m_crs.entries = entries_type("entries", nentries);
auto nentries = Kokkos::
get_crs_row_map_from_counts(this->m_crs.row_map, this->m_counts);
this->m_counts = counts_type();
this->m_crs.entries = entries_type("entries", nentries);
{
using fill_policy_type = RangePolicy<size_type, execution_space, Fill>;
using fill_closure_type =
@ -388,7 +438,7 @@ struct CountAndFill {
const fill_closure_type closure(*this, fill_policy_type(0, nrows));
closure.execute();
}
crs = m_crs;
crs = this->m_crs;
}
};
@ -398,9 +448,9 @@ void count_and_fill_crs(
CrsType& crs,
typename CrsType::size_type nrows,
Functor const& f) {
Kokkos::Experimental::CountAndFill<CrsType, Functor>(crs, nrows, f);
Kokkos::CountAndFill<CrsType, Functor>(crs, nrows, f);
}
}} // namespace Kokkos::Experimental
} // namespace Kokkos
#endif /* #define KOKKOS_CRS_HPP */

View File

@ -379,12 +379,13 @@ Impl::PerThreadValue PerThread(const int& arg);
* uses variadic templates. Each and any of the template arguments can
* be omitted.
*
* Possible Template arguments and there default values:
* Possible Template arguments and their default values:
* ExecutionSpace (DefaultExecutionSpace): where to execute code. Must be enabled.
* WorkTag (none): Tag which is used as the first argument for the functor operator.
* Schedule<Type> (Schedule<Static>): Scheduling Policy (Dynamic, or Static).
* IndexType<Type> (IndexType<ExecutionSpace::size_type>: Integer Index type used to iterate over the Index space.
* LaunchBounds<int,int> (LaunchBounds<1024,1>: Launch Bounds for CUDA compilation.
* LaunchBounds<unsigned,unsigned> Launch Bounds for CUDA compilation,
* default of LaunchBounds<0,0> indicates no launch bounds specified.
*/
template< class ... Properties>
class TeamPolicy: public

View File

@ -251,7 +251,7 @@
#endif
#endif
#if defined( __PGIC__ ) && !defined( __GNUC__ )
#if defined( __PGIC__ )
#define KOKKOS_COMPILER_PGI __PGIC__*100+__PGIC_MINOR__*10+__PGIC_PATCHLEVEL__
#if ( 1540 > KOKKOS_COMPILER_PGI )
@ -268,24 +268,22 @@
#define KOKKOS_ENABLE_PRAGMA_UNROLL 1
#define KOKKOS_ENABLE_PRAGMA_LOOPCOUNT 1
#define KOKKOS_ENABLE_PRAGMA_VECTOR 1
#define KOKKOS_ENABLE_PRAGMA_SIMD 1
#if ( 1800 > KOKKOS_COMPILER_INTEL )
#define KOKKOS_ENABLE_PRAGMA_SIMD 1
#endif
#if ( __INTEL_COMPILER > 1400 )
#define KOKKOS_ENABLE_PRAGMA_IVDEP 1
#endif
#if ! defined( KOKKOS_MEMORY_ALIGNMENT )
#define KOKKOS_MEMORY_ALIGNMENT 64
#endif
#define KOKKOS_RESTRICT __restrict__
#ifndef KOKKOS_ALIGN
#define KOKKOS_ALIGN(size) __attribute__((aligned(size)))
#endif
#ifndef KOKKOS_ALIGN_PTR
#define KOKKOS_ALIGN_PTR(size) __attribute__((align_value(size)))
#endif
#ifndef KOKKOS_ALIGN_SIZE
#define KOKKOS_ALIGN_SIZE 64
#ifndef KOKKOS_IMPL_ALIGN_PTR
#define KOKKOS_IMPL_ALIGN_PTR(size) __attribute__((align_value(size)))
#endif
#if ( 1400 > KOKKOS_COMPILER_INTEL )
@ -351,6 +349,11 @@
#if !defined( KOKKOS_FORCEINLINE_FUNCTION )
#define KOKKOS_FORCEINLINE_FUNCTION inline __attribute__((always_inline))
#endif
#if !defined( KOKKOS_IMPL_ALIGN_PTR )
#define KOKKOS_IMPL_ALIGN_PTR(size) __attribute__((aligned(size)))
#endif
#endif
//----------------------------------------------------------------------------
@ -426,16 +429,16 @@
//----------------------------------------------------------------------------
// Define Macro for alignment:
#if !defined KOKKOS_ALIGN_SIZE
#define KOKKOS_ALIGN_SIZE 16
#if ! defined( KOKKOS_MEMORY_ALIGNMENT )
#define KOKKOS_MEMORY_ALIGNMENT 16
#endif
#if !defined( KOKKOS_ALIGN )
#define KOKKOS_ALIGN(size) __attribute__((aligned(size)))
#if ! defined( KOKKOS_MEMORY_ALIGNMENT_THRESHOLD )
#define KOKKOS_MEMORY_ALIGNMENT_THRESHOLD 4
#endif
#if !defined( KOKKOS_ALIGN_PTR )
#define KOKKOS_ALIGN_PTR(size) __attribute__((aligned(size)))
#if !defined( KOKKOS_IMPL_ALIGN_PTR )
#define KOKKOS_IMPL_ALIGN_PTR(size) /* */
#endif
//----------------------------------------------------------------------------
@ -510,5 +513,11 @@
#define KOKKOS_ENABLE_TASKDAG
#endif
#if defined ( KOKKOS_ENABLE_CUDA )
#if ( 9000 <= CUDA_VERSION )
#define KOKKOS_IMPL_CUDA_VERSION_9_WORKAROUND
#endif
#endif
#endif // #ifndef KOKKOS_MACROS_HPP

View File

@ -51,6 +51,27 @@
#include <impl/Kokkos_Error.hpp>
#include <impl/Kokkos_SharedAlloc.hpp>
namespace Kokkos {
namespace Impl {
/* Report violation of size constraints:
* min_block_alloc_size <= max_block_alloc_size
* max_block_alloc_size <= min_superblock_size
* min_superblock_size <= max_superblock_size
* min_superblock_size <= min_total_alloc_size
* min_superblock_size <= min_block_alloc_size *
* max_block_per_superblock
*/
void memory_pool_bounds_verification
( size_t min_block_alloc_size
, size_t max_block_alloc_size
, size_t min_superblock_size
, size_t max_superblock_size
, size_t max_block_per_superblock
, size_t min_total_alloc_size
);
}
}
namespace Kokkos {
template< typename DeviceType >
@ -111,6 +132,10 @@ private:
public:
/**\brief The maximum size of a superblock and block */
enum : uint32_t { max_superblock_size = 1LU << 31 /* 2 gigabytes */ };
enum : uint32_t { max_block_per_superblock = max_bit_count };
//--------------------------------------------------------------------------
KOKKOS_INLINE_FUNCTION
@ -206,7 +231,7 @@ public:
const uint32_t * sb_state_ptr = sb_state_array ;
s << "pool_size(" << ( size_t(m_sb_count) << m_sb_size_lg2 ) << ")"
<< " superblock_size(" << ( 1 << m_sb_size_lg2 ) << ")" << std::endl ;
<< " superblock_size(" << ( 1LU << m_sb_size_lg2 ) << ")" << std::endl ;
for ( int32_t i = 0 ; i < m_sb_count
; ++i , sb_state_ptr += m_sb_state_size ) {
@ -215,7 +240,7 @@ public:
const uint32_t block_count_lg2 = (*sb_state_ptr) >> state_shift ;
const uint32_t block_size_lg2 = m_sb_size_lg2 - block_count_lg2 ;
const uint32_t block_count = 1 << block_count_lg2 ;
const uint32_t block_count = 1u << block_count_lg2 ;
const uint32_t block_used = (*sb_state_ptr) & state_used_mask ;
s << "Superblock[ " << i << " / " << m_sb_count << " ] {"
@ -284,43 +309,71 @@ public:
{
const uint32_t int_align_lg2 = 3 ; /* align as int[8] */
const uint32_t int_align_mask = ( 1u << int_align_lg2 ) - 1 ;
const uint32_t default_min_block_size = 1u << 6 ; /* 64 bytes */
const uint32_t default_max_block_size = 1u << 12 ;/* 4k bytes */
const uint32_t default_min_superblock_size = 1u << 20 ;/* 1M bytes */
// Constraints and defaults:
// min_block_alloc_size <= max_block_alloc_size
// max_block_alloc_size <= min_superblock_size
// min_superblock_size <= min_total_alloc_size
//--------------------------------------------------
// Default block and superblock sizes:
const uint32_t MIN_BLOCK_SIZE = 1u << 6 /* 64 bytes */ ;
const uint32_t MAX_BLOCK_SIZE = 1u << 12 /* 4k bytes */ ;
if ( 0 == min_block_alloc_size ) {
// Default all sizes:
if ( 0 == min_block_alloc_size ) min_block_alloc_size = MIN_BLOCK_SIZE ;
min_superblock_size =
std::min( size_t(default_min_superblock_size)
, min_total_alloc_size );
min_block_alloc_size =
std::min( size_t(default_min_block_size)
, min_superblock_size );
max_block_alloc_size =
std::min( size_t(default_max_block_size)
, min_superblock_size );
}
else if ( 0 == min_superblock_size ) {
// Choose superblock size as minimum of:
// max_block_per_superblock * min_block_size
// max_superblock_size
// min_total_alloc_size
const size_t max_superblock =
min_block_alloc_size * max_block_per_superblock ;
min_superblock_size =
std::min( max_superblock ,
std::min( size_t(max_superblock_size)
, min_total_alloc_size ) );
}
if ( 0 == max_block_alloc_size ) {
max_block_alloc_size = MAX_BLOCK_SIZE ;
// Upper bound of total allocation size
max_block_alloc_size = std::min( size_t(max_block_alloc_size)
, min_total_alloc_size );
// Lower bound of minimum block size
max_block_alloc_size = std::max( max_block_alloc_size
, min_block_alloc_size );
max_block_alloc_size = min_superblock_size ;
}
if ( 0 == min_superblock_size ) {
min_superblock_size = max_block_alloc_size ;
//--------------------------------------------------
// Upper bound of total allocation size
min_superblock_size = std::min( size_t(min_superblock_size)
, min_total_alloc_size );
/* Enforce size constraints:
* min_block_alloc_size <= max_block_alloc_size
* max_block_alloc_size <= min_superblock_size
* min_superblock_size <= max_superblock_size
* min_superblock_size <= min_total_alloc_size
* min_superblock_size <= min_block_alloc_size *
* max_block_per_superblock
*/
// Lower bound of maximum block size
min_superblock_size = std::max( min_superblock_size
, max_block_alloc_size );
}
Kokkos::Impl::memory_pool_bounds_verification
( min_block_alloc_size
, max_block_alloc_size
, min_superblock_size
, max_superblock_size
, max_block_per_superblock
, min_total_alloc_size
);
//--------------------------------------------------
// Block and superblock size is power of two:
// Maximum value is 'max_superblock_size'
m_min_block_size_lg2 =
Kokkos::Impl::integral_power_of_two_that_contains(min_block_alloc_size);
@ -331,45 +384,26 @@ public:
m_sb_size_lg2 =
Kokkos::Impl::integral_power_of_two_that_contains(min_superblock_size);
// Constraints:
// m_min_block_size_lg2 <= m_max_block_size_lg2 <= m_sb_size_lg2
// m_sb_size_lg2 <= m_min_block_size + max_bit_count_lg2
{
// number of superblocks is multiple of superblock size that
// can hold min_total_alloc_size.
if ( m_min_block_size_lg2 + max_bit_count_lg2 < m_sb_size_lg2 ) {
m_min_block_size_lg2 = m_sb_size_lg2 - max_bit_count_lg2 ;
}
if ( m_min_block_size_lg2 + max_bit_count_lg2 < m_max_block_size_lg2 ) {
m_min_block_size_lg2 = m_max_block_size_lg2 - max_bit_count_lg2 ;
}
if ( m_max_block_size_lg2 < m_min_block_size_lg2 ) {
m_max_block_size_lg2 = m_min_block_size_lg2 ;
}
if ( m_sb_size_lg2 < m_max_block_size_lg2 ) {
m_sb_size_lg2 = m_max_block_size_lg2 ;
const uint64_t sb_size_mask = ( 1LU << m_sb_size_lg2 ) - 1 ;
m_sb_count = ( min_total_alloc_size + sb_size_mask ) >> m_sb_size_lg2 ;
}
// At least 32 minimum size blocks in a superblock
{
// Any superblock can be assigned to the smallest size block
// Size the block bitset to maximum number of blocks
if ( m_sb_size_lg2 < m_min_block_size_lg2 + 5 ) {
m_sb_size_lg2 = m_min_block_size_lg2 + 5 ;
const uint32_t max_block_count_lg2 =
m_sb_size_lg2 - m_min_block_size_lg2 ;
m_sb_state_size =
( CB::buffer_bound_lg2( max_block_count_lg2 ) + int_align_mask ) & ~int_align_mask ;
}
// number of superblocks is multiple of superblock size that
// can hold min_total_alloc_size.
const uint32_t sb_size_mask = ( 1u << m_sb_size_lg2 ) - 1 ;
m_sb_count = ( min_total_alloc_size + sb_size_mask ) >> m_sb_size_lg2 ;
// Any superblock can be assigned to the smallest size block
// Size the block bitset to maximum number of blocks
const uint32_t max_block_count_lg2 =
m_sb_size_lg2 - m_min_block_size_lg2 ;
m_sb_state_size =
( CB::buffer_bound_lg2( max_block_count_lg2 ) + int_align_mask ) & ~int_align_mask ;
// Array of all superblock states
const size_t all_sb_state_size =
@ -454,7 +488,7 @@ private:
* Restrict lower bound to minimum block size.
*/
KOKKOS_FORCEINLINE_FUNCTION
unsigned get_block_size_lg2( unsigned n ) const noexcept
uint32_t get_block_size_lg2( uint32_t n ) const noexcept
{
const unsigned i = Kokkos::Impl::integral_power_of_two_that_contains( n );
@ -463,11 +497,12 @@ private:
public:
/* Return 0 for invalid block size */
KOKKOS_INLINE_FUNCTION
uint32_t allocate_block_size( uint32_t alloc_size ) const noexcept
uint32_t allocate_block_size( uint64_t alloc_size ) const noexcept
{
return alloc_size <= (1UL << m_max_block_size_lg2)
? ( 1u << get_block_size_lg2( alloc_size ) )
? ( 1UL << get_block_size_lg2( uint32_t(alloc_size) ) )
: 0 ;
}
@ -485,246 +520,253 @@ public:
void * allocate( size_t alloc_size
, int32_t attempt_limit = 1 ) const noexcept
{
if ( size_t(1LU << m_max_block_size_lg2) < alloc_size ) {
Kokkos::abort("Kokkos MemoryPool allocation request exceeded specified maximum allocation size");
}
if ( 0 == alloc_size ) return (void*) 0 ;
void * p = 0 ;
const uint32_t block_size_lg2 = get_block_size_lg2( alloc_size );
if ( block_size_lg2 <= m_max_block_size_lg2 ) {
// Allocation will fit within a superblock
// that has block sizes ( 1 << block_size_lg2 )
// Allocation will fit within a superblock
// that has block sizes ( 1 << block_size_lg2 )
const uint32_t block_count_lg2 = m_sb_size_lg2 - block_size_lg2 ;
const uint32_t block_state = block_count_lg2 << state_shift ;
const uint32_t block_count = 1u << block_count_lg2 ;
const uint32_t block_count_lg2 = m_sb_size_lg2 - block_size_lg2 ;
const uint32_t block_state = block_count_lg2 << state_shift ;
const uint32_t block_count = 1u << block_count_lg2 ;
// Superblock hints for this block size:
// hint_sb_id_ptr[0] is the dynamically changing hint
// hint_sb_id_ptr[1] is the static start point
// Superblock hints for this block size:
// hint_sb_id_ptr[0] is the dynamically changing hint
// hint_sb_id_ptr[1] is the static start point
volatile uint32_t * const hint_sb_id_ptr
= m_sb_state_array /* memory pool state array */
+ m_hint_offset /* offset to hint portion of array */
+ HINT_PER_BLOCK_SIZE /* number of hints per block size */
* ( block_size_lg2 - m_min_block_size_lg2 ); /* block size id */
volatile uint32_t * const hint_sb_id_ptr
= m_sb_state_array /* memory pool state array */
+ m_hint_offset /* offset to hint portion of array */
+ HINT_PER_BLOCK_SIZE /* number of hints per block size */
* ( block_size_lg2 - m_min_block_size_lg2 ); /* block size id */
const int32_t sb_id_begin = int32_t( hint_sb_id_ptr[1] );
const int32_t sb_id_begin = int32_t( hint_sb_id_ptr[1] );
// Fast query clock register 'tic' to pseudo-randomize
// the guess for which block within a superblock should
// be claimed. If not available then a search occurs.
// Fast query clock register 'tic' to pseudo-randomize
// the guess for which block within a superblock should
// be claimed. If not available then a search occurs.
const uint32_t block_id_hint =
(uint32_t)( Kokkos::Impl::clock_tic()
const uint32_t block_id_hint =
(uint32_t)( Kokkos::Impl::clock_tic()
#if defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_CUDA )
// Spread out potentially concurrent access
// by threads within a warp or thread block.
+ ( threadIdx.x + blockDim.x * threadIdx.y )
// Spread out potentially concurrent access
// by threads within a warp or thread block.
+ ( threadIdx.x + blockDim.x * threadIdx.y )
#endif
);
);
// expected state of superblock for allocation
uint32_t sb_state = block_state ;
// expected state of superblock for allocation
uint32_t sb_state = block_state ;
int32_t sb_id = -1 ;
int32_t sb_id = -1 ;
volatile uint32_t * sb_state_array = 0 ;
volatile uint32_t * sb_state_array = 0 ;
while ( attempt_limit ) {
while ( attempt_limit ) {
int32_t hint_sb_id = -1 ;
int32_t hint_sb_id = -1 ;
if ( sb_id < 0 ) {
if ( sb_id < 0 ) {
// No superblock specified, try the hint for this block size
// No superblock specified, try the hint for this block size
sb_id = hint_sb_id = int32_t( *hint_sb_id_ptr );
sb_id = hint_sb_id = int32_t( *hint_sb_id_ptr );
sb_state_array = m_sb_state_array + ( sb_id * m_sb_state_size );
}
// Require:
// 0 <= sb_id
// sb_state_array == m_sb_state_array + m_sb_state_size * sb_id
if ( sb_state == ( state_header_mask & *sb_state_array ) ) {
// This superblock state is as expected, for the moment.
// Attempt to claim a bit. The attempt updates the state
// so have already made sure the state header is as expected.
const uint32_t count_lg2 = sb_state >> state_shift ;
const uint32_t mask = ( 1u << count_lg2 ) - 1 ;
const Kokkos::pair<int,int> result =
CB::acquire_bounded_lg2( sb_state_array
, count_lg2
, block_id_hint & mask
, sb_state
);
// If result.first < 0 then failed to acquire
// due to either full or buffer was wrong state.
// Could be wrong state if a deallocation raced the
// superblock to empty before the acquire could succeed.
if ( 0 <= result.first ) { // acquired a bit
const uint32_t size_lg2 = m_sb_size_lg2 - count_lg2 ;
// Set the allocated block pointer
p = ((char*)( m_sb_state_array + m_data_offset ))
+ ( uint64_t(sb_id) << m_sb_size_lg2 ) // superblock memory
+ ( uint64_t(result.first) << size_lg2 ); // block memory
#if 0
printf( " MemoryPool(0x%lx) pointer(0x%lx) allocate(%lu) sb_id(%d) sb_state(0x%x) block_size(%d) block_capacity(%d) block_id(%d) block_claimed(%d)\n"
, (uintptr_t)m_sb_state_array
, (uintptr_t)p
, alloc_size
, sb_id
, sb_state
, (1u << size_lg2)
, (1u << count_lg2)
, result.first
, result.second );
#endif
break ; // Success
}
}
//------------------------------------------------------------------
// Arrive here if failed to acquire a block.
// Must find a new superblock.
// Start searching at designated index for this block size.
// Look for superblock that, in preferential order,
// 1) part-full superblock of this block size
// 2) empty superblock to claim for this block size
// 3) part-full superblock of the next larger block size
sb_state = block_state ; // Expect to find the desired state
sb_id = -1 ;
bool update_hint = false ;
int32_t sb_id_empty = -1 ;
int32_t sb_id_large = -1 ;
uint32_t sb_state_large = 0 ;
sb_state_array = m_sb_state_array + sb_id_begin * m_sb_state_size ;
for ( int32_t i = 0 , id = sb_id_begin ; i < m_sb_count ; ++i ) {
// Query state of the candidate superblock.
// Note that the state may change at any moment
// as concurrent allocations and deallocations occur.
const uint32_t full_state = *sb_state_array ;
const uint32_t used = full_state & state_used_mask ;
const uint32_t state = full_state & state_header_mask ;
if ( state == block_state ) {
// Superblock is assigned to this block size
if ( used < block_count ) {
// There is room to allocate one block
sb_id = id ;
// Is there room to allocate more than one block?
update_hint = used + 1 < block_count ;
break ;
}
}
else if ( 0 == used ) {
// Superblock is empty
if ( -1 == sb_id_empty ) {
// Superblock is not assigned to this block size
// and is the first empty superblock encountered.
// Save this id to use if a partfull superblock is not found.
sb_id_empty = id ;
}
}
else if ( ( -1 == sb_id_empty /* have not found an empty */ ) &&
( -1 == sb_id_large /* have not found a larger */ ) &&
( state < block_state /* a larger block */ ) &&
// is not full:
( used < ( 1u << ( state >> state_shift ) ) ) ) {
// First superblock encountered that is
// larger than this block size and
// has room for an allocation.
// Save this id to use of partfull or empty superblock not found
sb_id_large = id ;
sb_state_large = state ;
}
// Iterate around the superblock array:
if ( ++id < m_sb_count ) {
sb_state_array += m_sb_state_size ;
}
else {
id = 0 ;
sb_state_array = m_sb_state_array ;
}
}
// printf(" search m_sb_count(%d) sb_id(%d) sb_id_empty(%d) sb_id_large(%d)\n" , m_sb_count , sb_id , sb_id_empty , sb_id_large);
if ( sb_id < 0 ) {
// Did not find a partfull superblock for this block size.
if ( 0 <= sb_id_empty ) {
// Found first empty superblock following designated superblock
// Attempt to claim it for this block size.
// If the claim fails assume that another thread claimed it
// for this block size and try to use it anyway,
// but do not update hint.
sb_id = sb_id_empty ;
sb_state_array = m_sb_state_array + ( sb_id * m_sb_state_size );
// If successfully changed assignment of empty superblock 'sb_id'
// to this block_size then update the hint.
const uint32_t state_empty = state_header_mask & *sb_state_array ;
// If this thread claims the empty block then update the hint
update_hint =
state_empty ==
Kokkos::atomic_compare_exchange
(sb_state_array,state_empty,block_state);
}
else if ( 0 <= sb_id_large ) {
// Found a larger superblock with space available
sb_id = sb_id_large ;
sb_state = sb_state_large ;
sb_state_array = m_sb_state_array + ( sb_id * m_sb_state_size );
}
// Require:
// 0 <= sb_id
// sb_state_array == m_sb_state_array + m_sb_state_size * sb_id
if ( sb_state == ( state_header_mask & *sb_state_array ) ) {
// This superblock state is as expected, for the moment.
// Attempt to claim a bit. The attempt updates the state
// so have already made sure the state header is as expected.
const uint32_t count_lg2 = sb_state >> state_shift ;
const uint32_t mask = ( 1u << count_lg2 ) - 1 ;
const Kokkos::pair<int,int> result =
CB::acquire_bounded_lg2( sb_state_array
, count_lg2
, block_id_hint & mask
, sb_state
);
// If result.first < 0 then failed to acquire
// due to either full or buffer was wrong state.
// Could be wrong state if a deallocation raced the
// superblock to empty before the acquire could succeed.
if ( 0 <= result.first ) { // acquired a bit
const uint32_t size_lg2 = m_sb_size_lg2 - count_lg2 ;
// Set the allocated block pointer
p = ((char*)( m_sb_state_array + m_data_offset ))
+ ( uint32_t(sb_id) << m_sb_size_lg2 ) // superblock memory
+ ( result.first << size_lg2 ); // block memory
break ; // Success
}
// printf(" acquire count_lg2(%d) sb_state(0x%x) sb_id(%d) result(%d,%d)\n" , count_lg2 , sb_state , sb_id , result.first , result.second );
else {
// Did not find a potentially usable superblock
--attempt_limit ;
}
//------------------------------------------------------------------
// Arrive here if failed to acquire a block.
// Must find a new superblock.
}
// Start searching at designated index for this block size.
// Look for superblock that, in preferential order,
// 1) part-full superblock of this block size
// 2) empty superblock to claim for this block size
// 3) part-full superblock of the next larger block size
sb_state = block_state ; // Expect to find the desired state
sb_id = -1 ;
bool update_hint = false ;
int32_t sb_id_empty = -1 ;
int32_t sb_id_large = -1 ;
uint32_t sb_state_large = 0 ;
sb_state_array = m_sb_state_array + sb_id_begin * m_sb_state_size ;
for ( int32_t i = 0 , id = sb_id_begin ; i < m_sb_count ; ++i ) {
// Query state of the candidate superblock.
// Note that the state may change at any moment
// as concurrent allocations and deallocations occur.
const uint32_t full_state = *sb_state_array ;
const uint32_t used = full_state & state_used_mask ;
const uint32_t state = full_state & state_header_mask ;
if ( state == block_state ) {
// Superblock is assigned to this block size
if ( used < block_count ) {
// There is room to allocate one block
sb_id = id ;
// Is there room to allocate more than one block?
update_hint = used + 1 < block_count ;
break ;
}
}
else if ( 0 == used ) {
// Superblock is empty
if ( -1 == sb_id_empty ) {
// Superblock is not assigned to this block size
// and is the first empty superblock encountered.
// Save this id to use if a partfull superblock is not found.
sb_id_empty = id ;
}
}
else if ( ( -1 == sb_id_empty /* have not found an empty */ ) &&
( -1 == sb_id_large /* have not found a larger */ ) &&
( state < block_state /* a larger block */ ) &&
// is not full:
( used < ( 1u << ( state >> state_shift ) ) ) ) {
// First superblock encountered that is
// larger than this block size and
// has room for an allocation.
// Save this id to use of partfull or empty superblock not found
sb_id_large = id ;
sb_state_large = state ;
}
// Iterate around the superblock array:
if ( ++id < m_sb_count ) {
sb_state_array += m_sb_state_size ;
}
else {
id = 0 ;
sb_state_array = m_sb_state_array ;
}
}
// printf(" search m_sb_count(%d) sb_id(%d) sb_id_empty(%d) sb_id_large(%d)\n" , m_sb_count , sb_id , sb_id_empty , sb_id_large);
if ( sb_id < 0 ) {
// Did not find a partfull superblock for this block size.
if ( 0 <= sb_id_empty ) {
// Found first empty superblock following designated superblock
// Attempt to claim it for this block size.
// If the claim fails assume that another thread claimed it
// for this block size and try to use it anyway,
// but do not update hint.
sb_id = sb_id_empty ;
sb_state_array = m_sb_state_array + ( sb_id * m_sb_state_size );
// If successfully changed assignment of empty superblock 'sb_id'
// to this block_size then update the hint.
const uint32_t state_empty = state_header_mask & *sb_state_array ;
// If this thread claims the empty block then update the hint
update_hint =
state_empty ==
Kokkos::atomic_compare_exchange
(sb_state_array,state_empty,block_state);
}
else if ( 0 <= sb_id_large ) {
// Found a larger superblock with space available
sb_id = sb_id_large ;
sb_state = sb_state_large ;
sb_state_array = m_sb_state_array + ( sb_id * m_sb_state_size );
}
else {
// Did not find a potentially usable superblock
--attempt_limit ;
}
}
if ( update_hint ) {
Kokkos::atomic_compare_exchange
( hint_sb_id_ptr , uint32_t(hint_sb_id) , uint32_t(sb_id) );
}
} // end allocation attempt loop
//--------------------------------------------------------------------
}
else {
Kokkos::abort("Kokkos MemoryPool allocation request exceeded specified maximum allocation size");
}
if ( update_hint ) {
Kokkos::atomic_compare_exchange
( hint_sb_id_ptr , uint32_t(hint_sb_id) , uint32_t(sb_id) );
}
} // end allocation attempt loop
//--------------------------------------------------------------------
return p ;
}
@ -765,7 +807,7 @@ public:
const uint32_t block_size_lg2 =
m_sb_size_lg2 - ( block_state >> state_shift );
ok_block_aligned = 0 == ( d & ( ( 1 << block_size_lg2 ) - 1 ) );
ok_block_aligned = 0 == ( d & ( ( 1UL << block_size_lg2 ) - 1 ) );
if ( ok_block_aligned ) {
@ -773,31 +815,70 @@ public:
// mask into superblock and then shift down for block index
const uint32_t bit =
( d & ( ptrdiff_t( 1 << m_sb_size_lg2 ) - 1 ) ) >> block_size_lg2 ;
( d & ( ptrdiff_t( 1LU << m_sb_size_lg2 ) - 1 ) ) >> block_size_lg2 ;
const int result =
CB::release( sb_state_array , bit , block_state );
ok_dealloc_once = 0 <= result ;
// printf(" deallocate from sb_id(%d) result(%d) bit(%d) state(0x%x)\n"
// , sb_id
// , result
// , uint32_t(d >> block_size_lg2)
// , *sb_state_array );
#if 0
printf( " MemoryPool(0x%lx) pointer(0x%lx) deallocate sb_id(%d) block_size(%d) block_capacity(%d) block_id(%d) block_claimed(%d)\n"
, (uintptr_t)m_sb_state_array
, (uintptr_t)p
, sb_id
, (1u << block_size_lg2)
, (1u << (m_sb_size_lg2 - block_size_lg2))
, bit
, result );
#endif
}
}
if ( ! ok_contains || ! ok_block_aligned || ! ok_dealloc_once ) {
#if 0
printf("Kokkos MemoryPool deallocate(0x%lx) contains(%d) block_aligned(%d) dealloc_once(%d)\n",(uintptr_t)p,ok_contains,ok_block_aligned,ok_dealloc_once);
printf( " MemoryPool(0x%lx) pointer(0x%lx) deallocate ok_contains(%d) ok_block_aligned(%d) ok_dealloc_once(%d)\n"
, (uintptr_t)m_sb_state_array
, (uintptr_t)p
, int(ok_contains)
, int(ok_block_aligned)
, int(ok_dealloc_once) );
#endif
Kokkos::abort("Kokkos MemoryPool::deallocate given erroneous pointer");
}
}
// end deallocate
//--------------------------------------------------------------------------
KOKKOS_INLINE_FUNCTION
int number_of_superblocks() const noexcept { return m_sb_count ; }
KOKKOS_INLINE_FUNCTION
void superblock_state( int sb_id
, int & block_size
, int & block_count_capacity
, int & block_count_used ) const noexcept
{
block_size = 0 ;
block_count_capacity = 0 ;
block_count_used = 0 ;
if ( Kokkos::Impl::MemorySpaceAccess
< Kokkos::Impl::ActiveExecutionMemorySpace
, base_memory_space >::accessible ) {
// Can access the state array
const uint32_t state =
((uint32_t volatile *)m_sb_state_array)[sb_id*m_sb_state_size];
const uint32_t block_count_lg2 = state >> state_shift ;
const uint32_t block_used = state & state_used_mask ;
block_size = 1LU << ( m_sb_size_lg2 - block_count_lg2 );
block_count_capacity = 1LU << block_count_lg2 ;
block_count_used = block_used ;
}
}
};
} // namespace Kokkos

View File

@ -97,26 +97,22 @@ typedef Kokkos::MemoryTraits< Kokkos::Unmanaged | Kokkos::RandomAccess > MemoryR
namespace Kokkos {
namespace Impl {
static_assert(
( 0 < int(KOKKOS_MEMORY_ALIGNMENT) ) &&
( 0 == ( int(KOKKOS_MEMORY_ALIGNMENT) & (int(KOKKOS_MEMORY_ALIGNMENT)-1))) ,
"KOKKOS_MEMORY_ALIGNMENT must be a power of two" );
/** \brief Memory alignment settings
*
* Sets global value for memory alignment. Must be a power of two!
* Enable compatibility of views from different devices with static stride.
* Use compiler flag to enable overwrites.
*/
enum { MEMORY_ALIGNMENT =
#if defined( KOKKOS_MEMORY_ALIGNMENT )
( 1 << Kokkos::Impl::integral_power_of_two( KOKKOS_MEMORY_ALIGNMENT ) )
#else
( 1 << Kokkos::Impl::integral_power_of_two( 128 ) )
#endif
#if defined( KOKKOS_MEMORY_ALIGNMENT_THRESHOLD )
enum : unsigned
{ MEMORY_ALIGNMENT = KOKKOS_MEMORY_ALIGNMENT
, MEMORY_ALIGNMENT_THRESHOLD = KOKKOS_MEMORY_ALIGNMENT_THRESHOLD
#else
, MEMORY_ALIGNMENT_THRESHOLD = 4
#endif
};
} //namespace Impl
} // namespace Kokkos

View File

@ -204,6 +204,7 @@ struct reduction_identity<double> {
KOKKOS_FORCEINLINE_FUNCTION constexpr static double min() {return DBL_MAX;}
};
#if !defined( KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_CUDA )
template<>
struct reduction_identity<long double> {
KOKKOS_FORCEINLINE_FUNCTION constexpr static long double sum() {return static_cast<long double>(0.0);}
@ -211,6 +212,7 @@ struct reduction_identity<long double> {
KOKKOS_FORCEINLINE_FUNCTION constexpr static long double max() {return -LDBL_MAX;}
KOKKOS_FORCEINLINE_FUNCTION constexpr static long double min() {return LDBL_MAX;}
};
#endif
}

View File

@ -78,7 +78,7 @@ struct pair
/// This calls the default constructors of T1 and T2. It won't
/// compile if those default constructors are not defined and
/// public.
KOKKOS_FORCEINLINE_FUNCTION constexpr
KOKKOS_FUNCTION_DEFAULTED constexpr
pair() = default ;
/// \brief Constructor that takes both elements of the pair.
@ -458,7 +458,7 @@ struct pair<T1,void>
first_type first;
enum { second = 0 };
KOKKOS_FORCEINLINE_FUNCTION constexpr
KOKKOS_FUNCTION_DEFAULTED constexpr
pair() = default ;
KOKKOS_FORCEINLINE_FUNCTION constexpr

View File

@ -241,7 +241,7 @@ void parallel_for( const std::string & str
std::cout << "KOKKOS_DEBUG Start parallel_for kernel: " << str << std::endl;
#endif
parallel_for(policy,functor,str);
::Kokkos::parallel_for(policy,functor,str);
#if KOKKOS_ENABLE_DEBUG_PRINT_KERNEL_NAMES
Kokkos::fence();
@ -487,7 +487,7 @@ void parallel_scan( const std::string& str
std::cout << "KOKKOS_DEBUG Start parallel_scan kernel: " << str << std::endl;
#endif
parallel_scan(policy,functor,str);
::Kokkos::parallel_scan(policy,functor,str);
#if KOKKOS_ENABLE_DEBUG_PRINT_KERNEL_NAMES
Kokkos::fence();

View File

@ -0,0 +1,111 @@
/*
//@HEADER
// ************************************************************************
//
// Kokkos v. 2.0
// Copyright (2014) Sandia Corporation
//
// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
// the U.S. Government retains certain rights in this software.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// 1. Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//
// 2. Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
//
// 3. Neither the name of the Corporation nor the names of the
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Questions? Contact H. Carter Edwards (hcedwar@sandia.gov)
//
// ************************************************************************
//@HEADER
*/
#ifndef KOKKOSP_PROFILE_SECTION_HPP
#define KOKKOSP_PROFILE_SECTION_HPP
#include <Kokkos_Macros.hpp>
#include <impl/Kokkos_Profiling_Interface.hpp>
#include <string>
namespace Kokkos {
namespace Profiling {
class ProfilingSection {
public:
ProfilingSection(const std::string& sectionName) :
secName(sectionName) {
#if defined( KOKKOS_ENABLE_PROFILING )
if(Kokkos::Profiling::profileLibraryLoaded()) {
Kokkos::Profiling::createProfileSection(secName, &secID);
}
#else
secID = 0;
#endif
}
void start() {
#if defined( KOKKOS_ENABLE_PROFILING )
if(Kokkos::Profiling::profileLibraryLoaded()) {
Kokkos::Profiling::startSection(secID);
}
#endif
}
void stop() {
#if defined( KOKKOS_ENABLE_PROFILING )
if(Kokkos::Profiling::profileLibraryLoaded()) {
Kokkos::Profiling::stopSection(secID);
}
#endif
}
~ProfilingSection() {
#if defined( KOKKOS_ENABLE_PROFILING )
if(Kokkos::Profiling::profileLibraryLoaded()) {
Kokkos::Profiling::destroyProfileSection(secID);
}
#endif
}
std::string getName() {
return secName;
}
uint32_t getSectionID() {
return secID;
}
protected:
const std::string secName;
uint32_t secID;
};
}
}
#endif

View File

@ -204,8 +204,8 @@ struct VerifyExecutionCanAccessMemorySpace
>
{
enum { value = false };
inline static void verify( void ) { Experimental::ROCmSpace::access_error(); }
inline static void verify( const void * p ) { Experimental::ROCmSpace::access_error(p); }
inline static void verify( void ) { Kokkos::Experimental::ROCmSpace::access_error(); }
inline static void verify( const void * p ) { Kokkos::Experimental::ROCmSpace::access_error(p); }
};
} // namespace Experimental
} // namespace Kokkos

View File

@ -145,7 +145,7 @@ public:
unsigned use_cores_per_numa = 0 ,
bool allow_asynchronous_threadpool = false);
static int is_initialized();
static bool is_initialized();
/** \brief Return the maximum amount of concurrency. */
static int concurrency() {return 1;};
@ -424,11 +424,13 @@ private:
typedef typename Policy::work_tag WorkTag ;
typedef Kokkos::Impl::if_c< std::is_same<InvalidType,ReducerType>::value, FunctorType, ReducerType> ReducerConditional;
typedef typename ReducerConditional::type ReducerTypeFwd;
typedef typename Kokkos::Impl::if_c< std::is_same<InvalidType,ReducerType>::value, WorkTag, void>::type WorkTagFwd;
typedef FunctorAnalysis< FunctorPatternInterface::REDUCE , Policy , FunctorType > Analysis ;
typedef Kokkos::Impl::FunctorValueInit< ReducerTypeFwd , WorkTag > ValueInit ;
typedef Kokkos::Impl::FunctorValueInit< ReducerTypeFwd , WorkTagFwd > ValueInit ;
typedef typename Analysis::pointer_type pointer_type ;
typedef typename Analysis::reference_type reference_type ;
@ -488,7 +490,7 @@ public:
this-> template exec< WorkTag >( update );
Kokkos::Impl::FunctorFinal< ReducerTypeFwd , WorkTag >::
Kokkos::Impl::FunctorFinal< ReducerTypeFwd , WorkTagFwd >::
final( ReducerConditional::select(m_functor , m_reducer) , ptr );
}
@ -619,16 +621,16 @@ namespace Impl {
template< class FunctorType , class ... Traits >
class ParallelFor< FunctorType ,
Kokkos::Experimental::MDRangePolicy< Traits ... > ,
Kokkos::MDRangePolicy< Traits ... > ,
Kokkos::Serial
>
{
private:
typedef Kokkos::Experimental::MDRangePolicy< Traits ... > MDRangePolicy ;
typedef Kokkos::MDRangePolicy< Traits ... > MDRangePolicy ;
typedef typename MDRangePolicy::impl_range_policy Policy ;
typedef typename Kokkos::Experimental::Impl::HostIterateTile< MDRangePolicy, FunctorType, typename MDRangePolicy::work_tag, void > iterate_type;
typedef typename Kokkos::Impl::HostIterateTile< MDRangePolicy, FunctorType, typename MDRangePolicy::work_tag, void > iterate_type;
const FunctorType m_functor ;
const MDRangePolicy m_mdr_policy ;
@ -661,32 +663,33 @@ public:
template< class FunctorType , class ReducerType , class ... Traits >
class ParallelReduce< FunctorType
, Kokkos::Experimental::MDRangePolicy< Traits ... >
, Kokkos::MDRangePolicy< Traits ... >
, ReducerType
, Kokkos::Serial
>
{
private:
typedef Kokkos::Experimental::MDRangePolicy< Traits ... > MDRangePolicy ;
typedef Kokkos::MDRangePolicy< Traits ... > MDRangePolicy ;
typedef typename MDRangePolicy::impl_range_policy Policy ;
typedef typename MDRangePolicy::work_tag WorkTag ;
typedef Kokkos::Impl::if_c< std::is_same<InvalidType,ReducerType>::value, FunctorType, ReducerType> ReducerConditional;
typedef typename ReducerConditional::type ReducerTypeFwd;
typedef typename Kokkos::Impl::if_c< std::is_same<InvalidType,ReducerType>::value, WorkTag, void>::type WorkTagFwd;
typedef typename ReducerTypeFwd::value_type ValueType;
typedef FunctorAnalysis< FunctorPatternInterface::REDUCE , Policy , FunctorType > Analysis ;
typedef Kokkos::Impl::FunctorValueInit< ReducerTypeFwd , WorkTag > ValueInit ;
typedef Kokkos::Impl::FunctorValueInit< ReducerTypeFwd , WorkTagFwd > ValueInit ;
typedef typename Analysis::pointer_type pointer_type ;
typedef typename Analysis::reference_type reference_type ;
using iterate_type = typename Kokkos::Experimental::Impl::HostIterateTile< MDRangePolicy
using iterate_type = typename Kokkos::Impl::HostIterateTile< MDRangePolicy
, FunctorType
, WorkTag
, ValueType
@ -735,7 +738,7 @@ public:
this-> exec( update );
Kokkos::Impl::FunctorFinal< ReducerTypeFwd , WorkTag >::
Kokkos::Impl::FunctorFinal< ReducerTypeFwd , WorkTagFwd >::
final( ReducerConditional::select(m_functor , m_reducer) , ptr );
}
@ -878,8 +881,9 @@ private:
typedef Kokkos::Impl::if_c< std::is_same<InvalidType,ReducerType>::value, FunctorType, ReducerType> ReducerConditional;
typedef typename ReducerConditional::type ReducerTypeFwd;
typedef typename Kokkos::Impl::if_c< std::is_same<InvalidType,ReducerType>::value, WorkTag, void>::type WorkTagFwd;
typedef Kokkos::Impl::FunctorValueInit< ReducerTypeFwd , WorkTag > ValueInit ;
typedef Kokkos::Impl::FunctorValueInit< ReducerTypeFwd , WorkTagFwd > ValueInit ;
typedef typename Analysis::pointer_type pointer_type ;
typedef typename Analysis::reference_type reference_type ;
@ -940,7 +944,7 @@ public:
this-> template exec< WorkTag >( data , update );
Kokkos::Impl::FunctorFinal< ReducerTypeFwd , WorkTag >::
Kokkos::Impl::FunctorFinal< ReducerTypeFwd , WorkTagFwd >::
final( ReducerConditional::select(m_functor , m_reducer) , ptr );
}

View File

@ -408,7 +408,7 @@ view_alloc( Args const & ... args )
}
template< class ... Args >
inline
KOKKOS_INLINE_FUNCTION
Impl::ViewCtorProp< typename Impl::ViewCtorProp< void , Args >::type ... >
view_wrap( Args const & ... args )
{
@ -1216,6 +1216,13 @@ public:
m_track.assign_allocated_record_to_uninitialized( record );
}
KOKKOS_INLINE_FUNCTION
void assign_data( pointer_type arg_data )
{
m_track.clear();
m_map.assign_data( arg_data );
}
// Wrap memory according to properties and array layout
template< class ... P >
explicit KOKKOS_INLINE_FUNCTION
@ -2235,6 +2242,29 @@ create_mirror_view(const Space& , const Kokkos::View<T,P...> & src
return typename Impl::MirrorViewType<Space,T,P ...>::view_type(src.label(),src.layout());
}
// Create a mirror view and deep_copy in a new space (specialization for same space)
template<class Space, class T, class ... P>
typename Impl::MirrorViewType<Space,T,P ...>::view_type
create_mirror_view_and_copy(const Space& , const Kokkos::View<T,P...> & src
, std::string const& name = ""
, typename std::enable_if<Impl::MirrorViewType<Space,T,P ...>::is_same_memspace>::type* = 0 ) {
(void)name;
return src;
}
// Create a mirror view and deep_copy in a new space (specialization for different space)
template<class Space, class T, class ... P>
typename Impl::MirrorViewType<Space,T,P ...>::view_type
create_mirror_view_and_copy(const Space& , const Kokkos::View<T,P...> & src
, std::string const& name = ""
, typename std::enable_if<!Impl::MirrorViewType<Space,T,P ...>::is_same_memspace>::type* = 0 ) {
using Mirror = typename Impl::MirrorViewType<Space,T,P ...>::view_type;
std::string label = name.empty() ? src.label() : name;
auto mirror = Mirror(ViewAllocateWithoutInitializing(label), src.layout());
deep_copy(mirror, src);
return mirror;
}
} /* namespace Kokkos */
//----------------------------------------------------------------------------
@ -2432,6 +2462,7 @@ struct CommonViewAllocProp< void, ValueType >
using scalar_array_type = ValueType;
template < class ... Views >
KOKKOS_INLINE_FUNCTION
CommonViewAllocProp( const Views & ... ) {}
};
@ -2499,6 +2530,7 @@ using DeducedCommonPropsType = typename Impl::DeduceCommonViewAllocProp<Views...
// User function
template < class ... Views >
KOKKOS_INLINE_FUNCTION
DeducedCommonPropsType<Views...>
common_view_alloc_prop( Views const & ... views )
{

View File

@ -46,205 +46,198 @@
namespace Kokkos {
namespace Impl {
namespace Experimental {
template< class functor_type , class execution_space, class ... policy_args >
class WorkGraphExec;
}}} // namespace Kokkos::Impl::Experimental
}} // namespace Kokkos::Impl
namespace Kokkos {
namespace Experimental {
template< class ... Properties >
class WorkGraphPolicy
{
public:
using self_type = WorkGraphPolicy<Properties ... >;
using traits = Kokkos::Impl::PolicyTraits<Properties ... >;
using index_type = typename traits::index_type;
using self_type = WorkGraphPolicy<Properties ... >;
using traits = Kokkos::Impl::PolicyTraits<Properties ... >;
using index_type = typename traits::index_type;
using member_type = index_type;
using work_tag = typename traits::work_tag;
using execution_space = typename traits::execution_space;
using work_tag = typename traits::work_tag;
using memory_space = typename execution_space::memory_space;
using graph_type = Kokkos::Experimental::Crs<index_type, execution_space, void, index_type>;
using member_type = index_type;
using memory_space = typename execution_space::memory_space;
using graph_type = Kokkos::Crs<index_type,execution_space,void,index_type>;
enum : std::int32_t {
END_TOKEN = -1 ,
BEGIN_TOKEN = -2 ,
COMPLETED_TOKEN = -3 };
private:
graph_type m_graph;
using ints_type = Kokkos::View<std::int32_t*, memory_space>;
using range_type = Kokkos::pair<std::int32_t, std::int32_t>;
using ranges_type = Kokkos::View<range_type*, memory_space>;
const std::int32_t m_total_work;
ints_type m_counts;
ints_type m_queue;
ranges_type m_ranges;
// Let N = m_graph.numRows(), the total work
// m_queue[ 0 .. N-1] = the ready queue
// m_queue[ N .. 2*N-1] = the waiting queue counts
// m_queue[2*N .. 2*N+2] = the ready queue hints
graph_type const m_graph;
ints_type m_queue ;
KOKKOS_INLINE_FUNCTION
void push_work( const std::int32_t w ) const noexcept
{
const std::int32_t N = m_graph.numRows();
std::int32_t volatile * const ready_queue = & m_queue[0] ;
std::int32_t volatile * const end_hint = & m_queue[2*N+1] ;
// Push work to end of queue
const std::int32_t j = atomic_fetch_add( end_hint , 1 );
if ( ( N <= j ) ||
( END_TOKEN != atomic_exchange(ready_queue+j,w) ) ) {
// ERROR: past the end of queue or did not replace END_TOKEN
Kokkos::abort("WorkGraphPolicy push_work error");
}
memory_fence();
}
public:
struct TagZeroRanges {};
/**\brief Attempt to pop the work item at the head of the queue.
*
* Find entry 'i' such that
* ( m_queue[i] != BEGIN_TOKEN ) AND
* ( i == 0 OR m_queue[i-1] == BEGIN_TOKEN )
* if found then
* increment begin hint
* return atomic_exchange( m_queue[i] , BEGIN_TOKEN )
* else if i < total work
* return END_TOKEN
* else
* return COMPLETED_TOKEN
*
*/
KOKKOS_INLINE_FUNCTION
void operator()(TagZeroRanges, std::int32_t i) const {
m_ranges[i] = range_type(0, 0);
}
void zero_ranges() {
using policy_type = RangePolicy<std::int32_t, execution_space, TagZeroRanges>;
using closure_type = Kokkos::Impl::ParallelFor<self_type, policy_type>;
const closure_type closure(*this, policy_type(0, 1));
closure.execute();
execution_space::fence();
}
std::int32_t pop_work() const noexcept
{
const std::int32_t N = m_graph.numRows();
struct TagFillQueue {};
KOKKOS_INLINE_FUNCTION
void operator()(TagFillQueue, std::int32_t i) const {
if (*((volatile std::int32_t*)(&m_counts(i))) == 0) push_work(i);
}
void fill_queue() {
using policy_type = RangePolicy<std::int32_t, execution_space, TagFillQueue>;
using closure_type = Kokkos::Impl::ParallelFor<self_type, policy_type>;
const closure_type closure(*this, policy_type(0, m_total_work));
closure.execute();
execution_space::fence();
}
std::int32_t volatile * const ready_queue = & m_queue[0] ;
std::int32_t volatile * const begin_hint = & m_queue[2*N] ;
private:
// begin hint is guaranteed to be less than or equal to
// actual begin location in the queue.
inline
void setup() {
if (m_graph.numRows() > std::numeric_limits<std::int32_t>::max()) {
Kokkos::abort("WorkGraphPolicy work must be indexable using int32_t");
}
get_crs_transpose_counts(m_counts, m_graph);
m_queue = ints_type(ViewAllocateWithoutInitializing("queue"), m_total_work);
deep_copy(m_queue, std::int32_t(-1));
m_ranges = ranges_type("ranges", 1);
fill_queue();
}
for ( std::int32_t i = *begin_hint ; i < N ; ++i ) {
KOKKOS_INLINE_FUNCTION
std::int32_t pop_work() const {
range_type w(-1,-1);
while (true) {
const range_type w_new( w.first + 1 , w.second );
w = atomic_compare_exchange( &m_ranges(0) , w , w_new );
if ( w.first < w.second ) { // there was work in the queue
if ( w_new.first == w.first + 1 && w_new.second == w.second ) {
// we got a work item
std::int32_t i;
// the push_work function may have incremented the end counter
// but not yet written the work index into the queue.
// wait until the entry is valid.
while ( -1 == ( i = *((volatile std::int32_t*)(&m_queue( w.first ))) ) );
return i;
} // we got a work item
} else { // there was no work in the queue
#ifdef KOKKOS_DEBUG
if ( w_new.first == w.first + 1 && w_new.second == w.second ) {
Kokkos::abort("bug in pop_work");
const std::int32_t w = ready_queue[i] ;
if ( w == END_TOKEN ) { return END_TOKEN ; }
if ( ( w != BEGIN_TOKEN ) &&
( w == atomic_compare_exchange(ready_queue+i,w,BEGIN_TOKEN) ) ) {
// Attempt to claim ready work index succeeded,
// update the hint and return work index
atomic_increment( begin_hint );
return w ;
}
#endif
if (w.first == m_total_work) { // all work is done
return -1;
} else { // need to wait for more work to be pushed
// take a guess that one work item will be pushed
// the key thing is we can't leave (w) alone, because
// otherwise the next compare_exchange may succeed in
// popping work from an empty queue
w.second++;
}
} // there was no work in the queue
} // while (true)
}
// arrive here when ready_queue[i] == BEGIN_TOKEN
}
return COMPLETED_TOKEN ;
}
KOKKOS_INLINE_FUNCTION
void push_work(std::int32_t i) const {
range_type w(-1,-1);
while (true) {
const range_type w_new( w.first , w.second + 1 );
// try to increment the end counter
w = atomic_compare_exchange( &m_ranges(0) , w , w_new );
// stop trying if the increment was successful
if ( w.first == w_new.first && w.second + 1 == w_new.second ) break;
void completed_work( std::int32_t w ) const noexcept
{
Kokkos::memory_fence();
// Make sure the completed work function's memory accesses are flushed.
const std::int32_t N = m_graph.numRows();
std::int32_t volatile * const count_queue = & m_queue[N] ;
const std::int32_t B = m_graph.row_map(w);
const std::int32_t E = m_graph.row_map(w+1);
for ( std::int32_t i = B ; i < E ; ++i ) {
const std::int32_t j = m_graph.entries(i);
if ( 1 == atomic_fetch_add(count_queue+j,-1) ) {
push_work(j);
}
}
}
// write the work index into the claimed spot in the queue
*((volatile std::int32_t*)(&m_queue( w.second ))) = i;
// push this write out into the memory system
memory_fence();
}
template< class functor_type , class execution_space, class ... policy_args >
friend class Kokkos::Impl::Experimental::WorkGraphExec;
struct TagInit {};
struct TagCount {};
struct TagReady {};
public:
/**\brief Initialize queue
*
* m_queue[0..N-1] = END_TOKEN, the ready queue
* m_queue[N..2*N-1] = 0, the waiting count queue
* m_queue[2*N..2*N+1] = 0, begin/end hints for ready queue
*/
KOKKOS_INLINE_FUNCTION
void operator()( const TagInit , int i ) const noexcept
{ m_queue[i] = i < m_graph.numRows() ? END_TOKEN : 0 ; }
WorkGraphPolicy(graph_type arg_graph)
KOKKOS_INLINE_FUNCTION
void operator()( const TagCount , int i ) const noexcept
{
std::int32_t volatile * const count_queue =
& m_queue[ m_graph.numRows() ] ;
atomic_increment( count_queue + m_graph.entries[i] );
}
KOKKOS_INLINE_FUNCTION
void operator()( const TagReady , int w ) const noexcept
{
std::int32_t const * const count_queue =
& m_queue[ m_graph.numRows() ] ;
if ( 0 == count_queue[w] ) push_work(w);
}
WorkGraphPolicy( const graph_type & arg_graph )
: m_graph(arg_graph)
, m_total_work( arg_graph.numRows() )
, m_queue( view_alloc( "queue" , WithoutInitializing )
, arg_graph.numRows() * 2 + 2 )
{
setup();
}
{ // Initialize
using policy_type = RangePolicy<std::int32_t, execution_space, TagInit>;
using closure_type = Kokkos::Impl::ParallelFor<self_type, policy_type>;
const closure_type closure(*this, policy_type(0, m_queue.size()));
closure.execute();
execution_space::fence();
}
};
{ // execute-after counts
using policy_type = RangePolicy<std::int32_t, execution_space, TagCount>;
using closure_type = Kokkos::Impl::ParallelFor<self_type, policy_type>;
const closure_type closure(*this,policy_type(0,m_graph.entries.size()));
closure.execute();
execution_space::fence();
}
}} // namespace Kokkos::Experimental
/*--------------------------------------------------------------------------*/
/*--------------------------------------------------------------------------*/
namespace Kokkos {
namespace Impl {
namespace Experimental {
template< class functor_type , class execution_space, class ... policy_args >
class WorkGraphExec
{
public:
using self_type = WorkGraphExec< functor_type, execution_space, policy_args ... >;
using policy_type = Kokkos::Experimental::WorkGraphPolicy< policy_args ... >;
using member_type = typename policy_type::member_type;
using memory_space = typename execution_space::memory_space;
protected:
const functor_type m_functor;
const policy_type m_policy;
protected:
KOKKOS_INLINE_FUNCTION
std::int32_t before_work() const {
return m_policy.pop_work();
}
KOKKOS_INLINE_FUNCTION
void after_work(std::int32_t i) const {
/* fence any writes that were done by the work item itself
(usually writing its result to global memory) */
memory_fence();
const std::int32_t begin = m_policy.m_graph.row_map( i );
const std::int32_t end = m_policy.m_graph.row_map( i + 1 );
for (std::int32_t j = begin; j < end; ++j) {
const std::int32_t next = m_policy.m_graph.entries( j );
const std::int32_t old_count = atomic_fetch_add( &(m_policy.m_counts(next)), -1 );
if ( old_count == 1 ) m_policy.push_work( next );
{ // Scheduling ready tasks
using policy_type = RangePolicy<std::int32_t, execution_space, TagReady>;
using closure_type = Kokkos::Impl::ParallelFor<self_type, policy_type>;
const closure_type closure(*this,policy_type(0,m_graph.numRows()));
closure.execute();
execution_space::fence();
}
}
inline
WorkGraphExec( const functor_type & arg_functor
, const policy_type & arg_policy )
: m_functor( arg_functor )
, m_policy( arg_policy )
{
}
};
}}} // namespace Kokkos::Impl::Experimental
} // namespace Kokkos
#ifdef KOKKOS_ENABLE_SERIAL
#include "impl/Kokkos_Serial_WorkGraphPolicy.hpp"

View File

@ -5,51 +5,44 @@ endif
PREFIX ?= /usr/local/lib/kokkos
default: messages build-lib
echo "End Build"
default: build-lib
ifneq (,$(findstring Cuda,$(KOKKOS_DEVICES)))
CXX = $(KOKKOS_PATH)/bin/nvcc_wrapper
CXX ?= $(KOKKOS_PATH)/bin/nvcc_wrapper
else
CXX = g++
CXX ?= g++
endif
CXXFLAGS = -O3
CXXFLAGS ?= -O3
LINK ?= $(CXX)
LDFLAGS ?=
include $(KOKKOS_PATH)/Makefile.kokkos
PWD = $(shell pwd)
KOKKOS_HEADERS_INCLUDE = $(wildcard $(KOKKOS_PATH)/core/src/*.hpp)
KOKKOS_HEADERS_INCLUDE_IMPL = $(wildcard $(KOKKOS_PATH)/core/src/impl/*.hpp)
KOKKOS_HEADERS_INCLUDE += $(wildcard $(KOKKOS_PATH)/containers/src/*.hpp)
KOKKOS_HEADERS_INCLUDE_IMPL += $(wildcard $(KOKKOS_PATH)/containers/src/impl/*.hpp)
KOKKOS_HEADERS_INCLUDE += $(wildcard $(KOKKOS_PATH)/algorithms/src/*.hpp)
include $(KOKKOS_PATH)/core/src/Makefile.generate_header_lists
include $(KOKKOS_PATH)/core/src/Makefile.generate_build_files
CONDITIONAL_COPIES =
ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
KOKKOS_HEADERS_CUDA += $(wildcard $(KOKKOS_PATH)/core/src/Cuda/*.hpp)
CONDITIONAL_COPIES += copy-cuda
endif
ifeq ($(KOKKOS_INTERNAL_USE_PTHREADS), 1)
KOKKOS_HEADERS_THREADS += $(wildcard $(KOKKOS_PATH)/core/src/Threads/*.hpp)
CONDITIONAL_COPIES += copy-threads
endif
ifeq ($(KOKKOS_INTERNAL_USE_QTHREADS), 1)
KOKKOS_HEADERS_QTHREADS += $(wildcard $(KOKKOS_PATH)/core/src/Qthreads/*.hpp)
CONDITIONAL_COPIES += copy-qthreads
endif
ifeq ($(KOKKOS_INTERNAL_USE_OPENMP), 1)
KOKKOS_HEADERS_OPENMP += $(wildcard $(KOKKOS_PATH)/core/src/OpenMP/*.hpp)
CONDITIONAL_COPIES += copy-openmp
endif
ifeq ($(KOKKOS_INTERNAL_USE_ROCM), 1)
CONDITIONAL_COPIES += copy-rocm
endif
ifeq ($(KOKKOS_OS),CYGWIN)
COPY_FLAG = -u
endif
@ -66,104 +59,7 @@ else
KOKKOS_DEBUG_CMAKE = ON
endif
messages:
echo "Start Build"
build-makefile-kokkos:
rm -f Makefile.kokkos
echo "#Global Settings used to generate this library" >> Makefile.kokkos
echo "KOKKOS_PATH = $(PREFIX)" >> Makefile.kokkos
echo "KOKKOS_DEVICES = $(KOKKOS_DEVICES)" >> Makefile.kokkos
echo "KOKKOS_ARCH = $(KOKKOS_ARCH)" >> Makefile.kokkos
echo "KOKKOS_DEBUG = $(KOKKOS_DEBUG)" >> Makefile.kokkos
echo "KOKKOS_USE_TPLS = $(KOKKOS_USE_TPLS)" >> Makefile.kokkos
echo "KOKKOS_CXX_STANDARD = $(KOKKOS_CXX_STANDARD)" >> Makefile.kokkos
echo "KOKKOS_OPTIONS = $(KOKKOS_OPTIONS)" >> Makefile.kokkos
echo "KOKKOS_CUDA_OPTIONS = $(KOKKOS_CUDA_OPTIONS)" >> Makefile.kokkos
echo "CXX ?= $(CXX)" >> Makefile.kokkos
echo "NVCC_WRAPPER ?= $(PREFIX)/bin/nvcc_wrapper" >> Makefile.kokkos
echo "" >> Makefile.kokkos
echo "#Source and Header files of Kokkos relative to KOKKOS_PATH" >> Makefile.kokkos
echo "KOKKOS_HEADERS = $(KOKKOS_HEADERS)" >> Makefile.kokkos
echo "KOKKOS_SRC = $(KOKKOS_SRC)" >> Makefile.kokkos
echo "" >> Makefile.kokkos
echo "#Variables used in application Makefiles" >> Makefile.kokkos
echo "KOKKOS_OS = $(KOKKOS_OS)" >> Makefile.kokkos
echo "KOKKOS_CPP_DEPENDS = $(KOKKOS_CPP_DEPENDS)" >> Makefile.kokkos
echo "KOKKOS_CXXFLAGS = $(KOKKOS_CXXFLAGS)" >> Makefile.kokkos
echo "KOKKOS_CPPFLAGS = $(KOKKOS_CPPFLAGS)" >> Makefile.kokkos
echo "KOKKOS_LINK_DEPENDS = $(KOKKOS_LINK_DEPENDS)" >> Makefile.kokkos
echo "KOKKOS_LIBS = $(KOKKOS_LIBS)" >> Makefile.kokkos
echo "KOKKOS_LDFLAGS = $(KOKKOS_LDFLAGS)" >> Makefile.kokkos
echo "" >> Makefile.kokkos
echo "#Internal settings which need to propagated for Kokkos examples" >> Makefile.kokkos
echo "KOKKOS_INTERNAL_USE_CUDA = ${KOKKOS_INTERNAL_USE_CUDA}" >> Makefile.kokkos
echo "KOKKOS_INTERNAL_USE_QTHREADS = ${KOKKOS_INTERNAL_USE_QTHREADS}" >> Makefile.kokkos
echo "KOKKOS_INTERNAL_USE_OPENMP = ${KOKKOS_INTERNAL_USE_OPENMP}" >> Makefile.kokkos
echo "KOKKOS_INTERNAL_USE_PTHREADS = ${KOKKOS_INTERNAL_USE_PTHREADS}" >> Makefile.kokkos
echo "" >> Makefile.kokkos
echo "#Fake kokkos-clean target" >> Makefile.kokkos
echo "kokkos-clean:" >> Makefile.kokkos
echo "" >> Makefile.kokkos
sed \
-e 's|$(KOKKOS_PATH)/core/src|$(PREFIX)/include|g' \
-e 's|$(KOKKOS_PATH)/containers/src|$(PREFIX)/include|g' \
-e 's|$(KOKKOS_PATH)/algorithms/src|$(PREFIX)/include|g' \
-e 's|-L$(PWD)|-L$(PREFIX)/lib|g' \
-e 's|= libkokkos.a|= $(PREFIX)/lib/libkokkos.a|g' \
-e 's|= KokkosCore_config.h|= $(PREFIX)/include/KokkosCore_config.h|g' Makefile.kokkos \
> Makefile.kokkos.tmp
mv -f Makefile.kokkos.tmp Makefile.kokkos
build-cmake-kokkos:
rm -f kokkos.cmake
echo "#Global Settings used to generate this library" >> kokkos.cmake
echo "set(KOKKOS_PATH $(PREFIX) CACHE PATH \"Kokkos installation path\")" >> kokkos.cmake
echo "set(KOKKOS_DEVICES $(KOKKOS_DEVICES) CACHE STRING \"Kokkos devices list\")" >> kokkos.cmake
echo "set(KOKKOS_ARCH $(KOKKOS_ARCH) CACHE STRING \"Kokkos architecture flags\")" >> kokkos.cmake
echo "set(KOKKOS_DEBUG $(KOKKOS_DEBUG_CMAKE) CACHE BOOL \"Kokkos debug enabled ?)\")" >> kokkos.cmake
echo "set(KOKKOS_USE_TPLS $(KOKKOS_USE_TPLS) CACHE STRING \"Kokkos templates list\")" >> kokkos.cmake
echo "set(KOKKOS_CXX_STANDARD $(KOKKOS_CXX_STANDARD) CACHE STRING \"Kokkos C++ standard\")" >> kokkos.cmake
echo "set(KOKKOS_OPTIONS $(KOKKOS_OPTIONS) CACHE STRING \"Kokkos options\")" >> kokkos.cmake
echo "set(KOKKOS_CUDA_OPTIONS $(KOKKOS_CUDA_OPTIONS) CACHE STRING \"Kokkos Cuda options\")" >> kokkos.cmake
echo "if(NOT $ENV{CXX})" >> kokkos.cmake
echo ' message(WARNING "You are currently using compiler $${CMAKE_CXX_COMPILER} while Kokkos was built with $(CXX) ; make sure this is the behavior you intended to be.")' >> kokkos.cmake
echo "endif()" >> kokkos.cmake
echo "if(NOT DEFINED ENV{NVCC_WRAPPER})" >> kokkos.cmake
echo " set(NVCC_WRAPPER \"$(NVCC_WRAPPER)\" CACHE FILEPATH \"Path to command nvcc_wrapper\")" >> kokkos.cmake
echo "else()" >> kokkos.cmake
echo ' set(NVCC_WRAPPER $$ENV{NVCC_WRAPPER} CACHE FILEPATH "Path to command nvcc_wrapper")' >> kokkos.cmake
echo "endif()" >> kokkos.cmake
echo "" >> kokkos.cmake
echo "#Source and Header files of Kokkos relative to KOKKOS_PATH" >> kokkos.cmake
echo "set(KOKKOS_HEADERS \"$(KOKKOS_HEADERS)\" CACHE STRING \"Kokkos headers list\")" >> kokkos.cmake
echo "set(KOKKOS_SRC \"$(KOKKOS_SRC)\" CACHE STRING \"Kokkos source list\")" >> kokkos.cmake
echo "" >> kokkos.cmake
echo "#Variables used in application Makefiles" >> kokkos.cmake
echo "set(KOKKOS_CPP_DEPENDS \"$(KOKKOS_CPP_DEPENDS)\" CACHE STRING \"\")" >> kokkos.cmake
echo "set(KOKKOS_CXXFLAGS \"$(KOKKOS_CXXFLAGS)\" CACHE STRING \"\")" >> kokkos.cmake
echo "set(KOKKOS_CPPFLAGS \"$(KOKKOS_CPPFLAGS)\" CACHE STRING \"\")" >> kokkos.cmake
echo "set(KOKKOS_LINK_DEPENDS \"$(KOKKOS_LINK_DEPENDS)\" CACHE STRING \"\")" >> kokkos.cmake
echo "set(KOKKOS_LIBS \"$(KOKKOS_LIBS)\" CACHE STRING \"\")" >> kokkos.cmake
echo "set(KOKKOS_LDFLAGS \"$(KOKKOS_LDFLAGS)\" CACHE STRING \"\")" >> kokkos.cmake
echo "" >> kokkos.cmake
echo "#Internal settings which need to propagated for Kokkos examples" >> kokkos.cmake
echo "set(KOKKOS_INTERNAL_USE_CUDA \"${KOKKOS_INTERNAL_USE_CUDA}\" CACHE STRING \"\")" >> kokkos.cmake
echo "set(KOKKOS_INTERNAL_USE_OPENMP \"${KOKKOS_INTERNAL_USE_OPENMP}\" CACHE STRING \"\")" >> kokkos.cmake
echo "set(KOKKOS_INTERNAL_USE_PTHREADS \"${KOKKOS_INTERNAL_USE_PTHREADS}\" CACHE STRING \"\")" >> kokkos.cmake
echo "mark_as_advanced(KOKKOS_HEADERS KOKKOS_SRC KOKKOS_INTERNAL_USE_CUDA KOKKOS_INTERNAL_USE_OPENMP KOKKOS_INTERNAL_USE_PTHREADS)" >> kokkos.cmake
echo "" >> kokkos.cmake
sed \
-e 's|$(KOKKOS_PATH)/core/src|$(PREFIX)/include|g' \
-e 's|$(KOKKOS_PATH)/containers/src|$(PREFIX)/include|g' \
-e 's|$(KOKKOS_PATH)/algorithms/src|$(PREFIX)/include|g' \
-e 's|-L$(PWD)|-L$(PREFIX)/lib|g' \
-e 's|= libkokkos.a|= $(PREFIX)/lib/libkokkos.a|g' \
-e 's|= KokkosCore_config.h|= $(PREFIX)/include/KokkosCore_config.h|g' kokkos.cmake \
> kokkos.cmake.tmp
mv -f kokkos.cmake.tmp kokkos.cmake
build-lib: build-makefile-kokkos build-cmake-kokkos $(KOKKOS_LINK_DEPENDS)
build-lib: $(KOKKOS_LINK_DEPENDS)
mkdir:
mkdir -p $(PREFIX)
@ -188,14 +84,18 @@ copy-openmp: mkdir
mkdir -p $(PREFIX)/include/OpenMP
cp $(COPY_FLAG) $(KOKKOS_HEADERS_OPENMP) $(PREFIX)/include/OpenMP
install: mkdir $(CONDITIONAL_COPIES) build-lib
copy-rocm: mkdir
mkdir -p $(PREFIX)/include/ROCm
cp $(COPY_FLAG) $(KOKKOS_HEADERS_ROCM) $(PREFIX)/include/ROCm
install: mkdir $(CONDITIONAL_COPIES) build-lib generate_build_settings
cp $(COPY_FLAG) $(NVCC_WRAPPER) $(PREFIX)/bin
cp $(COPY_FLAG) $(KOKKOS_HEADERS_INCLUDE) $(PREFIX)/include
cp $(COPY_FLAG) $(KOKKOS_HEADERS_INCLUDE_IMPL) $(PREFIX)/include/impl
cp $(COPY_FLAG) Makefile.kokkos $(PREFIX)
cp $(COPY_FLAG) kokkos.cmake $(PREFIX)
cp $(COPY_FLAG) $(KOKKOS_MAKEFILE) $(PREFIX)
cp $(COPY_FLAG) $(KOKKOS_CMAKEFILE) $(PREFIX)
cp $(COPY_FLAG) libkokkos.a $(PREFIX)/lib
cp $(COPY_FLAG) KokkosCore_config.h $(PREFIX)/include
cp $(COPY_FLAG) $(KOKKOS_CONFIG_HEADER) $(PREFIX)/include
clean: kokkos-clean
rm -f Makefile.kokkos
rm -f $(KOKKOS_MAKEFILE) $(KOKKOS_CMAKEFILE)

View File

@ -0,0 +1,100 @@
# This file is responsible for generating files which will be used
# by build system (make and cmake) in scenarios where the kokkos library
# gets installed before building the app
# These files are generated by this makefile
KOKKOS_MAKEFILE=Makefile.kokkos
KOKKOS_CMAKEFILE=kokkos_generated_settings.cmake
ifeq ($(KOKKOS_DEBUG),"no")
KOKKOS_DEBUG_CMAKE = OFF
else
KOKKOS_DEBUG_CMAKE = ON
endif
# Functions for generating makefile and cmake file
# In calling these routines, do not put space after ,
# e.g., $(call kokkos_append_var,KOKKOS_PATH,$(PREFIX))
kokkos_append_makefile = echo $1 >> $(KOKKOS_MAKEFILE)
kokkos_append_cmakefile = echo $1 >> $(KOKKOS_CMAKEFILE)
kokkos_setvar_cmakefile = echo set\($1 $2\) >> $(KOKKOS_CMAKEFILE)
kokkos_setlist_cmakefile = echo set\($1 \"$2\"\) >> $(KOKKOS_CMAKEFILE)
kokkos_appendvar_makefile = echo $1 = $($(1)) >> $(KOKKOS_MAKEFILE)
kokkos_appendvar2_makefile = echo $1 ?= $($(1)) >> $(KOKKOS_MAKEFILE)
kokkos_appendvar_cmakefile = echo set\($1 $($(1)) CACHE $2 FORCE\) >> $(KOKKOS_CMAKEFILE)
kokkos_appendval_makefile = echo $1 = $2 >> $(KOKKOS_MAKEFILE)
kokkos_appendval_cmakefile = echo set\($1 $2 CACHE $3 FORCE\) >> $(KOKKOS_CMAKEFILE)
kokkos_append_string = $(call kokkos_append_makefile,$1); $(call kokkos_append_cmakefile,$1)
kokkos_append_var = $(call kokkos_appendvar_makefile,$1); $(call kokkos_appendvar_cmakefile,$1,$2)
kokkos_append_var2 = $(call kokkos_appendvar2_makefile,$1); $(call kokkos_appendvar_cmakefile,$1,$2)
kokkos_append_varval = $(call kokkos_appendval_makefile,$1,$2); $(call kokkos_appendval_cmakefile,$1,$2,$3)
generate_build_settings: $(KOKKOS_CONFIG_HEADER)
@rm -f $(KOKKOS_MAKEFILE)
@rm -f $(KOKKOS_CMAKEFILE)
@$(call kokkos_append_string, "#Global Settings used to generate this library")
@$(call kokkos_append_varval,KOKKOS_PATH,$(KOKKOS_INSTALL_PATH),'FILEPATH "Kokkos installation path"')
@$(call kokkos_append_var,KOKKOS_DEVICES,'STRING "Kokkos devices list"')
@$(call kokkos_append_var,KOKKOS_ARCH,'STRING "Kokkos architecture flags"')
@$(call kokkos_appendvar_makefile,KOKKOS_DEBUG)
@$(call kokkos_appendvar_cmakefile,KOKKOS_DEBUG_CMAKE,'BOOL "Kokkos debug enabled ?"')
@$(call kokkos_append_var,KOKKOS_USE_TPLS,'STRING "Kokkos templates list"')
@$(call kokkos_append_var,KOKKOS_CXX_STANDARD,'STRING "Kokkos C++ standard"')
@$(call kokkos_append_var,KOKKOS_OPTIONS,'STRING "Kokkos options"')
@$(call kokkos_append_var,KOKKOS_CUDA_OPTIONS,'STRING "Kokkos Cuda options"')
@$(call kokkos_appendvar2,CXX,'KOKKOS C++ Compiler')
@$(call kokkos_append_cmakefile,"if(NOT DEFINED ENV{NVCC_WRAPPER})")
@$(call kokkos_append_var2,NVCC_WRAPPER,'FILEPATH "Path to command nvcc_wrapper"')
@$(call kokkos_append_cmakefile,"else()")
@$(call kokkos_append_cmakefile,' set(NVCC_WRAPPER $$ENV{NVCC_WRAPPER} CACHE FILEPATH "Path to command nvcc_wrapper")')
@$(call kokkos_append_cmakefile,"endif()")
@$(call kokkos_append_string,"")
@$(call kokkos_append_string,"#Source and Header files of Kokkos relative to KOKKOS_PATH")
@$(call kokkos_append_var,KOKKOS_HEADERS,'STRING "Kokkos headers list"')
@$(call kokkos_append_var,KOKKOS_HEADERS_IMPL,'STRING "Kokkos headers impl list"')
@$(call kokkos_append_var,KOKKOS_HEADERS_CUDA,'STRING "Kokkos headers Cuda list"')
@$(call kokkos_append_var,KOKKOS_HEADERS_OPENMP,'STRING "Kokkos headers OpenMP list"')
@$(call kokkos_append_var,KOKKOS_HEADERS_ROCM,'STRING "Kokkos headers ROCm list"')
@$(call kokkos_append_var,KOKKOS_HEADERS_THREADS,'STRING "Kokkos headers Threads list"')
@$(call kokkos_append_var,KOKKOS_HEADERS_QTHREADS,'STRING "Kokkos headers QThreads list"')
@$(call kokkos_append_var,KOKKOS_SRC,'STRING "Kokkos source list"')
@$(call kokkos_append_string,"")
@$(call kokkos_append_string,"#Variables used in application Makefiles")
@$(call kokkos_append_var,KOKKOS_OS,'STRING ""') # This was not in original cmake gen
@$(call kokkos_append_var,KOKKOS_CPP_DEPENDS,'STRING ""')
@$(call kokkos_append_var,KOKKOS_LINK_DEPENDS,'STRING ""')
@$(call kokkos_append_var,KOKKOS_CXXFLAGS,'STRING ""')
@$(call kokkos_append_var,KOKKOS_CPPFLAGS,'STRING ""')
@$(call kokkos_append_var,KOKKOS_LDFLAGS,'STRING ""')
@$(call kokkos_append_var,KOKKOS_LIBS,'STRING ""')
@$(call kokkos_append_var,KOKKOS_EXTRA_LIBS,'STRING ""')
@$(call kokkos_append_string,"")
@$(call kokkos_append_string,"#Internal settings which need to propagated for Kokkos examples")
@$(call kokkos_append_var,KOKKOS_INTERNAL_USE_CUDA,'STRING ""')
@$(call kokkos_append_var,KOKKOS_INTERNAL_USE_OPENMP,'STRING ""')
@$(call kokkos_append_var,KOKKOS_INTERNAL_USE_PTHREADS,'STRING ""')
@$(call kokkos_append_var,KOKKOS_INTERNAL_USE_ROCM,'STRING ""')
@$(call kokkos_append_var,KOKKOS_INTERNAL_USE_QTHREADS,'STRING ""') # Not in original cmake gen
@$(call kokkos_append_cmakefile "mark_as_advanced(KOKKOS_HEADERS KOKKOS_SRC KOKKOS_INTERNAL_USE_CUDA KOKKOS_INTERNAL_USE_OPENMP KOKKOS_INTERNAL_USE_PTHREADS)")
@$(call kokkos_append_makefile,"")
@$(call kokkos_append_makefile,"#Fake kokkos-clean target")
@$(call kokkos_append_makefile,"kokkos-clean:")
@$(call kokkos_append_makefile,"")
@sed \
-e 's|$(KOKKOS_PATH)/core/src|$(PREFIX)/include|g' \
-e 's|$(KOKKOS_PATH)/containers/src|$(PREFIX)/include|g' \
-e 's|$(KOKKOS_PATH)/algorithms/src|$(PREFIX)/include|g' \
-e 's|-L$(PWD)|-L$(PREFIX)/lib|g' \
-e 's|= libkokkos.a|= $(PREFIX)/lib/libkokkos.a|g' \
-e 's|= $(KOKKOS_CONFIG_HEADER)|= $(PREFIX)/include/$(KOKKOS_CONFIG_HEADER)|g' $(KOKKOS_MAKEFILE) \
> $(KOKKOS_MAKEFILE).tmp
@mv -f $(KOKKOS_MAKEFILE).tmp $(KOKKOS_MAKEFILE)
@$(call kokkos_setvar_cmakefile,KOKKOS_CXX_FLAGS,$(KOKKOS_CXXFLAGS))
@$(call kokkos_setvar_cmakefile,KOKKOS_CPP_FLAGS,$(KOKKOS_CPPFLAGS))
@$(call kokkos_setvar_cmakefile,KOKKOS_LD_FLAGS,$(KOKKOS_LDFLAGS))
@$(call kokkos_setlist_cmakefile,KOKKOS_LIBS_LIST,$(KOKKOS_LIBS))
@$(call kokkos_setlist_cmakefile,KOKKOS_EXTRA_LIBS_LIST,$(KOKKOS_EXTRA_LIBS))

View File

@ -0,0 +1,28 @@
# Build a List of Header Files
KOKKOS_HEADERS_INCLUDE = $(wildcard $(KOKKOS_PATH)/core/src/*.hpp)
KOKKOS_HEADERS_INCLUDE_IMPL = $(wildcard $(KOKKOS_PATH)/core/src/impl/*.hpp)
KOKKOS_HEADERS_INCLUDE += $(wildcard $(KOKKOS_PATH)/containers/src/*.hpp)
KOKKOS_HEADERS_INCLUDE_IMPL += $(wildcard $(KOKKOS_PATH)/containers/src/impl/*.hpp)
KOKKOS_HEADERS_INCLUDE += $(wildcard $(KOKKOS_PATH)/algorithms/src/*.hpp)
ifeq ($(KOKKOS_INTERNAL_USE_CUDA), 1)
KOKKOS_HEADERS_CUDA += $(wildcard $(KOKKOS_PATH)/core/src/Cuda/*.hpp)
endif
ifeq ($(KOKKOS_INTERNAL_USE_PTHREADS), 1)
KOKKOS_HEADERS_THREADS += $(wildcard $(KOKKOS_PATH)/core/src/Threads/*.hpp)
endif
ifeq ($(KOKKOS_INTERNAL_USE_QTHREADS), 1)
KOKKOS_HEADERS_QTHREADS += $(wildcard $(KOKKOS_PATH)/core/src/Qthreads/*.hpp)
endif
ifeq ($(KOKKOS_INTERNAL_USE_OPENMP), 1)
KOKKOS_HEADERS_OPENMP += $(wildcard $(KOKKOS_PATH)/core/src/OpenMP/*.hpp)
endif
ifeq ($(KOKKOS_INTERNAL_USE_ROCM), 1)
KOKKOS_HEADERS_ROCM += $(wildcard $(KOKKOS_PATH)/core/src/ROCm/*.hpp)
endif

View File

@ -294,7 +294,7 @@ void OpenMP::initialize( int thread_count )
}
{
if (nullptr == std::getenv("OMP_PROC_BIND") ) {
if ( Kokkos::show_warnings() && nullptr == std::getenv("OMP_PROC_BIND") ) {
printf("Kokkos::OpenMP::initialize WARNING: OMP_PROC_BIND environment variable not set\n");
printf(" In general, for best performance with OpenMP 4.0 or better set OMP_PROC_BIND=spread and OMP_PLACES=threads\n");
printf(" For best performance with OpenMP 3.1 set OMP_PROC_BIND=true\n");
@ -327,7 +327,7 @@ void OpenMP::initialize( int thread_count )
omp_set_num_threads(Impl::g_openmp_hardware_max_threads);
}
else {
if( thread_count > process_num_threads ) {
if( Kokkos::show_warnings() && thread_count > process_num_threads ) {
printf( "Kokkos::OpenMP::initialize WARNING: You are likely oversubscribing your CPU cores.\n");
printf( " process threads available : %3d, requested thread : %3d\n", process_num_threads, thread_count );
}
@ -364,12 +364,12 @@ void OpenMP::initialize( int thread_count )
// Check for over-subscription
//if( Impl::mpi_ranks_per_node() * long(thread_count) > Impl::processors_per_node() ) {
// std::cout << "Kokkos::OpenMP::initialize WARNING: You are likely oversubscribing your CPU cores." << std::endl;
// std::cout << " Detected: " << Impl::processors_per_node() << " cores per node." << std::endl;
// std::cout << " Detected: " << Impl::mpi_ranks_per_node() << " MPI_ranks per node." << std::endl;
// std::cout << " Requested: " << thread_count << " threads per process." << std::endl;
//}
if( Kokkos::show_warnings() && (Impl::mpi_ranks_per_node() * long(thread_count) > Impl::processors_per_node()) ) {
std::cout << "Kokkos::OpenMP::initialize WARNING: You are likely oversubscribing your CPU cores." << std::endl;
std::cout << " Detected: " << Impl::processors_per_node() << " cores per node." << std::endl;
std::cout << " Detected: " << Impl::mpi_ranks_per_node() << " MPI_ranks per node." << std::endl;
std::cout << " Requested: " << thread_count << " threads per process." << std::endl;
}
// Init the array for used for arbitrarily sized atomics
Impl::init_lock_array_host_space();

View File

@ -170,20 +170,20 @@ public:
// MDRangePolicy impl
template< class FunctorType , class ... Traits >
class ParallelFor< FunctorType
, Kokkos::Experimental::MDRangePolicy< Traits ... >
, Kokkos::MDRangePolicy< Traits ... >
, Kokkos::OpenMP
>
{
private:
typedef Kokkos::Experimental::MDRangePolicy< Traits ... > MDRangePolicy ;
typedef Kokkos::MDRangePolicy< Traits ... > MDRangePolicy ;
typedef typename MDRangePolicy::impl_range_policy Policy ;
typedef typename MDRangePolicy::work_tag WorkTag ;
typedef typename Policy::WorkRange WorkRange ;
typedef typename Policy::member_type Member ;
typedef typename Kokkos::Experimental::Impl::HostIterateTile< MDRangePolicy, FunctorType, typename MDRangePolicy::work_tag, void > iterate_type;
typedef typename Kokkos::Impl::HostIterateTile< MDRangePolicy, FunctorType, typename MDRangePolicy::work_tag, void > iterate_type;
OpenMPExec * m_instance ;
const FunctorType m_functor ;
@ -292,11 +292,12 @@ private:
typedef Kokkos::Impl::if_c< std::is_same<InvalidType,ReducerType>::value, FunctorType, ReducerType> ReducerConditional;
typedef typename ReducerConditional::type ReducerTypeFwd;
typedef typename Kokkos::Impl::if_c< std::is_same<InvalidType,ReducerType>::value, WorkTag, void>::type WorkTagFwd;
// Static Assert WorkTag void if ReducerType not InvalidType
typedef Kokkos::Impl::FunctorValueInit< ReducerTypeFwd, WorkTag > ValueInit ;
typedef Kokkos::Impl::FunctorValueJoin< ReducerTypeFwd, WorkTag > ValueJoin ;
typedef Kokkos::Impl::FunctorValueInit< ReducerTypeFwd, WorkTagFwd > ValueInit ;
typedef Kokkos::Impl::FunctorValueJoin< ReducerTypeFwd, WorkTagFwd > ValueJoin ;
typedef typename Analysis::pointer_type pointer_type ;
typedef typename Analysis::reference_type reference_type ;
@ -393,7 +394,7 @@ public:
, m_instance->get_thread_data(i)->pool_reduce_local() );
}
Kokkos::Impl::FunctorFinal< ReducerTypeFwd , WorkTag >::final( ReducerConditional::select(m_functor , m_reducer) , ptr );
Kokkos::Impl::FunctorFinal< ReducerTypeFwd , WorkTagFwd >::final( ReducerConditional::select(m_functor , m_reducer) , ptr );
if ( m_result_ptr ) {
const int n = Analysis::value_count( ReducerConditional::select(m_functor , m_reducer) );
@ -445,14 +446,14 @@ public:
// MDRangePolicy impl
template< class FunctorType , class ReducerType, class ... Traits >
class ParallelReduce< FunctorType
, Kokkos::Experimental::MDRangePolicy< Traits ...>
, Kokkos::MDRangePolicy< Traits ...>
, ReducerType
, Kokkos::OpenMP
>
{
private:
typedef Kokkos::Experimental::MDRangePolicy< Traits ... > MDRangePolicy ;
typedef Kokkos::MDRangePolicy< Traits ... > MDRangePolicy ;
typedef typename MDRangePolicy::impl_range_policy Policy ;
typedef typename MDRangePolicy::work_tag WorkTag ;
@ -463,16 +464,17 @@ private:
typedef Kokkos::Impl::if_c< std::is_same<InvalidType,ReducerType>::value, FunctorType, ReducerType> ReducerConditional;
typedef typename ReducerConditional::type ReducerTypeFwd;
typedef typename Kokkos::Impl::if_c< std::is_same<InvalidType,ReducerType>::value, WorkTag, void>::type WorkTagFwd;
typedef typename ReducerTypeFwd::value_type ValueType;
typedef Kokkos::Impl::FunctorValueInit< ReducerTypeFwd, WorkTag > ValueInit ;
typedef Kokkos::Impl::FunctorValueJoin< ReducerTypeFwd, WorkTag > ValueJoin ;
typedef Kokkos::Impl::FunctorValueInit< ReducerTypeFwd, WorkTagFwd > ValueInit ;
typedef Kokkos::Impl::FunctorValueJoin< ReducerTypeFwd, WorkTagFwd > ValueJoin ;
typedef typename Analysis::pointer_type pointer_type ;
typedef typename Analysis::reference_type reference_type ;
using iterate_type = typename Kokkos::Experimental::Impl::HostIterateTile< MDRangePolicy
using iterate_type = typename Kokkos::Impl::HostIterateTile< MDRangePolicy
, FunctorType
, WorkTag
, ValueType
@ -558,7 +560,7 @@ public:
, m_instance->get_thread_data(i)->pool_reduce_local() );
}
Kokkos::Impl::FunctorFinal< ReducerTypeFwd , WorkTag >::final( ReducerConditional::select(m_functor , m_reducer) , ptr );
Kokkos::Impl::FunctorFinal< ReducerTypeFwd , WorkTagFwd >::final( ReducerConditional::select(m_functor , m_reducer) , ptr );
if ( m_result_ptr ) {
const int n = Analysis::value_count( ReducerConditional::select(m_functor , m_reducer) );
@ -920,9 +922,10 @@ private:
, FunctorType, ReducerType> ReducerConditional;
typedef typename ReducerConditional::type ReducerTypeFwd;
typedef typename Kokkos::Impl::if_c< std::is_same<InvalidType,ReducerType>::value, WorkTag, void>::type WorkTagFwd;
typedef Kokkos::Impl::FunctorValueInit< ReducerTypeFwd , WorkTag > ValueInit ;
typedef Kokkos::Impl::FunctorValueJoin< ReducerTypeFwd , WorkTag > ValueJoin ;
typedef Kokkos::Impl::FunctorValueInit< ReducerTypeFwd , WorkTagFwd > ValueInit ;
typedef Kokkos::Impl::FunctorValueJoin< ReducerTypeFwd , WorkTagFwd > ValueJoin ;
typedef typename Analysis::pointer_type pointer_type ;
typedef typename Analysis::reference_type reference_type ;
@ -1067,7 +1070,7 @@ public:
, m_instance->get_thread_data(i)->pool_reduce_local() );
}
Kokkos::Impl::FunctorFinal< ReducerTypeFwd , WorkTag >::final( ReducerConditional::select(m_functor , m_reducer) , ptr );
Kokkos::Impl::FunctorFinal< ReducerTypeFwd , WorkTagFwd >::final( ReducerConditional::select(m_functor , m_reducer) , ptr );
if ( m_result_ptr ) {
const int n = Analysis::value_count( ReducerConditional::select(m_functor , m_reducer) );

View File

@ -49,33 +49,26 @@ namespace Impl {
template< class FunctorType , class ... Traits >
class ParallelFor< FunctorType ,
Kokkos::Experimental::WorkGraphPolicy< Traits ... > ,
Kokkos::WorkGraphPolicy< Traits ... > ,
Kokkos::OpenMP
>
: public Kokkos::Impl::Experimental::
WorkGraphExec< FunctorType,
Kokkos::OpenMP,
Traits ...
>
{
private:
typedef Kokkos::Experimental::WorkGraphPolicy< Traits ... > Policy ;
typedef Kokkos::Impl::Experimental::
WorkGraphExec<FunctorType, Kokkos::OpenMP, Traits ... > Base ;
typedef Kokkos::WorkGraphPolicy< Traits ... > Policy ;
Policy m_policy ;
FunctorType m_functor ;
template< class TagType >
typename std::enable_if< std::is_same< TagType , void >::value >::type
exec_one(const typename Policy::member_type& i) const {
Base::m_functor( i );
}
exec_one( const std::int32_t w ) const noexcept
{ m_functor( w ); }
template< class TagType >
typename std::enable_if< ! std::is_same< TagType , void >::value >::type
exec_one(const typename Policy::member_type& i) const {
const TagType t{} ;
Base::m_functor( t , i );
}
exec_one( const std::int32_t w ) const noexcept
{ const TagType t{} ; m_functor( t , w ); }
public:
@ -86,9 +79,15 @@ public:
#pragma omp parallel num_threads(pool_size)
{
for (std::int32_t i; (-1 != (i = Base::before_work())); ) {
exec_one< typename Policy::work_tag >( i );
Base::after_work(i);
// Spin until COMPLETED_TOKEN.
// END_TOKEN indicates no work is currently available.
for ( std::int32_t w = Policy::END_TOKEN ;
Policy::COMPLETED_TOKEN != ( w = m_policy.pop_work() ) ; ) {
if ( Policy::END_TOKEN != w ) {
exec_one< typename Policy::work_tag >( w );
m_policy.completed_work(w);
}
}
}
}
@ -96,12 +95,13 @@ public:
inline
ParallelFor( const FunctorType & arg_functor
, const Policy & arg_policy )
: Base( arg_functor, arg_policy )
{
}
: m_policy( arg_policy )
, m_functor( arg_functor )
{}
};
} // namespace Impl
} // namespace Kokkos
#endif /* #define KOKKOS_OPENMP_WORKGRAPHPOLICY_HPP */

View File

@ -248,12 +248,13 @@ private:
typedef Kokkos::Impl::if_c< std::is_same<InvalidType,ReducerType>::value, FunctorType, ReducerType> ReducerConditional;
typedef typename ReducerConditional::type ReducerTypeFwd;
typedef typename Kokkos::Impl::if_c< std::is_same<InvalidType,ReducerType>::value, WorkTag, void>::type WorkTagFwd;
// Static Assert WorkTag void if ReducerType not InvalidType
typedef Kokkos::Impl::FunctorValueTraits< ReducerTypeFwd, WorkTag > ValueTraits ;
typedef Kokkos::Impl::FunctorValueInit< ReducerTypeFwd, WorkTag > ValueInit ;
typedef Kokkos::Impl::FunctorValueJoin< ReducerTypeFwd, WorkTag > ValueJoin ;
typedef Kokkos::Impl::FunctorValueTraits< ReducerTypeFwd , WorkTagFwd > ValueTraits ;
typedef Kokkos::Impl::FunctorValueInit< ReducerTypeFwd , WorkTagFwd > ValueInit ;
typedef Kokkos::Impl::FunctorValueJoin< ReducerTypeFwd , WorkTagFwd > ValueJoin ;
enum {HasJoin = ReduceFunctorHasJoin<FunctorType>::value };
enum {UseReducer = is_reducer_type<ReducerType>::value };
@ -620,10 +621,11 @@ private:
typedef Kokkos::Impl::if_c< std::is_same<InvalidType,ReducerType>::value, FunctorType, ReducerType> ReducerConditional;
typedef typename ReducerConditional::type ReducerTypeFwd;
typedef typename Kokkos::Impl::if_c< std::is_same<InvalidType,ReducerType>::value, WorkTag, void>::type WorkTagFwd;
typedef Kokkos::Impl::FunctorValueTraits< ReducerTypeFwd , WorkTag > ValueTraits ;
typedef Kokkos::Impl::FunctorValueInit< ReducerTypeFwd , WorkTag > ValueInit ;
typedef Kokkos::Impl::FunctorValueJoin< ReducerTypeFwd , WorkTag > ValueJoin ;
typedef Kokkos::Impl::FunctorValueTraits< ReducerTypeFwd , WorkTagFwd > ValueTraits ;
typedef Kokkos::Impl::FunctorValueInit< ReducerTypeFwd , WorkTagFwd > ValueInit ;
typedef Kokkos::Impl::FunctorValueJoin< ReducerTypeFwd , WorkTagFwd > ValueJoin ;
typedef typename ValueTraits::pointer_type pointer_type ;
typedef typename ValueTraits::reference_type reference_type ;

View File

@ -150,11 +150,12 @@ private:
typedef Kokkos::Impl::if_c< std::is_same<InvalidType, ReducerType>::value, FunctorType, ReducerType > ReducerConditional;
typedef typename ReducerConditional::type ReducerTypeFwd;
typedef typename Kokkos::Impl::if_c< std::is_same<InvalidType, ReducerType>::value, WorkTag, void >::type WorkTagFwd;
// Static Assert WorkTag void if ReducerType not InvalidType
typedef Kokkos::Impl::FunctorValueTraits< ReducerTypeFwd, WorkTag > ValueTraits ;
typedef Kokkos::Impl::FunctorValueInit< ReducerTypeFwd, WorkTag > ValueInit ;
typedef Kokkos::Impl::FunctorValueTraits< ReducerTypeFwd , WorkTagFwd > ValueTraits ;
typedef Kokkos::Impl::FunctorValueInit< ReducerTypeFwd , WorkTagFwd > ValueInit ;
typedef typename ValueTraits::pointer_type pointer_type ;
typedef typename ValueTraits::reference_type reference_type ;
@ -213,7 +214,7 @@ public:
const pointer_type data = (pointer_type) QthreadsExec::exec_all_reduce_result();
Kokkos::Impl::FunctorFinal< ReducerTypeFwd , WorkTag >::final( ReducerConditional::select(m_functor , m_reducer) , data );
Kokkos::Impl::FunctorFinal< ReducerTypeFwd , WorkTagFwd >::final( ReducerConditional::select(m_functor , m_reducer) , data );
if ( m_result_ptr ) {
const unsigned n = ValueTraits::value_count( ReducerConditional::select(m_functor , m_reducer) );
@ -331,9 +332,10 @@ private:
typedef Kokkos::Impl::if_c< std::is_same<InvalidType,ReducerType>::value, FunctorType, ReducerType> ReducerConditional;
typedef typename ReducerConditional::type ReducerTypeFwd;
typedef typename Kokkos::Impl::if_c< std::is_same<InvalidType, ReducerType>::value, WorkTag, void >::type WorkTagFwd;
typedef Kokkos::Impl::FunctorValueTraits< ReducerTypeFwd , WorkTag > ValueTraits ;
typedef Kokkos::Impl::FunctorValueInit< ReducerTypeFwd , WorkTag > ValueInit ;
typedef Kokkos::Impl::FunctorValueTraits< ReducerTypeFwd , WorkTagFwd > ValueTraits ;
typedef Kokkos::Impl::FunctorValueInit< ReducerTypeFwd , WorkTagFwd > ValueInit ;
typedef typename ValueTraits::pointer_type pointer_type ;
typedef typename ValueTraits::reference_type reference_type ;
@ -394,7 +396,7 @@ public:
const pointer_type data = (pointer_type) QthreadsExec::exec_all_reduce_result();
Kokkos::Impl::FunctorFinal< ReducerTypeFwd , WorkTag >::final( ReducerConditional::select(m_functor , m_reducer), data );
Kokkos::Impl::FunctorFinal< ReducerTypeFwd , WorkTagFwd >::final( ReducerConditional::select(m_functor , m_reducer), data );
if ( m_result_ptr ) {
const unsigned n = ValueTraits::value_count( ReducerConditional::select(m_functor , m_reducer) );

View File

@ -125,7 +125,7 @@ namespace Kokkos {
oldval.t = *dest ;
assume.i = oldval.i ;
newval.t = val ;
atomic_compare_exchange( reinterpret_cast<int*>(dest) , assume.i, newval.i );
atomic_compare_exchange( (int*)(dest) , assume.i, newval.i );
return oldval.t ;
}

View File

@ -608,6 +608,7 @@ ROCmInternal::scratch_space( const Kokkos::Experimental::ROCm::size_type size )
void ROCmInternal::finalize()
{
Kokkos::Impl::rocm_device_synchronize();
was_finalized = 1;
if ( 0 != m_scratchSpace || 0 != m_scratchFlags ) {

View File

@ -277,7 +277,7 @@ public:
this->team_barrier();
value = local_value;
}
// Reduce accross a team of threads.
// Reduce across a team of threads.
//
// Each thread has vector_length elements.
// This reduction is for TeamThreadRange operations, where the range
@ -354,6 +354,80 @@ public:
return buffer[0];
}
// Reduce across a team of threads, with a reducer data type
//
// Each thread has vector_length elements.
// This reduction is for TeamThreadRange operations, where the range
// is spread across threads. Effectively, there are vector_length
// independent reduction operations.
// This is different from a reduction across the elements of a thread,
// which reduces every vector element.
template< class ReducerType >
KOKKOS_INLINE_FUNCTION
typename std::enable_if< is_reducer< ReducerType >::value >::type
team_reduce( const ReducerType & reducer) const
{
typedef typename ReducerType::value_type value_type ;
tile_static value_type buffer[512];
const auto local = lindex();
const auto team = team_rank();
auto vector_rank = local%m_vector_length;
auto thread_base = team*m_vector_length;
const std::size_t size = next_pow_2(m_team_size+1)/2;
#if defined(ROCM15)
buffer[local] = reducer.reference();
#else
// ROCM 1.5 handles address spaces better, previous version didn't
lds_for(buffer[local], [&](ValueType& x)
{
x = value;
});
#endif
m_idx.barrier.wait();
for(std::size_t s = 1; s < size; s *= 2)
{
const std::size_t index = 2 * s * team;
if (index < size)
{
#if defined(ROCM15)
reducer.join(buffer[vector_rank+index*m_vector_length],
buffer[vector_rank+(index+s)*m_vector_length]);
#else
lds_for(buffer[vector_rank+index*m_vector_length], [&](ValueType& x)
{
lds_for(buffer[vector_rank+(index+s)*m_vector_length],
[&](ValueType& y)
{
reducer.join(x, y);
});
});
#endif
}
m_idx.barrier.wait();
}
if (local == 0)
{
for(int i=size*m_vector_length; i<m_team_size*m_vector_length; i+=m_vector_length)
#if defined(ROCM15)
reducer.join(buffer[vector_rank], buffer[vector_rank+i]);
#else
lds_for(buffer[vector_rank], [&](ValueType& x)
{
lds_for(buffer[vector_rank+i],
[&](ValueType& y)
{
reducer.join(x, y);
});
});
#endif
}
m_idx.barrier.wait();
}
/** \brief Intra-team vector reduce
* with intra-team non-deterministic ordering accumulation.
@ -406,6 +480,33 @@ public:
return buffer[thread_base];
}
template< typename ReducerType >
KOKKOS_INLINE_FUNCTION static
typename std::enable_if< is_reducer< ReducerType >::value >::type
vector_reduce( ReducerType const & reducer )
{
#ifdef __HCC_ACCELERATOR__
if(blockDim_x == 1) return;
// Intra vector lane shuffle reduction:
typename ReducerType::value_type tmp ( reducer.reference() );
for ( int i = blockDim_x ; ( i >>= 1 ) ; ) {
shfl_down( reducer.reference() , i , blockDim_x );
if ( (int)threadIdx_x < i ) { reducer.join( tmp , reducer.reference() ); }
}
// Broadcast from root lane to all other lanes.
// Cannot use "butterfly" algorithm to avoid the broadcast
// because floating point summation is not associative
// and thus different threads could have different results.
shfl( reducer.reference() , 0 , blockDim_x );
#endif
}
/** \brief Intra-team exclusive prefix sum with team_rank() ordering
* with intra-team non-deterministic ordering accumulation.
*
@ -1075,6 +1176,22 @@ void parallel_reduce(const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::ROC
// Impl::JoinAdd<ValueType>());
}
/** \brief Inter-thread thread range parallel_reduce. Executes lambda(iType i, ValueType & val) for each i=0..N-1.
*
* The range i=0..N-1 is mapped to all threads of the the calling thread team and a summation of
* val is performed and put into result. This functionality requires C++11 support.*/
template< typename iType, class Lambda, typename ReducerType >
KOKKOS_INLINE_FUNCTION
void parallel_reduce(const Impl::TeamThreadRangeBoundariesStruct<iType,Impl::ROCmTeamMember>& loop_boundaries,
const Lambda & lambda, ReducerType const & reducer) {
reducer.init( reducer.reference() );
for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
lambda(i,reducer.reference());
}
loop_boundaries.thread.team_reduce(reducer);
}
/** \brief Intra-thread thread range parallel_reduce. Executes lambda(iType i, ValueType & val) for each i=0..N-1.
*
* The range i=0..N-1 is mapped to all vector lanes of the the calling thread and a reduction of
@ -1161,6 +1278,41 @@ void parallel_reduce(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::R
result = loop_boundaries.thread.thread_reduce(result,join);
}
/** \brief Intra-thread vector parallel_reduce. Executes lambda(iType i, ValueType & val) for each i=0..N-1.
*
* The range i=0..N-1 is mapped to all vector lanes of the the calling thread and a summation of
* val is performed and put into result. This functionality requires C++11 support.*/
template< typename iType, class Lambda, typename ReducerType >
KOKKOS_INLINE_FUNCTION
void parallel_reduce(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::ROCmTeamMember >&
loop_boundaries, const Lambda & lambda, ReducerType const & reducer) {
reducer.init( reducer.reference() );
for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
lambda(i,reducer.reference());
}
loop_boundaries.thread.vector_reduce(reducer);
}
/** \brief Intra-thread vector parallel_reduce. Executes lambda(iType i, ValueType & val) for each i=0..N-1.
*
* The range i=0..N-1 is mapped to all vector lanes of the the calling thread and a reduction of
* val is performed using JoinType(ValueType& val, const ValueType& update) and put into init_result.
* The input value of init_result is used as initializer for temporary variables of ValueType. Therefore
* the input value should be the neutral element with respect to the join operation (e.g. '0 for +-' or
* '1 for *'). This functionality requires C++11 support.*/
template< typename iType, class Lambda, typename ReducerType, class JoinType >
KOKKOS_INLINE_FUNCTION
void parallel_reduce(const Impl::ThreadVectorRangeBoundariesStruct<iType,Impl::ROCmTeamMember >&
loop_boundaries, const Lambda & lambda, const JoinType& join, ReducerType const & reducer) {
for( iType i = loop_boundaries.start; i < loop_boundaries.end; i+=loop_boundaries.increment) {
lambda(i,reducer.reference());
loop_boundaries.thread.team_barrier();
}
reducer.reference() = loop_boundaries.thread.thread_reduce(reducer.reference(),join);
}
/** \brief Intra-thread vector parallel exclusive prefix sum. Executes lambda(iType i, ValueType & val, bool final)
* for each i=0..N-1.
*

View File

@ -102,11 +102,12 @@ void reduce_enqueue(
typedef Kokkos::Impl::if_c< std::is_same<InvalidType,ReducerType>::value, F, ReducerType> ReducerConditional;
typedef typename ReducerConditional::type ReducerTypeFwd;
typedef typename Kokkos::Impl::if_c< std::is_same<InvalidType, ReducerType>::value, Tag, void >::type TagFwd;
typedef Kokkos::Impl::FunctorValueTraits< ReducerTypeFwd , Tag > ValueTraits ;
typedef Kokkos::Impl::FunctorValueInit< ReducerTypeFwd , Tag > ValueInit ;
typedef Kokkos::Impl::FunctorValueJoin< ReducerTypeFwd , Tag > ValueJoin ;
typedef Kokkos::Impl::FunctorFinal< ReducerTypeFwd , Tag > ValueFinal ;
typedef Kokkos::Impl::FunctorValueTraits< ReducerTypeFwd , TagFwd > ValueTraits ;
typedef Kokkos::Impl::FunctorValueInit< ReducerTypeFwd , TagFwd > ValueInit ;
typedef Kokkos::Impl::FunctorValueJoin< ReducerTypeFwd , TagFwd > ValueJoin ;
typedef Kokkos::Impl::FunctorFinal< ReducerTypeFwd , TagFwd > ValueFinal ;
typedef typename ValueTraits::pointer_type pointer_type ;
typedef typename ValueTraits::reference_type reference_type ;

View File

@ -266,7 +266,7 @@ void ThreadsExec::execute_sleep( ThreadsExec & exec , const void * )
const int rank_rev = exec.m_pool_size - ( exec.m_pool_rank + 1 );
for ( int i = 0 ; i < n ; ++i ) {
Impl::spinwait_while_equal( exec.m_pool_base[ rank_rev + (1<<i) ]->m_pool_state , ThreadsExec::Active );
Impl::spinwait_while_equal<int>( exec.m_pool_base[ rank_rev + (1<<i) ]->m_pool_state , ThreadsExec::Active );
}
exec.m_pool_state = ThreadsExec::Inactive ;
@ -310,7 +310,7 @@ void ThreadsExec::fence()
{
if ( s_thread_pool_size[0] ) {
// Wait for the root thread to complete:
Impl::spinwait_while_equal( s_threads_exec[0]->m_pool_state , ThreadsExec::Active );
Impl::spinwait_while_equal<int>( s_threads_exec[0]->m_pool_state , ThreadsExec::Active );
}
s_current_function = 0 ;
@ -716,12 +716,12 @@ void ThreadsExec::initialize( unsigned thread_count ,
}
// Check for over-subscription
//if( Impl::mpi_ranks_per_node() * long(thread_count) > Impl::processors_per_node() ) {
// std::cout << "Kokkos::Threads::initialize WARNING: You are likely oversubscribing your CPU cores." << std::endl;
// std::cout << " Detected: " << Impl::processors_per_node() << " cores per node." << std::endl;
// std::cout << " Detected: " << Impl::mpi_ranks_per_node() << " MPI_ranks per node." << std::endl;
// std::cout << " Requested: " << thread_count << " threads per process." << std::endl;
//}
if( Kokkos::show_warnings() && (Impl::mpi_ranks_per_node() * long(thread_count) > Impl::processors_per_node()) ) {
std::cout << "Kokkos::Threads::initialize WARNING: You are likely oversubscribing your CPU cores." << std::endl;
std::cout << " Detected: " << Impl::processors_per_node() << " cores per node." << std::endl;
std::cout << " Detected: " << Impl::mpi_ranks_per_node() << " MPI_ranks per node." << std::endl;
std::cout << " Requested: " << thread_count << " threads per process." << std::endl;
}
// Init the array for used for arbitrarily sized atomics
Impl::init_lock_array_host_space();

View File

@ -107,7 +107,7 @@ private:
// Which thread am I stealing from currently
int m_current_steal_target;
// This thread's owned work_range
Kokkos::pair<long,long> m_work_range KOKKOS_ALIGN(16);
Kokkos::pair<long,long> m_work_range __attribute__((aligned(16))) ;
// Team Offset if one thread determines work_range for others
long m_team_work_index;
@ -191,13 +191,13 @@ public:
// Fan-in reduction with highest ranking thread as the root
for ( int i = 0 ; i < m_pool_fan_size ; ++i ) {
// Wait: Active -> Rendezvous
Impl::spinwait_while_equal( m_pool_base[ rev_rank + (1<<i) ]->m_pool_state , ThreadsExec::Active );
Impl::spinwait_while_equal<int>( m_pool_base[ rev_rank + (1<<i) ]->m_pool_state , ThreadsExec::Active );
}
if ( rev_rank ) {
m_pool_state = ThreadsExec::Rendezvous ;
// Wait: Rendezvous -> Active
Impl::spinwait_while_equal( m_pool_state , ThreadsExec::Rendezvous );
Impl::spinwait_while_equal<int>( m_pool_state , ThreadsExec::Rendezvous );
}
else {
// Root thread does the reduction and broadcast
@ -233,13 +233,13 @@ public:
// Fan-in reduction with highest ranking thread as the root
for ( int i = 0 ; i < m_pool_fan_size ; ++i ) {
// Wait: Active -> Rendezvous
Impl::spinwait_while_equal( m_pool_base[ rev_rank + (1<<i) ]->m_pool_state , ThreadsExec::Active );
Impl::spinwait_while_equal<int>( m_pool_base[ rev_rank + (1<<i) ]->m_pool_state , ThreadsExec::Active );
}
if ( rev_rank ) {
m_pool_state = ThreadsExec::Rendezvous ;
// Wait: Rendezvous -> Active
Impl::spinwait_while_equal( m_pool_state , ThreadsExec::Rendezvous );
Impl::spinwait_while_equal<int>( m_pool_state , ThreadsExec::Rendezvous );
}
else {
// Root thread does the reduction and broadcast
@ -268,7 +268,7 @@ public:
ThreadsExec & fan = *m_pool_base[ rev_rank + ( 1 << i ) ] ;
Impl::spinwait_while_equal( fan.m_pool_state , ThreadsExec::Active );
Impl::spinwait_while_equal<int>( fan.m_pool_state , ThreadsExec::Active );
Join::join( f , reduce_memory() , fan.reduce_memory() );
}
@ -295,7 +295,7 @@ public:
const int rev_rank = m_pool_size - ( m_pool_rank + 1 );
for ( int i = 0 ; i < m_pool_fan_size ; ++i ) {
Impl::spinwait_while_equal( m_pool_base[rev_rank+(1<<i)]->m_pool_state , ThreadsExec::Active );
Impl::spinwait_while_equal<int>( m_pool_base[rev_rank+(1<<i)]->m_pool_state , ThreadsExec::Active );
}
}
@ -327,7 +327,7 @@ public:
ThreadsExec & fan = *m_pool_base[ rev_rank + (1<<i) ];
// Wait: Active -> ReductionAvailable (or ScanAvailable)
Impl::spinwait_while_equal( fan.m_pool_state , ThreadsExec::Active );
Impl::spinwait_while_equal<int>( fan.m_pool_state , ThreadsExec::Active );
Join::join( f , work_value , fan.reduce_memory() );
}
@ -345,8 +345,8 @@ public:
// Wait: Active -> ReductionAvailable
// Wait: ReductionAvailable -> ScanAvailable
Impl::spinwait_while_equal( th.m_pool_state , ThreadsExec::Active );
Impl::spinwait_while_equal( th.m_pool_state , ThreadsExec::ReductionAvailable );
Impl::spinwait_while_equal<int>( th.m_pool_state , ThreadsExec::Active );
Impl::spinwait_while_equal<int>( th.m_pool_state , ThreadsExec::ReductionAvailable );
Join::join( f , work_value + count , ((scalar_type *)th.reduce_memory()) + count );
}
@ -357,7 +357,7 @@ public:
// Wait for all threads to complete inclusive scan
// Wait: ScanAvailable -> Rendezvous
Impl::spinwait_while_equal( m_pool_state , ThreadsExec::ScanAvailable );
Impl::spinwait_while_equal<int>( m_pool_state , ThreadsExec::ScanAvailable );
}
//--------------------------------
@ -365,7 +365,7 @@ public:
for ( int i = 0 ; i < m_pool_fan_size ; ++i ) {
ThreadsExec & fan = *m_pool_base[ rev_rank + (1<<i) ];
// Wait: ReductionAvailable -> ScanAvailable
Impl::spinwait_while_equal( fan.m_pool_state , ThreadsExec::ReductionAvailable );
Impl::spinwait_while_equal<int>( fan.m_pool_state , ThreadsExec::ReductionAvailable );
// Set: ScanAvailable -> Rendezvous
fan.m_pool_state = ThreadsExec::Rendezvous ;
}
@ -392,13 +392,13 @@ public:
// Wait for all threads to copy previous thread's inclusive scan value
// Wait for all threads: Rendezvous -> ScanCompleted
for ( int i = 0 ; i < m_pool_fan_size ; ++i ) {
Impl::spinwait_while_equal( m_pool_base[ rev_rank + (1<<i) ]->m_pool_state , ThreadsExec::Rendezvous );
Impl::spinwait_while_equal<int>( m_pool_base[ rev_rank + (1<<i) ]->m_pool_state , ThreadsExec::Rendezvous );
}
if ( rev_rank ) {
// Set: ScanAvailable -> ScanCompleted
m_pool_state = ThreadsExec::ScanCompleted ;
// Wait: ScanCompleted -> Active
Impl::spinwait_while_equal( m_pool_state , ThreadsExec::ScanCompleted );
Impl::spinwait_while_equal<int>( m_pool_state , ThreadsExec::ScanCompleted );
}
// Set: ScanCompleted -> Active
for ( int i = 0 ; i < m_pool_fan_size ; ++i ) {
@ -425,7 +425,7 @@ public:
// Fan-in reduction with highest ranking thread as the root
for ( int i = 0 ; i < m_pool_fan_size ; ++i ) {
// Wait: Active -> Rendezvous
Impl::spinwait_while_equal( m_pool_base[ rev_rank + (1<<i) ]->m_pool_state , ThreadsExec::Active );
Impl::spinwait_while_equal<int>( m_pool_base[ rev_rank + (1<<i) ]->m_pool_state , ThreadsExec::Active );
}
for ( unsigned i = 0 ; i < count ; ++i ) { work_value[i+count] = work_value[i]; }
@ -433,7 +433,7 @@ public:
if ( rev_rank ) {
m_pool_state = ThreadsExec::Rendezvous ;
// Wait: Rendezvous -> Active
Impl::spinwait_while_equal( m_pool_state , ThreadsExec::Rendezvous );
Impl::spinwait_while_equal<int>( m_pool_state , ThreadsExec::Rendezvous );
}
else {
// Root thread does the thread-scan before releasing threads

View File

@ -107,13 +107,13 @@ public:
// Wait for fan-in threads
for ( n = 1 ; ( ! ( m_team_rank_rev & n ) ) && ( ( j = m_team_rank_rev + n ) < m_team_size ) ; n <<= 1 ) {
Impl::spinwait_while_equal( m_team_base[j]->state() , ThreadsExec::Active );
Impl::spinwait_while_equal<int>( m_team_base[j]->state() , ThreadsExec::Active );
}
// If not root then wait for release
if ( m_team_rank_rev ) {
m_exec->state() = ThreadsExec::Rendezvous ;
Impl::spinwait_while_equal( m_exec->state() , ThreadsExec::Rendezvous );
Impl::spinwait_while_equal<int>( m_exec->state() , ThreadsExec::Rendezvous );
}
return ! m_team_rank_rev ;

View File

@ -180,12 +180,12 @@ public:
// MDRangePolicy impl
template< class FunctorType , class ... Traits >
class ParallelFor< FunctorType
, Kokkos::Experimental::MDRangePolicy< Traits ... >
, Kokkos::MDRangePolicy< Traits ... >
, Kokkos::Threads
>
{
private:
typedef Kokkos::Experimental::MDRangePolicy< Traits ... > MDRangePolicy ;
typedef Kokkos::MDRangePolicy< Traits ... > MDRangePolicy ;
typedef typename MDRangePolicy::impl_range_policy Policy ;
typedef typename MDRangePolicy::work_tag WorkTag ;
@ -193,7 +193,7 @@ private:
typedef typename Policy::WorkRange WorkRange ;
typedef typename Policy::member_type Member ;
typedef typename Kokkos::Experimental::Impl::HostIterateTile< MDRangePolicy, FunctorType, typename MDRangePolicy::work_tag, void > iterate_type;
typedef typename Kokkos::Impl::HostIterateTile< MDRangePolicy, FunctorType, typename MDRangePolicy::work_tag, void > iterate_type;
const FunctorType m_functor ;
const MDRangePolicy m_mdr_policy ;
@ -396,9 +396,10 @@ private:
typedef Kokkos::Impl::if_c< std::is_same<InvalidType,ReducerType>::value, FunctorType, ReducerType> ReducerConditional;
typedef typename ReducerConditional::type ReducerTypeFwd;
typedef typename Kokkos::Impl::if_c< std::is_same<InvalidType,ReducerType>::value, WorkTag, void>::type WorkTagFwd;
typedef Kokkos::Impl::FunctorValueTraits< ReducerTypeFwd, WorkTag > ValueTraits ;
typedef Kokkos::Impl::FunctorValueInit< ReducerTypeFwd, WorkTag > ValueInit ;
typedef Kokkos::Impl::FunctorValueTraits< ReducerTypeFwd , WorkTagFwd > ValueTraits ;
typedef Kokkos::Impl::FunctorValueInit< ReducerTypeFwd , WorkTagFwd > ValueInit ;
typedef typename ValueTraits::pointer_type pointer_type ;
typedef typename ValueTraits::reference_type reference_type ;
@ -458,7 +459,7 @@ private:
( self.m_functor , range.begin() , range.end()
, ValueInit::init( ReducerConditional::select(self.m_functor , self.m_reducer) , exec.reduce_memory() ) );
exec.template fan_in_reduce< ReducerTypeFwd , WorkTag >( ReducerConditional::select(self.m_functor , self.m_reducer) );
exec.template fan_in_reduce< ReducerTypeFwd , WorkTagFwd >( ReducerConditional::select(self.m_functor , self.m_reducer) );
}
template<class Schedule>
@ -484,7 +485,7 @@ private:
work_index = exec.get_work_index();
}
exec.template fan_in_reduce< ReducerTypeFwd , WorkTag >( ReducerConditional::select(self.m_functor , self.m_reducer) );
exec.template fan_in_reduce< ReducerTypeFwd , WorkTagFwd >( ReducerConditional::select(self.m_functor , self.m_reducer) );
}
public:
@ -548,14 +549,14 @@ public:
// MDRangePolicy impl
template< class FunctorType , class ReducerType, class ... Traits >
class ParallelReduce< FunctorType
, Kokkos::Experimental::MDRangePolicy< Traits ... >
, Kokkos::MDRangePolicy< Traits ... >
, ReducerType
, Kokkos::Threads
>
{
private:
typedef Kokkos::Experimental::MDRangePolicy< Traits ... > MDRangePolicy ;
typedef Kokkos::MDRangePolicy< Traits ... > MDRangePolicy ;
typedef typename MDRangePolicy::impl_range_policy Policy ;
typedef typename MDRangePolicy::work_tag WorkTag ;
@ -564,16 +565,17 @@ private:
typedef Kokkos::Impl::if_c< std::is_same<InvalidType,ReducerType>::value, FunctorType, ReducerType> ReducerConditional;
typedef typename ReducerConditional::type ReducerTypeFwd;
typedef typename Kokkos::Impl::if_c< std::is_same<InvalidType,ReducerType>::value, WorkTag, void>::type WorkTagFwd;
typedef typename ReducerTypeFwd::value_type ValueType;
typedef Kokkos::Impl::FunctorValueTraits< ReducerTypeFwd, WorkTag > ValueTraits ;
typedef Kokkos::Impl::FunctorValueInit< ReducerTypeFwd, WorkTag > ValueInit ;
typedef Kokkos::Impl::FunctorValueTraits< ReducerTypeFwd , WorkTagFwd > ValueTraits ;
typedef Kokkos::Impl::FunctorValueInit< ReducerTypeFwd , WorkTagFwd > ValueInit ;
typedef typename ValueTraits::pointer_type pointer_type ;
typedef typename ValueTraits::reference_type reference_type ;
using iterate_type = typename Kokkos::Experimental::Impl::HostIterateTile< MDRangePolicy
using iterate_type = typename Kokkos::Impl::HostIterateTile< MDRangePolicy
, FunctorType
, WorkTag
, ValueType
@ -618,7 +620,7 @@ private:
( self.m_mdr_policy, self.m_functor , range.begin() , range.end()
, ValueInit::init( ReducerConditional::select(self.m_functor , self.m_reducer) , exec.reduce_memory() ) );
exec.template fan_in_reduce< ReducerTypeFwd , WorkTag >( ReducerConditional::select(self.m_functor , self.m_reducer) );
exec.template fan_in_reduce< ReducerTypeFwd , WorkTagFwd >( ReducerConditional::select(self.m_functor , self.m_reducer) );
}
template<class Schedule>
@ -644,7 +646,7 @@ private:
work_index = exec.get_work_index();
}
exec.template fan_in_reduce< ReducerTypeFwd , WorkTag >( ReducerConditional::select(self.m_functor , self.m_reducer) );
exec.template fan_in_reduce< ReducerTypeFwd , WorkTagFwd >( ReducerConditional::select(self.m_functor , self.m_reducer) );
}
public:
@ -725,9 +727,10 @@ private:
typedef Kokkos::Impl::if_c< std::is_same<InvalidType,ReducerType>::value, FunctorType, ReducerType> ReducerConditional;
typedef typename ReducerConditional::type ReducerTypeFwd;
typedef typename Kokkos::Impl::if_c< std::is_same<InvalidType,ReducerType>::value, WorkTag, void>::type WorkTagFwd;
typedef Kokkos::Impl::FunctorValueTraits< ReducerTypeFwd, WorkTag > ValueTraits ;
typedef Kokkos::Impl::FunctorValueInit< ReducerTypeFwd, WorkTag > ValueInit ;
typedef Kokkos::Impl::FunctorValueTraits< ReducerTypeFwd , WorkTagFwd > ValueTraits ;
typedef Kokkos::Impl::FunctorValueInit< ReducerTypeFwd , WorkTagFwd > ValueInit ;
typedef typename ValueTraits::pointer_type pointer_type ;
typedef typename ValueTraits::reference_type reference_type ;
@ -767,7 +770,7 @@ private:
( self.m_functor , Member( & exec , self.m_policy , self.m_shared )
, ValueInit::init( ReducerConditional::select(self.m_functor , self.m_reducer) , exec.reduce_memory() ) );
exec.template fan_in_reduce< ReducerTypeFwd , WorkTag >( ReducerConditional::select(self.m_functor , self.m_reducer) );
exec.template fan_in_reduce< ReducerTypeFwd , WorkTagFwd >( ReducerConditional::select(self.m_functor , self.m_reducer) );
}
public:

Some files were not shown because too many files have changed in this diff Show More