Commit Graph

747 Commits

Author SHA1 Message Date
Jonas Hahnfeld 9228f9718c [libomptarget-nvptx-bc] Pass found CUDA installations
We already know where the CUDA SDK is, so there is no point in
letting Clang search for it again and possibly finding no or
a different installation.

--cuda-path is supported since the beginning of CUDA support in
Clang, so making this required doesn't impose additional restrictions.

Differential Revision: https://reviews.llvm.org/D46930

llvm-svn: 332495
2018-05-16 17:20:27 +00:00
Jonas Hahnfeld 37bbe1a698 [libomptarget-nvptx] Test bitcode compiler flags and enable by default
Move all logic related to selecting the bitcode compiler and linker
into a new file and dynamically test required compiler flags. This
also adds -fcuda-rdc for Clang trunk as previously attempted in D44992
which fixes the build.

As a result this change also enables building the library by default
if all prerequisites are met.

Differential Revision: https://reviews.llvm.org/D46901

llvm-svn: 332494
2018-05-16 17:20:21 +00:00
Gheorghe-Teodor Bercea 787a350021 [OpenMP][libomptarget] Add function for checking SPMD mode
Summary: Add function to the NVPTX libomptarget library that will return true if the current target region is being executed in SPMD mode.

Reviewers: ABataev, grokos, carlo.bertolli, caomhin

Reviewed By: grokos

Subscribers: guansong, openmp-commits

Differential Revision: https://reviews.llvm.org/D46840

llvm-svn: 332360
2018-05-15 15:16:43 +00:00
Joachim Protze 9be9cf20bf [OMPT] Fix thread_num for implicit_task_end callbacks in nested parallel regions
implicit_task_end callbacks in nested parallel regions did not always give the
correct thread_num, since the inner parallel region may have already been
finalized.
Now, the thread_num is stored at the beginning of the implicit task and
retrieved at the end, whenever necessary.

A testcase was added as well.

Differential Revision: https://reviews.llvm.org/D46260

llvm-svn: 331632
2018-05-07 12:42:21 +00:00
Joachim Protze 8fc39f6b19 [OMPT] Add api_calls_misc.c testcase and rename api_calls.c testcase
The api_calls_misc.c testcase tests the following api calls:

ompt_get_callback()
ompt_get_state()
ompt_enumerate_states()
ompt_enumerate_mutex_impls()
These have not been tested previously.

The api_calls.c testcase has been renamed to api_calls_places.c because it only tests api calls that are related to places.

Differential Revision: https://reviews.llvm.org/D42523

llvm-svn: 331631
2018-05-07 12:42:15 +00:00
Guansong Zhang e1c7a46d5b [OpenMP] Use LIBOMPTARGET_DEVICE_RTL_DEBUG env var to control debug messages on the device side
Summary:
Enable the device side debug messages at compile time, use env var to control at runtime.

To achieve this, an environment data block is passed to the device lib when it is loaded.

By default, the message is off, to enable it, a user need to set LIBOMPDEVICE_DEBUG=1.

Reviewers: grokos

Reviewed By: grokos

Subscribers: openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D46210

llvm-svn: 331550
2018-05-04 19:29:28 +00:00
Jonathan Peyton d47df260ba [OpenMP][OMPT] Fix api_calls_from_other_thread.cpp
Removed environment setting in RUN: line that was being ignored anyways.
Changed a few specific checks to "any number"

llvm-svn: 331212
2018-04-30 18:46:31 +00:00
Guansong Zhang ad6c26516b [OpenMP] Remove compilation warning when using clang to compile bc files.
Summary: Minor printf format correction. NVCC ignore those. Clang will give warning on these if debug is enabled.

Reviewers: grokos

Reviewed By: grokos

Subscribers: openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D45528

llvm-svn: 330944
2018-04-26 14:06:53 +00:00
Guansong Zhang 334c379e32 [OpenMP] Make bc file compilation sensitive to LIBOMPTARGET_NVPTX_DEBUG flag
Summary: The LIBOMPTARGET_NVPTX_DEBUG flag is inconsistent between using nvcc to generate .a file and clang to generate .bc file. Sync the two setting so we can get debug messages from the bc file path as well.

Reviewers: grokos

Subscribers: Hahnfeld, openmp-commits, mgorny

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D45530

llvm-svn: 330477
2018-04-20 20:41:00 +00:00
Heejin Ahn f78a493528 [OpenMP] Compilation error fix on const char*
Summary:
This line
(0ed912c7a7/runtime/src/kmp_gsupport.cpp (L1459))
added in D45327 (rL330282) causes a compilation failure.

Reviewers: jlpeyton

Subscribers: guansong, openmp-commits

Differential Revision: https://reviews.llvm.org/D45786

llvm-svn: 330299
2018-04-18 22:23:31 +00:00
Jonathan Peyton 1482db9e03 [OpenMP] Fix affinity API for KMP_AFFINITY=none|compact|scatter
Currently, the affinity API reports garbage for the initial place list and any
thread's place lists when using KMP_AFFINITY=none|compact|scatter.
This patch does two things:

for KMP_AFFINITY=none, Creates a one entry table for the places, this way, the
initial place list is just a single place with all the proc ids in it. We also
set the initial place of any thread to 0 instead of KMP_PLACE_ALL so that the
thread reports that single place (place 0) instead of garbage (-1) when using
the affinity API.

When non-OMP_PROC_BIND affinity is used
(including KMP_AFFINITY=compact|scatter), a thread's place list is populated
correctly. We assume that each thread is assigned to a single place. This is
implemented in two of the affinity API functions

Differential Revision: https://reviews.llvm.org/D45527

llvm-svn: 330283
2018-04-18 19:25:48 +00:00
Jonathan Peyton 27a677fc95 Introduce GOMP_taskloop API
This patch introduces GOMP_taskloop to our API. It adds GOMP_4.5 to our
version symbols. Being a wrapper around __kmpc_taskloop, the function
creates a task with the loop bounds properly nested in the shareds so that
the GOMP task thunk will work properly. Also, the firstprivate copy constructors
are properly handled using the __kmp_gomp_task_dup() auxiliary function.

Currently, only linear spawning of tasks is supported
for the GOMP_taskloop interface.

Differential Revision: https://reviews.llvm.org/D45327

llvm-svn: 330282
2018-04-18 19:23:54 +00:00
Joachim Protze 3865c69b84 Set the license header for all OMPT files
llvm-svn: 329928
2018-04-12 17:23:26 +00:00
Guansong Zhang f679431f91 [OpenMP] Remove extra warning when we build
Summary:
This one line change is to remove this warning message

"warning: integer conversion resulted in a change of sign"

Reviewers: grokos

Reviewed By: grokos

Subscribers: openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D45415

llvm-svn: 329713
2018-04-10 15:28:31 +00:00
Guansong Zhang f0029a7738 Revert "[OpenMP] enable bc file compilation using the latest clang"
This reverts commit 6849e31c36d712d97433bca9af39b7a09c8c1207.

llvm-svn: 329576
2018-04-09 14:45:41 +00:00
Guansong Zhang e47fbc9da8 [OpenMP] enable bc file compilation using the latest clang
Summary: adding cuda-rdc flag to allow extern global data

Reviewers: grokos

Reviewed By: grokos

Subscribers: gregrodgers, mgorny, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D44992

llvm-svn: 329072
2018-04-03 15:01:34 +00:00
Jonathan Peyton 1e6bb8d5de Minor cleanup in __kmp_atfork_child()
This change removes the unnecessary lock operation on __kmp_initz_lock inside
the __kmp_atfork_child() function for Linux; the lock variable is initialized
in the same function later.

Patch by Hansang Bae

Differential Revision: https://reviews.llvm.org/D44949

llvm-svn: 328900
2018-03-30 19:55:11 +00:00
Jonathan Peyton ea82c769f4 Move blocktime_str variable right before its first use
llvm-svn: 328575
2018-03-26 19:20:50 +00:00
Jonathan Peyton b6b79ac95b Add summarizeStats.py to tools directory
The summarizeStats.py script processes raw data provided by the
instrumented (stats-gathering) OpenMP* runtime library. It provides:

1) A radar chart which plots counters as frequency (per GigaTick) of use within
   the program. The frequencies are plotted as log10, however values less than
   one are kept as it is and represented in red color. This was done to help
   visualize the differences better.
2) Pie charts separating total time as compute and non-compute. The compute and
   non-compute times have their own pie charts showing the constructs that
   contributed to them. The percentages listed are with respect to the total
   time.
3) '.csv' file with percentage of time spent within the different constructs.

The script can be used as:
$ python $PATH_TO_SCRIPT/summarizeStats.py instrumented1.csv instrumented2.csv

Patch by Taru Doodi

Differential Revision: https://reviews.llvm.org/D41838

llvm-svn: 328568
2018-03-26 18:44:48 +00:00
Andrey Churbanov 2d91a8a3ba Fixed __kmpc_get_target_offload() to call library initialization.
Differential Revision: https://reviews.llvm.org/D44793

llvm-svn: 328228
2018-03-22 18:51:51 +00:00
Gheorghe-Teodor Bercea 4bc36a06e2 [OpenMP][libomptarget] Initialize global memory stack only once.
Summary: The global stack initialization function may be called multiple times. The initialization of the shared memory slots should only happen when the function is called for the first time for a given warp master thread.

Reviewers: grokos, carlo.bertolli, ABataev, caomhin

Reviewed By: grokos

Subscribers: guansong, openmp-commits

Differential Revision: https://reviews.llvm.org/D44754

llvm-svn: 328148
2018-03-21 21:02:55 +00:00
Gheorghe-Teodor Bercea b4332ca3da [OpenMP][libomptarget] Fix master warp check
Summary: The check for the master warp must take into consideration the actual number of warps: the master warp is equal to the last active warp not necessarily WARPSIZE - 1.

Reviewers: grokos, carlo.bertolli, ABataev, caomhin

Reviewed By: grokos

Subscribers: guansong, openmp-commits

Differential Revision: https://reviews.llvm.org/D44537

llvm-svn: 328146
2018-03-21 20:51:16 +00:00
Gheorghe-Teodor Bercea c8d395a168 [OpenMP][libomptarget] Enable globalization for workers
Summary:
This patch allows worker to have a global memory stack managed by the runtime. This patch is needed for completeness and consistency with the globalization policy: if a worker-side variable escapes the current context it then needs to be globalized.
Until now, only the master thread was allowed to have such a stack. These global values can now potentially be shared amongst workers if the semantics of the OpenMP program require it.

Reviewers: ABataev, grokos, carlo.bertolli, caomhin

Reviewed By: grokos

Subscribers: guansong, openmp-commits

Differential Revision: https://reviews.llvm.org/D44487

llvm-svn: 328144
2018-03-21 20:34:19 +00:00
Jonathan Peyton 78f977fcd1 Read OMP_TARGET_OFFLOAD and provide API to access ICV
Added settings code to read OMP_TARGET_OFFLOAD environment variable. Added
target-offload-var ICV as __kmp_target_offload, set via OMP_TARGET_OFFLOAD,
if available, otherwise defaulting to DEFAULT. Valid values for the ICV are
specified as enum values {0,1,2} for disabled, default, and mandatory. An
internal API access function __kmpc_get_target_offload is provided.

Patch by Terry Wilmarth

Differential Revision: https://reviews.llvm.org/D44577

llvm-svn: 328046
2018-03-20 21:18:17 +00:00
Andrey Churbanov 3336aa0d07 Fix for Fix for https://bugs.llvm.org/show_bug.cgi?id=36705.
Differential Revision: https://reviews.llvm.org/D44637

llvm-svn: 327875
2018-03-19 18:05:15 +00:00
George Rokos 6b9bb5e1c2 Bugfix, extern declarations for libomp functions are `extern "C"` declarations
llvm-svn: 327763
2018-03-17 02:07:42 +00:00
George Rokos 2878c3957b Moved extern declarations to private header file, they are only used from within libomptarget, they don't need to be in omptarget.h.
llvm-svn: 327740
2018-03-16 20:40:09 +00:00
Gheorghe-Teodor Bercea 876c1ed2e5 [OpenMP][libomptarget] Enable usage of shared memory slots
Summary:
Allow the runtime to use the existing shared memory statically allocated slots.

When a variable is globalized, the underlying memory can be either shared or global memory (both have block-wide visibility). In this case, we allow that the storage to use a limited amount of shared memory that has been statically allocated already. Only if shared memory doesn't prove to be enough do we then invoke malloc() to create a new global memory slot.

Reviewers: ABataev, carlo.bertolli, grokos, caomhin

Reviewed By: grokos

Subscribers: guansong, openmp-commits

Differential Revision: https://reviews.llvm.org/D44486

llvm-svn: 327639
2018-03-15 16:05:34 +00:00
Gheorghe-Teodor Bercea f3de222b0d [OpenMP][libomptarget] Enable multiple frames per global memory slot
Summary: To save on calls to malloc, this patch enables the re-use of pre-allocated global memory slots.

Reviewers: ABataev, grokos, carlo.bertolli, caomhin

Reviewed By: grokos

Subscribers: guansong, openmp-commits

Differential Revision: https://reviews.llvm.org/D44470

llvm-svn: 327637
2018-03-15 15:56:04 +00:00
George Rokos 59be4b434f [libomptarget][nvptx] Bug fix: Correctly identify the warp master active thread.
llvm-svn: 327556
2018-03-14 19:11:36 +00:00
Gheorghe-Teodor Bercea 49b62649cf [OpenMP][libomptarget] Add global memory data sharing support for master-worker sharing.
Summary:
This patch adds support for the sharing of variables from the master thread of a team to the worker threads of the team.
The runtime uses a stack structure implemented as a doubly-linked list of slots with each slot having the exact same size as the size requested. This implementation leverages existing data structures. The runtime functions are added as separate functions to avoid interfering with the current interface. 

Limitations to be addressed in future patches:
- This current patch only employs global memory. In a future patch we will enable to usage for shared memory as an optimization.
- Allow the allocation of several requested sizes in the same slot.

Reviewers: ABataev, grokos, caomhin, carlo.bertolli

Reviewed By: grokos

Subscribers: Hahnfeld, guansong, openmp-commits

Differential Revision: https://reviews.llvm.org/D44260

llvm-svn: 327440
2018-03-13 19:44:53 +00:00
Sylvestre Ledru 1c861c582f fix a typo on the website
llvm-svn: 327237
2018-03-11 10:53:40 +00:00
Gheorghe-Teodor Bercea d5e5992f9a [OpenMP][libomptarget] Fix union.
Summary: To make the two parts of the union have the same size, the size of vect needs to be increased by 16 bits.

Reviewers: grokos, carlo.bertolli, caomhin, ABataev

Reviewed By: grokos, ABataev

Subscribers: fedor.sergeev, guansong, openmp-commits

Differential Revision: https://reviews.llvm.org/D44254

llvm-svn: 327040
2018-03-08 18:44:02 +00:00
Gheorghe-Teodor Bercea 7a5fa21ae2 [OpenMP] Remove implicit data sharing using device shared memory from libomptarget
Summary:
This patch reverts the changes to libomptarget that were coupled with the changes to Clang code gen for data sharing using shared memory. A similar patch exists for Clang: D43625

Shared memory is meant to be used as an optimization on top of a more general scheme. So far we didn't have a global memory implementation ready so shared memory was a solution which applied to the current level of OpenMP complexity supported by trunk on GPU devices (due to the missing NVPTX backend patch this functionality has never been exercised). Now that we have a global memory solution this patch is "in the way" and needs to be removed (for now). This patch (or an equivalent version of it) will be put out for review once the global memory scheme is in place.


Reviewers: ABataev, grokos, carlo.bertolli, caomhin

Reviewed By: grokos

Subscribers: Hahnfeld, guansong, openmp-commits

Differential Revision: https://reviews.llvm.org/D43626

llvm-svn: 326950
2018-03-07 22:10:10 +00:00
Andrey Churbanov 9e9333aa8a Improve OpenMP threadprivate implementation.
Patch by Terry Wilmarth

Differential Revision: https://reviews.llvm.org/D41914

llvm-svn: 326733
2018-03-05 18:42:01 +00:00
Andrey Churbanov 75bc70fb56 Fixed build of the OpenMP stubs library.
Differential Revision: https://reviews.llvm.org/D44019

llvm-svn: 326728
2018-03-05 18:01:47 +00:00
Jonas Hahnfeld b0f051ae63 [OMPT] Fix interoperability test with GCC
We have to ensure that the runtime is initialized _before_ waiting
for the two started threads to guarantee that the master threads
post their ompt_event_thread_begin before the worker threads. This
is not guaranteed in the parallel region where one worker thread
could start before the other master thread has invoked the callback.

The problem did not happen with Clang becauses the generated code
calls __kmpc_global_thread_num() and cashes its result for functions
that contain OpenMP pragmas.

Differential Revision: https://reviews.llvm.org/D43882

llvm-svn: 326435
2018-03-01 14:03:18 +00:00
Joachim Protze f5aebc27ad [OMPT] Fix task-type test with GCC
This is similar to D43882. The runtime needs to be initialized before calling print_ids(0)

http://lab.llvm.org:8011/builders/openmp-gcc-x86_64-linux-debian/builds/60

Differential Revision: https://reviews.llvm.org/D43897

llvm-svn: 326428
2018-03-01 11:26:15 +00:00
Joachim Protze aa2022e74f [OMPT] Fix ompt_get_task_info() and add tests for it
The thread_num parameter of ompt_get_task_info() was not being used previously,
but need to be set.

The print_task_type() function (form the task-types.c testcase) was merged into
the print_ids() function (in callback.h). Testing of ompt_get_task_info() was
added to the task-types.c testcase. It was not tested extensively previously.

Differential Revision: https://reviews.llvm.org/D42472

llvm-svn: 326338
2018-02-28 17:36:18 +00:00
Joachim Protze 4df80bda40 [OMPT] Fix inconsistent testcases
The main change of this patch is to insert {{.*}} in current_address=[[RETURN_ADDRESS_END]].
This is needed to match any of the alternatively printed addresses.

Additionally, clang-format is applied to the two tests.

Differential Revision: https://reviews.llvm.org/D43115

llvm-svn: 326312
2018-02-28 09:28:51 +00:00
Jonas Hahnfeld 82768d0ba1 [OMPT] Fix parallel_data in implicit barrier-end
This is required to be NULL for implicit barriers at the end of a
parallel region. Noticed in review of D43191.

Differential Revision: https://reviews.llvm.org/D43308

llvm-svn: 325922
2018-02-23 16:46:25 +00:00
Jonas Hahnfeld 5e44069857 [OMPT] Fix test tasks/serialized.c with optimization
The compiler inlines the user code in the task. Check for that case at
runtime by comparing the frame addresses and print the expected exit
address.

Also showcase how I think the OMPT tests could be reformatted to match
LLVM's code style. In my opinion it would be great to that kind of change
to all tests that need to be touched for whatever reason...

Differential Revision: https://reviews.llvm.org/D43191

llvm-svn: 325921
2018-02-23 16:46:11 +00:00
Joachim Protze b0e4f87fb0 [OMPT] Omissionin in OMPT Formatting
Applying clang-format to the /runtime/src/ folder

Differential Revision: https://reviews.llvm.org/D42169

llvm-svn: 325424
2018-02-17 09:54:10 +00:00
Joachim Protze 33db70d2d7 [OMPT] Add interoperability testcase
Test whether OMPT-callbacks for two threads that initiate a parallel region are correct.

Differential Revision: https://reviews.llvm.org/D41942

llvm-svn: 325423
2018-02-17 09:40:08 +00:00
Joachim Protze 76899b84fe [OMPT] Update api_calls testcase
Only use ompt_ functions when testing OMPT in api_calls testcase.
Add size parameter to print_list.
Fix small bug in implementation of ompt_get_partition_place_nums(): return correct length.

Differential Revision: https://reviews.llvm.org/D42162

llvm-svn: 325422
2018-02-17 09:40:02 +00:00
Jonas Hahnfeld 6f9e25d382 [CMake] Add -fno-experimental-isel for testing
GlobalISel doesn't yet implement blockaddress and falls back to
SelectionDAG. This results in additional branch instruction to
the next basic block which breaks the OMPT tests.
Disable GlobalISel for now when compiling the tests because fixing
them is not easily possible. See http://llvm.org/PR36313 for full
discussion history.

Differential Revision: https://reviews.llvm.org/D43195

llvm-svn: 325218
2018-02-15 08:10:22 +00:00
Jonas Hahnfeld cc6d29d72c [OMPT][test] Correct warning about added wrapper functions
This affects all outlined functions, not just tasks! Only show warning
when using Clang 5.0 or later.

Differential Revision: https://reviews.llvm.org/D43190

llvm-svn: 325131
2018-02-14 15:15:24 +00:00
Gheorghe-Teodor Bercea d5ae4e6501 [OpenMP][libomptarget] Enable the compilation of multiple bc libraries for runtime inlining
Summary:
Different NVIDIA GPUs support different compute capabilities. To enable the inlining of runtime functions and the best performance on different generations of NVIDIA GPUs, a bc library for each compute capability needs to be compiled. The same compiler build will then be usable in conjunction with multiple generations of NVIDIA GPUs.
To differentiate between versions of the same bc lib, the output file name will contain the compute capability ID.
Depends on D14254

Reviewers: Hahnfeld, hfinkel, carlo.bertolli, caomhin, ABataev, grokos

Reviewed By: Hahnfeld, grokos

Subscribers: guansong, mgorny, openmp-commits

Differential Revision: https://reviews.llvm.org/D41724

llvm-svn: 324904
2018-02-12 16:45:20 +00:00
Jonas Hahnfeld 3cfaf3dd0d [libomptarget] Fix detection of CUDA stubs library
CUDA_LIBRARIES contains additional linker arguments since CMake 3.3
which breakes the current way of finding the stubs library.

llvm-svn: 324879
2018-02-12 11:01:56 +00:00
Joachim Protze cfc98c2493 [OMPT] Add tool_available_search testcase
Tests the search for tools as defined in the spec. The OMP_TOOL_LIBRARIES
environment variable contains paths to the following files(in that order)

-to a nonexisting file
-to a shared library that does not have a ompt_start_tool function
-to a shared library that has an ompt_start_tool implementation returning NULL
-to a shared library that has an ompt_start_tool implementation returning a
    pointer to a valid instance of ompt_start_tool_result_t

The expected result is that the last tool gets active and can print in the
thread-begin callback.

Differential Revision: https://reviews.llvm.org/D42166

llvm-svn: 324588
2018-02-08 10:04:33 +00:00