llvm-project

Commit Graph

Author	SHA1	Message	Date
Joachim Protze	cdaefac5bd	[OMPT] Fix OMPT callbacks for the taskloop construct and add testcase Fix the order of callbacks related to the taskloop construct. Add the iteration_count to work callbacks (according to the spec). Use kmpc_omp_task() instead of kmp_omp_task() to include OMPT callbacks. Add a testcase. Patch by Simon Convent Reviewed by: protze.joachim, hbae Subscribers: openmp-commits Differential Revision: https://reviews.llvm.org/D47709 llvm-svn: 338146	2018-07-27 18:13:24 +00:00
Joachim Protze	86ed6aa668	[OMPT] Adapt OMPT callbacks for tasks to handle untied tasks correctly The ompt/tasks/task_types.c testcase did not test untied tasks properly. Now, frame addresses are tested and two scheduling points are added at which the task can switch to another thread. Due to scheduling effects, the frame address could be NULL. This needed a restructure of the way OMPT callbacks are called. __ompt_task_finish() now as an extra parameter, whether a task is completed. Its invocation has been moved into __kmp_task_finish(). Thus, the order of the writes to the frame addresses is not subject to scheduling effects anymore. Patch by Simon Convent Reviewed by: protze.joachim, hbae Subscribers: openmp-commits Differential Revision: https://reviews.llvm.org/D49181 llvm-svn: 338145	2018-07-27 18:13:20 +00:00
Joachim Protze	f203109edb	[OMPT] Print two more addresses in print_fuzzy_address_block() The two more outputs are needed to match the return addresses when using the Intel Compiler, as it generates more instructions between the fuzzy-printing of the address and the runtime call. Patch by Simon Convent Reviewed By: protze.joachim, hbae Differential Revision: https://reviews.llvm.org/D49373 llvm-svn: 338144	2018-07-27 18:13:15 +00:00
Jonas Hahnfeld	3a0e9b37f3	PR30734: Remove __kmp_ft_page_allocate() This function was not enabled by default and not exported when manually tweaking the build flags. Additionally it was hard to use since there is no corresponding __kmp_ft_page_free(). The code itself is questionable because the returned memory address is padded by an extra pointer which stores the unpadded start of the allocated region (this would need to be freed). Differential Revision: https://reviews.llvm.org/D49802 llvm-svn: 338052	2018-07-26 18:15:02 +00:00
Jonas Hahnfeld	6fbbf27d98	[test] Remove XFAIL of omp_for_bigbounds.c for Intel Compiler The initial commit said that the test passes with Intel Compiler, so change XFAIL to only list clang and gcc. Differential Revision: https://reviews.llvm.org/D49801 llvm-svn: 338051	2018-07-26 18:14:57 +00:00
Jonas Hahnfeld	ba5ec9c684	[OMPT] Fix typo in test parallel/nested_thread_num.c This caused test failures with GCC since its initial commit in r336085 (https://reviews.llvm.org/D46533). llvm-svn: 337911	2018-07-25 12:34:31 +00:00
Alexey Bataev	37d4156b11	[OPNEMP, NVPTX] Fixed sychronization construct + code cleanup. Summary: 1. Fixed internal problem in `__kmpc_barrier` function: SPMD mode synchronization function should be called only in L1 parallel level. 2. Removed some extra code for synchronization inside of the code, used `__kmpc_barrier` instead. 3. Some code cleanup. Reviewers: gtbercea, grokos Subscribers: openmp-commits Differential Revision: https://reviews.llvm.org/D49564 llvm-svn: 337691	2018-07-23 13:52:12 +00:00
Jonathan Peyton	a764af68be	Block library shutdown until unreaped threads finish spin-waiting This change fixes possibly invalid access to the internal data structure during library shutdown. In a heavily oversubscribed situation, the library shutdown sequence can reach the point where resources are deallocated while there still exist threads in their final spinning loop. The added loop in __kmp_internal_end() checks if there are such busy-waiting threads and blocks the shutdown sequence if that is the case. Two versions of kmp_wait_template() are now used to minimize performance impact. Patch by Hansang Bae Differential Revision: https://reviews.llvm.org/D49452 llvm-svn: 337486	2018-07-19 19:17:00 +00:00
George Rokos	a0da24683b	[OpenMP][libomptarget] New map interface: remove translation code and ensure proper alignment of struct members This patch removes the translation code since this functionality is now implemented in the compiler. target_data_begin and target_data_end are also patched to handle some special cases that used to be handled by the obsolete translation function, namely ensure proper alignment of struct members when we have partially mapped structs. Mapping a struct from a higher address (i.e. not from its beginning) can result in distortion of the alignment for some of its member fields. Padding restores the original (proper) alignment. Differential revision: https://reviews.llvm.org/D44186 llvm-svn: 337455	2018-07-19 13:41:03 +00:00
Joachim Protze	bb869f42b7	[libomptarget] Also support several images for elf In revision r336569 (D49036) libomptarget support for multiple nvidia images has been fixed in case a target region resides inside one or multiple libraries and in the compiled application. But the issues is still present for elf images. This fix will also support multiple images for elf. Patch by Jannis Klinkenberg Reviewers: protze.joachim, ABataev, grokos Reviewed By: protze.joachim, ABataev, grokos Subscribers: openmp-commits Differential Revision: https://reviews.llvm.org/D49418 llvm-svn: 337355	2018-07-18 07:23:46 +00:00
Azharuddin Mohammed	6712b8675b	[cmake] Fix libomptarget/test/CMakeLists.txt Summary: Should be variable name instead of variable reference. If the variable is somehow unset, it messes up the if condition expression and causes a CMake error. Reviewers: jlpeyton, AndreyChurbanov, Hahnfeld Reviewed By: Hahnfeld Subscribers: mgorny, llvm-commits, openmp-commits Differential Revision: https://reviews.llvm.org/D47221 llvm-svn: 337133	2018-07-15 17:29:43 +00:00
Gheorghe-Teodor Bercea	9e94326185	[OpenMP][libomptarget] Fix data sharing and globalization infrastructure to work in SPMD mode Summary: This patch fixes the data sharing infrastructure to work for the SPMD and non-SPMD cases. Reviewers: ABataev, grokos, carlo.bertolli, caomhin Reviewed By: ABataev, grokos Subscribers: guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D49204 llvm-svn: 337013	2018-07-13 16:14:22 +00:00
Alexey Bataev	c2c0138a04	[OPENMP, NVPTX] Fix loop boundaries calculation for dynamic loops. Summary: Patch fixes the next problems. 1. Removes unused functions from omptarget_nvptx_ThreadPrivateContext class + simplified data members. 2. Fixed calculation of loop boundaries for dynamic loops with static scheduling. 3. Introduced saving/restoring of the dynamic loop boundaries to support several nested parallel dynamic loops. Reviewers: grokos Subscribers: guansong, kkwli0, openmp-commits Differential Revision: https://reviews.llvm.org/D49241 llvm-svn: 336915	2018-07-12 15:18:28 +00:00
Jonathan Peyton	dc73f512ae	Fix const cast problem introduced in r336563 336563 eliminated CCAST() macros caused build failures llvm-svn: 336586	2018-07-09 19:09:31 +00:00
Jonathan Peyton	61d44f188a	[OpenMP] Fix a few formatting issues llvm-svn: 336575	2018-07-09 18:09:25 +00:00
Jonathan Peyton	f639936748	[OpenMP] Introduce hierarchical scheduling This patch introduces the logic implementing hierarchical scheduling. First and foremost, hierarchical scheduling is off by default To enable, use -DLIBOMP_USE_HIER_SCHED=On during CMake's configure stage. This work is based off if the IWOMP paper: "Workstealing and Nested Parallelism in SMP Systems" Hierarchical scheduling is the layering of OpenMP schedules for different layers of the memory hierarchy. One can have multiple layers between the threads and the global iterations space. The threads will go up the hierarchy to grab iterations, using possibly a different schedule & chunk for each layer. [ Global iteration space (0-999) ] (use static) [ L1 \| L1 \| L1 \| L1 ] (use dynamic,1) [ T0 T1 \| T2 T3 \| T4 T5 \| T6 T7 ] In the example shown above, there are 8 threads and 4 L1 caches begin targeted. If the topology indicates that there are two threads per core, then two consecutive threads will share the data of one L1 cache unit. This example would have the iteration space (0-999) split statically across the four L1 caches (so the first L1 would get (0-249), the second would get (250-499), etc). Then the threads will use a dynamic,1 schedule to grab iterations from the L1 cache units. There are currently four supported layers: L1, L2, L3, NUMA OMP_SCHEDULE can now read a hierarchical schedule with this syntax: OMP_SCHEDULE='EXPERIMENTAL LAYER,SCHED[,CHUNK][:LAYER,SCHED[,CHUNK]...]:SCHED,CHUNK And OMP_SCHEDULE can still read the normal SCHED,CHUNK syntax from before I've kept most of the hierarchical scheduling logic inside kmp_dispatch_hier.h to try to keep it separate from the rest of the code. Differential Revision: https://reviews.llvm.org/D47962 llvm-svn: 336571	2018-07-09 17:51:13 +00:00
Alexey Bataev	2622e9e5b3	[OPENMP, NVPTX] Support several images in the executable. Summary: Currently Cuda plugin supports loading of the single image, though we may have the executable with the several images, if it has target regions inside of the dynamically loaded library. Patch allows to load multiple images. Reviewers: grokos Subscribers: guansong, openmp-commits, kkwli0 Differential Revision: https://reviews.llvm.org/D49036 llvm-svn: 336569	2018-07-09 17:46:55 +00:00
Jonathan Peyton	39ada85446	[OpenMP] Restructure loop code for hierarchical scheduling This patch reorganizes the loop scheduling code in order to allow hierarchical scheduling to use it more effectively. In particular, the goal of this patch is to separate the algorithmic parts of the scheduling from the thread logistics code. Moves declarations & structures to kmp_dispatch.h for easier access in other files. Extracts the algorithmic part of __kmp_dispatch_init() and __kmp_dispatch_next() into __kmp_dispatch_init_algorithm() and __kmp_dispatch_next_algorithm(). The thread bookkeeping logic is still kept in __kmp_dispatch_init() and __kmp_dispatch_next(). This is done because the hierarchical scheduler needs to access the scheduling logic without the bookkeeping logic. To prepare for new pointer in dispatch_private_info_t, a new flags variable is created which stores the ordered and nomerge flags instead of them being in two separate variables. This will keep the dispatch_private_info_t structure the same size. Differential Revision: https://reviews.llvm.org/D47961 llvm-svn: 336568	2018-07-09 17:45:33 +00:00
Jonathan Peyton	37e2ef5434	[OpenMP] Use C++11 Atomics - barrier, tasking, and lock code These are preliminary changes that attempt to use C++11 Atomics in the runtime. We are expecting better portability with this change across architectures/OSes. Here is the summary of the changes. Most variables that need synchronization operation were converted to generic atomic variables (std::atomic<T>). Variables that are updated with combined CAS are packed into a single atomic variable, and partial read/write is done through unpacking/packing Patch by Hansang Bae Differential Revision: https://reviews.llvm.org/D47903 llvm-svn: 336563	2018-07-09 17:36:22 +00:00
Kelvin Li	b1711b28f7	Define the __STDC_FORMAT_MACROS to avoid test failure on some platforms. ompt/misc/api_calls_from_other_thread.cpp ompt/misc/interoperability.cpp Differential Revision: https://reviews.llvm.org/D48984 llvm-svn: 336438	2018-07-06 14:15:59 +00:00
Joachim Protze	b41c61eed4	Dropped non-supoorted "--no-as-needed" flag from OMPT tests for macOS The flag "--no-as-needed" is not recognized by the linker on macOS making the following tests fail: ompt/loadtool/tool_available/tool_available.c ompt/loadtool/tool_not_available/tool_not_available.c This patch removes this flag for macOS and adds it only for Linux and Windows. I tested it on Ubuntu 16.04 and macOS HighSierra, with Clang/LLVM 6.0.1 and OpenMP trunk. This solution was also discussed in the OpenMP-dev mailing list. Patch provided by Simone Atzeni Differential Revision: https://reviews.llvm.org/D48888 llvm-svn: 336327	2018-07-05 09:14:06 +00:00
Joachim Protze	00505b85a3	[OMPT] Add synchronization to threads_nested.c testcase The testcase potentially fails when a thread is reused. The added synchronization makes sure this does not happen. Patch provided by Simon Convent Differential Revision: https://reviews.llvm.org/D48932 llvm-svn: 336326	2018-07-05 09:14:01 +00:00
Joachim Protze	04a00fc18c	[OMPT] Use alloca() to force availability of frame pointer When compiling with icc, there is a problem with reenter frame addresses in parallel_begin callbacks in the interoperability.c testcase. (The address is not available. thus NULL) Using alloca() forces availability of the frame pointer. Patch provided by Simon Convent Differential Revision: https://reviews.llvm.org/D48282 llvm-svn: 336088	2018-07-02 09:13:38 +00:00
Joachim Protze	e2eec57a4f	[OMPT] Add tests for runtime entry points from non-OpenMP threads Several runtime entry points have not been tested from non-OpenMP threads. This adds tests to an existing testcase. While at it, the testcase was reformatted Patch provided by Simon Convent Differential Revision: https://reviews.llvm.org/D48124 llvm-svn: 336087	2018-07-02 09:13:34 +00:00
Joachim Protze	28d2d708d4	[OMPT] Add testcases for thread_begin and thread_end callbacks Especially the thread_end callback has not been tested before. This adds a testcase for nested and non-nested threads. Patch provided by Simon Convent Differential Revision: https://reviews.llvm.org/D47824 llvm-svn: 336086	2018-07-02 09:13:30 +00:00
Joachim Protze	4a73ae167e	[OMPT] Provide the right thread_num for ancestor levels The current implementation always provides the thread-num for the current parallel region. This patch fixes the behavior for ancestor levels >0. Differential Revision: https://reviews.llvm.org/D46533 llvm-svn: 336085	2018-07-02 09:13:24 +00:00
Alexey Bataev	3994bafbc7	[OPENMP, NVPTX] Sync threads before start ordered loops. Summary: Threads must be synchronized before starting ordered construct. Reviewers: grokos Subscribers: guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D48732 llvm-svn: 335987	2018-06-29 16:16:00 +00:00
Alexey Bataev	0ac29350b5	[OPENMP, NVPTX] Fixes for NVPTX RTL Summary: Patch fixes several problems in the implementation of NVPTX RTL. 1. Detection of the last iteration for loops with static scheduling, no chunks. 2. Fixes reductions for the serialized parallel constructs. 3. Fixes handling of the barriers. Reviewers: grokos Reviewed By: grokos Subscribers: Hahnfeld, guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D48480 llvm-svn: 335469	2018-06-25 13:43:35 +00:00
Andrey Churbanov	a7fa3f009a	minor: fixed typo in debug print llvm-svn: 335138	2018-06-20 15:54:11 +00:00
Jonas Hahnfeld	d03cbf2cfe	Remove liboffload from repository See the mailing list for the proposal and discussion: http://lists.llvm.org/pipermail/openmp-dev/2018-June/002041.html llvm-svn: 335069	2018-06-19 19:08:17 +00:00
Guansong Zhang	f9e56e5982	[OpenMP] [CUDA] Expose teamid to the debug path Summary: Small bug fix for debug build. A previous fix causing trouble for debug build. Reviewers: grokos Reviewed By: grokos Subscribers: openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D48286 llvm-svn: 335046	2018-06-19 14:05:38 +00:00
Jonathan Peyton	e92ae43be8	[OpenMP] Fix formatting issues in kmp_stats.h llvm-svn: 334335	2018-06-08 22:27:53 +00:00
Joachim Protze	406361330b	[OMPT] Rename ompt_wait_id to omp_wait_id Rename ompt_wait_id to omp_wait_id, as defined in the spec. Differential Revision: https://reviews.llvm.org/D46530 llvm-svn: 333368	2018-05-28 08:16:08 +00:00
Joachim Protze	c5836064bb	[OMPT] Rename ompt_frame_t to omp_frame_t Rename ompt_frame_t to omp_frame_t, as defined in the spec. Differential Revision: https://reviews.llvm.org/D43568 llvm-svn: 333367	2018-05-28 08:14:58 +00:00
Jonas Hahnfeld	3c6595d65d	[OMPT] Fix test parallel/not_enough_threads.c Upcoming changes to FileCheck will modify CHECK-DAG to not match overlapping regions of the input. This test was found to be affected because it expects to find four threads to invoke events of type ompt_event_implicit_task_begin. It turns out this is wrong because OMP_THREAD_LIMIT is set to 2, so there are only two threads. The rest of the test got it right so it went unnoticed until now. (Rewrite test and apply clang-format to it as discussed in the past.) Differential Revision: https://reviews.llvm.org/D47119 llvm-svn: 333361	2018-05-27 17:07:38 +00:00
Jonas Hahnfeld	17aabf83e9	[libomptarget-nvptx] loop: Determine if runtime uninitialized The generic entry points for static loop scheduling previously hardcoded that the runtime was initialized. This can be wrong if the compiler analyzes that the runtime is not needed and calls the init functions accordingly. This didn't affect clang-ykt because they have entry points for different combinations of SPMD x Runtime not needed. I didn't do measurements yet but with inlining we might get away with always calling the generic interface and letting compiler and runtime figure out the rest. In any case, a correct runtime is always better than having functions that may only be called if previous calls passed in a specific set of arguments! Differential Revision: https://reviews.llvm.org/D47131 llvm-svn: 333285	2018-05-25 15:56:48 +00:00
Jonas Hahnfeld	65e0b8784c	[CMake] Unify install path for libraries Introduce OPENMP_INSTALL_LIBDIR and use in all install() commands. This also fixes installation of libomptarget-nvptx that previously didn't honor {OPENMP,LLVM}_LIBDIR_SUFFIX. Differential Revision: https://reviews.llvm.org/D47130 llvm-svn: 333284	2018-05-25 15:56:41 +00:00
George Rokos	6da6f433a0	[CUDA]Fix dynamic\|guided scheduling. The existing implementation of the dynamic scheduling breaks the contract introduced by the original openmp runtime and, thus, is incorrect. Patch fixes it and introduces correct dynamic scheduling model. Thanks to Alexey Bataev for submitting this patch. Differential Revision: https://reviews.llvm.org/D47333 llvm-svn: 333225	2018-05-24 21:12:41 +00:00
Jonas Hahnfeld	9228f9718c	[libomptarget-nvptx-bc] Pass found CUDA installations We already know where the CUDA SDK is, so there is no point in letting Clang search for it again and possibly finding no or a different installation. --cuda-path is supported since the beginning of CUDA support in Clang, so making this required doesn't impose additional restrictions. Differential Revision: https://reviews.llvm.org/D46930 llvm-svn: 332495	2018-05-16 17:20:27 +00:00
Jonas Hahnfeld	37bbe1a698	[libomptarget-nvptx] Test bitcode compiler flags and enable by default Move all logic related to selecting the bitcode compiler and linker into a new file and dynamically test required compiler flags. This also adds -fcuda-rdc for Clang trunk as previously attempted in D44992 which fixes the build. As a result this change also enables building the library by default if all prerequisites are met. Differential Revision: https://reviews.llvm.org/D46901 llvm-svn: 332494	2018-05-16 17:20:21 +00:00
Gheorghe-Teodor Bercea	787a350021	[OpenMP][libomptarget] Add function for checking SPMD mode Summary: Add function to the NVPTX libomptarget library that will return true if the current target region is being executed in SPMD mode. Reviewers: ABataev, grokos, carlo.bertolli, caomhin Reviewed By: grokos Subscribers: guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D46840 llvm-svn: 332360	2018-05-15 15:16:43 +00:00
Joachim Protze	9be9cf20bf	[OMPT] Fix thread_num for implicit_task_end callbacks in nested parallel regions implicit_task_end callbacks in nested parallel regions did not always give the correct thread_num, since the inner parallel region may have already been finalized. Now, the thread_num is stored at the beginning of the implicit task and retrieved at the end, whenever necessary. A testcase was added as well. Differential Revision: https://reviews.llvm.org/D46260 llvm-svn: 331632	2018-05-07 12:42:21 +00:00
Joachim Protze	8fc39f6b19	[OMPT] Add api_calls_misc.c testcase and rename api_calls.c testcase The api_calls_misc.c testcase tests the following api calls: ompt_get_callback() ompt_get_state() ompt_enumerate_states() ompt_enumerate_mutex_impls() These have not been tested previously. The api_calls.c testcase has been renamed to api_calls_places.c because it only tests api calls that are related to places. Differential Revision: https://reviews.llvm.org/D42523 llvm-svn: 331631	2018-05-07 12:42:15 +00:00
Guansong Zhang	e1c7a46d5b	[OpenMP] Use LIBOMPTARGET_DEVICE_RTL_DEBUG env var to control debug messages on the device side Summary: Enable the device side debug messages at compile time, use env var to control at runtime. To achieve this, an environment data block is passed to the device lib when it is loaded. By default, the message is off, to enable it, a user need to set LIBOMPDEVICE_DEBUG=1. Reviewers: grokos Reviewed By: grokos Subscribers: openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D46210 llvm-svn: 331550	2018-05-04 19:29:28 +00:00
Jonathan Peyton	d47df260ba	[OpenMP][OMPT] Fix api_calls_from_other_thread.cpp Removed environment setting in RUN: line that was being ignored anyways. Changed a few specific checks to "any number" llvm-svn: 331212	2018-04-30 18:46:31 +00:00
Guansong Zhang	ad6c26516b	[OpenMP] Remove compilation warning when using clang to compile bc files. Summary: Minor printf format correction. NVCC ignore those. Clang will give warning on these if debug is enabled. Reviewers: grokos Reviewed By: grokos Subscribers: openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D45528 llvm-svn: 330944	2018-04-26 14:06:53 +00:00
Guansong Zhang	334c379e32	[OpenMP] Make bc file compilation sensitive to LIBOMPTARGET_NVPTX_DEBUG flag Summary: The LIBOMPTARGET_NVPTX_DEBUG flag is inconsistent between using nvcc to generate .a file and clang to generate .bc file. Sync the two setting so we can get debug messages from the bc file path as well. Reviewers: grokos Subscribers: Hahnfeld, openmp-commits, mgorny Tags: #openmp Differential Revision: https://reviews.llvm.org/D45530 llvm-svn: 330477	2018-04-20 20:41:00 +00:00
Heejin Ahn	f78a493528	[OpenMP] Compilation error fix on const char* Summary: This line (`0ed912c7a7/runtime/src/kmp_gsupport.cpp (L1459)`) added in D45327 (rL330282) causes a compilation failure. Reviewers: jlpeyton Subscribers: guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D45786 llvm-svn: 330299	2018-04-18 22:23:31 +00:00
Jonathan Peyton	1482db9e03	[OpenMP] Fix affinity API for KMP_AFFINITY=none\|compact\|scatter Currently, the affinity API reports garbage for the initial place list and any thread's place lists when using KMP_AFFINITY=none\|compact\|scatter. This patch does two things: for KMP_AFFINITY=none, Creates a one entry table for the places, this way, the initial place list is just a single place with all the proc ids in it. We also set the initial place of any thread to 0 instead of KMP_PLACE_ALL so that the thread reports that single place (place 0) instead of garbage (-1) when using the affinity API. When non-OMP_PROC_BIND affinity is used (including KMP_AFFINITY=compact\|scatter), a thread's place list is populated correctly. We assume that each thread is assigned to a single place. This is implemented in two of the affinity API functions Differential Revision: https://reviews.llvm.org/D45527 llvm-svn: 330283	2018-04-18 19:25:48 +00:00
Jonathan Peyton	27a677fc95	Introduce GOMP_taskloop API This patch introduces GOMP_taskloop to our API. It adds GOMP_4.5 to our version symbols. Being a wrapper around __kmpc_taskloop, the function creates a task with the loop bounds properly nested in the shareds so that the GOMP task thunk will work properly. Also, the firstprivate copy constructors are properly handled using the __kmp_gomp_task_dup() auxiliary function. Currently, only linear spawning of tasks is supported for the GOMP_taskloop interface. Differential Revision: https://reviews.llvm.org/D45327 llvm-svn: 330282	2018-04-18 19:23:54 +00:00

1 2 3 4 5 ...

785 Commits