llvm-project

Commit Graph

Author	SHA1	Message	Date
Jonathan Peyton	985f152f25	[OpenMP] Update ittnotify sources This patch updates the ittnotify sources to the latest corresponding with Intel(R) VTune(TM) Amplifier 2018 Differential Revision: https://reviews.llvm.org/D52378 llvm-svn: 343139	2018-09-26 20:30:00 +00:00
Jonathan Peyton	cf27e31bdd	[OpenMP] Fix performance issue from 376.kdtree This change improves the performance of 376.kdtree by giving the compiler an opportunity to do inlining and other optimizations for the call path, __kmpc_omp_task_complete_if0()->__kmp_task_finish(), which is one of the hot paths in the program; some functions in kmp_taskdeps.cpp were moved to the new header file, kmp_taskdeps.h to achieve this. Patch by Hansang Bae Differential Revision: https://reviews.llvm.org/D51889 llvm-svn: 343138	2018-09-26 20:24:39 +00:00
Jonathan Peyton	60eec6fecb	[OpenMP][OMPT] A few improvements This change includes miscellaneous improvements as follows: 1) Added ompt_get_proc_id() implementation for Windows 2) Added parser and print tool for omp-tool-var, just in case it needs to be printed (OMP_DISPLAY_ENV) 3) omp_control_tool is exported on Windows Patch by Hansang Bae Differential Revision: https://reviews.llvm.org/D50538 llvm-svn: 343137	2018-09-26 20:19:44 +00:00
Gheorghe-Teodor Bercea	f7256a593f	[OpenMP][libomptarget] Set the frame pointer then test empty slot condition Summary: NFC - just fixing a bug: the empty slot test was before the re-setting of the Stack pointer. Reviewers: ABataev, caomhin, Hahnfeld Reviewed By: ABataev Subscribers: guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D52122 llvm-svn: 343006	2018-09-25 18:48:14 +00:00
Gheorghe-Teodor Bercea	9bc3bfffb4	[OpenMP][libomptarget] Simplify warp master selection for data sharing Summary: There is currently no supported situation where the warp master is not the first thread in the warp. This also avoids the device execution from hanging on Volta GPUs when ballot_sync is called by a number of threads that is less that the size of a warp. Reviewers: ABataev, caomhin, grokos Reviewed By: grokos Subscribers: guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D50188 llvm-svn: 342972	2018-09-25 13:23:32 +00:00
Alexey Bataev	022bf16b41	[OPENMP][NVPTX] Add support for lastprivates/reductions handling in SPMD constructs with lightweight runtime. Summary: We need the support for per-team shared variables to support codegen for lastprivates/reductions. Patch adds this support by using shared memory if the total size of the reductions/lastprivates is <= 128 bytes, then pre-allocated buffer in global memory if size is <= 4K bytes,or uses malloc/free, otherwise. Reviewers: gtbercea, kkwli0, grokos Subscribers: guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D51875 llvm-svn: 342737	2018-09-21 14:11:41 +00:00
Alexey Bataev	06b6e0f406	[OPENMP]Increment iterator when the loop is continued. Summary: Missed operation of the incrementing iterator when required just to continue execution. Reviewers: kkwli0, gtbercea, grokos Subscribers: guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D51937 llvm-svn: 341964	2018-09-11 17:16:26 +00:00
Joachim Protze	489cdb783a	[OMPT] Update types according to TR7 Some types and callback signatures have changed from TR6 to TR7. Major changes (only adding signatures and stubs): (-remove idle callback) done by D48362 -add reduction and dispatch callback -add get_task_memory and finalize_tool runtime entry points -ompt_invoker_t becomes ompt_parallel_flag_t -more types of sync_regions Patch provided by Simon Convent Reviewers: hbae, protze.joachim Differential Revision: https://reviews.llvm.org/D50774 llvm-svn: 341834	2018-09-10 14:34:54 +00:00
Jonas Hahnfeld	dc79c7187c	[libomptarget-nvptx] Remove last mentions of __kmpc_print_* Their implementation was removed during review, delete their prototype declarations. llvm-svn: 341748	2018-09-08 12:10:19 +00:00
Jonathan Peyton	08f0180ba9	[OpenMP] Update copyright to 2018 Better late than never llvm-svn: 341703	2018-09-07 20:33:35 +00:00
Jonathan Peyton	a2f6eff488	[OpenMP] Change hint parameter type for critical to uint32_t Add atomic hint flags to the enum. The hint parameter type was changed to uint32_t in __kmpc_critical_with_hint() Patch by Olga Malysheva Differential Revision: https://reviews.llvm.org/D51235 llvm-svn: 341694	2018-09-07 18:46:40 +00:00
Jonathan Peyton	2ff302d5d7	[OpenMP] Synchronization hint constants added to headers ident flags reserved for atomic hints. This patch adds omp_sync_hint_t to omp.h and omp_sync_hint_kind to omp_lib.h. For better maintainability the list of macros for ident flags was replaced with a enum. The new KMP_IDENT_ATOMIC_HINT_MASK was added to the enum to support possible future atomic hints. Also fix omp_lib.h.var to be under 72 chars again after 5.0 OpenMP Memory commit Patch by Olga Malysheva Differential Revision: https://reviews.llvm.org/D51233 llvm-svn: 341693	2018-09-07 18:45:13 +00:00
Jonathan Peyton	92ca61884b	[OpenMP] Initial implementation of OMP 5.0 Memory Management routines Implemented omp_alloc, omp_free, omp_{set,get}_default_allocator entries, and OMP_ALLOCATOR environment variable. Added support for HBW memory on Linux if libmemkind.so library is accessible (dynamic library only, no support for static libraries). Only used stable API (hbwmalloc) of the memkind library though we may consider using experimental API in future. The ICV def-allocator-var is implemented per implicit task similar to place-partition-var. In the absence of a requested allocator, the uses the default allocator. Predefined allocators (the only ones currently available) are made similar for C and Fortran, - pointers (long integers) with values 1 to 8. Patch by Andrey Churbanov Differential Revision: https://reviews.llvm.org/D51232 llvm-svn: 341687	2018-09-07 18:25:49 +00:00
Andrey Churbanov	d946778b9f	Fix for https://bugs.llvm.org/show_bug.cgi?id=38839 : Changed style of declarations to be less than 72 char each. Differential Revision: https://reviews.llvm.org/D51694 llvm-svn: 341653	2018-09-07 12:22:04 +00:00
Jonas Hahnfeld	21e3ee0afe	[libomptarget] Remove two unneeded includes, NFCI. Follow-up to r340542 and r340767. llvm-svn: 341563	2018-09-06 17:00:57 +00:00
Jonas Hahnfeld	f27dcf01d2	[libomptaret][test] Announce compiler features This is a follow-up to r341371: The new test for PR38704 doesn't work with Clang 6.0. It uses an UNSUPPORTED: clang-6, but that hasn't worked because the compiler features weren't known to lit. llvm-svn: 341448	2018-09-05 07:26:00 +00:00
Sergey Dmitriev	b4dc69ff80	[libomptarget] Remove `Devices` from `RTLInfoTy` This patch removes unused field `Devices` from `RTLInfoTy`. Differential Revision: https://reviews.llvm.org/D51653 llvm-svn: 341399	2018-09-04 20:23:09 +00:00
Jonas Hahnfeld	bb51d39871	[libomptarget][CUDA] Use cuDeviceGetAttribute, NFCI. cuDeviceGetProperties has apparently been deprecated since CUDA 5.0. Nvidia started using annotations only in CUDA 9.2, so nobody noticed nor cared before. The new function returns the same values, tested with a P100. Differential Revision: https://reviews.llvm.org/D51624 llvm-svn: 341372	2018-09-04 15:13:28 +00:00
Jonas Hahnfeld	f7f86971e6	[libomptarget] PR38704: Fix erase of ShadowPtrMap erase() invalidates the iterator and returns a new one pointing to the following element. The code now follows the example at https://en.cppreference.com/w/cpp/container/map/erase. (The added testcase crashes without this patch.) Reported by David Binderman (https://llvm.org/PR38704)! Differential Revision: https://reviews.llvm.org/D51623 llvm-svn: 341371	2018-09-04 15:13:23 +00:00
Jonas Hahnfeld	82d20201d0	[libomptarget][NVPTX] Drop dead code and data structures, NFCI. * cg and HasCancel in WorkDescr were never read and can be removed. * This eliminates the last use of priv in ThreadPrivateContext. * CounterGroup is unused afterwards. * Remove duplicate external declares in omptarget-nvptx.cu that are already in the header omptarget-nvptx.h. Differential Revision: https://reviews.llvm.org/D51622 llvm-svn: 341370	2018-09-04 15:13:17 +00:00
Jonas Hahnfeld	96c13488ab	[libomptarget][NVPTX] Fix __kmpc_spmd_kernel_deinit If the runtime is uninitialized the master thread must Enqueue the state object, and ALL threads must return immediately. Found post-commit of https://reviews.llvm.org/D51222. llvm-svn: 341328	2018-09-03 17:24:23 +00:00
Alexey Bataev	39a4724095	[OPENMP][NVPTX] Replace assert() by ASSERT0() macro, NFC. Required to fix the buildbots. llvm-svn: 340956	2018-08-29 19:22:06 +00:00
Alexey Bataev	b7a5d38cf5	[OPENMP][NVPTX] Lightweight runtime support for SPMD mode. Summary: Implemented simple and lightweight runtime support for SPMD mode-based constructs. It adds support for L2 sequential parallelism wihtout full runtime support. Also, patch fixes some use cases for uninitialized\|lightweight runtime. Reviewers: grokos, kkwli0, Hahnfeld, gtbercea Subscribers: guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D51222 llvm-svn: 340944	2018-08-29 17:35:09 +00:00
Gheorghe-Teodor Bercea	15f5407d92	[OpenMP][Fix] Conditional compilation leaves variables unused Summary: Prevent variables from being left unused by conditional compilation. Reviewers: ABataev, grokos, Hahnfeld, caomhin, protze.joachim Reviewed By: Hahnfeld Subscribers: guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D51303 llvm-svn: 340771	2018-08-27 19:54:26 +00:00
Alexandre Eichenberger	e9b7d8dcd6	[OpenMP][libomptarget] rework of fatal error reporting Summary: Removed the function that used a lock and varargs Used the same mechanism as for debug messages Reviewers: ABataev, gtbercea, grokos, Hahnfeld Reviewed By: gtbercea, Hahnfeld Subscribers: mikerice, ABataev, RaviNarayanaswamy, guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D51226 llvm-svn: 340767	2018-08-27 18:20:15 +00:00
Gheorghe-Teodor Bercea	353adf437d	[OpenMP][Fix] Ensure comparison between unsigned values. Summary: Ensure the values being compared are both unsigned. Reviewers: ABataev, Hahnfeld, caomhin, grokos, AndreyChurbanov Reviewed By: AndreyChurbanov Subscribers: AndreyChurbanov, guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D51301 llvm-svn: 340745	2018-08-27 14:52:20 +00:00
Jonathan Peyton	2a966e84ce	[OpenMP] Remove deprecated/obsolete MIC attributes from headers llvm-svn: 340656	2018-08-24 21:34:10 +00:00
Jonathan Peyton	2c3e5d82b4	[OpenMP] Fixed affinity verbose double printing for balanced type. llvm-svn: 340647	2018-08-24 20:35:42 +00:00
Jonathan Peyton	a4a9c48c78	[OpenMP] Fix tasking bug for decreasing hot team nthreads The __kmp_execute_tasks_template() function reads the task_team and current_task from the thread structure. There appears to be a pathological timing where the number of threads in the hot team decreases and so a thread is put in the pool via __kmp_free_thread(). It could be the case that: 1) A thread reads th_task_team into task_team local variables and is then interrupted by the OS 2) Master frees the thread and sets current task and task team to NULL 3) The thread reads current_task as NULL When this happens, current_task is dereferenced and a segfault occurs. This patch just checks for current_task to not be NULL as well. Differential Revision: https://reviews.llvm.org/D50651 llvm-svn: 340632	2018-08-24 18:07:35 +00:00
Jonathan Peyton	ca10a76f08	[OpenMP] Add check for hot_teams array If hot teams are not being used, this code could seg fault without the added check, and does so when composability is used in conjunction with nesting. The fix prevents the segfault. Differential Revision: https://reviews.llvm.org/D50649 llvm-svn: 340629	2018-08-24 18:05:00 +00:00
Jonathan Peyton	b1b221c82c	[OpenMP] Fix incorrect barrier imbalance reporting in ITTNOTIFY Exclude nested explicit tasks from timing, only outer level explicit task counted and its time added to barrier arrive time for the thread. Differential Revision: https://reviews.llvm.org/D50584 llvm-svn: 340628	2018-08-24 18:03:27 +00:00
Alexandre Eichenberger	1b4a666ba5	[OpenMP][libomptarget] Bringing up to spec with respect to OMP_TARGET_OFFLOAD env var Summary: Right now, only the OMP_TARGET_OFFLOAD=DISABLED was implemented. Added support for the other MANDATORY and DEFAULT values. Reviewers: gtbercea, ABataev, grokos, caomhin, Hahnfeld Reviewed By: Hahnfeld Subscribers: protze.joachim, gtbercea, AlexEichenberger, RaviNarayanaswamy, Hahnfeld, guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D50522 llvm-svn: 340542	2018-08-23 16:22:42 +00:00
Joachim Protze	e1a04b4659	[OMPT] Remove OMPT idle callback The idle callback was removed from the spec as of TR7. This removes it from the implementation. Patch provided by Simon Convent Reviewers: hbae, protze.joachim Differential Revision: https://reviews.llvm.org/D48362 llvm-svn: 339771	2018-08-15 13:54:28 +00:00
Jonathan Peyton	a3f6d4c5b8	[OMPT] Make omp_control_tool() compliant when called from Fortran programs This change fixes an incorrect behavior of the omp_control_tool function when called from Fortran applications. A tool callback function for this event is supposed to get NULL for the third argument according to the specification, but the current implementation just passes a garbage value. A possible fix is to use the OPTIONAL attribute for the third argument. Patch by Hansang Bae Differential Revision: https://reviews.llvm.org/D50565 llvm-svn: 339585	2018-08-13 17:26:18 +00:00
Jonathan Peyton	baad3f6016	[OpenMP] Cleanup code This patch cleans up unused functions, variables, sign compare issues, and addresses some -Warning flags which are now enabled including -Wcast-qual. Not all the warning flags in LibompHandleFlags.cmake are enabled, but some are with this patch. Some __kmp_gtid_from_* macros in kmp.h are switched to static inline functions which allows us to remove the awkward definition of KMP_DEBUG_ASSERT() and KMP_ASSERT() macros which used the comma operator. This had to be done for the innumerable -Wunused-value warnings related to KMP_DEBUG_ASSERT() Differential Revision: https://reviews.llvm.org/D49105 llvm-svn: 339393	2018-08-09 22:04:30 +00:00
Jonathan Peyton	821649229e	[OpenMP] Fix doacross testing for gcc This patch adds a test using the doacross clauses in OpenMP and removes gcc from testing kmp_doacross_check.c which is only testing the kmp rather than the gomp interface. Differential Revision: https://reviews.llvm.org/D50014 llvm-svn: 338757	2018-08-02 19:13:07 +00:00
Jonas Hahnfeld	ef8f737288	[OMPT] Disable by default on Windows This is broken per PR36561 and PR36574, so disable it for now until somebody interested can take a look. OMPT can still be activated manually by passing -DLIBOMP_OMPT_SUPPORT=ON during configuration. Differential Revision: https://reviews.llvm.org/D50086 llvm-svn: 338721	2018-08-02 14:34:08 +00:00
Jonas Hahnfeld	5b57eb4b09	[tests] Add annotations for taskloop features Only supported since GCC 6 and Intel 17.0. However GCC 6.3.0 is crashing on two of the tests, so disable them as well... Differential Revision: https://reviews.llvm.org/D50085 llvm-svn: 338720	2018-08-02 14:34:03 +00:00
Joachim Protze	935399d254	[OMPT,tests] Fix taskloop testcase scheduling effects The taskloop testcase had scheduling effects. Tasks of the taskloop would sometimes be scheduled before all task were created. The testing is now split into two phases. First, the task creation on the master is tested, than the scheduling events of the tasks are tested. Thus, the order of creation and scheduling events is irrelavant. Patch by Simon Convent Reviewed by: protze.joachim, Hahnfeld Subscribers: openmp-commits Differential Revision: https://reviews.llvm.org/D50140 llvm-svn: 338580	2018-08-01 16:15:18 +00:00
Jonas Hahnfeld	51fc3cc628	[test] Convert test for PR36720 to c89 GCC 4.8.5 defaults to this old C standard. I think we should make the tests pass a newer -std=c99\|c11 but that's too intrusive for now... Differential Revision: https://reviews.llvm.org/D50084 llvm-svn: 338490	2018-08-01 06:26:55 +00:00
Jonathan Peyton	28226e7d64	[OpenMP] Fix tasking + parallel bug From the bug report, the runtime needs to initialize the nproc variables (inside middle init) for each root when the task is encountered, otherwise, a segfault can occur. Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=36720 Differential Revision: https://reviews.llvm.org/D49996 llvm-svn: 338313	2018-07-30 21:47:56 +00:00
Gheorghe-Teodor Bercea	f729df821a	[OpenMP] Fix new task creation Summary: When OMPT is not supported the __kmp_omp_task() function is passed the parameters in the wrong order. This is a fix related to patch D47709. Reviewers: Hahnfeld, sconvent, caomhin, jlpeyton Reviewed By: Hahnfeld Subscribers: guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D50001 llvm-svn: 338295	2018-07-30 19:51:51 +00:00
Jonas Hahnfeld	f985f98128	[CMake] Disable -Wstringop-overflow GCC 8 produces false-positives with this: In file included from <openmp>/src/runtime/src/kmp_os.h:950, from <openmp>/src/runtime/src/kmp.h:78, from <openmp>/src/runtime/src/kmp_environment.cpp:54: <openmp>/src/runtime/src/kmp_environment.cpp: In function ‘char* __kmp_env_get(const char)’: <openmp>/src/runtime/src/kmp_safe_c_api.h:52:50: warning: ‘char strncpy(char, const char, size_t)’ specified bound depends on the length of the source argument [-Wstringop-overflow=] #define KMP_STRNCPY_S(dst, bsz, src, cnt) strncpy(dst, src, cnt) ~~~~~~~^~~~~~~~~~~~~~~ <openmp>/src/runtime/src/kmp_environment.cpp:97:5: note: in expansion of macro ‘KMP_STRNCPY_S’ KMP_STRNCPY_S(result, len, value, len); ^~~~~~~~~~~~~ <openmp>/src/runtime/src/kmp_environment.cpp:92:28: note: length computed here size_t len = KMP_STRLEN(value) + 1; This is stupid because result is allocated with KMP_INTERNAL_MALLOC(len), so the arguments are correct. Differential Revision: https://reviews.llvm.org/D49904 llvm-svn: 338283	2018-07-30 18:16:22 +00:00
Jonathan Peyton	284fab195a	[OpenMP] Add GOMP version symbols for OMP_4.5 API This patch adds the appropriate version symbols to the relevant API functions Differential Revision: https://reviews.llvm.org/D49859 llvm-svn: 338281	2018-07-30 17:50:35 +00:00
Jonathan Peyton	369d72db11	[OpenMP] Implement GOMP doacross compatibility This change introduces GOMP doacross compatibility. There are 12 new interface functions 6 for long type and 6 for unsigned long long type: GOMP_doacross_post, GOMP_doacross_wait, GOMP_loop_doacross_[schedule]_start where schedule can be static, dynamic, guided, or runtime. These functions just translate the parameters if necessary and send them to the corresponding kmp function. E.g., GOMP_doacross_post() -> __kmpc_doacross_post() For the GOMP_doacross_post function, there is template specialization to account for when long is a four byte vs an eight byte type. If it is a four byte type, then a temporary array has to be created to convert the four byte integers into eight byte integers and then sending that into __kmpc_doacross_post(). Because GOMP_doacross_wait uses varargs, it always needs a temporary array and does not need template specialization. Differential Revision: https://reviews.llvm.org/D49857 llvm-svn: 338280	2018-07-30 17:48:33 +00:00
Jonathan Peyton	8692e142b3	[OpenMP] Fix build errors when building with KMP_DEBUG_ADAPTIVE_LOCKS=1 This change fixes build errors when building a runtime with adaptive lock stats enabled. Most of the errors were due to the recent changes in the runtime, but it seems that we have not tried to build this debug runtime on Windows for a long time. Patch by Hansang Bae Differential Revision: https://reviews.llvm.org/D49823 llvm-svn: 338277	2018-07-30 17:45:23 +00:00
Jonathan Peyton	f0682ac498	[OpenMP][Stats] Cleanup stats gathering code 1) Remove unnecessary data from list node structure 2) Remove timerPair in favor of pushing/popping explicitTimers. This way, nested timers will work properly. 3) Fix #pragma omp critical timers 4) Add histogram capability 5) Add KMP_STATS_FILE formatting capability 6) Have time partitioned into serial & parallel by introducing partitionedTimers::exchange(). This also counts the number of serial regions in the executable. 7) Fix up the timers around OMP loops so that scheduling overhead and work are both counted correctly. 8) Fix up the iterations statistics so they count the number of iterations the thread receives at each loop scheduling event 9) Change timers so there is only one RDTSC read per event change 10) Fix up the outdated comments for the timers Differential Revision: https://reviews.llvm.org/D49699 llvm-svn: 338276	2018-07-30 17:41:08 +00:00
Joachim Protze	cdaefac5bd	[OMPT] Fix OMPT callbacks for the taskloop construct and add testcase Fix the order of callbacks related to the taskloop construct. Add the iteration_count to work callbacks (according to the spec). Use kmpc_omp_task() instead of kmp_omp_task() to include OMPT callbacks. Add a testcase. Patch by Simon Convent Reviewed by: protze.joachim, hbae Subscribers: openmp-commits Differential Revision: https://reviews.llvm.org/D47709 llvm-svn: 338146	2018-07-27 18:13:24 +00:00
Joachim Protze	86ed6aa668	[OMPT] Adapt OMPT callbacks for tasks to handle untied tasks correctly The ompt/tasks/task_types.c testcase did not test untied tasks properly. Now, frame addresses are tested and two scheduling points are added at which the task can switch to another thread. Due to scheduling effects, the frame address could be NULL. This needed a restructure of the way OMPT callbacks are called. __ompt_task_finish() now as an extra parameter, whether a task is completed. Its invocation has been moved into __kmp_task_finish(). Thus, the order of the writes to the frame addresses is not subject to scheduling effects anymore. Patch by Simon Convent Reviewed by: protze.joachim, hbae Subscribers: openmp-commits Differential Revision: https://reviews.llvm.org/D49181 llvm-svn: 338145	2018-07-27 18:13:20 +00:00
Joachim Protze	f203109edb	[OMPT] Print two more addresses in print_fuzzy_address_block() The two more outputs are needed to match the return addresses when using the Intel Compiler, as it generates more instructions between the fuzzy-printing of the address and the runtime call. Patch by Simon Convent Reviewed By: protze.joachim, hbae Differential Revision: https://reviews.llvm.org/D49373 llvm-svn: 338144	2018-07-27 18:13:15 +00:00
Jonas Hahnfeld	3a0e9b37f3	PR30734: Remove __kmp_ft_page_allocate() This function was not enabled by default and not exported when manually tweaking the build flags. Additionally it was hard to use since there is no corresponding __kmp_ft_page_free(). The code itself is questionable because the returned memory address is padded by an extra pointer which stores the unpadded start of the allocated region (this would need to be freed). Differential Revision: https://reviews.llvm.org/D49802 llvm-svn: 338052	2018-07-26 18:15:02 +00:00
Jonas Hahnfeld	6fbbf27d98	[test] Remove XFAIL of omp_for_bigbounds.c for Intel Compiler The initial commit said that the test passes with Intel Compiler, so change XFAIL to only list clang and gcc. Differential Revision: https://reviews.llvm.org/D49801 llvm-svn: 338051	2018-07-26 18:14:57 +00:00
Jonas Hahnfeld	ba5ec9c684	[OMPT] Fix typo in test parallel/nested_thread_num.c This caused test failures with GCC since its initial commit in r336085 (https://reviews.llvm.org/D46533). llvm-svn: 337911	2018-07-25 12:34:31 +00:00
Alexey Bataev	37d4156b11	[OPNEMP, NVPTX] Fixed sychronization construct + code cleanup. Summary: 1. Fixed internal problem in `__kmpc_barrier` function: SPMD mode synchronization function should be called only in L1 parallel level. 2. Removed some extra code for synchronization inside of the code, used `__kmpc_barrier` instead. 3. Some code cleanup. Reviewers: gtbercea, grokos Subscribers: openmp-commits Differential Revision: https://reviews.llvm.org/D49564 llvm-svn: 337691	2018-07-23 13:52:12 +00:00
Jonathan Peyton	a764af68be	Block library shutdown until unreaped threads finish spin-waiting This change fixes possibly invalid access to the internal data structure during library shutdown. In a heavily oversubscribed situation, the library shutdown sequence can reach the point where resources are deallocated while there still exist threads in their final spinning loop. The added loop in __kmp_internal_end() checks if there are such busy-waiting threads and blocks the shutdown sequence if that is the case. Two versions of kmp_wait_template() are now used to minimize performance impact. Patch by Hansang Bae Differential Revision: https://reviews.llvm.org/D49452 llvm-svn: 337486	2018-07-19 19:17:00 +00:00
George Rokos	a0da24683b	[OpenMP][libomptarget] New map interface: remove translation code and ensure proper alignment of struct members This patch removes the translation code since this functionality is now implemented in the compiler. target_data_begin and target_data_end are also patched to handle some special cases that used to be handled by the obsolete translation function, namely ensure proper alignment of struct members when we have partially mapped structs. Mapping a struct from a higher address (i.e. not from its beginning) can result in distortion of the alignment for some of its member fields. Padding restores the original (proper) alignment. Differential revision: https://reviews.llvm.org/D44186 llvm-svn: 337455	2018-07-19 13:41:03 +00:00
Joachim Protze	bb869f42b7	[libomptarget] Also support several images for elf In revision r336569 (D49036) libomptarget support for multiple nvidia images has been fixed in case a target region resides inside one or multiple libraries and in the compiled application. But the issues is still present for elf images. This fix will also support multiple images for elf. Patch by Jannis Klinkenberg Reviewers: protze.joachim, ABataev, grokos Reviewed By: protze.joachim, ABataev, grokos Subscribers: openmp-commits Differential Revision: https://reviews.llvm.org/D49418 llvm-svn: 337355	2018-07-18 07:23:46 +00:00
Azharuddin Mohammed	6712b8675b	[cmake] Fix libomptarget/test/CMakeLists.txt Summary: Should be variable name instead of variable reference. If the variable is somehow unset, it messes up the if condition expression and causes a CMake error. Reviewers: jlpeyton, AndreyChurbanov, Hahnfeld Reviewed By: Hahnfeld Subscribers: mgorny, llvm-commits, openmp-commits Differential Revision: https://reviews.llvm.org/D47221 llvm-svn: 337133	2018-07-15 17:29:43 +00:00
Gheorghe-Teodor Bercea	9e94326185	[OpenMP][libomptarget] Fix data sharing and globalization infrastructure to work in SPMD mode Summary: This patch fixes the data sharing infrastructure to work for the SPMD and non-SPMD cases. Reviewers: ABataev, grokos, carlo.bertolli, caomhin Reviewed By: ABataev, grokos Subscribers: guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D49204 llvm-svn: 337013	2018-07-13 16:14:22 +00:00
Alexey Bataev	c2c0138a04	[OPENMP, NVPTX] Fix loop boundaries calculation for dynamic loops. Summary: Patch fixes the next problems. 1. Removes unused functions from omptarget_nvptx_ThreadPrivateContext class + simplified data members. 2. Fixed calculation of loop boundaries for dynamic loops with static scheduling. 3. Introduced saving/restoring of the dynamic loop boundaries to support several nested parallel dynamic loops. Reviewers: grokos Subscribers: guansong, kkwli0, openmp-commits Differential Revision: https://reviews.llvm.org/D49241 llvm-svn: 336915	2018-07-12 15:18:28 +00:00
Jonathan Peyton	dc73f512ae	Fix const cast problem introduced in r336563 336563 eliminated CCAST() macros caused build failures llvm-svn: 336586	2018-07-09 19:09:31 +00:00
Jonathan Peyton	61d44f188a	[OpenMP] Fix a few formatting issues llvm-svn: 336575	2018-07-09 18:09:25 +00:00
Jonathan Peyton	f639936748	[OpenMP] Introduce hierarchical scheduling This patch introduces the logic implementing hierarchical scheduling. First and foremost, hierarchical scheduling is off by default To enable, use -DLIBOMP_USE_HIER_SCHED=On during CMake's configure stage. This work is based off if the IWOMP paper: "Workstealing and Nested Parallelism in SMP Systems" Hierarchical scheduling is the layering of OpenMP schedules for different layers of the memory hierarchy. One can have multiple layers between the threads and the global iterations space. The threads will go up the hierarchy to grab iterations, using possibly a different schedule & chunk for each layer. [ Global iteration space (0-999) ] (use static) [ L1 \| L1 \| L1 \| L1 ] (use dynamic,1) [ T0 T1 \| T2 T3 \| T4 T5 \| T6 T7 ] In the example shown above, there are 8 threads and 4 L1 caches begin targeted. If the topology indicates that there are two threads per core, then two consecutive threads will share the data of one L1 cache unit. This example would have the iteration space (0-999) split statically across the four L1 caches (so the first L1 would get (0-249), the second would get (250-499), etc). Then the threads will use a dynamic,1 schedule to grab iterations from the L1 cache units. There are currently four supported layers: L1, L2, L3, NUMA OMP_SCHEDULE can now read a hierarchical schedule with this syntax: OMP_SCHEDULE='EXPERIMENTAL LAYER,SCHED[,CHUNK][:LAYER,SCHED[,CHUNK]...]:SCHED,CHUNK And OMP_SCHEDULE can still read the normal SCHED,CHUNK syntax from before I've kept most of the hierarchical scheduling logic inside kmp_dispatch_hier.h to try to keep it separate from the rest of the code. Differential Revision: https://reviews.llvm.org/D47962 llvm-svn: 336571	2018-07-09 17:51:13 +00:00
Alexey Bataev	2622e9e5b3	[OPENMP, NVPTX] Support several images in the executable. Summary: Currently Cuda plugin supports loading of the single image, though we may have the executable with the several images, if it has target regions inside of the dynamically loaded library. Patch allows to load multiple images. Reviewers: grokos Subscribers: guansong, openmp-commits, kkwli0 Differential Revision: https://reviews.llvm.org/D49036 llvm-svn: 336569	2018-07-09 17:46:55 +00:00
Jonathan Peyton	39ada85446	[OpenMP] Restructure loop code for hierarchical scheduling This patch reorganizes the loop scheduling code in order to allow hierarchical scheduling to use it more effectively. In particular, the goal of this patch is to separate the algorithmic parts of the scheduling from the thread logistics code. Moves declarations & structures to kmp_dispatch.h for easier access in other files. Extracts the algorithmic part of __kmp_dispatch_init() and __kmp_dispatch_next() into __kmp_dispatch_init_algorithm() and __kmp_dispatch_next_algorithm(). The thread bookkeeping logic is still kept in __kmp_dispatch_init() and __kmp_dispatch_next(). This is done because the hierarchical scheduler needs to access the scheduling logic without the bookkeeping logic. To prepare for new pointer in dispatch_private_info_t, a new flags variable is created which stores the ordered and nomerge flags instead of them being in two separate variables. This will keep the dispatch_private_info_t structure the same size. Differential Revision: https://reviews.llvm.org/D47961 llvm-svn: 336568	2018-07-09 17:45:33 +00:00
Jonathan Peyton	37e2ef5434	[OpenMP] Use C++11 Atomics - barrier, tasking, and lock code These are preliminary changes that attempt to use C++11 Atomics in the runtime. We are expecting better portability with this change across architectures/OSes. Here is the summary of the changes. Most variables that need synchronization operation were converted to generic atomic variables (std::atomic<T>). Variables that are updated with combined CAS are packed into a single atomic variable, and partial read/write is done through unpacking/packing Patch by Hansang Bae Differential Revision: https://reviews.llvm.org/D47903 llvm-svn: 336563	2018-07-09 17:36:22 +00:00
Kelvin Li	b1711b28f7	Define the __STDC_FORMAT_MACROS to avoid test failure on some platforms. ompt/misc/api_calls_from_other_thread.cpp ompt/misc/interoperability.cpp Differential Revision: https://reviews.llvm.org/D48984 llvm-svn: 336438	2018-07-06 14:15:59 +00:00
Joachim Protze	b41c61eed4	Dropped non-supoorted "--no-as-needed" flag from OMPT tests for macOS The flag "--no-as-needed" is not recognized by the linker on macOS making the following tests fail: ompt/loadtool/tool_available/tool_available.c ompt/loadtool/tool_not_available/tool_not_available.c This patch removes this flag for macOS and adds it only for Linux and Windows. I tested it on Ubuntu 16.04 and macOS HighSierra, with Clang/LLVM 6.0.1 and OpenMP trunk. This solution was also discussed in the OpenMP-dev mailing list. Patch provided by Simone Atzeni Differential Revision: https://reviews.llvm.org/D48888 llvm-svn: 336327	2018-07-05 09:14:06 +00:00
Joachim Protze	00505b85a3	[OMPT] Add synchronization to threads_nested.c testcase The testcase potentially fails when a thread is reused. The added synchronization makes sure this does not happen. Patch provided by Simon Convent Differential Revision: https://reviews.llvm.org/D48932 llvm-svn: 336326	2018-07-05 09:14:01 +00:00
Joachim Protze	04a00fc18c	[OMPT] Use alloca() to force availability of frame pointer When compiling with icc, there is a problem with reenter frame addresses in parallel_begin callbacks in the interoperability.c testcase. (The address is not available. thus NULL) Using alloca() forces availability of the frame pointer. Patch provided by Simon Convent Differential Revision: https://reviews.llvm.org/D48282 llvm-svn: 336088	2018-07-02 09:13:38 +00:00
Joachim Protze	e2eec57a4f	[OMPT] Add tests for runtime entry points from non-OpenMP threads Several runtime entry points have not been tested from non-OpenMP threads. This adds tests to an existing testcase. While at it, the testcase was reformatted Patch provided by Simon Convent Differential Revision: https://reviews.llvm.org/D48124 llvm-svn: 336087	2018-07-02 09:13:34 +00:00
Joachim Protze	28d2d708d4	[OMPT] Add testcases for thread_begin and thread_end callbacks Especially the thread_end callback has not been tested before. This adds a testcase for nested and non-nested threads. Patch provided by Simon Convent Differential Revision: https://reviews.llvm.org/D47824 llvm-svn: 336086	2018-07-02 09:13:30 +00:00
Joachim Protze	4a73ae167e	[OMPT] Provide the right thread_num for ancestor levels The current implementation always provides the thread-num for the current parallel region. This patch fixes the behavior for ancestor levels >0. Differential Revision: https://reviews.llvm.org/D46533 llvm-svn: 336085	2018-07-02 09:13:24 +00:00
Alexey Bataev	3994bafbc7	[OPENMP, NVPTX] Sync threads before start ordered loops. Summary: Threads must be synchronized before starting ordered construct. Reviewers: grokos Subscribers: guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D48732 llvm-svn: 335987	2018-06-29 16:16:00 +00:00
Alexey Bataev	0ac29350b5	[OPENMP, NVPTX] Fixes for NVPTX RTL Summary: Patch fixes several problems in the implementation of NVPTX RTL. 1. Detection of the last iteration for loops with static scheduling, no chunks. 2. Fixes reductions for the serialized parallel constructs. 3. Fixes handling of the barriers. Reviewers: grokos Reviewed By: grokos Subscribers: Hahnfeld, guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D48480 llvm-svn: 335469	2018-06-25 13:43:35 +00:00
Andrey Churbanov	a7fa3f009a	minor: fixed typo in debug print llvm-svn: 335138	2018-06-20 15:54:11 +00:00
Jonas Hahnfeld	d03cbf2cfe	Remove liboffload from repository See the mailing list for the proposal and discussion: http://lists.llvm.org/pipermail/openmp-dev/2018-June/002041.html llvm-svn: 335069	2018-06-19 19:08:17 +00:00
Guansong Zhang	f9e56e5982	[OpenMP] [CUDA] Expose teamid to the debug path Summary: Small bug fix for debug build. A previous fix causing trouble for debug build. Reviewers: grokos Reviewed By: grokos Subscribers: openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D48286 llvm-svn: 335046	2018-06-19 14:05:38 +00:00
Jonathan Peyton	e92ae43be8	[OpenMP] Fix formatting issues in kmp_stats.h llvm-svn: 334335	2018-06-08 22:27:53 +00:00
Joachim Protze	406361330b	[OMPT] Rename ompt_wait_id to omp_wait_id Rename ompt_wait_id to omp_wait_id, as defined in the spec. Differential Revision: https://reviews.llvm.org/D46530 llvm-svn: 333368	2018-05-28 08:16:08 +00:00
Joachim Protze	c5836064bb	[OMPT] Rename ompt_frame_t to omp_frame_t Rename ompt_frame_t to omp_frame_t, as defined in the spec. Differential Revision: https://reviews.llvm.org/D43568 llvm-svn: 333367	2018-05-28 08:14:58 +00:00
Jonas Hahnfeld	3c6595d65d	[OMPT] Fix test parallel/not_enough_threads.c Upcoming changes to FileCheck will modify CHECK-DAG to not match overlapping regions of the input. This test was found to be affected because it expects to find four threads to invoke events of type ompt_event_implicit_task_begin. It turns out this is wrong because OMP_THREAD_LIMIT is set to 2, so there are only two threads. The rest of the test got it right so it went unnoticed until now. (Rewrite test and apply clang-format to it as discussed in the past.) Differential Revision: https://reviews.llvm.org/D47119 llvm-svn: 333361	2018-05-27 17:07:38 +00:00
Jonas Hahnfeld	17aabf83e9	[libomptarget-nvptx] loop: Determine if runtime uninitialized The generic entry points for static loop scheduling previously hardcoded that the runtime was initialized. This can be wrong if the compiler analyzes that the runtime is not needed and calls the init functions accordingly. This didn't affect clang-ykt because they have entry points for different combinations of SPMD x Runtime not needed. I didn't do measurements yet but with inlining we might get away with always calling the generic interface and letting compiler and runtime figure out the rest. In any case, a correct runtime is always better than having functions that may only be called if previous calls passed in a specific set of arguments! Differential Revision: https://reviews.llvm.org/D47131 llvm-svn: 333285	2018-05-25 15:56:48 +00:00
Jonas Hahnfeld	65e0b8784c	[CMake] Unify install path for libraries Introduce OPENMP_INSTALL_LIBDIR and use in all install() commands. This also fixes installation of libomptarget-nvptx that previously didn't honor {OPENMP,LLVM}_LIBDIR_SUFFIX. Differential Revision: https://reviews.llvm.org/D47130 llvm-svn: 333284	2018-05-25 15:56:41 +00:00
George Rokos	6da6f433a0	[CUDA]Fix dynamic\|guided scheduling. The existing implementation of the dynamic scheduling breaks the contract introduced by the original openmp runtime and, thus, is incorrect. Patch fixes it and introduces correct dynamic scheduling model. Thanks to Alexey Bataev for submitting this patch. Differential Revision: https://reviews.llvm.org/D47333 llvm-svn: 333225	2018-05-24 21:12:41 +00:00
Jonas Hahnfeld	9228f9718c	[libomptarget-nvptx-bc] Pass found CUDA installations We already know where the CUDA SDK is, so there is no point in letting Clang search for it again and possibly finding no or a different installation. --cuda-path is supported since the beginning of CUDA support in Clang, so making this required doesn't impose additional restrictions. Differential Revision: https://reviews.llvm.org/D46930 llvm-svn: 332495	2018-05-16 17:20:27 +00:00
Jonas Hahnfeld	37bbe1a698	[libomptarget-nvptx] Test bitcode compiler flags and enable by default Move all logic related to selecting the bitcode compiler and linker into a new file and dynamically test required compiler flags. This also adds -fcuda-rdc for Clang trunk as previously attempted in D44992 which fixes the build. As a result this change also enables building the library by default if all prerequisites are met. Differential Revision: https://reviews.llvm.org/D46901 llvm-svn: 332494	2018-05-16 17:20:21 +00:00
Gheorghe-Teodor Bercea	787a350021	[OpenMP][libomptarget] Add function for checking SPMD mode Summary: Add function to the NVPTX libomptarget library that will return true if the current target region is being executed in SPMD mode. Reviewers: ABataev, grokos, carlo.bertolli, caomhin Reviewed By: grokos Subscribers: guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D46840 llvm-svn: 332360	2018-05-15 15:16:43 +00:00
Joachim Protze	9be9cf20bf	[OMPT] Fix thread_num for implicit_task_end callbacks in nested parallel regions implicit_task_end callbacks in nested parallel regions did not always give the correct thread_num, since the inner parallel region may have already been finalized. Now, the thread_num is stored at the beginning of the implicit task and retrieved at the end, whenever necessary. A testcase was added as well. Differential Revision: https://reviews.llvm.org/D46260 llvm-svn: 331632	2018-05-07 12:42:21 +00:00
Joachim Protze	8fc39f6b19	[OMPT] Add api_calls_misc.c testcase and rename api_calls.c testcase The api_calls_misc.c testcase tests the following api calls: ompt_get_callback() ompt_get_state() ompt_enumerate_states() ompt_enumerate_mutex_impls() These have not been tested previously. The api_calls.c testcase has been renamed to api_calls_places.c because it only tests api calls that are related to places. Differential Revision: https://reviews.llvm.org/D42523 llvm-svn: 331631	2018-05-07 12:42:15 +00:00
Guansong Zhang	e1c7a46d5b	[OpenMP] Use LIBOMPTARGET_DEVICE_RTL_DEBUG env var to control debug messages on the device side Summary: Enable the device side debug messages at compile time, use env var to control at runtime. To achieve this, an environment data block is passed to the device lib when it is loaded. By default, the message is off, to enable it, a user need to set LIBOMPDEVICE_DEBUG=1. Reviewers: grokos Reviewed By: grokos Subscribers: openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D46210 llvm-svn: 331550	2018-05-04 19:29:28 +00:00
Jonathan Peyton	d47df260ba	[OpenMP][OMPT] Fix api_calls_from_other_thread.cpp Removed environment setting in RUN: line that was being ignored anyways. Changed a few specific checks to "any number" llvm-svn: 331212	2018-04-30 18:46:31 +00:00
Guansong Zhang	ad6c26516b	[OpenMP] Remove compilation warning when using clang to compile bc files. Summary: Minor printf format correction. NVCC ignore those. Clang will give warning on these if debug is enabled. Reviewers: grokos Reviewed By: grokos Subscribers: openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D45528 llvm-svn: 330944	2018-04-26 14:06:53 +00:00
Guansong Zhang	334c379e32	[OpenMP] Make bc file compilation sensitive to LIBOMPTARGET_NVPTX_DEBUG flag Summary: The LIBOMPTARGET_NVPTX_DEBUG flag is inconsistent between using nvcc to generate .a file and clang to generate .bc file. Sync the two setting so we can get debug messages from the bc file path as well. Reviewers: grokos Subscribers: Hahnfeld, openmp-commits, mgorny Tags: #openmp Differential Revision: https://reviews.llvm.org/D45530 llvm-svn: 330477	2018-04-20 20:41:00 +00:00
Heejin Ahn	f78a493528	[OpenMP] Compilation error fix on const char* Summary: This line (`0ed912c7a7/runtime/src/kmp_gsupport.cpp (L1459)`) added in D45327 (rL330282) causes a compilation failure. Reviewers: jlpeyton Subscribers: guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D45786 llvm-svn: 330299	2018-04-18 22:23:31 +00:00
Jonathan Peyton	1482db9e03	[OpenMP] Fix affinity API for KMP_AFFINITY=none\|compact\|scatter Currently, the affinity API reports garbage for the initial place list and any thread's place lists when using KMP_AFFINITY=none\|compact\|scatter. This patch does two things: for KMP_AFFINITY=none, Creates a one entry table for the places, this way, the initial place list is just a single place with all the proc ids in it. We also set the initial place of any thread to 0 instead of KMP_PLACE_ALL so that the thread reports that single place (place 0) instead of garbage (-1) when using the affinity API. When non-OMP_PROC_BIND affinity is used (including KMP_AFFINITY=compact\|scatter), a thread's place list is populated correctly. We assume that each thread is assigned to a single place. This is implemented in two of the affinity API functions Differential Revision: https://reviews.llvm.org/D45527 llvm-svn: 330283	2018-04-18 19:25:48 +00:00
Jonathan Peyton	27a677fc95	Introduce GOMP_taskloop API This patch introduces GOMP_taskloop to our API. It adds GOMP_4.5 to our version symbols. Being a wrapper around __kmpc_taskloop, the function creates a task with the loop bounds properly nested in the shareds so that the GOMP task thunk will work properly. Also, the firstprivate copy constructors are properly handled using the __kmp_gomp_task_dup() auxiliary function. Currently, only linear spawning of tasks is supported for the GOMP_taskloop interface. Differential Revision: https://reviews.llvm.org/D45327 llvm-svn: 330282	2018-04-18 19:23:54 +00:00
Joachim Protze	3865c69b84	Set the license header for all OMPT files llvm-svn: 329928	2018-04-12 17:23:26 +00:00
Guansong Zhang	f679431f91	[OpenMP] Remove extra warning when we build Summary: This one line change is to remove this warning message "warning: integer conversion resulted in a change of sign" Reviewers: grokos Reviewed By: grokos Subscribers: openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D45415 llvm-svn: 329713	2018-04-10 15:28:31 +00:00
Guansong Zhang	f0029a7738	Revert "[OpenMP] enable bc file compilation using the latest clang" This reverts commit 6849e31c36d712d97433bca9af39b7a09c8c1207. llvm-svn: 329576	2018-04-09 14:45:41 +00:00
Guansong Zhang	e47fbc9da8	[OpenMP] enable bc file compilation using the latest clang Summary: adding cuda-rdc flag to allow extern global data Reviewers: grokos Reviewed By: grokos Subscribers: gregrodgers, mgorny, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D44992 llvm-svn: 329072	2018-04-03 15:01:34 +00:00
Jonathan Peyton	1e6bb8d5de	Minor cleanup in __kmp_atfork_child() This change removes the unnecessary lock operation on __kmp_initz_lock inside the __kmp_atfork_child() function for Linux; the lock variable is initialized in the same function later. Patch by Hansang Bae Differential Revision: https://reviews.llvm.org/D44949 llvm-svn: 328900	2018-03-30 19:55:11 +00:00
Jonathan Peyton	ea82c769f4	Move blocktime_str variable right before its first use llvm-svn: 328575	2018-03-26 19:20:50 +00:00
Jonathan Peyton	b6b79ac95b	Add summarizeStats.py to tools directory The summarizeStats.py script processes raw data provided by the instrumented (stats-gathering) OpenMP* runtime library. It provides: 1) A radar chart which plots counters as frequency (per GigaTick) of use within the program. The frequencies are plotted as log10, however values less than one are kept as it is and represented in red color. This was done to help visualize the differences better. 2) Pie charts separating total time as compute and non-compute. The compute and non-compute times have their own pie charts showing the constructs that contributed to them. The percentages listed are with respect to the total time. 3) '.csv' file with percentage of time spent within the different constructs. The script can be used as: $ python $PATH_TO_SCRIPT/summarizeStats.py instrumented1.csv instrumented2.csv Patch by Taru Doodi Differential Revision: https://reviews.llvm.org/D41838 llvm-svn: 328568	2018-03-26 18:44:48 +00:00
Andrey Churbanov	2d91a8a3ba	Fixed __kmpc_get_target_offload() to call library initialization. Differential Revision: https://reviews.llvm.org/D44793 llvm-svn: 328228	2018-03-22 18:51:51 +00:00
Gheorghe-Teodor Bercea	4bc36a06e2	[OpenMP][libomptarget] Initialize global memory stack only once. Summary: The global stack initialization function may be called multiple times. The initialization of the shared memory slots should only happen when the function is called for the first time for a given warp master thread. Reviewers: grokos, carlo.bertolli, ABataev, caomhin Reviewed By: grokos Subscribers: guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D44754 llvm-svn: 328148	2018-03-21 21:02:55 +00:00
Gheorghe-Teodor Bercea	b4332ca3da	[OpenMP][libomptarget] Fix master warp check Summary: The check for the master warp must take into consideration the actual number of warps: the master warp is equal to the last active warp not necessarily WARPSIZE - 1. Reviewers: grokos, carlo.bertolli, ABataev, caomhin Reviewed By: grokos Subscribers: guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D44537 llvm-svn: 328146	2018-03-21 20:51:16 +00:00
Gheorghe-Teodor Bercea	c8d395a168	[OpenMP][libomptarget] Enable globalization for workers Summary: This patch allows worker to have a global memory stack managed by the runtime. This patch is needed for completeness and consistency with the globalization policy: if a worker-side variable escapes the current context it then needs to be globalized. Until now, only the master thread was allowed to have such a stack. These global values can now potentially be shared amongst workers if the semantics of the OpenMP program require it. Reviewers: ABataev, grokos, carlo.bertolli, caomhin Reviewed By: grokos Subscribers: guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D44487 llvm-svn: 328144	2018-03-21 20:34:19 +00:00
Jonathan Peyton	78f977fcd1	Read OMP_TARGET_OFFLOAD and provide API to access ICV Added settings code to read OMP_TARGET_OFFLOAD environment variable. Added target-offload-var ICV as __kmp_target_offload, set via OMP_TARGET_OFFLOAD, if available, otherwise defaulting to DEFAULT. Valid values for the ICV are specified as enum values {0,1,2} for disabled, default, and mandatory. An internal API access function __kmpc_get_target_offload is provided. Patch by Terry Wilmarth Differential Revision: https://reviews.llvm.org/D44577 llvm-svn: 328046	2018-03-20 21:18:17 +00:00
Andrey Churbanov	3336aa0d07	Fix for Fix for https://bugs.llvm.org/show_bug.cgi?id=36705 . Differential Revision: https://reviews.llvm.org/D44637 llvm-svn: 327875	2018-03-19 18:05:15 +00:00
George Rokos	6b9bb5e1c2	Bugfix, extern declarations for libomp functions are `extern "C"` declarations llvm-svn: 327763	2018-03-17 02:07:42 +00:00
George Rokos	2878c3957b	Moved extern declarations to private header file, they are only used from within libomptarget, they don't need to be in omptarget.h. llvm-svn: 327740	2018-03-16 20:40:09 +00:00
Gheorghe-Teodor Bercea	876c1ed2e5	[OpenMP][libomptarget] Enable usage of shared memory slots Summary: Allow the runtime to use the existing shared memory statically allocated slots. When a variable is globalized, the underlying memory can be either shared or global memory (both have block-wide visibility). In this case, we allow that the storage to use a limited amount of shared memory that has been statically allocated already. Only if shared memory doesn't prove to be enough do we then invoke malloc() to create a new global memory slot. Reviewers: ABataev, carlo.bertolli, grokos, caomhin Reviewed By: grokos Subscribers: guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D44486 llvm-svn: 327639	2018-03-15 16:05:34 +00:00
Gheorghe-Teodor Bercea	f3de222b0d	[OpenMP][libomptarget] Enable multiple frames per global memory slot Summary: To save on calls to malloc, this patch enables the re-use of pre-allocated global memory slots. Reviewers: ABataev, grokos, carlo.bertolli, caomhin Reviewed By: grokos Subscribers: guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D44470 llvm-svn: 327637	2018-03-15 15:56:04 +00:00
George Rokos	59be4b434f	[libomptarget][nvptx] Bug fix: Correctly identify the warp master active thread. llvm-svn: 327556	2018-03-14 19:11:36 +00:00
Gheorghe-Teodor Bercea	49b62649cf	[OpenMP][libomptarget] Add global memory data sharing support for master-worker sharing. Summary: This patch adds support for the sharing of variables from the master thread of a team to the worker threads of the team. The runtime uses a stack structure implemented as a doubly-linked list of slots with each slot having the exact same size as the size requested. This implementation leverages existing data structures. The runtime functions are added as separate functions to avoid interfering with the current interface. Limitations to be addressed in future patches: - This current patch only employs global memory. In a future patch we will enable to usage for shared memory as an optimization. - Allow the allocation of several requested sizes in the same slot. Reviewers: ABataev, grokos, caomhin, carlo.bertolli Reviewed By: grokos Subscribers: Hahnfeld, guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D44260 llvm-svn: 327440	2018-03-13 19:44:53 +00:00
Sylvestre Ledru	1c861c582f	fix a typo on the website llvm-svn: 327237	2018-03-11 10:53:40 +00:00
Gheorghe-Teodor Bercea	d5e5992f9a	[OpenMP][libomptarget] Fix union. Summary: To make the two parts of the union have the same size, the size of vect needs to be increased by 16 bits. Reviewers: grokos, carlo.bertolli, caomhin, ABataev Reviewed By: grokos, ABataev Subscribers: fedor.sergeev, guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D44254 llvm-svn: 327040	2018-03-08 18:44:02 +00:00
Gheorghe-Teodor Bercea	7a5fa21ae2	[OpenMP] Remove implicit data sharing using device shared memory from libomptarget Summary: This patch reverts the changes to libomptarget that were coupled with the changes to Clang code gen for data sharing using shared memory. A similar patch exists for Clang: D43625 Shared memory is meant to be used as an optimization on top of a more general scheme. So far we didn't have a global memory implementation ready so shared memory was a solution which applied to the current level of OpenMP complexity supported by trunk on GPU devices (due to the missing NVPTX backend patch this functionality has never been exercised). Now that we have a global memory solution this patch is "in the way" and needs to be removed (for now). This patch (or an equivalent version of it) will be put out for review once the global memory scheme is in place. Reviewers: ABataev, grokos, carlo.bertolli, caomhin Reviewed By: grokos Subscribers: Hahnfeld, guansong, openmp-commits Differential Revision: https://reviews.llvm.org/D43626 llvm-svn: 326950	2018-03-07 22:10:10 +00:00
Andrey Churbanov	9e9333aa8a	Improve OpenMP threadprivate implementation. Patch by Terry Wilmarth Differential Revision: https://reviews.llvm.org/D41914 llvm-svn: 326733	2018-03-05 18:42:01 +00:00
Andrey Churbanov	75bc70fb56	Fixed build of the OpenMP stubs library. Differential Revision: https://reviews.llvm.org/D44019 llvm-svn: 326728	2018-03-05 18:01:47 +00:00
Jonas Hahnfeld	b0f051ae63	[OMPT] Fix interoperability test with GCC We have to ensure that the runtime is initialized _before_ waiting for the two started threads to guarantee that the master threads post their ompt_event_thread_begin before the worker threads. This is not guaranteed in the parallel region where one worker thread could start before the other master thread has invoked the callback. The problem did not happen with Clang becauses the generated code calls __kmpc_global_thread_num() and cashes its result for functions that contain OpenMP pragmas. Differential Revision: https://reviews.llvm.org/D43882 llvm-svn: 326435	2018-03-01 14:03:18 +00:00
Joachim Protze	f5aebc27ad	[OMPT] Fix task-type test with GCC This is similar to D43882. The runtime needs to be initialized before calling print_ids(0) http://lab.llvm.org:8011/builders/openmp-gcc-x86_64-linux-debian/builds/60 Differential Revision: https://reviews.llvm.org/D43897 llvm-svn: 326428	2018-03-01 11:26:15 +00:00
Joachim Protze	aa2022e74f	[OMPT] Fix ompt_get_task_info() and add tests for it The thread_num parameter of ompt_get_task_info() was not being used previously, but need to be set. The print_task_type() function (form the task-types.c testcase) was merged into the print_ids() function (in callback.h). Testing of ompt_get_task_info() was added to the task-types.c testcase. It was not tested extensively previously. Differential Revision: https://reviews.llvm.org/D42472 llvm-svn: 326338	2018-02-28 17:36:18 +00:00
Joachim Protze	4df80bda40	[OMPT] Fix inconsistent testcases The main change of this patch is to insert {{.*}} in current_address=[[RETURN_ADDRESS_END]]. This is needed to match any of the alternatively printed addresses. Additionally, clang-format is applied to the two tests. Differential Revision: https://reviews.llvm.org/D43115 llvm-svn: 326312	2018-02-28 09:28:51 +00:00
Jonas Hahnfeld	82768d0ba1	[OMPT] Fix parallel_data in implicit barrier-end This is required to be NULL for implicit barriers at the end of a parallel region. Noticed in review of D43191. Differential Revision: https://reviews.llvm.org/D43308 llvm-svn: 325922	2018-02-23 16:46:25 +00:00
Jonas Hahnfeld	5e44069857	[OMPT] Fix test tasks/serialized.c with optimization The compiler inlines the user code in the task. Check for that case at runtime by comparing the frame addresses and print the expected exit address. Also showcase how I think the OMPT tests could be reformatted to match LLVM's code style. In my opinion it would be great to that kind of change to all tests that need to be touched for whatever reason... Differential Revision: https://reviews.llvm.org/D43191 llvm-svn: 325921	2018-02-23 16:46:11 +00:00
Joachim Protze	b0e4f87fb0	[OMPT] Omissionin in OMPT Formatting Applying clang-format to the /runtime/src/ folder Differential Revision: https://reviews.llvm.org/D42169 llvm-svn: 325424	2018-02-17 09:54:10 +00:00
Joachim Protze	33db70d2d7	[OMPT] Add interoperability testcase Test whether OMPT-callbacks for two threads that initiate a parallel region are correct. Differential Revision: https://reviews.llvm.org/D41942 llvm-svn: 325423	2018-02-17 09:40:08 +00:00
Joachim Protze	76899b84fe	[OMPT] Update api_calls testcase Only use ompt_ functions when testing OMPT in api_calls testcase. Add size parameter to print_list. Fix small bug in implementation of ompt_get_partition_place_nums(): return correct length. Differential Revision: https://reviews.llvm.org/D42162 llvm-svn: 325422	2018-02-17 09:40:02 +00:00
Jonas Hahnfeld	6f9e25d382	[CMake] Add -fno-experimental-isel for testing GlobalISel doesn't yet implement blockaddress and falls back to SelectionDAG. This results in additional branch instruction to the next basic block which breaks the OMPT tests. Disable GlobalISel for now when compiling the tests because fixing them is not easily possible. See http://llvm.org/PR36313 for full discussion history. Differential Revision: https://reviews.llvm.org/D43195 llvm-svn: 325218	2018-02-15 08:10:22 +00:00
Jonas Hahnfeld	cc6d29d72c	[OMPT][test] Correct warning about added wrapper functions This affects all outlined functions, not just tasks! Only show warning when using Clang 5.0 or later. Differential Revision: https://reviews.llvm.org/D43190 llvm-svn: 325131	2018-02-14 15:15:24 +00:00
Gheorghe-Teodor Bercea	d5ae4e6501	[OpenMP][libomptarget] Enable the compilation of multiple bc libraries for runtime inlining Summary: Different NVIDIA GPUs support different compute capabilities. To enable the inlining of runtime functions and the best performance on different generations of NVIDIA GPUs, a bc library for each compute capability needs to be compiled. The same compiler build will then be usable in conjunction with multiple generations of NVIDIA GPUs. To differentiate between versions of the same bc lib, the output file name will contain the compute capability ID. Depends on D14254 Reviewers: Hahnfeld, hfinkel, carlo.bertolli, caomhin, ABataev, grokos Reviewed By: Hahnfeld, grokos Subscribers: guansong, mgorny, openmp-commits Differential Revision: https://reviews.llvm.org/D41724 llvm-svn: 324904	2018-02-12 16:45:20 +00:00
Jonas Hahnfeld	3cfaf3dd0d	[libomptarget] Fix detection of CUDA stubs library CUDA_LIBRARIES contains additional linker arguments since CMake 3.3 which breakes the current way of finding the stubs library. llvm-svn: 324879	2018-02-12 11:01:56 +00:00
Joachim Protze	cfc98c2493	[OMPT] Add tool_available_search testcase Tests the search for tools as defined in the spec. The OMP_TOOL_LIBRARIES environment variable contains paths to the following files(in that order) -to a nonexisting file -to a shared library that does not have a ompt_start_tool function -to a shared library that has an ompt_start_tool implementation returning NULL -to a shared library that has an ompt_start_tool implementation returning a pointer to a valid instance of ompt_start_tool_result_t The expected result is that the last tool gets active and can print in the thread-begin callback. Differential Revision: https://reviews.llvm.org/D42166 llvm-svn: 324588	2018-02-08 10:04:33 +00:00
Joachim Protze	9440c0ee3c	[OMPT] Add tool_not_available testcase Add a testcase that checks wheter the runtime can handle an ompt_start_tool method that returns NULL indicating that no tool shall be loaded. All tool_available testcases need a separate folder to avoid file conflicts for the generated tools. Differential Revision: https://reviews.llvm.org/D41904 llvm-svn: 324587	2018-02-08 10:04:28 +00:00
Gheorghe-Teodor Bercea	aaeab8d4ef	[OpenMP][libomptarget] Add data sharing support in libomptarget Summary: This patch extends the libomptarget functionality in patch D14254 with support for the data sharing scheme for supporting implicitly shared variables. The runtime therefore maintains a list of references to shared variables. Reviewers: carlo.bertolli, ABataev, Hahnfeld, grokos, caomhin, hfinkel Reviewed By: Hahnfeld, grokos Subscribers: guansong, llvm-commits, openmp-commits Differential Revision: https://reviews.llvm.org/D41485 llvm-svn: 324495	2018-02-07 18:21:55 +00:00
Joachim Protze	2a20299f91	[OMPT] Fix tool initialization returning 0 If tool initialization returns 0, OMPT should not be active. The current implementation provided some callback invocations in this case. Differential Revision: https://reviews.llvm.org/D42709 llvm-svn: 324320	2018-02-06 08:41:27 +00:00
Carlo Bertolli	57e9f44a8c	[OpenMP-RT] Fix debug string for NVPTX runtime library https://reviews.llvm.org/D42757 The method ThreadsInTeam is used to determine the number of threads to be used in a parallel region under SPMD mode (see line 127 of supporti.h in libomptarget/deviceRTLs/nvptx/src/). This patch fixes the corresponding debug print upon initialization of the kernel in SPMD mode. llvm-svn: 323978	2018-02-01 16:12:16 +00:00
Jonas Hahnfeld	a349d4820c	[libomptarget] Check for library with CUDA Driver API That's what we really need to link the CUDA plugin against, not the CUDA runtime API in CUDA_LIBRARIES! While the latter comes with the CUDA SDK, the Driver API is installed with the kernel driver and there is at most one per system. As fallback we can use the stubs library distributed with the CUDA SDK for linking. Differential Revision: https://reviews.llvm.org/D42643 llvm-svn: 323787	2018-01-30 16:49:13 +00:00
Jonas Hahnfeld	c189523529	[libomptarget] Only use CUDA Driver API Use equivalents for the last calls to the Runtime API. Remove stray assert in case of an error found during review, we should only return OFFLOAD_FAIL. Differential Revision: https://reviews.llvm.org/D42686 llvm-svn: 323786	2018-01-30 16:49:06 +00:00
George Rokos	0dd6ed74fd	[OpenMP] Initial implementation of OpenMP offloading library - libomptarget device RTLs. This patch implements the device runtime library whose interface is used in the code generation for OpenMP offloading devices. Currently there is a single device RTL written in CUDA meant to CUDA enabled GPUs. The interface is a variation of the kmpc interface that includes some extra calls to do thread and storage management that only make sense for a GPU target. Differential revision: https://reviews.llvm.org/D14254 llvm-svn: 323649	2018-01-29 13:59:35 +00:00
Jonas Hahnfeld	723560d123	[OMPT] Use fuzzy return addresses in lock testcases Use fuzzy return addresses in lock testcases so that these testcases can also be run using the Intel Compiler. Patch by Simon Convent! Differential Revision: https://reviews.llvm.org/D41896 llvm-svn: 323529	2018-01-26 14:19:02 +00:00
Jonas Hahnfeld	e57620308e	Fix name of 'macOS' and add asteriks to brands, NFC. llvm-svn: 323180	2018-01-23 07:54:10 +00:00
Dimitry Andric	9f49676a8a	Sprinkle a few <cstdlib> includes, for libomptarget sources using malloc, free, alloca and getenv. NFCI. llvm-svn: 322869	2018-01-18 18:24:22 +00:00
Jonas Hahnfeld	e5499111b9	Add missing headers for Debug builds llvm-svn: 322830	2018-01-18 10:58:43 +00:00
Joachim Protze	e6269e3509	Partial revert of [OMPT] Rename ompt_mutex_impl_t to kmp_mutex_impl The previous commit did not revert all replaced ompt_mutex_impl_unknown. llvm-svn: 322631	2018-01-17 11:13:11 +00:00
Joachim Protze	0c9516b36c	[OMPT] Add Workaround for Intel Compiler Bug Add Workaround for Intel Compiler Bug with Case#: 03138964 A critical region within a nested task causes a segfault in icc 14-18: int main() { #pragma omp parallel num_threads(2) #pragma omp master #pragma omp task #pragma omp task #pragma omp critical printf("test\n"); } When the critical region is in a separate function, the segault does not occur. So we add noinline to make sure that the function call stays there. Differential Revision: https://reviews.llvm.org/D41182 llvm-svn: 322622	2018-01-17 10:06:06 +00:00
Joachim Protze	1b2bd2680b	[OMPT] Rename ompt_mutex_impl_t to kmp_mutex_impl The defintion is not part of the spec and thus should not have the prefix "ompt_" but rather a prefix that indicates that this is implementation specific. Differential Revision: https://reviews.llvm.org/D41166 llvm-svn: 322621	2018-01-17 10:06:01 +00:00
Joachim Protze	1dc2afdcaf	[OMPT] Return appropiate values for ompt runtime entry points for non-OpenMP threads When the current thread is not an (initialized) OpenMP thread, the runtime entry points return values that correspond to "not available" or similar Differential Revision: https://reviews.llvm.org/D41167 llvm-svn: 322620	2018-01-17 10:05:55 +00:00

1 2 3 4 5 ...

932 Commits