llvm-project

Commit Graph

Author	SHA1	Message	Date
Alexey Bataev	bfc8f9e9b0	[clang] Fix computation of number of dependencies using OpenMP iterator, by Raul Penacoba. The size of kmp_depend_info and the number of dependencies are computed multiplying the iterator sizes, which not right. Now size is computed as: itersize1numclausedeps1 + itersize2numclausedeps2 + ... + itersizeN*numclausedepsN where itersizeX is the size of the iterator and numclausedepsX the number of dependencies in that depend clause. Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D111045	2021-10-04 07:06:51 -07:00
Joseph Huber	d12502a3ab	[OpenMP] Apply OpenMP assumptions to applicable call sites This patch adds OpenMP assumption attributes to call sites in applicable regions. Currently this applies the caller's assumption attributes to any calls contained within it. So, if a call occurs inside an OpenMP assumes region to a function outside that region, we will assume that call respects the assumptions. This is primarily useful for inline assembly calls used heavily in the OpenMP GPU device runtime, which allows us to then make judgements about what the ASM will do. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D110655	2021-09-29 16:08:21 -04:00
Joseph Huber	b4a5543624	[OpenMP] Introduce a new worksharing RTL function for distribute This patch adds a new RTL function for worksharing. Currently we use `__kmpc_for_static_init` for both the `distribute` and `parallel` portion of the loop clause. This patch replaces the `distribute` portion with a new runtime call `__kmpc_distribute_static_init`. Currently this will be used exactly the same way, but will make it easier in the future to fine-tune the distribute and parallel portion of the loop. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D110429	2021-09-27 11:36:37 -04:00
Shilei Tian	423d34f74a	[OpenMP][Offloading] Change `bool IsSPMD` to `int8_t Mode` in `__kmpc_target_init` and `__kmpc_target_deinit` This is a follow-up of D110029, which uses bitset to indicate execution mode. This patches makes the changes in the function call. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D110279	2021-09-22 17:16:41 -04:00
Shilei Tian	ca999f7191	[OpenMP][Offloading] Use bitset to indicate execution mode instead of value The execution mode of a kernel is stored in a global variable, whose value means: - 0 - SPMD mode - 1 - indicates generic mode - 2 - SPMD mode execution with generic mode semantics We are going to add support for SIMD execution mode. It will be come with another execution mode, such as SIMD-generic mode. As a result, this value-based indicator is not flexible. This patch changes to bitset based solution to encode execution mode. Each position is: [0] - generic mode [1] - SPMD mode [2] - SIMD mode (will be added later) In this way, `0x1` is generic mode, `0x2` is SPMD mode, and `0x3` is SPMD mode execution with generic mode semantics. In the future after we add the support for SIMD mode, `0b1xx` will be in SIMD mode. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D110029	2021-09-22 11:40:52 -04:00
Giorgis Georgakoudis	ac90dfc43a	Revert "[OpenMP] Codegen aggregate for outlined function captures" This reverts commit `1d66649adf`. Revert to fix AMG GPU issue.	2021-09-21 13:20:39 -07:00
David Blaikie	131e878664	Print nullptr_t namespace qualified within std:: This improves diagnostic (& important to me, DWARF) accuracy - otherwise there could be ambiguities between "std::nullptr_t" and some user-defined type that's /actually/ "nullptr_t" defined in the global namespace. Differential Revision: https://reviews.llvm.org/D110044	2021-09-21 11:21:40 -07:00
Giorgis Georgakoudis	1d66649adf	[OpenMP] Codegen aggregate for outlined function captures Parallel regions are outlined as functions with capture variables explicitly generated as distinct parameters in the function's argument list. That complicates the fork_call interface in the OpenMP runtime: (1) the fork_call is variadic since there is a variable number of arguments to forward to the outlined function, (2) wrapping/unwrapping arguments happens in the OpenMP runtime, which is sub-optimal, has been a source of ABI bugs, and has a hardcoded limit (16) in the number of arguments, (3) forwarded arguments must cast to pointer types, which complicates debugging. This patch avoids those issues by aggregating captured arguments in a struct to pass to the fork_call. Reviewed By: jdoerfert, jhuber6 Differential Revision: https://reviews.llvm.org/D102107	2021-09-21 10:50:04 -07:00
cchen	3679d2001c	[NCF][OpenMP] Fix metadirective test on SystemZ	2021-09-20 12:22:54 -05:00
alokmishra.besu	000875c127	OpenMP 5.0 metadirective This patch supports OpenMP 5.0 metadirective features. It is implemented keeping the OpenMP 5.1 features like dynamic user condition in mind. A new function, getBestWhenMatchForContext, is defined in llvm/Frontend/OpenMP/OMPContext.h Currently this function return the index of the when clause with the highest score from the ones applicable in the Context. But this function is declared with an array which can be used in OpenMP 5.1 implementation to select all the valid when clauses which can be resolved in runtime. Currently this array is set to null by default and its implementation is left for future. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D91944	2021-09-18 13:40:44 -05:00
Nico Weber	31cca21565	Revert "OpenMP 5.0 metadirective" This reverts commit `c7d7b98e52`. Breaks tests on macOS, see comment on https://reviews.llvm.org/D91944	2021-09-18 09:10:37 -04:00
Joseph Huber	c30d7730eb	[OpenMP] Change debugging symbol to weak_odr linkage The new device runtime uses an internal variable to set debugging. This variable was originally privately linked because every module will have a copy of it. This caused problems with merging the device bitcode library because it would get renamed and there was not a way to refer to an external, private symbol. This changes the symbol to weak_odr so it can be defined multiply, but will not be renamed. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D109997	2021-09-17 21:25:24 -04:00
cchen	9ff848c5cd	Revert "[OpenMP] Use irbuilder as default for masked and master construct" This reverts commit `2908fc0d3f`.	2021-09-17 16:44:09 -05:00
alokmishra.besu	347f3c186d	OpenMP 5.0 metadirective This patch supports OpenMP 5.0 metadirective features. It is implemented keeping the OpenMP 5.1 features like dynamic user condition in mind. A new function, getBestWhenMatchForContext, is defined in llvm/Frontend/OpenMP/OMPContext.h Currently this function return the index of the when clause with the highest score from the ones applicable in the Context. But this function is declared with an array which can be used in OpenMP 5.1 implementation to select all the valid when clauses which can be resolved in runtime. Currently this array is set to null by default and its implementation is left for future. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D91944	2021-09-17 16:30:06 -05:00
cchen	7efb825382	Revert "OpenMP 5.0 metadirective" This reverts commit `c7d7b98e52`.	2021-09-17 16:14:16 -05:00
cchen	c7d7b98e52	OpenMP 5.0 metadirective This patch supports OpenMP 5.0 metadirective features. It is implemented keeping the OpenMP 5.1 features like dynamic user condition in mind. A new function, getBestWhenMatchForContext, is defined in llvm/Frontend/OpenMP/OMPContext.h Currently this function return the index of the when clause with the highest score from the ones applicable in the Context. But this function is declared with an array which can be used in OpenMP 5.1 implementation to select all the valid when clauses which can be resolved in runtime. Currently this array is set to null by default and its implementation is left for future. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D91944	2021-09-17 16:03:13 -05:00
cchen	2908fc0d3f	[OpenMP] Use irbuilder as default for masked and master construct Use irbuilder as default and remove redundant Clang codegen for masked construct and master construct. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D100874	2021-09-17 15:54:11 -05:00
Qiu Chaofan	0195f8621f	[Clang] Fix long double availability check `fae0dfa` changed code to check 128-bit float availability, since it introduced a new 128-bit double type on PowerPC. However, there're other long float types besides IEEE float128 and PPC double-double requiring this feature. Reviewed By: ronlieb Differential Revision: https://reviews.llvm.org/D109943	2021-09-17 15:24:06 +08:00
cchen	976d474bec	[OpenMP] Support construct trait set for Clang This patch supports construct trait set selector by using the existed declare variant infrastructure inside `OMPContext` and simd selector is currently not supported. The goal of this patch is to pass the declare variant test inside sollve test suite. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D109635	2021-09-16 11:34:31 -05:00
Andrew Savonichev	6377426b4a	Revert "[clang] Check unsupported types in expressions" This reverts commit `ec6c847179`. Fails on check-openmp: /b/1/openmp-clang-x86_64-linux-debian/llvm.build/projects/openmp/runtime/test/lock/Output/omp_init_lock.c.tmp -- Exit Code: -11	2021-09-13 15:34:21 +03:00
Andrew Savonichev	ec6c847179	[clang] Check unsupported types in expressions The patch adds missing diagnostics for cases like: float F3 = ((__float128)F1 * (__float128)F2) / 2.0f; Sema::checkDeviceDecl (renamed to checkTypeSupport) is changed to work with a type without the corresponding ValueDecl. It is also refactored so that host diagnostics for unsupported types can be added here as well. Differential Revision: https://reviews.llvm.org/D109315	2021-09-13 14:59:37 +03:00
Joseph Huber	29b44ca896	[OpenMP] Add flag for setting debug in the offloading device This patch introduces the flags `-fopenmp-target-debug` and `-fopenmp-target-debug=` to set the value of a global in the device. This will be used to enable or disable debugging features statically in the device runtime library. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D109544	2021-09-10 18:19:19 -04:00
Johannes Doerfert	45e8e08492	[OpenMP] Encode `omp [...] assume[...]` assumptions with `omp[x]` prefix Since these assumptions are coming from OpenMP it makes sense to mark them as such in the generic IR encoding. Standardized assumptions will be named omp_ASSUMPTION_NAME and extensions will be named ompx_ASSUMPTION_NAME which is the OpenMP 5.2 syntax for "extensions" of any kind. This also matches what the OpenMP-Opt pass expects. Summarized, #pragma omp [...] assume[s] no_parallelism now generates the same IR assumption annotation as __attribute__((assume("omp_no_parallelism"))) Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D105937	2021-09-10 12:08:52 -05:00
Michael Kruse	650bbc5620	[OpenMP][OpenMPIRBuilder] Implement loop unrolling. Recommit of `707ce34b06`. Don't introduce a dependency to the LLVMPasses component, instead register the required passes individually. Add methods for loop unrolling to the OpenMPIRBuilder class and use them in Clang if `-fopenmp-enable-irbuilder` is enabled. The unrolling methods are: * `unrollLoopFull` * `unrollLoopPartial` * `unrollLoopHeuristic` `unrollLoopPartial` and `unrollLoopHeuristic` can use compiler heuristics to automatically determine the unroll factor. If possible, that is if no CanonicalLoopInfo is required to pass to another method, metadata for LLVM's LoopUnrollPass is added. Otherwise the unroll factor is determined using the same heurstics as user by LoopUnrollPass. Not requiring a CanonicalLoopInfo, especially with `unrollLoopHeuristic` allows greater flexibility. With full unrolling and partial unrolling with known unroll factor, instead of duplicating instructions by the OpenMPIRBuilder, the full unroll is still delegated to the LoopUnrollPass. In case of partial unrolling the loop is first tiled using the existing `tileLoops` methods, then the inner loop fully unrolled using the same mechanism. Reviewed By: jdoerfert, kiranchandramohan Differential Revision: https://reviews.llvm.org/D107764	2021-09-04 19:18:58 -05:00
Jinsong Ji	d364eccdd5	[NFC][OpenMP] Use clang_cc1 to driver tests The test driver-fopenmp-extensions.c is failing on platforms that does not use integrated-as. It can be reproduced using -fno-integrated-as on Linux too. bin/clang -c -Xclang -verify=omp -fopenmp -fopenmp-extensions -fno-openmp-extensions ../llvm-project/clang/test/OpenMP/driver-fopenmp-extensions.c -fno-integrated-as Assembler messages: Error: can't open /tmp/driver-fopenmp-extensions-8fafe8.s for reading: No such file or directory clang-14: error: assembler command failed with exit code 1 (use -v to see invocation) The goal of this test is to verify syntax diags only, so we should use clang_cc1 to test. Reviewed By: jdenny, ABataev Differential Revision: https://reviews.llvm.org/D109255	2021-09-03 20:33:48 +00:00
PeixinQiao	a42380ce83	[OMPIRBuilder] Add ordered directive to OMPBuilder Add support for ordered directive in the OpenMPIRBuilder. This patch also modidies clang to use the ordered directive when the option -fopenmp-enable-irbuilder is enabled. Also fix one ICE when parsing one canonical for loop with the relational operator LE or GE in openmp region by replacing unary increment operation of the expression of the variable "Expr A" minus the variable "Expr B" (++(Expr A - Expr B)) with binary addition operation of the experssion of the variable "Expr A" minus the variable "Expr B" and the expression with constant value "1" (Expr A - Expr B + "1"). Reviewed By: Meinersbur, kiranchandramohan Differential Revision: https://reviews.llvm.org/D107430	2021-09-03 09:37:58 +08:00
Roman Lebedev	50634deaa5	Revert "[OpenMP][OpenMPIRBuilder] Implement loop unrolling." Breaks build with -DBUILD_SHARED_LIBS=ON ``` CMake Error: The inter-target dependency graph contains the following strongly connected component (cycle): "LLVMFrontendOpenMP" of type SHARED_LIBRARY depends on "LLVMPasses" (weak) "LLVMipo" of type SHARED_LIBRARY depends on "LLVMFrontendOpenMP" (weak) "LLVMCoroutines" of type SHARED_LIBRARY depends on "LLVMipo" (weak) "LLVMPasses" of type SHARED_LIBRARY depends on "LLVMCoroutines" (weak) depends on "LLVMipo" (weak) At least one of these targets is not a STATIC_LIBRARY. Cyclic dependencies are allowed only among static libraries. CMake Generate step failed. Build files cannot be regenerated correctly. ``` This reverts commit `707ce34b06`.	2021-09-02 12:42:23 +03:00
Michael Kruse	707ce34b06	[OpenMP][OpenMPIRBuilder] Implement loop unrolling. Add methods for loop unrolling to the OpenMPIRBuilder class and use them in Clang if `-fopenmp-enable-irbuilder` is enabled. The unrolling methods are: * `unrollLoopFull` * `unrollLoopPartial` * `unrollLoopHeuristic` `unrollLoopPartial` and `unrollLoopHeuristic` can use compiler heuristics to automatically determine the unroll factor. If possible, that is if no CanonicalLoopInfo is required to pass to another method, metadata for LLVM's LoopUnrollPass is added. Otherwise the unroll factor is determined using the same heurstics as user by LoopUnrollPass. Not requiring a CanonicalLoopInfo, especially with `unrollLoopHeuristic` allows greater flexibility. With full unrolling and partial unrolling with known unroll factor, instead of duplicating instructions by the OpenMPIRBuilder, the full unroll is still delegated to the LoopUnrollPass. In case of partial unrolling the loop is first tiled using the existing `tileLoops` methods, then the inner loop fully unrolled using the same mechanism. Reviewed By: jdoerfert, kiranchandramohan Differential Revision: https://reviews.llvm.org/D107764	2021-09-02 02:37:25 -05:00
Joel E. Denny	83ddfa0d22	[OpenMP][OpenACC] Implement `ompx_hold` map type modifier extension in Clang (1/2) This patch implements Clang support for an original OpenMP extension we have developed to support OpenACC: the `ompx_hold` map type modifier. The next patch in this series, D106510, implements OpenMP runtime support. Consider the following example: ``` #pragma omp target data map(ompx_hold, tofrom: x) // holds onto mapping of x { foo(); // might have map(delete: x) #pragma omp target map(present, alloc: x) // x is guaranteed to be present printf("%d\n", x); } ``` The `ompx_hold` map type modifier above specifies that the `target data` directive holds onto the mapping for `x` throughout the associated region regardless of any `target exit data` directives executed during the call to `foo`. Thus, the presence assertion for `x` at the enclosed `target` construct cannot fail. (As usual, the standard OpenMP reference count for `x` must also reach zero before the data is unmapped.) Justification for inclusion in Clang and LLVM's OpenMP runtime: * The `ompx_hold` modifier supports OpenACC functionality (structured reference count) that cannot be achieved in standard OpenMP, as of 5.1. * The runtime implementation for `ompx_hold` (next patch) will thus be used by Flang's OpenACC support. * The Clang implementation for `ompx_hold` (this patch) as well as the runtime implementation are required for the Clang OpenACC support being developed as part of the ECP Clacc project, which translates OpenACC to OpenMP at the directive AST level. These patches are the first step in upstreaming OpenACC functionality from Clacc. * The Clang implementation for `ompx_hold` is also used by the tests in the runtime implementation. That syntactic support makes the tests more readable than low-level runtime calls can. Moreover, upstream Flang and Clang do not yet support OpenACC syntax sufficiently for writing the tests. * More generally, the Clang implementation enables a clean separation of concerns between OpenACC and OpenMP development in LLVM. That is, LLVM's OpenMP developers can discuss, modify, and debug LLVM's extended OpenMP implementation and test suite without directly considering OpenACC's language and execution model, which can be handled by LLVM's OpenACC developers. * OpenMP users might find the `ompx_hold` modifier useful, as in the above example. See new documentation introduced by this patch in `openmp/docs` for more detail on the functionality of this extension and its relationship with OpenACC. For example, it explains how the runtime must support two reference counts, as specified by OpenACC. Clang recognizes `ompx_hold` unless `-fno-openmp-extensions`, a new command-line option introduced by this patch, is specified. Reviewed By: ABataev, jdoerfert, protze.joachim, grokos Differential Revision: https://reviews.llvm.org/D106509	2021-08-31 16:13:49 -04:00
Vyacheslav Zakharin	2e192ab1f4	[CodeExtractor] Preserve topological order for the return blocks. Differential Revision: https://reviews.llvm.org/D108673	2021-08-25 08:09:01 -07:00
Richard Smith	cd4d6d718b	PR48030: Fix COMDAT-related linking problem with C++ thread_local static data members. Previously when emitting a C++ guarded initializer, we tried to work out what the enclosing function would be used for and added it to the COMDAT containing the variable if we thought that doing so would be correct. But this was done from a context in which we didn't -- and realistically couldn't -- correctly infer how the enclosing function would be used. Instead, add the initialization function to a COMDAT from the code that creates it, in the case where it makes sense to do so: when we know that the one and only reference to the initialization function is in @llvm.global.ctors and that reference is in the same COMDAT. Reviewed By: rjmccall Differential Revision: https://reviews.llvm.org/D108680	2021-08-24 19:53:44 -07:00
Joseph Huber	ec66ed79f4	[OpenMP] Correctly add member expressions to OpenMP info Mapping expressions that have `this` as their base expression aren't considered a valid base variable and the rest of the runtime expects this. However, if we have an expression with no value declaration we can try to extract it manually to provide more helpful debuggin information. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D108483	2021-08-20 20:45:14 -04:00
Jennifer Yu	c274b19866	Add implicit map for a list item appears in a reduction clause. A new rule is added in 5.0: If a list item appears in a reduction, lastprivate or linear clause on a combined target construct then it is treated as if it also appears in a map clause with a map-type of tofrom. Currently map clauses for all capture variables are added implicitly. But missing for list item of expression for array elements or array sections. The change is to add implicit map clause for array of elements used in reduction clause. Skip adding map clause if the expression is not mappable. Noted: For linear and lastprivate, since only variable name is accepted, the map has been added though capture variables. To do so: During the mappable checking, if error, ignore diagnose and skip adding implicit map clause. The changes: 1> Add code to generate implicit map in ActOnOpenMPExecutableDirective, for omp 5.0 and up. 2> Add extra default parameter NoDiagnose in ActOnOpenMPMapClause: Use that to skip error as well as skip adding implicit map during the mappable checking. Note: there are only tow places need to be check for NoDiagnose. Rest of them either the check is for < omp 5.0 or the error already generated for reduction clause. Differential Revision: https://reviews.llvm.org/D108132	2021-08-19 12:53:47 -07:00
Jon Chesterfield	21d91a8ef3	[libomptarget][devicertl] Replace lanemask with uint64 at interface Use uint64_t for lanemask on all GPU architectures at the interface with clang. Updates tests. The deviceRTL is always linked as IR so the zext and trunc introduced for wave32 architectures will fold after inlining. Simplification partly motivated by amdgpu gfx10 which will be wave32 and is awkward to express in the current arch-dependant typedef interface. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D108317	2021-08-18 20:47:33 +01:00
Roger Ferrer Ibanez	bfb77364d0	[OpenMP] Fix accidental reuse of VLA size We were using an OpaqueValueExpr allocated on the stack to store the size of a VLA. Because the VLASizeMap in CodegenFunction uses the address of the expression to avoid recomputing VLAs, we were accidentally reusing an earlier llvm::Value. This led to invalid LLVM IR. This is a temporary solution until VLASizeMap can be pushed and popped based on the context. Differential Revision: https://reviews.llvm.org/D107666	2021-08-07 05:55:27 +00:00
Joseph Huber	41a6b50c25	[OpenMP]Fix PR51349: Remove AlwaysInline for if regions. After D94315 we add the `NoInline` attribute to the outlined function to handle data environments in the OpenMP if clause. This conflicted with the `AlwaysInline` attribute added to the outlined function. for better performance in D106799. The data environments should ideally not require NoInline, but for now this fixes PR51349. Reviewed By: mikerice Differential Revision: https://reviews.llvm.org/D107649	2021-08-06 17:53:04 -04:00
Jennifer Yu	6b0f35931a	Fix signal during the call to checkOpenMPLoop. The root problem is a null pointer is accessed during the call to checkOpenMPLoop, because loop up bound expr is an error expression due to error diagnostic was emit early. To fix this, in setLCDeclAndLB, setUB and setStep instead return false, return true when LB, UB or Step contains Error, so that the checking is stopped in checkOpenMPLoop. Differential Revision: https://reviews.llvm.org/D107385	2021-08-05 08:59:35 -07:00
Aaron Ballman	530ea28fef	Correct a lot of diagnostic wordings for the driver Clang diagnostics should not start with a capital letter or use trailing punctuation (https://clang.llvm.org/docs/InternalsManual.html#the-format-string), but quite a few driver diagnostics were not following this advice. This corrects the grammar and punctuation to improve consistency, but does not change the circumstances under which the diagnostics are produced.	2021-08-05 07:04:55 -04:00
Jennifer Yu	656d022331	Stop emit incomplete type error for a variable in a map clause where should not. Currently we are using QTy->isIncompleteType(&ND) to check incomplete type. But before doing that, need to instantiate for a class template specialization or a class member of a class template specialization, or an array with known size of such..., so that we know it is really incomplete type. To fix this using RequireCompleteType instead. The new test is added into "test/OpenMP/target_update_messages.cpp" The different of using RequireCompleteType is when emit incomplete type, an additional note is also emitted to point to where incomplete type is declared. Because this change, many tests are needed to be fixed by adding additional note. This is to fix https://bugs.llvm.org/show_bug.cgi?id=50508 Differential Revision: https://reviews.llvm.org/D107200	2021-08-03 10:51:32 -07:00
Chirag Khandelwal	77ebfba68b	[Flang][Openmp] Upgrade TASKGROUP construct to 5.0. In OMP 5.0 specification clause-list with * task_reduction * allocate were allowed on taskgroup construct. Fix XFAIL - omp-taskloop01.f90. Reviewed By: kiranchandramohan Differential Revision: https://reviews.llvm.org/D93373	2021-08-03 10:27:47 +05:30
Eli Friedman	2a2847823f	[ConstantFold] Get rid of special cases for sizeof etc. Target-dependent constant folding will fold these down to simple constants (or at least, expressions that don't involve a GEP). We don't need heroics to try to optimize the form of the expression before that happens. Fixes https://bugs.llvm.org/show_bug.cgi?id=51232 . Differential Revision: https://reviews.llvm.org/D107116	2021-07-31 13:20:47 -07:00
Jose M Monsalve Diaz	0276db1416	[OpenMP] Creating the `omp_target_num_teams` and `omp_target_thread_limit` attributes to outlined functions The device runtime contains several calls to __kmpc_get_hardware_num_threads_in_block and __kmpc_get_hardware_num_blocks. If the thread_limit and the num_teams are constant, these calls can be folded to the constant value. In commit D106033 we have the optimization phase. This commit adds the attributes to the outlined function for the grid size. the two attributes are `omp_target_num_teams` and `omp_target_thread_limit`. These values are added as long as they are constant. Two functions are created `getNumThreadsExprForTargetDirective` and `getNumTeamsExprForTargetDirective`. The original functions `emitNumTeamsForTargetDirective` and `emitNumThreadsForTargetDirective` identify the expresion and emit the code. However, for the Device version of the outlined function, we cannot emit anything. Therefore, this is a first attempt to separate emision of code from deduction of the values. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106298	2021-07-27 17:21:04 -04:00
Joseph Huber	af000197c4	[OpenMP] Always inline the OpenMP outlined function This patch adds the always inline attribute to the outlined functions generated by OpenMP regions. Because there is only a single instance of this function and it always has internal linkage it is safe to inline in every instance it is created. This could potentially lead to performance degredation due to inflated register counts in the parallel region. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106799	2021-07-26 17:27:59 -04:00
Shilei Tian	3274cdc83e	[Clang][OpenMP] Remove the mandatory flush for capture for OpenMP 5.1 In OpenMP 5.1: > If the `write` or `update` clause is specifieded, the atomic operation is not an atomic conditional update for which the comparison fails, and the effective memory ordering is `release`, `acq_rel`, or `seq_cst`, the strong flush on entry to the atomic operation is also a release flush. If the `read` or `update` clause is specified and the effective memory ordering is `acquire`, `acq_rel`, or `seq_cst` then the strong flush on exit from the atomic operation is also an acquire flush. In OpenMP 5.0: > If the `write`, `update`, or `capture` clause is specified and the `release`, `acq_rel`, or `seq_cst` clause is specified then the strong flush on entry to the atomic operation is also a release flush. If the `read` or `capture` clause is specified and the `acquire`, `acq_rel`, or `seq_cst` clause is specified then the strong flush on exit from the atomic operation is also an acquire flush. From my understanding, in OpenMP 5.1, `capture` is removed from the requirement for flush, therefore we don't have to enforce it. Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D100768	2021-07-26 11:00:44 -04:00
Alexey Bataev	b88a68c45e	[OPENMP]Fix PR49787: Codegen for calling __tgt_target_teams_nowait_mapper has too few arguments. Added missed arguments in __tgt_target_teams_nowait_mapper/__tgt_target_nowait_mapper runtime functions calls. Differential Revision: https://reviews.llvm.org/D106542	2021-07-22 08:44:37 -07:00
Alexey Bataev	f828f0a90f	Revert "[OPENMP]Fix PR49787: Codegen for calling __tgt_target_teams_nowait_mapper has too few arguments." This reverts commit `b455f7f225` to fix buildbots.	2021-07-22 08:06:29 -07:00
Alexey Bataev	b455f7f225	[OPENMP]Fix PR49787: Codegen for calling __tgt_target_teams_nowait_mapper has too few arguments. Added missed arguments in __tgt_target_teams_nowait_mapper/__tgt_target_nowait_mapper runtime functions calls. Differential Revision: https://reviews.llvm.org/D106542	2021-07-22 07:53:37 -07:00
Joseph Huber	754eb1c210	[OpenMP] Change `__kmpc_free_shared` to include the paired allocation size This patch changes `__kmpc_free_shared` to take an additional argument corresponding to the associated allocation's size. This makes it easier to implement the allocator in the runtime. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106496	2021-07-21 20:56:21 -04:00
Giorgis Georgakoudis	fb0cf01795	Revert "[OpenMP] Codegen aggregate for outlined function captures" This reverts commit `e9c7291cb2`. Fix failing tests	2021-07-19 07:54:26 -07:00
Giorgis Georgakoudis	e9c7291cb2	[OpenMP] Codegen aggregate for outlined function captures Parallel regions are outlined as functions with capture variables explicitly generated as distinct parameters in the function's argument list. That complicates the fork_call interface in the OpenMP runtime: (1) the fork_call is variadic since there is a variable number of arguments to forward to the outlined function, (2) wrapping/unwrapping arguments happens in the OpenMP runtime, which is sub-optimal, has been a source of ABI bugs, and has a hardcoded limit (16) in the number of arguments, (3) forwarded arguments must cast to pointer types, which complicates debugging. This patch avoids those issues by aggregating captured arguments in a struct to pass to the fork_call. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D102107	2021-07-16 23:27:44 -07:00

1 2 3 4 5 ...

1742 Commits