llvm-project

Commit Graph

Author	SHA1	Message	Date
hyeongyu kim	fd9b099906	Revert "[Clang/Test]: Rename enable_noundef_analysis to disable-noundef-analysis and turn it off by default" This reverts commit `aacfbb953e`. Revert "Fix lit test failures in CodeGenCoroutines" This reverts commit `63fff0f5bf`.	2021-11-09 02:15:55 +09:00
hyeongyukim	aacfbb953e	[Clang/Test]: Rename enable_noundef_analysis to disable-noundef-analysis and turn it off by default Turning on `enable_noundef_analysis` flag allows better codegen by removing freeze instructions. I modified clang by renaming `enable_noundef_analysis` flag to `disable-noundef-analysis` and turning it off by default. Test updates are made as a separate patch: D108453 Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D105169 [Clang/Test]: Rename enable_noundef_analysis to disable-noundef-analysis and turn it off by default (2) This patch updates test files after D105169. Autogenerated test codes are changed by `utils/update_cc_test_checks.py,` and non-autogenerated test codes are changed as follows: (1) I wrote a python script that (partially) updates the tests using regex: {F18594904} The script is not perfect, but I believe it gives hints about which patterns are updated to have `noundef` attached. (2) The remaining tests are updated manually. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D108453 Resolve lit failures in clang after 8ca4b3e's land Fix lit test failures in clang-ppc* and clang-x64-windows-msvc Fix missing failures in clang-ppc64be* and retry fixing clang-x64-windows-msvc Fix internal_clone(aarch64) inline assembly	2021-11-06 19:19:22 +09:00
Juneyoung Lee	89ad2822af	Revert "[Clang/Test]: Rename enable_noundef_analysis to disable-noundef-analysis and turn it off by default" This reverts commit `7584ef766a`.	2021-11-06 15:39:19 +09:00
Juneyoung Lee	7584ef766a	[Clang/Test]: Rename enable_noundef_analysis to disable-noundef-analysis and turn it off by default Turning on `enable_noundef_analysis` flag allows better codegen by removing freeze instructions. I modified clang by renaming `enable_noundef_analysis` flag to `disable-noundef-analysis` and turning it off by default. Test updates are made as a separate patch: D108453 Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D105169	2021-11-06 15:36:42 +09:00
Juneyoung Lee	f193bcc701	Revert D105169 due to the two-stage failure in ASAN This reverts the following commits: `37ca7a795b` `9aa6c72b92` `705387c507` `8ca4b3ef19` `80dba72a66`	2021-10-18 23:52:46 +09:00
Juneyoung Lee	8ca4b3ef19	[Clang/Test]: Rename enable_noundef_analysis to disable-noundef-analysis and turn it off by default (2) This patch updates test files after D105169. Autogenerated test codes are changed by `utils/update_cc_test_checks.py,` and non-autogenerated test codes are changed as follows: (1) I wrote a python script that (partially) updates the tests using regex: {F18594904} The script is not perfect, but I believe it gives hints about which patterns are updated to have `noundef` attached. (2) The remaining tests are updated manually. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D108453	2021-10-16 12:01:41 +09:00
hsmahesha	f7de6962c8	[CFE][Codegen][In-progress] Remove CodeGenFunction::InitTempAlloca() Sequel patch to https://reviews.llvm.org/D111293. Remove call to CodeGenFunction::InitTempAlloca() from OpenMP related codegen part. Also remove the metadata `!llvm.access.group` from the updated lit tests. Reviewed By: rjmccall Differential Revision: https://reviews.llvm.org/D111316	2021-10-12 10:01:46 +05:30
Shilei Tian	423d34f74a	[OpenMP][Offloading] Change `bool IsSPMD` to `int8_t Mode` in `__kmpc_target_init` and `__kmpc_target_deinit` This is a follow-up of D110029, which uses bitset to indicate execution mode. This patches makes the changes in the function call. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D110279	2021-09-22 17:16:41 -04:00
Giorgis Georgakoudis	ac90dfc43a	Revert "[OpenMP] Codegen aggregate for outlined function captures" This reverts commit `1d66649adf`. Revert to fix AMG GPU issue.	2021-09-21 13:20:39 -07:00
Giorgis Georgakoudis	1d66649adf	[OpenMP] Codegen aggregate for outlined function captures Parallel regions are outlined as functions with capture variables explicitly generated as distinct parameters in the function's argument list. That complicates the fork_call interface in the OpenMP runtime: (1) the fork_call is variadic since there is a variable number of arguments to forward to the outlined function, (2) wrapping/unwrapping arguments happens in the OpenMP runtime, which is sub-optimal, has been a source of ABI bugs, and has a hardcoded limit (16) in the number of arguments, (3) forwarded arguments must cast to pointer types, which complicates debugging. This patch avoids those issues by aggregating captured arguments in a struct to pass to the fork_call. Reviewed By: jdoerfert, jhuber6 Differential Revision: https://reviews.llvm.org/D102107	2021-09-21 10:50:04 -07:00
Joseph Huber	754eb1c210	[OpenMP] Change `__kmpc_free_shared` to include the paired allocation size This patch changes `__kmpc_free_shared` to take an additional argument corresponding to the associated allocation's size. This makes it easier to implement the allocator in the runtime. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106496	2021-07-21 20:56:21 -04:00
Giorgis Georgakoudis	fb0cf01795	Revert "[OpenMP] Codegen aggregate for outlined function captures" This reverts commit `e9c7291cb2`. Fix failing tests	2021-07-19 07:54:26 -07:00
Giorgis Georgakoudis	e9c7291cb2	[OpenMP] Codegen aggregate for outlined function captures Parallel regions are outlined as functions with capture variables explicitly generated as distinct parameters in the function's argument list. That complicates the fork_call interface in the OpenMP runtime: (1) the fork_call is variadic since there is a variable number of arguments to forward to the outlined function, (2) wrapping/unwrapping arguments happens in the OpenMP runtime, which is sub-optimal, has been a source of ABI bugs, and has a hardcoded limit (16) in the number of arguments, (3) forwarded arguments must cast to pointer types, which complicates debugging. This patch avoids those issues by aggregating captured arguments in a struct to pass to the fork_call. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D102107	2021-07-16 23:27:44 -07:00
Johannes Doerfert	e2cfbfcc0c	[OpenMP] Unified entry point for SPMD & generic kernels in the device RTL In the spirit of TRegions [0], this patch provides a simpler and uniform interface for a kernel to set up the device runtime. The OMPIRBuilder is used for reuse in Flang. A custom state machine will be generated in the follow up patch. The "surplus" threads of the "master warp" will not exit early anymore so we need to use non-aligned barriers. The new runtime will not have an extra warp but also require these non-aligned barriers. [0] https://link.springer.com/chapter/10.1007/978-3-030-28596-8_11 This was in parts extracted from D59319. Reviewed By: ABataev, JonChesterfield Differential Revision: https://reviews.llvm.org/D101976	2021-07-10 17:53:56 -05:00
Joseph Huber	68d133a3e8	[OpenMP] Simplify GPU memory globalization Summary: Memory globalization is required to maintain OpenMP standard semantics for data sharing between worker and master threads. The GPU cannot share data between its threads so must allocate global or shared memory to store the data in. Currently this is implemented fully in the frontend using the `__kmpc_data_sharing_push_stack` and __kmpc_data_sharing_pop_stack` functions to emulate standard CPU stack sharing. The front-end scans the target region for variables that escape the region and must be shared between the threads. Each variable then has a field created for it in a global record type. This patch replaces this functinality with a single allocation command, effectively mimicing an alloca instruction for the variables that must be shared between the threads. This will be much slower than the current solution, but makes it much easier to optimize as we can analyze each variable independently and determine if it is not captured. In the future, we can replace these calls with an `alloca` and small allocations can be pushed to shared memory. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D97680	2021-06-22 10:52:46 -04:00
Giorgis Georgakoudis	207b08a913	[OpenMP][NFC] Refactor Clang OpenMP tests using update_cc_test_checks This patch refactors a subset of Clang OpenMP tests, generating checklines using the update_cc_test_checks script. This refactoring facilitates updating the Clang OpenMP code generation codebase by automating test generation. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D101849	2021-05-05 20:08:38 -07:00
Giorgis Georgakoudis	78a7d8c4dd	[Utils][NFC] Rename replace-function-regex in update_cc_test_checks This patch renames the replace-function-regex to replace-value-regex to indicate that the existing regex replacement functionality can replace any IR value besides functions. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D101934	2021-05-05 14:19:30 -07:00
Giorgis Georgakoudis	f016c06abb	Revert "[OpenMP][NFC] Refactor Clang OpenMP tests using update_cc_test_checks" This reverts commit `956cae2f09`.	2021-05-04 17:12:32 -07:00
Giorgis Georgakoudis	956cae2f09	[OpenMP][NFC] Refactor Clang OpenMP tests using update_cc_test_checks This patch refactors a subset of Clang OpenMP tests, generating checklines using the update_cc_test_checks script. This refactoring facilitates updating the Clang OpenMP code generation codebase by automating test generation. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D101849	2021-05-04 16:58:45 -07:00
Giorgis Georgakoudis	a2dbfb6b72	[OpenMP] Simplify offloading parallel call codegen This revision simplifies Clang codegen for parallel regions in OpenMP GPU target offloading and corresponding changes in libomptarget: SPMD/non-SPMD parallel calls are unified under a single `kmpc_parallel_51` runtime entry point for parallel regions (which will be commonized between target, host-side parallel regions), data sharing is internalized to the runtime. Tests have been auto-generated using `update_cc_test_checks.py`. Also, the revision contains changes to OpenMPOpt for remark creation on target offloading regions. Reviewed By: jdoerfert, Meinersbur Differential Revision: https://reviews.llvm.org/D95976	2021-04-21 18:46:07 -07:00
JonChesterfield	5d02ca49a2	[libomptarget][nvptx] Undef, weak shared variables [libomptarget][nvptx] Undef, weak shared variables Shared variables on nvptx, and LDS on amdgcn, are uninitialized at the start of kernel execution. Therefore create the variables with undef instead of zeros, motivated in part by the amdgcn back end rejecting LDS+initializer. Common is zero initialized, which seems incompatible with shared. Thus change them to weak, following the direction of https://reviews.llvm.org/rG7b3eabdcd215 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D90248	2020-10-28 14:25:36 +00:00
Alexey Bataev	32ea3397be	[OPENMP]Dynamic globalization for parallel target regions. Summary: Added support for dynamic memory allocation for globalized variables in case if execution of target regions in parallel is required. Reviewers: jdoerfert Subscribers: jholewinski, yaxunl, guansong, sstefan1, cfe-commits, caomhin Tags: #clang Differential Revision: https://reviews.llvm.org/D82324	2020-06-25 08:25:24 -04:00
Alexey Bataev	8c5555c39a	[OPENMP][NVPTX]Mark more functions as always_inline for better performance. Internally generated functions must be marked as always_inlines in most cases. Patch marks some extra reduction function + outlined parallel functions as always_inline for better performance, but only if the optimization is requested. llvm-svn: 361269	2019-05-21 15:11:58 +00:00
Alexey Bataev	8e009036c9	[OPENMP][NVPTX]Use new functions from the runtime library. Updated codegen to use the new functions from the runtime library. llvm-svn: 350415	2019-01-04 17:25:09 +00:00
Alexey Bataev	a3924b517e	[OPENMP][NVPTX]Use __kmpc_barrier_simple_spmd(nullptr, 0) instead of nvvm_barrier0. Use runtime functions instead of the direct call to the nvvm intrinsics. It allows to prevent some dangerous LLVM optimizations, that breaks the code for the NVPTX target. llvm-svn: 350328	2019-01-03 16:25:35 +00:00
Alexey Bataev	6a1b06bcd4	[OPENMP][NVPTX]Emit shared memory buffer for reduction as 128 bytes buffer. Seems to me, nvlink has a bug with the proper support of the weakly linked symbols. It does not allow to define several shared memory buffer with the different sizes even with the weak linkage. Instead we always use 128 bytes buffer to prevent nvlink from the error message emission. llvm-svn: 349540	2018-12-18 21:01:42 +00:00
Alexey Bataev	f2f39be9ed	[OPENMP][NVPTX]Emit correct reduction code for teams/parallel reductions. Fixed previously committed code for the reduction support in teams/parallel constructs taking into account new design of the NVPTX support in the compiler. Teams reduction are not fully functional yet, it is going to be fixed in the following patches. llvm-svn: 347081	2018-11-16 19:38:21 +00:00
Alexey Bataev	09c9eea78f	[OPENMP][NVPTX]Allow to use shared memory for the target\|teams\|distribute variables. If the total size of the variables, declared in target\|teams\|distribute regions, is less than the maximal size of shared memory available, the buffer is allocated in the shared memory. llvm-svn: 346507	2018-11-09 16:18:04 +00:00
Alexey Bataev	e40901806f	[OPENMP][NVPTX]Improve emission of the globalized variables for target/teams/distribute regions. Target/teams/distribute regions exist for all the time the kernel is executed. Thus, if the variable is declared in their context and then escape it, we can allocate global memory statically instead of allocating it dynamically. Patch captures all the globalized variables in target/teams/distribute contexts, merges them into the records, one per each target region. Those records are then joined into the union, one per compilation unit (to save the global memory). Those units are organized into 2 x dimensional arrays, where the first dimension is the number of blocks per SM and the second one is the number of SMs. Runtime functions manage this global memory space between the executing teams. llvm-svn: 345978	2018-11-02 14:54:07 +00:00
Gheorghe-Teodor Bercea	669dbde7a5	[OpenMP][NVPTX] Enable default scheduling for parallel for in non-SPMD cases. Summary: This patch enables the choosing of the default schedule for parallel for loops even in non-SPMD cases. Reviewers: ABataev, caomhin Reviewed By: ABataev Subscribers: jholewinski, guansong, cfe-commits Differential Revision: https://reviews.llvm.org/D53443 llvm-svn: 345507	2018-10-29 15:23:23 +00:00
Gheorghe-Teodor Bercea	a6cb25676e	[NFC][OpenMP] Add new test for parallel for code generation. Summary: This is a simple test of the parallel for code generation. It will be used to showcase the change introduced by patch D53443. Reviewers: ABataev, caomhin Reviewed By: ABataev Subscribers: guansong, cfe-commits Differential Revision: https://reviews.llvm.org/D53772 llvm-svn: 345417	2018-10-26 18:59:52 +00:00

31 Commits