llvm-project

Commit Graph

Author	SHA1	Message	Date
Arpith Chacko Jacob	101e8fb1f3	[OpenMP] Parallel reduction on the NVPTX device. This patch implements codegen for the reduction clause on any parallel construct for elementary data types. An efficient implementation requires hierarchical reduction within a warp and a threadblock. It is complicated by the fact that variables declared in the stack of a CUDA thread cannot be shared with other threads. The patch creates a struct to hold reduction variables and a number of helper functions. The OpenMP runtime on the GPU implements reduction algorithms that uses these helper functions to perform reductions within a team. Variables are shared between CUDA threads using shuffle intrinsics. An implementation of reductions on the NVPTX device is substantially different to that of CPUs. However, this patch is written so that there are minimal changes to the rest of OpenMP codegen. The implemented design allows the compiler and runtime to be decoupled, i.e., the runtime does not need to know of the reduction operation(s), the type of the reduction variable(s), or the number of reductions. The design also allows reuse of host codegen, with appropriate specialization for the NVPTX device. While the patch does introduce a number of abstractions, the expected use case calls for inlining of the GPU OpenMP runtime. After inlining and optimizations in LLVM, these abstractions are unwound and performance of OpenMP reductions is comparable to CUDA-canonical code. Patch by Tian Jin in collaboration with Arpith Jacob Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D29758 llvm-svn: 295333	2017-02-16 16:20:16 +00:00
George Rimar	505ac8dc41	[ELF] - Do not crash when discarding sections that are referenced by others. SHF_LINK_ORDER sections adds special ordering requirements. Such sections references other sections. Previously we would crash if section that other were referenced to was discarded by script. Patch fixes that by discarding all dependent sections in that case. It supports chained dependencies, testcase is provided. Differential revision: https://reviews.llvm.org/D30033 llvm-svn: 295332	2017-02-16 16:06:13 +00:00
Sjoerd Meijer	cb2d950214	[AArch64] AArch64AsmParser clean up of isImmediate functions. NFC Regression test neon-diagnostics.s needed changing because it now produces a more specific diagnostic about the immediate ranges. One change in the expected error message is not obvious, but there multiple candidate and it happens to pick the immediate diagnostic. Differential Revision: https://reviews.llvm.org/D29939 llvm-svn: 295331	2017-02-16 15:52:22 +00:00
Saleem Abdulrasool	3d99648f00	math: correct the MSVCRT condition Fixes a number of tests in the testsuite on Windows. llvm-svn: 295330	2017-02-16 15:47:50 +00:00
Saleem Abdulrasool	305b4f2ba9	threading_support: make __thread_sleep_for be alertable On Windows, we were using `Sleep` which is not alertable. This means that if the thread was used for a user APC or WinProc handling and thread::sleep was used, we could potentially dead lock. Use `SleepEx` with an alertable sleep, resuming until the time has expired if we are awoken early. llvm-svn: 295329	2017-02-16 15:47:45 +00:00
Pavel Labath	7dc6e51ef5	Fix build due to clang r295311 BuiltinType::Kind::OCLNDRange was removed. llvm-svn: 295328	2017-02-16 15:32:19 +00:00
Dan Gohman	4a5496902c	[WebAssembly] Add a cast to void to fix an unused private member warning, for now. llvm-svn: 295327	2017-02-16 15:21:37 +00:00
Simon Pilgrim	2fe568c95e	[X86] Remove local areOnlyUsersOf helper and use SDNode::areOnlyUsersOf instead. llvm-svn: 295326	2017-02-16 15:11:49 +00:00
Marshall Clow	e9110d71dd	Remove uses of deprecated std::random_shuffle in the LLVM code base. Reviewed as https://reviews.llvm.org/D29780 . llvm-svn: 295325	2017-02-16 14:37:03 +00:00
Rafael Espindola	908a3d3420	Ignore relocation sections in linker scripts. Unfortunately, the common way of writing linker scripts seems to be to get the output of ld.bfd --verbose and edit it a bit. Also unfortunately, the bfd default script contains things like .rela.dyn : { *(... .rela.data ...) } but bfd actually ignores that for -emit-relocs, so we have to do the same. llvm-svn: 295324	2017-02-16 14:36:09 +00:00
Arpith Chacko Jacob	bd6344c0be	Revert r295319 while investigating buildbot failure. llvm-svn: 295323	2017-02-16 14:25:35 +00:00
Rafael Espindola	82f00ec4a2	Fix crash with -emit-relocs -shared. The code to handle the input SHT_REL/SHT_RELA sections was getting confused with the linker generated relocation sections. llvm-svn: 295322	2017-02-16 14:23:43 +00:00
Diana Picus	1540b06ef8	[ARM] GlobalISel: Select floating point loads llvm-svn: 295321	2017-02-16 14:10:50 +00:00
Benjamin Kramer	aad1bdc863	Silence sign compare warning. NFC. ExprConstant.cpp:6344:20: warning: comparison of integers of different signs: 'const size_t' (aka 'const unsigned long') and 'typename iterator_traits<Expr const >::difference_type' (aka 'long') [-Wsign-compare] llvm-svn: 295320	2017-02-16 14:08:41 +00:00
Arpith Chacko Jacob	8e170fc857	[OpenMP] Parallel reduction on the NVPTX device. This patch implements codegen for the reduction clause on any parallel construct for elementary data types. An efficient implementation requires hierarchical reduction within a warp and a threadblock. It is complicated by the fact that variables declared in the stack of a CUDA thread cannot be shared with other threads. The patch creates a struct to hold reduction variables and a number of helper functions. The OpenMP runtime on the GPU implements reduction algorithms that uses these helper functions to perform reductions within a team. Variables are shared between CUDA threads using shuffle intrinsics. An implementation of reductions on the NVPTX device is substantially different to that of CPUs. However, this patch is written so that there are minimal changes to the rest of OpenMP codegen. The implemented design allows the compiler and runtime to be decoupled, i.e., the runtime does not need to know of the reduction operation(s), the type of the reduction variable(s), or the number of reductions. The design also allows reuse of host codegen, with appropriate specialization for the NVPTX device. While the patch does introduce a number of abstractions, the expected use case calls for inlining of the GPU OpenMP runtime. After inlining and optimizations in LLVM, these abstractions are unwound and performance of OpenMP reductions is comparable to CUDA-canonical code. Patch by Tian Jin in collaboration with Arpith Jacob Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D29758 llvm-svn: 295319	2017-02-16 14:03:36 +00:00
Kuba Mracek	3e81c2675e	[tsan] Provide external tags (object types) via debugging API In D28836, we added a way to tag heap objects and thus provide object types into report. This patch exposes this information into the debugging API. Differential Revision: https://reviews.llvm.org/D30023 llvm-svn: 295318	2017-02-16 14:02:32 +00:00
Krasimir Georgiev	8fcdd5ab96	Fix clang-move test after clang-format update r295312 llvm-svn: 295317	2017-02-16 13:17:38 +00:00
Artur Pilipenko	a1b384c4ce	Rever -r295314 "[DAGCombiner] Support {a\|s}ext, {a\|z\|s}ext load nodes in load combine" This change causes some of AMDGPU and PowerPC tests to fail. llvm-svn: 295316	2017-02-16 13:04:46 +00:00
Artur Pilipenko	daaa0c0f7d	[DAGCombiner] Support {a\|s}ext, {a\|z\|s}ext load nodes in load combine Support {a\|s}ext, {a\|z\|s}ext load nodes as a part of load combine patters. Reviewed By: filcab Differential Revision: https://reviews.llvm.org/D29591 llvm-svn: 295314	2017-02-16 12:53:26 +00:00
Anastasia Stulova	b376bee642	[OpenCL][Doc] Added OpenCL vendor extension description to user manual doc Added description of a new feature that allows to specify vendor extension in flexible way using compiler pragma instead of modifying source code directly (committed in clang@r289979). Review: D29829 llvm-svn: 295313	2017-02-16 12:49:29 +00:00
Krasimir Georgiev	bb99a36dc0	[clang-format] Align block comment decorations Summary: This patch implements block comment decoration alignment. source: ``` /* line 1 * line 2 / ``` result before: ``` / line 1 * line 2 / ``` result after: ``` / line 1 * line 2 */ ``` Reviewers: djasper, bkramer, klimek Reviewed By: klimek Subscribers: mprobst, cfe-commits, klimek Differential Revision: https://reviews.llvm.org/D29943 llvm-svn: 295312	2017-02-16 12:39:31 +00:00
Anastasia Stulova	58984e7087	[OpenCL] Correct ndrange_t implementation Removed ndrange_t as Clang builtin type and added as a struct type in the OpenCL header. Use type name to do the Sema checking in enqueue_kernel and modify IR generation accordingly. Review: D28058 Patch by Dmitry Borisenkov! llvm-svn: 295311	2017-02-16 12:27:47 +00:00
Diana Picus	b1701e0b05	[ARM] GlobalISel: Select G_SEQUENCE and G_EXTRACT Since they're only used for passing around double precision floating point values into the general purpose registers, we'll lower them to VMOVDRR and VMOVRRD. llvm-svn: 295310	2017-02-16 12:19:57 +00:00
Diana Picus	6beef3c087	[ARM] GlobalISel: Select double G_FADD and copies Just use VADDD if available, bail out if not. llvm-svn: 295309	2017-02-16 12:19:52 +00:00
Diana Picus	9b32faa821	[ARM] GlobalISel: Assert that we don't use the FPR bank if we don't have VFP llvm-svn: 295308	2017-02-16 11:25:09 +00:00
Anastasia Stulova	9d98a316c5	[OpenCL] Disallow blocks capture other blocks (v2.0, s6.12.5) llvm-svn: 295307	2017-02-16 11:13:30 +00:00
Diana Picus	a93803b9fe	[ARM] GlobalISel: Add reg bank mappings for G_SEQUENCE and G_EXTRACT Support G_SEQUENCE and G_EXTRACT as needed for passing double precision floating point values in the soft-fp float mode. llvm-svn: 295306	2017-02-16 11:00:31 +00:00
Krasimir Georgiev	f7de84ab9f	[clangd] Fix Output.log error llvm-svn: 295305	2017-02-16 10:53:27 +00:00
Krasimir Georgiev	1b8bfd4b76	[clangd] Implement format on type Summary: This patch adds onTypeFormatting to clangd. The trigger character is '}' and it works by scanning for the matching '{' and formatting the range in-between. There are problems with ';' as a trigger character, the cursor position is before the `\|`: ``` int main() { int i;\| } ``` becomes: ``` int main() { int i;\| } ``` which is not likely what the user intended. Also formatting at semicolon in a non-properly closed scope puts the following tokens in the same unwrapped line, which doesn't reformat nicely. Reviewers: bkramer Reviewed By: bkramer Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D29990 llvm-svn: 295304	2017-02-16 10:49:46 +00:00
Alexander Kornienko	762adef1a9	[clang-tidy] Ignore spaces between globs in the Checks option. llvm-svn: 295303	2017-02-16 10:23:18 +00:00
Diana Picus	7f82c87022	[ARM] GlobalISel: Make the FPR bank 64-bit wide Also add mappings for single and double precision FP, and use them for G_FADD and G_LOAD. llvm-svn: 295302	2017-02-16 10:12:49 +00:00
Erik Verbruggen	2c7c38d9bb	Cache FileID when translating diagnostics in PCH files Modules/preambles/PCH files can contain diagnostics, which, when used, are added to the current ASTUnit. For that to work, they are translated to use the current FileManager's FileIDs. When the entry is not the main file, all local source locations will be checked by a linear search. Now this is a problem, when there are lots of diagnostics (say, 25000) and lots of local source locations (say, 440000), and end up taking seconds when using such a preamble. The fix is to cache the last FileID, because many subsequent diagnostics refer to the same file. This reduces the time spent in ASTUnit::TranslateStoredDiagnostics from seconds to a few milliseconds for files with many slocs/diagnostics. This fixes PR31353. Differential Revision: https://reviews.llvm.org/D29755 llvm-svn: 295301	2017-02-16 09:49:30 +00:00
Diana Picus	21c3d8e0fc	[ARM] GlobalISel: Legalize 64-bit G_FADD and G_LOAD For now we just mark them as legal all the time and let the other passes bail out if they can't handle it. In the future, we'll want to move more of the brains into the legalizer. llvm-svn: 295300	2017-02-16 09:09:49 +00:00
Vitaly Buka	f813697f05	[sanitizers] Fix formatting of the shell script. llvm-svn: 295299	2017-02-16 08:47:27 +00:00
George Rimar	09015fee3c	[ELF] - Allow section to have multiple dependent sections. That fixes a case when section has more than one metadata section. Previously GC would collect one of such sections because we had implementation that stored only last one as dependent. Differential revision: https://reviews.llvm.org/D29981 llvm-svn: 295298	2017-02-16 08:41:19 +00:00
NAKAMURA Takumi	14246c937d	RWMutex.h: Use llvm-config.h instead of config.h in installed headers. llvm-svn: 295297	2017-02-16 08:22:08 +00:00
Vitaly Buka	69068dd50e	[sanitizers] Redirect pthread calls to interceptors. It's needed if libcxx is build without disabling threads. llvm-svn: 295296	2017-02-16 08:06:17 +00:00
Diana Picus	ca6a890d7f	[ARM] GlobalISel: Lower double precision FP args For the hard float calling convention, we just use the D registers. For the soft-fp calling convention, we use the R registers and move values to/from the D registers by means of G_SEQUENCE/G_EXTRACT. While doing so, we make sure to honor the endianness of the target, since the CCAssignFn doesn't do that for us. For pure soft float targets, we still bail out because we don't support the libcalls yet. llvm-svn: 295295	2017-02-16 07:53:07 +00:00
Craig Topper	3731f4d173	[AVX-512][InstCombine] Teach InstCombine to optimize 512-bit packss/packus intrinsics like it does 128/256-bit. llvm-svn: 295294	2017-02-16 07:35:23 +00:00
Richard Trieu	e55fb7f6f1	Revert r295284: Add better ODR checking for modules. Fix modules build bot. llvm-svn: 295293	2017-02-16 07:09:18 +00:00
Roman Gareev	4eb07e481e	[FIX] Fix the typo in ScheduleOptimizer.cpp. llvm-svn: 295292	2017-02-16 07:04:41 +00:00
Craig Topper	f0d1147fae	[AVX-512] Replace 512-bit masked packss/packus builtins and replace with new unmasked builtins. These new unmasked builtins will enable us to easily support optimizing these builtins in InstCombine in the backend. llvm-svn: 295291	2017-02-16 06:32:07 +00:00
Craig Topper	715873ead3	[AVX-512] Remove masked packss/packus intrinsics and autoupgrade to unmasked intrinsics with select instructions. For 512-bit add new unmasked intrinsics. The new 512-bit unmasked intrinsics will make it easy to handle these with the SSE/AVX intrinsics in InstCombine where we currently have a TODO. llvm-svn: 295290	2017-02-16 06:31:54 +00:00
Rui Ueyama	cd19b039ce	Use isRelExprOneOf. llvm-svn: 295289	2017-02-16 06:24:16 +00:00
Rui Ueyama	f829e8c97d	Removes a trivial accessor. llvm-svn: 295288	2017-02-16 06:12:41 +00:00
Rui Ueyama	924b361d01	Do not overload a one-bit variable, NeedsCopyOrPltAddr. This patch removes NeedsCopyOrPltAddr and instead add two variables, NeedsCopy and NeedsPltAddr. This uses one more bit in Symbol class, but the actual size doesn't increase because we had unused bits. This should improve code readability. llvm-svn: 295287	2017-02-16 06:12:22 +00:00
Richard Trieu	2700dc1302	Loosen a Type check ODR checking to try to fix the build bot. llvm-svn: 295286	2017-02-16 05:48:25 +00:00
Petr Hosek	63524f56f1	[libunwind][CMake] Use libc++ headers when available libunwind depends on C++ library headers. When building libunwind as part of LLVM and libc++ is available, use its headers. Differential Revision: https://reviews.llvm.org/D29997 llvm-svn: 295285	2017-02-16 05:18:08 +00:00
Richard Trieu	f351ac8987	Add better ODR checking for modules. Recommit r293585 that was reverted in r293611 with new fixes. The previous issue was determined to be an overly aggressive AST visitor from forward declared objects. The visitor will now only deeply visit certain Decl's and only do a shallow information extraction from all other Decl's. When objects are imported for modules, there is a chance that a name collision will cause an ODR violation. Previously, only a small number of such violations were detected. This patch provides a stronger check based on AST nodes. The information needed to uniquely identify an object is taken from the AST and put into a one-dimensional byte stream. This stream is then hashed to give a value to represent the object, which is stored with the other object data in the module. When modules are loaded, and Decl's are merged, the hash values of the two Decl's are compared. Only Decl's with matched hash values will be merged. Mismatch hashes will generate a module error, and if possible, point to the first difference between the two objects. The transform from AST to byte stream is a modified depth first algorithm. Due to references between some AST nodes, a pure depth first algorithm could generate loops. For Stmt nodes, a straight depth first processing occurs. For Type and Decl nodes, they are replaced with an index number and only on first visit will these nodes be processed. As an optimization, boolean values are saved and stored together in reverse order at the end of the byte stream to lower the ammount of data that needs to be hashed. Compile time impact was measured at 1.5-2.0% during module building, and negligible during builds without module building. Differential Revision: https://reviews.llvm.org/D21675 llvm-svn: 295284	2017-02-16 04:53:40 +00:00
Rui Ueyama	26ad057099	Add comments. llvm-svn: 295283	2017-02-16 04:51:46 +00:00

1 2 3 4 5 ...

254991 Commits All Branches Search

254991 Commits

All Branches