llvm-project

Commit Graph

Author	SHA1	Message	Date
Stanislav Mekhanoshin	502b3bfc6a	[AMDGPU] require s-memtime-inst for __builtin_amdgcn_s_memtime Differential Revision: https://reviews.llvm.org/D97420	2021-02-25 08:31:59 -08:00
Dávid Bolvanský	cd54c57919	Reland "[Libcalls, Attrs] Annotate libcalls with noundef" Fixed Clang tests.	2021-02-20 06:18:48 +01:00
Stanislav Mekhanoshin	a8d9d50762	[AMDGPU] gfx90a support Differential Revision: https://reviews.llvm.org/D96906	2021-02-17 16:01:32 -08:00
serge-sans-paille	3c8bf29f14	Reduce the number of attributes attached to each function This takes advantage of the implicit default behavior to reduce the number of attributes, which in turns reduces compilation time. I've observed -3% in instruction count when compiling sqlite3 amalgamation with -O0 Differential Revision: https://reviews.llvm.org/D96400	2021-02-16 16:19:54 +01:00
James Y Knight	8043d5a964	NFC: update clang tests to check ordering and alignment for atomicrmw/cmpxchg. The ability to specify alignment was recently added, and it's an important property which we should ensure is set as expected by Clang. (Especially before making further changes to Clang's code in this area.) But, because it's on the end of the lines, the existing tests all ignore it. Therefore, update all the tests to also verify the expected alignment for atomicrmw and cmpxchg. While I was in there, I also updated uses of 'load atomic' and 'store atomic', and added the memory ordering, where that was missing.	2021-02-11 17:35:09 -05:00
Anastasia Stulova	79b222c39f	[OpenCL] Fix types with signed prefix in arginfo metadata. Signed prefix is removed and the single word spelling is printed for the scalar types. Tags: #clang Differential Revision: https://reviews.llvm.org/D96161	2021-02-09 15:13:19 +00:00
Anastasia Stulova	ecc8ac3f08	[OpenCL] Fix pipe type printing in arg info metadata Pipe element type spelling for arg info metadata should follow the same behavior as normal type spelling. We should only use the canonical type spelling in the base type field. This patch also removed duplication in type handling. Tags: #clang Differential Revision: https://reviews.llvm.org/D96151	2021-02-08 16:05:13 +00:00
Stanislav Mekhanoshin	8e661d3d9c	[AMDGPU] Set s-memtime-inst feature from clang Differential Revision: https://reviews.llvm.org/D95733	2021-02-01 14:20:43 -08:00
Sven van Haastregt	526c42e76c	[OpenCL] Hide sampler-less read_image builtins before CL1.2 Ensure sampler-less image read functions are not available with `-fdeclare-opencl-builtins` before OpenCL 1.2.	2021-01-28 11:14:19 +00:00
Sven van Haastregt	79c727328b	[clang] Fix signedness in vector bitcast evaluation The included test case triggered a sign assertion on the result in `Success()`. This was caused by the APSInt created for a bitcast having its signedness bit inverted. The second APSInt constructor argument is `isUnsigned`, so invert the result of `isSignedIntegerType`. Relanding this patch after reverting. The test case had to be updated to be insensitive to 32/64-bit extractelement indices. Differential Revision: https://reviews.llvm.org/D95135	2021-01-27 09:30:26 +00:00
Sven van Haastregt	b16fb1ffc3	Revert "[clang] Fix signedness in vector bitcast evaluation" This reverts commit `14947cd047` because it broke clang-cmake-armv7-quick.	2021-01-25 12:43:30 +00:00
Sven van Haastregt	14947cd047	[clang] Fix signedness in vector bitcast evaluation The included test case triggered a sign assertion on the result in `Success()`. This was caused by the APSInt created for a bitcast having its signedness bit inverted. The second APSInt constructor argument is `isUnsigned`, so invert the result of `isSignedIntegerType`. Differential Revision: https://reviews.llvm.org/D95135	2021-01-25 12:01:42 +00:00
Nikita Popov	65fd034b95	[FunctionAttrs] Infer willreturn for functions without loops If a function doesn't contain loops and does not call non-willreturn functions, then it is willreturn. Loops are detected by checking for backedges in the function. We don't attempt to handle finite loops at this point. Differential Revision: https://reviews.llvm.org/D94633	2021-01-21 20:29:33 +01:00
Sven van Haastregt	29d375f5ff	[OpenCL][NFC] Improve OpenCL test file naming Change "negative" into "invalid" and put "invalid" at the beginning of the file name, following the bulk of the invalid tests in the SemaOpenCL directory. Use the "invalid-" prefix only for tests that contain only invalid constructs. Drop the "valid" suffix for CodeGen tests, as inputs in this directory are supposed to be valid anyway.	2021-01-06 14:16:44 +00:00
Fangrui Song	fd739804e0	[test] Add {{.}} to make ELF tests immune to dso_local/dso_preemptable/(none) differences For a default visibility external linkage definition, dso_local is set for ELF -fno-pic/-fpie and COFF and Mach-O. Since default clang -cc1 for ELF is similar to -fpic ("PIC Level" is not set), this nuance causes unneeded binary format differences. To make emitted IR similar, ELF -cc1 -fpic will default to -fno-semantic-interposition, which sets dso_local for default visibility external linkage definitions. To make this flip smooth and enable future (dso_local as definition default), this patch replaces (function) `define ` with `define{{.}} `, (variable/constant/alias) `= ` with `={{.}} `, or inserts appropriate `{{.}} `.	2020-12-31 00:27:11 -08:00
Fangrui Song	6b3351792c	[test] Add {{.}} to make tests immune to dso_local/dso_preemptable/(none) differences For a definition (of most linkage types), dso_local is set for ELF -fno-pic/-fpie and COFF, but not for Mach-O. This nuance causes unneeded binary format differences. This patch replaces (function) `define ` with `define{{.}} `, (variable/constant/alias) `= ` with `={{.}} `, or inserts appropriate `{{.}} ` if there is an explicit linkage. * Clang will set dso_local for Mach-O, which is currently implied by TargetMachine.cpp. This will make COFF/Mach-O and executable ELF similar. * Eventually I hope we can make dso_local the textual LLVM IR default (write explicit "dso_preemptable" when applicable) and -fpic ELF will be similar to everything else. This patch helps move toward that goal.	2020-12-30 20:52:01 -08:00
Juneyoung Lee	9b29610228	Use unary CreateShuffleVector if possible As mentioned in D93793, there are quite a few places where unary `IRBuilder::CreateShuffleVector(X, Mask)` can be used instead of `IRBuilder::CreateShuffleVector(X, Undef, Mask)`. Let's update them. Actually, it would have been more natural if the patches were made in this order: (1) let them use unary CreateShuffleVector first (2) update IRBuilder::CreateShuffleVector to use poison as a placeholder value (D93793) The order is swapped, but in terms of correctness it is still fine. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D93923	2020-12-30 22:36:08 +09:00
Juneyoung Lee	278aa65cc4	[IR] Let IRBuilder's CreateVectorSplat/CreateShuffleVector use poison as placeholder This patch updates IRBuilder to create insertelement/shufflevector using poison as a placeholder. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D93793	2020-12-30 04:21:04 +09:00
Tony	92ab6ed667	[AMDGPU] Add missing targets to amdgpu-features.cl Differential Revision: https://reviews.llvm.org/D93017	2020-12-12 18:19:02 +00:00
Melanie Blower	320af6b138	Create SPIRABIInfo to enable SPIR_FUNC calling convention. Background: Call to library arithmetic functions for div is emitted by the compiler and it set wrong “C” calling convention for calls to these functions, whereas library functions are declared with `spir_function` calling convention. InstCombine optimization replaces such calls with “unreachable” instruction. It looks like clang lacks SPIRABIInfo class which should specify default calling conventions for “system” function calls. SPIR supports only SPIR_FUNC and SPIR_KERNEL calling convention. Reviewers: Erich Keane, Anastasia Differential Revision: https://reviews.llvm.org/D92721	2020-12-12 05:48:20 -08:00
Yaxun (Sam) Liu	efc063b621	Fix lit test failure due to 0b81d9 These lit tests now requires amdgpu-registered-target since they use clang driver and clang driver passes an LLVM option which is available only if amdgpu target is registered. Change-Id: I2df31967409f1627fc6d342d1ab5cc8aa17c9c0c	2020-12-07 19:50:21 -05:00
Alex Richardson	51e09e1d5a	[AMDGPU] Set the default globals address space to 1 This will ensure that passes that add new global variables will create them in address space 1 once the passes have been updated to no longer default to the implicit address space zero. This also changes AutoUpgrade.cpp to add -G1 to the DataLayout if it wasn't already to present to ensure bitcode backwards compatibility. Reviewed by: arsenm Differential Revision: https://reviews.llvm.org/D84345	2020-11-20 15:46:53 +00:00
Simon Pilgrim	c1e3d38301	[CodeGenOpenCL] Fix check prefix typo on convergent.cl test Noticed while fixing unused prefix warnings - there isn't actually any diff in the loop unrolled ir between old/new pass managers any more, so the broken checks were superfluous	2020-11-11 15:44:59 +00:00
Tim Renouf	89d41f3a2b	[AMDGPU] Add gfx1033 target Differential Revision: https://reviews.llvm.org/D90447 Change-Id: If2650fc7f31bbdd49c76e74a9ca8e3734d769761	2020-11-03 16:27:48 +00:00
Tim Renouf	ee3e642627	[AMDGPU] Add gfx90c target This differentiates the Ryzen 4000/4300/4500/4700 series APUs that were previously included in gfx909. Differential Revision: https://reviews.llvm.org/D90419 Change-Id: Ia901a7157eb2f73ccd9f25dbacec38427312377d	2020-11-03 16:27:43 +00:00
Jon Chesterfield	dee7704829	[AMDGPU] Add __builtin_amdgcn_grid_size [AMDGPU] Add __builtin_amdgcn_grid_size Similar to D76772, loads the data from the dispatch pointer. Marked invariant. Patch also updates the openmp devicertl to use this builtin. Reviewed By: yaxunl Differential Revision: https://reviews.llvm.org/D90251	2020-10-29 16:25:13 +00:00
Tony	5984097823	[AMDGPU] Add missing support for targets - Add missing tests. Differential Revision: https://reviews.llvm.org/D90212	2020-10-27 15:36:31 +00:00
Richard Smith	6781fee085	Don't permit array bound constant folding in OpenCL. Permitting non-standards-driven "do the best you can" constant-folding of array bounds is permitted solely as a GNU compatibility feature. We should not be doing it in any language mode that is attempting to be conforming. From https://reviews.llvm.org/D20090 it appears the intent here was to permit `__constant int` globals to be used in array bounds, but the change in that patch only added half of the functionality necessary to support that in the constant evaluator. This patch adds the other half of the functionality and turns off constant folding for array bounds in OpenCL. I couldn't find any spec justification for accepting the kinds of cases that D20090 accepts, so a reference to where in the OpenCL specification this is permitted would be useful. Note that this change also affects the code generation in one test: because after 'const int n = 0' we now treat 'n' as a constant expression with value 0, it's now a null pointer, so '(local int *)n' forms a null pointer rather than a zero pointer. Reviewed By: Anastasia Differential Revision: https://reviews.llvm.org/D89520	2020-10-20 16:52:28 -07:00
Matt Arsenault	0a7cd99a70	Reapply "OpaquePtr: Add type to sret attribute" This reverts commit `eb9f7c28e5`. Previously this was incorrectly handling linking of the contained type, so this merges the fixes from D88973.	2020-10-16 11:05:02 -04:00
Stanislav Mekhanoshin	d1beb95d12	[AMDGPU] gfx1032 target Differential Revision: https://reviews.llvm.org/D89487	2020-10-15 12:41:18 -07:00
Tim Renouf	666ef0db20	[AMDGPU] Add gfx602, gfx705, gfx805 targets At AMD, in an internal audit of our code, we found some corner cases where we were not quite differentiating targets enough for some old hardware. This commit is part of fixing that by adding three new targets: * The "Oland" and "Hainan" variants of gfx601 are now split out into gfx602. LLPC (in the GPUOpen driver) and other front-ends could use that to avoid using the shaderZExport workaround on gfx602. * One variant of gfx703 is now split out into gfx705. LLPC and other front-ends could use that to avoid using the shaderSpiCsRegAllocFragmentation workaround on gfx705. * The "TongaPro" variant of gfx802 is now split out into gfx805. TongaPro has a faster 64-bit shift than its former friends in gfx802, and a subtarget feature could be set up for that to take advantage of it. This commit does not make that change; it just adds the target. V2: Add clang changes. Put TargetParser list in order. V3: AMDGCNGPUs table in TargetParser.cpp needs to be in GPUKind order, so fix the GPUKind order. Differential Revision: https://reviews.llvm.org/D88916 Change-Id: Ia901a7157eb2f73ccd9f25dbacec38427312377d	2020-10-10 17:22:22 +01:00
Michael Liao	8c36eaf037	[clang][opencl][codegen] Remove the insertion of `correctly-rounded-divide-sqrt-fp-math` fn-attr. - `-cl-fp32-correctly-rounded-divide-sqrt` is already handled in a per-instruction manner by annotating the accuracy required. There's no need to add that fn-attr. So far, there's no in-tree backend handling that attr and that OpenCL specific option. - In case that out-of-tree backends are broken, this change could be reverted if those backends could not be fixed. Differential Revision: https://reviews.llvm.org/D88424	2020-10-01 11:07:39 -04:00
Tres Popp	eb9f7c28e5	Revert "OpaquePtr: Add type to sret attribute" This reverts commit `55c4ff91bd`. Issues were introduced as discussed in https://reviews.llvm.org/D88241 where this change made previous bugs in the linker and BitCodeWriter visible.	2020-09-29 10:31:04 +02:00
Matt Arsenault	55c4ff91bd	OpaquePtr: Add type to sret attribute Make the corresponding change that was made for byval in `b7141207a4`. Like byval, this requires a bulk update of the test IR tests to include the type before this can be mandatory.	2020-09-25 14:07:30 -04:00
Stanislav Mekhanoshin	59691dc874	[AMDGPU] Make ds fp atomics overloadable Differential Revision: https://reviews.llvm.org/D87947	2020-09-23 11:39:50 -07:00
Matt Arsenault	30eeb742f1	clang: Use byref for aggregate kernel arguments Add address space to indirect abi info and use it for kernels. Previously, indirect arguments assumed assumed a stack passed object in the alloca address space using byval. A stack pointer is unsuitable for kernel arguments, which are passed in a separate, constant buffer with a different address space. Start using the new byref for aggregate kernel arguments. Previously these were emitted as raw struct arguments, and turned into loads in the backend. These will lower identically, although with byref you now have the option of applying an explicit alignment. In the future, a reasonable implementation would use byref for all kernel arguments (this would be a practical problem at the moment due to losing things like noalias on pointer arguments). This is mostly to avoid fighting the optimizer's treatment of aggregate load/store. SROA and instcombine both turn aggregate loads and stores into a long sequence of element loads and stores, rather than the optimizable memcpy I would expect in this situation. Now an explicit memcpy will be introduced up-front which is better understood and helps eliminate the alloca in more situations. This skips using byref in the case where HIP kernel pointer arguments in structs are promoted to global pointers. At minimum an additional patch is needed to allow coercion with indirect arguments. This also skips using it for OpenCL due to the current workaround used to support kernels calling kernels. Distinct function bodies would need to be generated up front instead of emitting an illegal call.	2020-08-06 15:52:26 -04:00
Stanislav Mekhanoshin	ea7d0e2996	[AMDGPU] gfx1031 target Differential Revision: https://reviews.llvm.org/D85337	2020-08-05 12:36:26 -07:00
Alexey Bader	8d27be8dba	[OpenCL] Add global_device and global_host address spaces This patch introduces 2 new address spaces in OpenCL: global_device and global_host which are a subset of a global address space, so the address space scheme will be looking like: ``` generic->global->host ->device ->private ->local constant ``` Justification: USM allocations may be associated with both host and device memory. We want to give users a way to tell the compiler the allocation type of a USM pointer for optimization purposes. (Link to the Unified Shared Memory extension: https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/USM/cl_intel_unified_shared_memory.asciidoc) Before this patch USM pointer could be only in opencl_global address space, hence a device backend can't tell if a particular pointer points to host or device memory. On FPGAs at least we can generate more efficient hardware code if the user tells us where the pointer can point - being able to distinguish between these types of pointers at compile time allows us to instantiate simpler load-store units to perform memory transactions. Patch by Dmitry Sidorov. Reviewed By: Anastasia Differential Revision: https://reviews.llvm.org/D82174	2020-07-29 17:24:53 +03:00
Max Kazantsev	360ab70712	[SimplifyCFG] Do not create unneeded PR Phi in block with convergent calls We do not thread blocks with convergent calls, but this check was missing when we decide to insert PR Phis into it (which we only do for threading). Differential Revision: https://reviews.llvm.org/D83936 Reviewed By: nikic	2020-07-22 13:53:50 +07:00
Max Kazantsev	90798e09e2	Re-enable "[InstCombine] Simplify boolean Phis with const inputs using CFG" This reverts commit `b893822e32`. + Clang test fixes + Insertion point fix for landing pads	2020-07-16 16:09:08 +07:00
Fangrui Song	b0b5162fc2	[Driver] Pass -gno-column-info instead of -dwarf-column-info Making -g[no-]column-info opt out reduces the length of a typical CC1 command line. Additionally, in a non-debug compile, we won't see -dwarf-column-info.	2020-07-05 11:50:38 -07:00
Roman Lebedev	7fed3cfadb	[clang] Fix two tests that are affected by llvm opt change	2020-07-04 18:26:22 +03:00
Dmitry Preobrazhensky	53422e8b4f	[AMDGPU] Added support of new inline assembler constraints Added support for constraints 'I', 'J', 'L', 'B', 'C', 'Kf', 'DA', 'DB'. See https://gcc.gnu.org/onlinedocs/gcc/Machine-Constraints.html#Machine-Constraints. Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D81657	2020-07-03 18:01:12 +03:00
Sven van Haastregt	bd46a56474	[OpenCL] Reject block arguments OpenCL 2.0 does not allow block arguments, primarily because it is difficult to support function pointers on the various architectures that OpenCL targets. Clang was still accepting them. Rename and reuse the `err_opencl_half_param` diagnostic. Fixes PR46324. Differential Revision: https://reviews.llvm.org/D82313	2020-06-29 14:13:12 +01:00
Melanie Blower	f4aaed3bf1	Reland D81869 "Modify FPFeatures to use delta not absolute settings" This reverts commit `defd43a5b3`. with correction to solve msan report To solve https://bugs.llvm.org/show_bug.cgi?id=46166 where the floating point settings in PCH files aren't compatible, rewrite FPFeatures to use a delta in the settings rather than absolute settings. With this patch, these floating point options can be benign. Reviewers: rjmccall Differential Revision: https://reviews.llvm.org/D81869	2020-06-27 01:34:57 -07:00
Matt Arsenault	9e03bdebc1	AMDGPU: Add llvm.amdgcn.sqrt intrinsic I spread the GlobalISel test into the regular one, which I've been avoiding so far.	2020-06-26 15:07:07 -04:00
Melanie Blower	defd43a5b3	Revert "Revert "Revert "Modify FPFeatures to use delta not absolute settings""" This reverts commit `9518763d71`. Memory sanitizer fails in CGFPOptionsRAII::CGFPOptionsRAII dtor	2020-06-26 08:47:04 -07:00
Melanie Blower	9518763d71	Revert "Revert "Modify FPFeatures to use delta not absolute settings"" This reverts commit `b55d723ed6`. Reapply Modify FPFeatures to use delta not absolute settings To solve https://bugs.llvm.org/show_bug.cgi?id=46166 where the floating point settings in PCH files aren't compatible, rewrite FPFeatures to use a delta in the settings rather than absolute settings. With this patch, these floating point options can be benign. Reviewers: rjmccall Differential Revision: https://reviews.llvm.org/D81869	2020-06-26 08:00:08 -07:00
Melanie Blower	b55d723ed6	Revert "Modify FPFeatures to use delta not absolute settings" This reverts commit `3a748cbf86`. I'm reverting this commit because I forgot to format the commit message propertly. Sorry for the thrash.	2020-06-26 07:52:57 -07:00
Melanie Blower	3a748cbf86	Modify FPFeatures to use delta not absolute settings	2020-06-26 07:41:09 -07:00
Stanislav Mekhanoshin	9ee272f13d	[AMDGPU] Add gfx1030 target Differential Revision: https://reviews.llvm.org/D81886	2020-06-15 16:18:05 -07:00
Stanislav Mekhanoshin	58de24ce6c	[AMDGPU] Sorted targets in amdgpu-features.cl. NFC.	2020-06-12 11:57:40 -07:00
Akira Hatanaka	c9a52de002	[CodeGen] Simplify the way lifetime of block captures is extended Rather than pushing inactive cleanups for the block captures at the entry of a full expression and activating them during the creation of the block literal, just call pushLifetimeExtendedDestroy to ensure the cleanups are popped at the end of the scope enclosing the block expression. rdar://problem/63996471 Differential Revision: https://reviews.llvm.org/D81624	2020-06-11 16:06:22 -07:00
Douglas Yung	086be9fb20	Fix test on PS4 linux bot. Commit `301a6da8c2` changed the test and modified a CHECK line that is inconsisent with similar lines elsewhere in the file and was causing failures when run in slightly different configurations. This change makes the line more consistent and should fix the bot failure. Failure link: http://lab.llvm.org:8011/builders/llvm-clang-lld-x86_64-scei-ps4-ubuntu-fast/builds/68593	2020-06-02 20:17:02 +00:00
Matt Arsenault	301a6da8c2	AMDGPU: Fix clang side null pointer value for private The change to fold_priv_arith looks strange to me, but this was already the untested behavior for local.	2020-06-02 09:23:46 -04:00
John McCall	8a8d703be0	Fix how cc1 command line options are mapped into FP options. Canonicalize on storing FP options in LangOptions instead of redundantly in CodeGenOptions. Incorporate -ffast-math directly into the values of those LangOptions rather than considering it separately when building FPOptions. Build IR attributes from those options rather than a mix of sources. We should really simplify the driver/cc1 interaction here and have the driver pass down options that cc1 directly honors. That can happen in a follow-up, though. Patch by Michele Scandale! https://reviews.llvm.org/D80315	2020-06-01 22:00:30 -04:00
Matt Arsenault	97f3f0bab0	AMDGPU: Add intrinsic for s_setreg This will be more useful with fenv access implemented.	2020-05-28 14:26:38 -04:00
Melanie Blower	827be690dc	[clang] FastMathFlags.allowContract should be initialized only from FPFeatures.allowFPContractAcrossStatement Summary: Fix bug introduced in D72841 adding support for pragma float_control Reviewers: rjmccall, Anastasia Differential Revision: https://reviews.llvm.org/D79903	2020-05-20 06:19:10 -07:00
Joel E. Denny	a1fd188223	[FileCheck] Support comment directives Sometimes you want to disable a FileCheck directive without removing it entirely, or you want to write comments that mention a directive by name. The `COM:` directive makes it easy to do this. For example, you might have: ``` ; X32: pinsrd_1: ; X32: pinsrd $1, 4(%esp), %xmm0 ; COM: FIXME: X64 isn't working correctly yet for this part of codegen, but ; COM: X64 will have something similar to X32: ; COM: ; COM: X64: pinsrd_1: ; COM: X64: pinsrd $1, %edi, %xmm0 ``` Without this patch, you need to use some combination of rewording and directive syntax mangling to prevent FileCheck from recognizing the commented occurrences of `X32:` and `X64:` above as directives. Moreover, FileCheck diagnostics have been proposed that might complain about the occurrences of `X64` that don't have the trailing `:` because they look like directive typos: <http://lists.llvm.org/pipermail/llvm-dev/2020-April/140610.html> I think dodging all these problems can prove tedious for test authors, and directive syntax mangling already makes the purpose of existing test code unclear. `COM:` can avoid all these problems. This patch also updates the small set of existing tests that define `COM` as a check prefix: - clang/test/CodeGen/default-address-space.c - clang/test/CodeGenOpenCL/addr-space-struct-arg.cl - clang/test/Driver/hip-device-libs.hip - llvm/test/Assembler/drop-debug-info-nonzero-alloca.ll I think lit should support `COM:` as well. Perhaps `clang -verify` should too. Reviewed By: jhenderson, thopre Differential Revision: https://reviews.llvm.org/D79276	2020-05-13 11:29:48 -04:00
Joel E. Denny	d0e7fd6b62	Revert "[FileCheck] Support comment directives" This reverts commit `9a9a5f9893` to try to fix a bot: http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/23489	2020-05-11 19:41:22 -04:00
Joel E. Denny	9a9a5f9893	[FileCheck] Support comment directives Sometimes you want to disable a FileCheck directive without removing it entirely, or you want to write comments that mention a directive by name. The `COM:` directive makes it easy to do this. For example, you might have: ``` ; X32: pinsrd_1: ; X32: pinsrd $1, 4(%esp), %xmm0 ; COM: FIXME: X64 isn't working correctly yet for this part of codegen, but ; COM: X64 will have something similar to X32: ; COM: ; COM: X64: pinsrd_1: ; COM: X64: pinsrd $1, %edi, %xmm0 ``` Without this patch, you need to use some combination of rewording and directive syntax mangling to prevent FileCheck from recognizing the commented occurrences of `X32:` and `X64:` above as directives. Moreover, FileCheck diagnostics have been proposed that might complain about the occurrences of `X64` that don't have the trailing `:` because they look like directive typos: <http://lists.llvm.org/pipermail/llvm-dev/2020-April/140610.html> I think dodging all these problems can prove tedious for test authors, and directive syntax mangling already makes the purpose of existing test code unclear. `COM:` can avoid all these problems. This patch also updates the small set of existing tests that define `COM` as a check prefix: - clang/test/CodeGen/default-address-space.c - clang/test/CodeGenOpenCL/addr-space-struct-arg.cl - clang/test/Driver/hip-device-libs.hip - llvm/test/Assembler/drop-debug-info-nonzero-alloca.ll I think lit should support `COM:` as well. Perhaps `clang -verify` should too. Reviewed By: jhenderson, thopre Differential Revision: https://reviews.llvm.org/D79276	2020-05-11 14:53:48 -04:00
Melanie Blower	f5360d4bb3	Reapply "Add support for #pragma float_control" with buildbot fixes Add support for #pragma float_control Reviewers: rjmccall, erichkeane, sepavloff Differential Revision: https://reviews.llvm.org/D72841 This reverts commit `fce82c0ed3`.	2020-05-04 05:51:25 -07:00
Melanie Blower	fce82c0ed3	Revert "Reapply "Add support for #pragma float_control" with improvements to" This reverts commit `69aacaf699`.	2020-05-01 10:31:09 -07:00
Melanie Blower	69aacaf699	Reapply "Add support for #pragma float_control" with improvements to test cases Add support for #pragma float_control Reviewers: rjmccall, erichkeane, sepavloff Differential Revision: https://reviews.llvm.org/D72841 This reverts commit `85dc033cac`, and makes corrections to the test cases that failed on buildbots.	2020-05-01 10:03:30 -07:00
Melanie Blower	85dc033cac	Revert "Add support for #pragma float_control" This reverts commit `4f1e9a17e9`. due to fail on buildbot, sorry for the noise	2020-05-01 06:36:58 -07:00
Melanie Blower	4f1e9a17e9	Add support for #pragma float_control Reviewers: rjmccall, erichkeane, sepavloff Differential Revision: https://reviews.llvm.org/D72841	2020-05-01 06:14:24 -07:00
Matt Arsenault	580a9f2c30	Fix test without built AMDGPU	2020-04-27 13:32:00 -04:00
Matt Arsenault	5c03beefa7	clang: Allow backend unsupported warnings Currently this asserts on anything other than errors. In one workaround scenario, AMDGPU emits DiagnosticInfoUnsupported as a warning for functions that can't be correctly codegened, but should never be executed.	2020-04-27 12:14:51 -04:00
Erich Keane	5f0903e9be	Reland Implement _ExtInt as an extended int type specifier. I fixed the LLDB issue, so re-applying the patch. This reverts commit `a4b88c0449`.	2020-04-17 10:45:48 -07:00
Sterling Augustine	a4b88c0449	Revert "Implement _ExtInt as an extended int type specifier." This reverts commit `61ba1481e2`. I'm reverting this because it breaks the lldb build with incomplete switch coverage warnings. I would fix it forward, but am not familiar enough with lldb to determine the correct fix. lldb/source/Plugins/TypeSystem/Clang/TypeSystemClang.cpp:3958:11: error: enumeration values 'DependentExtInt' and 'ExtInt' not handled in switch [-Werror,-Wswitch] switch (qual_type->getTypeClass()) { ^ lldb/source/Plugins/TypeSystem/Clang/TypeSystemClang.cpp:4633:11: error: enumeration values 'DependentExtInt' and 'ExtInt' not handled in switch [-Werror,-Wswitch] switch (qual_type->getTypeClass()) { ^ lldb/source/Plugins/TypeSystem/Clang/TypeSystemClang.cpp:4889:11: error: enumeration values 'DependentExtInt' and 'ExtInt' not handled in switch [-Werror,-Wswitch] switch (qual_type->getTypeClass()) {	2020-04-17 10:29:40 -07:00
Erich Keane	61ba1481e2	Implement _ExtInt as an extended int type specifier. Introduction/Motivation: LLVM-IR supports integers of non-power-of-2 bitwidth, in the iN syntax. Integers of non-power-of-two aren't particularly interesting or useful on most hardware, so much so that no language in Clang has been motivated to expose it before. However, in the case of FPGA hardware normal integer types where the full bitwidth isn't used, is extremely wasteful and has severe performance/space concerns. Because of this, Intel has introduced this functionality in the High Level Synthesis compiler[0] under the name "Arbitrary Precision Integer" (ap_int for short). This has been extremely useful and effective for our users, permitting them to optimize their storage and operation space on an architecture where both can be extremely expensive. We are proposing upstreaming a more palatable version of this to the community, in the form of this proposal and accompanying patch. We are proposing the syntax _ExtInt(N). We intend to propose this to the WG14 committee[1], and the underscore-capital seems like the active direction for a WG14 paper's acceptance. An alternative that Richard Smith suggested on the initial review was __int(N), however we believe that is much less acceptable by WG14. We considered _Int, however _Int is used as an identifier in libstdc++ and there is no good way to fall back to an identifier (since _Int(5) is indistinguishable from an unnamed initializer of a template type named _Int). [0]https://www.intel.com/content/www/us/en/software/programmable/quartus-prime/hls-compiler.html) [1]http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2472.pdf Differential Revision: https://reviews.llvm.org/D73967	2020-04-17 07:10:57 -07:00
Matt Arsenault	4593e4131a	AMDGPU: Teach toolchain to link rocm device libs Currently the library is separately linked, but this isn't correct to implement fast math flags correctly. Each module should get the version of the library appropriate for its combination of fast math and related flags, with the attributes propagated into its functions and internalized. HIP already maintains the list of libraries, but this is not used for OpenCL. Unfortunately, HIP uses a separate --hip-device-lib argument, despite both languages using the same bitcode library. Eventually these two searches need to be merged. An additional problem is there are 3 different locations the libraries are installed, depending on which build is used. This also needs to be consolidated (or at least the search logic needs to deal with this unnecessary complexity).	2020-04-10 13:37:32 -04:00
Yaxun (Sam) Liu	b72fce1ffd	Fix __builtin_amdgcn_workgroup_size_x/y/z return type https://reviews.llvm.org/D77390	2020-04-03 09:56:30 -04:00
Yaxun (Sam) Liu	a46e7d7a5f	[AMDGPU] Allow AGPR in inline asm Differential Revision: https://reviews.llvm.org/D77329	2020-04-03 09:08:13 -04:00
Matt Arsenault	ce2258c1cd	clang/AMDGPU: Stop setting old denormal subtarget features	2020-04-02 17:17:12 -04:00
Yaxun (Sam) Liu	369e26ca9e	[AMDGPU] Add __builtin_amdgcn_workgroup_size_x/y/z The main purpose of introducing these builtins is to add a range metadata [1, 1025) on the work group size loaded from dispatch ptr, which cannot be done by source code. Differential Revision: https://reviews.llvm.org/D76772	2020-03-28 01:03:20 -04:00
Erich Keane	fe5c719eaf	Implement post-commit comments for D75685/rG86e0a6c60627 @Anastasia made a pair of comments on D75685 after it was committed requesting changes to the test. This patch updates the test based on her comments.	2020-03-25 12:24:56 -07:00
Erich Keane	86e0a6c606	Add MS Mangling for OpenCL Pipe types, add mangling test. SPIRV2.0 Spec only specifies Linux mangling, however our downstream has use for a Windows mangling for these types. Unfortunately, the SPIRV spec specifies a single mangling for all pipe types, despite clang allowing overloading on these types. Because of this, this patch chooses to mangle the read/writability and element type for the windows mangling. The windows manglings in the test all demangle according to demangler: "void __cdecl test1(struct __clang::ocl_pipe<int,1>) "void __cdecl test2(struct __clang::ocl_pipe<float,0>) "void __cdecl test2(struct __clang::ocl_pipe<int,1>) "void __cdecl test3(struct __clang::ocl_pipe<int const,1>) "void __cdecl test4(struct __clang::ocl_pipe<union __clang::__vector<unsigned char,3>,1>) "void __cdecl test5(struct __clang::ocl_pipe<union __clang::__vector<int,4>,1>) "void __cdecl test_reserved_read_pipe(struct __clang::_ASCLglobal<struct Person > * __ptr64,struct __clang::ocl_pipe<struct Person,1>) Differential Revision: https://reviews.llvm.org/D75685	2020-03-25 07:59:22 -07:00
Erik Pilkington	de98cf92e3	[CodeGen] Add an alignment attribute to all sret parameters This fixes a miscompile when the parameter is actually underaligned. rdar://58316406 Differential revision: https://reviews.llvm.org/D74183	2020-03-24 15:31:57 -04:00
Matt Arsenault	3f533006ba	AMDGPU: Emit llvm.fshr for __builtin_amdgcn_alignbit These are equivalent. The generic rotate builtins do not directly map to the fshr intrinsic.	2020-03-23 16:51:25 -04:00
Sjoerd Meijer	3d9a0445cc	Recommit #2 "[Driver] Default to -fno-common for all targets" After a first attempt to fix the test-suite failures, my first recommit caused the same failures again. I had updated CMakeList.txt files of tests that needed -fcommon, but it turns out that there are also Makefiles which are used by some bots, so I've updated these Makefiles now too. See the original commit message for more details on this change: `0a9fc9233e`	2020-03-09 19:57:03 +00:00
Sjoerd Meijer	f35d112efd	Revert "Recommit "[Driver] Default to -fno-common for all targets"" This reverts commit `2c36c23f34`. Still problems in the test-suite, which I really thought I had fixed...	2020-03-09 10:37:28 +00:00
Sjoerd Meijer	2c36c23f34	Recommit "[Driver] Default to -fno-common for all targets" This includes fixes for: - test-suite: some benchmarks need to be compiled with -fcommon, see D75557. - compiler-rt: one test needed -fcommon, and another a change, see D75520.	2020-03-09 10:07:37 +00:00
Matt Arsenault	00b2a9df45	Reapply "clang: Treat ieee mode as the default for denormal-fp-math" This reverts commit `737394c490`. The fp-model test was failing on platforms that enable denormal flushing based on -ffast-math. This needs to reset to IEEE, not the default in these cases. Change-Id: Ibbad32f66d0d0b89b9c1173a3a96fb1a570ddd89	2020-03-06 11:46:55 -08:00
Jeremy Morse	737394c490	Revert "clang: Treat ieee mode as the default for denormal-fp-math" This reverts commit `c64ca93053`. This patch tripped a few build bots: http://lab.llvm.org:8011/builders/clang-x86_64-debian-fast/builds/24703/ http://lab.llvm.org:8011/builders/clang-cmake-x86_64-avx2-linux/builds/13465/ http://lab.llvm.org:8011/builders/clang-with-lto-ubuntu/builds/15994/ Reverting to clear the bots.	2020-03-05 10:55:24 +00:00
Matt Arsenault	c64ca93053	clang: Treat ieee mode as the default for denormal-fp-math The IR hasn't switched the default yet, so explicitly add the ieee attributes. I'm still not really sure how the target default denormal mode should interact with -fno-unsafe-math-optimizations. The target may have selected the default mode to be non-IEEE based on the flags or based on its true behavior, but we don't know which is the case. Since the only users of a non-IEEE mode without a flag still support IEEE mode, just reset to IEEE.	2020-03-04 23:34:02 -05:00
Sjoerd Meijer	4e363563fa	Revert "[Driver] Default to -fno-common for all targets" This reverts commit `0a9fc9233e`. Going to look at the asan failures. I find the failures in the test suite weird, because they look like compile time test and I don't understand how that can be failing, but will have a brief look at that too.	2020-03-03 10:00:36 +00:00
Sjoerd Meijer	0a9fc9233e	[Driver] Default to -fno-common for all targets This makes -fno-common the default for all targets because this has performance and code-size benefits and is more language conforming for C code. Additionally, GCC10 also defaults to -fno-common and so we get consistent behaviour with GCC. With this change, C code that uses tentative definitions as definitions of a variable in multiple translation units will trigger multiple-definition linker errors. Generally, this occurs when the use of the extern keyword is neglected in the declaration of a variable in a header file. In some cases, no specific translation unit provides a definition of the variable. The previous behavior can be restored by specifying -fcommon. As GCC has switched already, we benefit from applications already being ported and existing documentation how to do this. For example: - https://gcc.gnu.org/gcc-10/porting_to.html - https://wiki.gentoo.org/wiki/Gcc_10_porting_notes/fno_common Differential revision: https://reviews.llvm.org/D75056	2020-03-03 09:15:07 +00:00
Yaxun (Sam) Liu	a57d9652a0	Make __builtin_amdgcn_dispatch_ptr dereferenceable and align at 4 Differential Revision: https://reviews.llvm.org/D75028	2020-02-25 13:58:20 -05:00
Yaxun (Sam) Liu	fb44b9db95	[OpenCL][CUDA][HIP][SYCL] Add norecurse norecurse function attr indicates the function is not called recursively directly or indirectly. Add norecurse to OpenCL functions, SYCL functions in device compilation and CUDA/HIP kernels. Although there is LLVM pass adding norecurse to functions, it only works for whole-program compilation. Also FE adding norecurse can make that pass run faster since functions with norecurse do not need to be checked again. Differential Revision: https://reviews.llvm.org/D73651	2020-02-16 20:41:00 -05:00
Konstantin Pyzhov	987aa3435f	Corrected clang amdgpu-features.cl test for `6d614a82a4` (AMDGPU MFMA built-ins) Differential Revision: https://reviews.llvm.org/D72723	2020-01-28 05:41:42 -05:00
Konstantin Pyzhov	ac9b2a6297	Add missing clang tests for `6d614a82a4` (AMDGPU MFMA built-ins) Differential Revision: https://reviews.llvm.org/D72723	2020-01-28 04:41:21 -05:00
Konstantin Pyzhov	6d614a82a4	Summary: This CL adds clang declarations of built-in functions for AMDGPU MFMA intrinsics and instructions. OpenCL tests for new built-ins are included. Differential Revision: https://reviews.llvm.org/D72723	2020-01-28 03:51:27 -05:00
Matt Arsenault	a4451d88ee	Consolidate internal denormal flushing controls Currently there are 4 different mechanisms for controlling denormal flushing behavior, and about as many equivalent frontend controls. - AMDGPU uses the fp32-denormals and fp64-f16-denormals subtarget features - NVPTX uses the nvptx-f32ftz attribute - ARM directly uses the denormal-fp-math attribute - Other targets indirectly use denormal-fp-math in one DAGCombine - cl-denorms-are-zero has a corresponding denorms-are-zero attribute AMDGPU wants a distinct control for f32 flushing from f16/f64, and as far as I can tell the same is true for NVPTX (based on the attribute name). Work on consolidating these into the denormal-fp-math attribute, and a new type specific denormal-fp-math-f32 variant. Only ARM seems to support the two different flush modes, so this is overkill for the other use cases. Ideally we would error on the unsupported positive-zero mode on other targets from somewhere. Move the logic for selecting the flush mode into the compiler driver, instead of handling it in cc1. denormal-fp-math/denormal-fp-math-f32 are now both cc1 flags, but denormal-fp-math-f32 is not yet exposed as a user flag. -cl-denorms-are-zero, -fcuda-flush-denormals-to-zero and -fno-cuda-flush-denormals-to-zero will be mapped to -fp-denormal-math-f32=ieee or preserve-sign rather than the old attributes. Stop emitting the denorms-are-zero attribute for the OpenCL flag. It has no in-tree users. The meaning would also be target dependent, such as the AMDGPU choice to treat this as only meaning allow flushing of f32 and not f16 or f64. The naming is also potentially confusing, since DAZ in other contexts refers to instructions implicitly treating input denormals as zero, not necessarily flushing output denormals to zero. This also does not attempt to change the behavior for the current attribute. The LangRef now states that the default is ieee behavior, but this is inaccurate for the current implementation. The clang handling is slightly hacky to avoid touching the existing denormal-fp-math uses. Fixing this will be left for a future patch. AMDGPU is still using the subtarget feature to control the denormal mode, but the new attribute are now emitted. A future change will switch this and remove the subtarget features.	2020-01-17 20:09:53 -05:00
Matt Arsenault	9b549f26fa	AMDGPU: Update clang test	2020-01-16 18:10:29 -05:00
Sven van Haastregt	6713670b17	[OpenCL] Fix mangling of single-overload builtins Commit `9a8d477a0e` ("[OpenCL] Add builtin function attribute handling", 2019-11-05) stopped Clang from mangling single-overload builtins, which is incorrect.	2019-12-03 11:09:16 +00:00
Sven van Haastregt	9a8d477a0e	[OpenCL] Add builtin function attribute handling Add handling for the "pure", "const" and "convergent" function attributes for OpenCL builtin functions. Patch by Pierre Gondois and Sven van Haastregt. Differential Revision: https://reviews.llvm.org/D64319	2019-11-05 10:26:47 +00:00
Matt Arsenault	281f2e2c37	AMDGPU: Add builtins for is_shared/is_private llvm-svn: 371010	2019-09-05 03:00:43 +00:00
Matt Arsenault	eac783a900	AMDGPU: Always emit amdgpu-flat-work-group-size The backend default maximum should be the hardware maximum, so the frontend should set the implementation defined default maximum. llvm-svn: 370101	2019-08-27 19:25:40 +00:00
Matt Arsenault	acd0a53c02	Builtins: Start adding half versions of math builtins The implementation of the OpenCL builtin currently library uses 2 different hacks to get to the corresponding IR intrinsics from the source. This will allow removal of those. This is the set that is currently used (minus a few vector ones). llvm-svn: 367973	2019-08-06 03:28:37 +00:00

1 2 3 4 5 ...

548 Commits