llvm-project

Commit Graph

Author	SHA1	Message	Date
Eli Friedman	74909e4b6e	Rename MachineMemOperand::getOrdering -> getSuccessOrdering. Since this method can apply to cmpxchg operations, make sure it's clear what value we're actually retrieving. This will help ensure we don't accidentally ignore the failure ordering of cmpxchg in the future. We could potentially introduce a getOrdering() method on AtomicSDNode that asserts the operation isn't cmpxchg, but not sure that's worthwhile. Differential Revision: https://reviews.llvm.org/D103338	2021-06-21 16:49:27 -07:00
Kazu Hirata	23b0ab2acb	[llvm] Use the default value of drop_begin (NFC)	2021-01-18 10:16:36 -08:00
Kazu Hirata	19aacdb715	[llvm] Construct SmallVector with iterator ranges (NFC)	2021-01-16 09:40:53 -08:00
Vitaly Buka	b0eb40ca39	[NFC] Remove unused GetUnderlyingObject paramenter Depends on D84617. Differential Revision: https://reviews.llvm.org/D84621	2020-07-31 02:10:03 -07:00
Vitaly Buka	89051ebace	[NFC] GetUnderlyingObject -> getUnderlyingObject I am going to touch them in the next patch anyway	2020-07-30 21:08:24 -07:00
Mark de Wever	31262d6722	[NVPTX] Fixes -Wrange-loop-analysis warnings This avoids new warnings due to D68912 adds -Wrange-loop-analysis to -Wall. Also removed the top-level const as requested by Aaron Ballman in similar patches. Differential Revision: https://reviews.llvm.org/D71812	2019-12-22 19:27:44 +01:00
Reid Kleckner	5d986953c8	[IR] Split out target specific intrinsic enums into separate headers This has two main effects: - Optimizes debug info size by saving 221.86 MB of obj file size in a Windows optimized+debug build of 'all'. This is 3.03% of 7,332.7MB of object file size. - Incremental step towards decoupling target intrinsics. The enums are still compact, so adding and removing a single target-specific intrinsic will trigger a rebuild of all of LLVM. Assigning distinct target id spaces is potential future work. Part of PR34259 Reviewers: efriedma, echristo, MaskRay Reviewed By: echristo, MaskRay Differential Revision: https://reviews.llvm.org/D71320	2019-12-11 18:02:14 -08:00
Graham Hunter	3f08ad611a	[SVE][CodeGen] Scalable vector MVT size queries * Implements scalable size queries for MVTs, split out from D53137. * Contains a fix for FindMemType to avoid using scalable vector type to contain non-scalable types. * Explicit casts for several places where implicit integer sign changes or promotion from 32 to 64 bits caused problems. * CodeGenDAGPatterns will treat scalable and non-scalable vector types as different. Reviewers: greened, cameron.mcinally, sdesmalen, rovka Reviewed By: rovka Differential Revision: https://reviews.llvm.org/D66871	2019-11-18 12:30:59 +00:00
Dmitri Gribenko	14c69fefe6	Include what you use in NVPTX.h I also fixed all other files that were including NVPTX.h and were relying on transitive includes. llvm-svn: 362402	2019-06-03 14:26:50 +00:00
Bjorn Pettersson	71e8c6f20f	Add "const" in GetUnderlyingObjects. NFC Summary: Both the input Value pointer and the returned Value pointers in GetUnderlyingObjects are now declared as const. It turned out that all current (in-tree) uses of GetUnderlyingObjects were trivial to update, being satisfied with have those Value pointers declared as const. Actually, in the past several of the users had to use const_cast, just because of ValueTracking not providing a version of GetUnderlyingObjects with "const" Value pointers. With this patch we get rid of those const casts. Reviewers: hfinkel, materi, jkorous Reviewed By: jkorous Subscribers: dexonsmith, jkorous, jholewinski, sdardis, eraman, hiraditya, jrtc27, atanasyan, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61038 llvm-svn: 359072	2019-04-24 06:55:50 +00:00
Chandler Carruth	2946cd7010	Update the file headers across all of the LLVM projects in the monorepo to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636	2019-01-19 08:50:56 +00:00
Chandler Carruth	66654b72c9	[SDAG] Remove the reliance on MI's allocation strategy for `MachineMemOperand` pointers attached to `MachineSDNodes` and instead have the `SelectionDAG` fully manage the memory for this array. Prior to this change, the memory management was deeply confusing here -- The way the MI was built relied on the `SelectionDAG` allocating memory for these arrays of pointers using the `MachineFunction`'s allocator so that the raw pointer to the array could be blindly copied into an eventual `MachineInstr`. This creates a hard coupling between how `MachineInstr`s allocate their array of `MachineMemOperand` pointers and how the `MachineSDNode` does. This change is motivated in large part by a change I am making to how `MachineFunction` allocates these pointers, but it seems like a layering improvement as well. This would run the risk of increasing allocations overall, but I've implemented an optimization that should avoid that by storing a single `MachineMemOperand` pointer directly instead of allocating anything. This is expected to be a net win because the vast majority of uses of these only need a single pointer. As a side-effect, this makes the API for updating a `MachineSDNode` and a `MachineInstr` reasonably different which seems nice to avoid unexpected coupling of these two layers. We can map between them, but we shouldn't be surprised at where that occurs. =] Differential Revision: https://reviews.llvm.org/D50680 llvm-svn: 339740	2018-08-14 23:30:32 +00:00
Jonas Hahnfeld	20526bf483	[NVPTX] Select atomic loads and stores According to PTX ISA .volatile has the same memory synchronization semantics as .relaxed.sys, so it can be used to implement monotonic atomic loads and stores. This is important for OpenMP's atomic construct where - 'read's and 'write's are lowered to atomic loads and stores, and - an update of float or double types are lowered into a cmpxchg loop. (Note that PTX could do better because it has atom.add.f{32,64} but LLVM's atomicrmw instruction only allows integer types.) Higher levels of atomicity (like acquire and release) need additional synchronization properties which were added with PTX ISA 6.0 / sm_70. So using these instructions still results in an error. Differential Revision: https://reviews.llvm.org/D50391 llvm-svn: 339316	2018-08-09 07:45:49 +00:00
Benjamin Kramer	23d8282047	[NVPTX] Delete dead code No functionality change. llvm-svn: 335913	2018-06-28 20:05:35 +00:00
Artem Belevich	2f348ea1c7	[NVPTX] Added a feature to use short pointers for const/local/shared AS. Const/local/shared address spaces are all < 4GB and we can always use 32-bit pointers to access them. This has substantial performance impact on kernels that uses shared memory for intermediary results. The feature is disabled by default. Differential Revision: https://reviews.llvm.org/D46147 llvm-svn: 331941	2018-05-09 23:46:19 +00:00
Artem Belevich	a28e598ebb	[NVPTX] Fixed vectorized LDG for f16. v2f16 is a special case in NVPTX. v4f16 may be loaded as a pair of v2f16 and that was not previously handled correctly by tryLDGLDU() Differential Revision: https://reviews.llvm.org/D45339 llvm-svn: 329456	2018-04-06 21:10:24 +00:00
Artem Belevich	7b14e7f041	[NVPTX] TblGen-ized lowering of WMMA intrinsics. NFC. Differential Revision: https://reviews.llvm.org/D43151 llvm-svn: 327672	2018-03-15 21:40:56 +00:00
Artem Belevich	8c9749b1dc	[NVPTX] use pattern matching to lower int_nvvm_match_all_sync*. Now that patterns can handle intrinsics returning multiple results, use tablegen'ed pattern matching instead of custom lowering. Differential Revision: https://reviews.llvm.org/D43890 llvm-svn: 326457	2018-03-01 18:28:45 +00:00
Justin Lebar	faaf2d298e	[NVPTX] Lower loads from global constants using ld.global.nc (aka LDG). Summary: After D43914, loads from global variables in addrspace(1) happen with ld.global. But since they're constants, even better would be to use ld.global.nc, aka ldg. Reviewers: tra Subscribers: jholewinski, sanjoy, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D43915 llvm-svn: 326390	2018-02-28 23:58:05 +00:00
Matthias Braun	f1caa2833f	MachineFunction: Return reference from getFunction(); NFC The Function can never be nullptr so we can return a reference. llvm-svn: 320884	2017-12-15 22:22:58 +00:00
Richard Smith	7007f07664	Fix unused variable warning. llvm-svn: 318201	2017-11-14 21:26:46 +00:00
Artem Belevich	55dcf5e586	Mark intrinsics operating on the whole warp as IntrInaccessibleMemOnly It's needed to model the fact that they do access data from other threads in a warp and thus can't be CSE'd. llvm-svn: 318173	2017-11-14 19:14:00 +00:00
Artem Belevich	3bafc2f0d9	[NVPTX] Implemented wmma intrinsics and instructions. WMMA = "Warp Level Matrix Multiply-Accumulate". These are the new instructions introduced in PTX6.0 and available on sm_70 GPUs. Differential Revision: https://reviews.llvm.org/D38645 llvm-svn: 315601	2017-10-12 18:27:55 +00:00
Artem Belevich	bab95c7087	[NVPTX] added match.{any,all}.sync instructions, intrinsics & builtins. Differential Revision: https://reviews.llvm.org/D38191 llvm-svn: 314223	2017-09-26 17:07:23 +00:00
Justin Lebar	d31d5e6aa2	Revert "[NVPTX] added match.{any,all}.sync instructions, intrinsics & builtins.", rL314135. Causing assertion failures on macos: > Assertion failed: (Num < NumOperands && "Invalid child # of SDNode!"), > function getOperand, file > /Users/buildslave/jenkins/workspace/clang-stage1-cmake-RA-incremental/llvm/include/llvm/CodeGen/SelectionDAGNodes.h, > line 835. http://green.lab.llvm.org/green/job/clang-stage1-cmake-RA-incremental/42739/testReport/LLVM/CodeGen_NVPTX/surf_read_cuda_ll/ llvm-svn: 314142	2017-09-25 19:41:56 +00:00
Artem Belevich	9941ee9529	[NVPTX] added match.{any,all}.sync instructions, intrinsics & builtins. Differential Revision: https://reviews.llvm.org/D38191 llvm-svn: 314135	2017-09-25 18:53:57 +00:00
Benjamin Kramer	806ae44012	[NVPTX] Reduce copypasta. No functionality change intended. llvm-svn: 311295	2017-08-20 17:30:32 +00:00
Artem Belevich	ee7dd12ff4	[NVPTX] Reduce amount of boilerplate code used to select load instruction opcode. Make opcode selection code for the load instruction a bit easier to read and maintain. This patch also catches number of f16 load/store variants that were not handled before. Differential Revision: https://reviews.llvm.org/D30513 llvm-svn: 296785	2017-03-02 19:14:14 +00:00
Artem Belevich	620db1f3dd	[NVPTX] Added support for .f16x2 instructions. This patch enables support for .f16x2 operations. Added new register type Float16x2. Added support for .f16x2 instructions. Added handling of vectorized loads/stores of v2f16 values. Differential Revision: https://reviews.llvm.org/D30057 Differential Revision: https://reviews.llvm.org/D30310 llvm-svn: 296032	2017-02-23 22:38:24 +00:00
Justin Lebar	077f8fb168	[NVPTX] Move getDivF32Level, usePrecSqrtF32, and useF32FTZ into out of DAGToDAG and into TargetLowering. Summary: DADToDAG has access to TargetLowering, but not vice versa, so this is the more general location for these functions. NFC Reviewers: tra Subscribers: jholewinski, llvm-commits Differential Revision: https://reviews.llvm.org/D28795 llvm-svn: 292693	2017-01-21 01:00:14 +00:00
Artem Belevich	64dc9be7b4	[NVPTX] Added support for half-precision floating point. Only scalar half-precision operations are supported at the moment. - Adds general support for 'half' type in NVPTX. - fp16 math operations are supported on sm_53+ GPUs only (can be disabled with --nvptx-no-f16-math). - Type conversions to/from fp16 are supported on all GPU variants. - On GPU variants that do not have full fp16 support (or if it's disabled), fp16 operations are promoted to fp32 and results are converted back to fp16 for storage. Differential Revision: https://reviews.llvm.org/D28540 llvm-svn: 291956	2017-01-13 20:56:17 +00:00
Artem Belevich	d109f46573	[NVPTX] Only lower sin/cos to approximate instructions if unsafe math is allowed. Previously we'd always lower @llvm.{sin,cos}.f32 to {sin.cos}.approx.f32 instruction even when unsafe FP math was not allowed. Clang-generated IR is not affected by this as it uses precise sin/cos from CUDA's libdevice when unsafe math is disabled. Differential Revision: https://reviews.llvm.org/D28619 llvm-svn: 291936	2017-01-13 18:48:13 +00:00
Sanjay Patel	b1f0a0f4a8	getValueType().getSizeInBits() -> getValueSizeInBits() ; NFCI llvm-svn: 281493	2016-09-14 16:05:51 +00:00
Justin Lebar	6d6b11a4a6	[NVPTX] Use ldg for explicitly invariant loads. Summary: With this change (plus some changes to prevent !invariant from being clobbered within llvm), clang will be able to model the __ldg CUDA builtin as an invariant load, rather than as a target-specific llvm intrinsic. This will let the optimizer play with these loads -- specifically, we should be able to vectorize them in the load-store vectorizer. Reviewers: tra Subscribers: jholewinski, hfinkel, llvm-commits, chandlerc Differential Revision: https://reviews.llvm.org/D23477 llvm-svn: 281152	2016-09-11 01:39:04 +00:00
Artem Belevich	b2e76a5e7a	[NVPTX] Improve lowering of byval args of device functions. Avoid unnecessary spills of byval arguments of device functions to local space on SASS level and subsequent pointer conversion to generic address space that follows. Instead, make a local copy in IR, provide a way to access arguments directly, and let LLVM optimize the copy away when possible. Differential Review: https://reviews.llvm.org/D21421 llvm-svn: 276153	2016-07-20 18:39:47 +00:00
Artem Belevich	4d5d7be8cc	Revert r273313 "[NVPTX] Improve lowering of byval args of device functions." The change causes llvm crash in some unoptimized builds. llvm-svn: 274163	2016-06-29 20:51:15 +00:00
Artem Belevich	d7ebcfb291	[NVPTX] Improve lowering of byval args of device functions. Avoid unnecessary spills of such vars to local space on SASS level and pointer space conversion. Instead, make a local copy with appropriate addrspacecasts and let LLVM optimize them away when possible. This allows loading value of the argument using [symbol+offset] instead of converting argument to general space pointer and using it for indexing (which also implicitly converts param space pointer to local space one on SASS level and triggers copying of argument into local space in the process). This reduces call overhead, uses less registers and reduces overall SASS size by 2-4%. Differential Review: http://reviews.llvm.org/D21421 llvm-svn: 273313	2016-06-21 20:30:26 +00:00
Justin Bogner	8d83fb6132	SDAG: Implement Select instead of SelectImpl in NVPTXDAGToDAGISel - Where we were returning a node before, call ReplaceNode instead. - Where we would return null to fall back to another selector, rename the method to try* and return a bool for success. Part of llvm.org/pr26808. llvm-svn: 269483	2016-05-13 21:12:53 +00:00
Justin Bogner	b012699741	SDAG: Rename Select->SelectImpl and repurpose Select as returning void This is a step towards removing the rampant undefined behaviour in SelectionDAG, which is a part of llvm.org/PR26808. We rename SelectionDAGISel::Select to SelectImpl and update targets to match, and then change Select to return void and consolidate the sketchy behaviour we're trying to get away from there. Next, we'll update backends to implement `void Select(...)` instead of SelectImpl and eventually drop the base Select implementation. llvm-svn: 268693	2016-05-05 23:19:08 +00:00
Justin Holewinski	9a6ea2c256	[NVPTX] Fix sign/zero-extending ldg/ldu instruction selection Summary: We don't have sign-/zero-extending ldg/ldu instructions defined, so we need to emulate them with explicit CVTs. We were originally handling the i8 case, but not any other cases. Fixes PR26185 Reviewers: jingyue, jlebar Subscribers: jholewinski Differential Revision: http://reviews.llvm.org/D19615 llvm-svn: 268272	2016-05-02 18:12:02 +00:00
Justin Holewinski	c79979299a	[NVPTX] Handle ldg created from sign-/zero-extended load Reviewers: jingyue Subscribers: jholewinski Differential Revision: http://reviews.llvm.org/D18053 llvm-svn: 265389	2016-04-05 12:38:01 +00:00
Bjarke Hammersholt Roune	5cbc7d2999	[NVPTX] Use LDG for pointer induction variables. More specifically, make NVPTXISelDAGToDAG able to emit cached loads (LDG) for pointer induction variables. Also fix latent bug where LDG was not restricted to kernel functions. I believe that this could not be triggered so far since we do not currently infer that a pointer is global outside a kernel function, and only loads of global pointers are considered for cached loads. llvm-svn: 244166	2015-08-05 23:11:57 +00:00
Craig Topper	e3dcce9700	De-constify pointers to Type since they can't be modified. NFC This was already done in most places a while ago. This just fixes the ones that crept in over time. llvm-svn: 243842	2015-08-01 22:20:21 +00:00
Jingyue Wu	48a9bdc6aa	[NVPTX] make load on global readonly memory to use ldg Summary: [NVPTX] make load on global readonly memory to use ldg Summary: As describe in [1], ld.global.nc may be used to load memory by nvcc when __restrict__ is used and compiler can detect whether read-only data cache is safe to use. This patch will try to check whether ldg is safe to use and use them to replace ld.global when possible. This change can improve the performance by 18~29% on affected kernels (ratt_kernel and rwdot_kernel) in S3D benchmark of shoc [2]. Patched by Xuetian Weng. [1] http://docs.nvidia.com/cuda/kepler-tuning-guide/#read-only-data-cache [2] https://github.com/vetter/shoc Test Plan: test/CodeGen/NVPTX/load-with-non-coherent-cache.ll Reviewers: jholewinski, jingyue Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D11314 llvm-svn: 242713	2015-07-20 21:28:54 +00:00
Jingyue Wu	a2f6027a31	[NVPTX] roll forward r239082 NVPTXISelDAGToDAG translates "addrspacecast to param" to NVPTX::nvvm_ptr_gen_to_param Added an llc test in bug21465. llvm-svn: 239100	2015-06-04 21:28:26 +00:00
Sergey Dmitrouk	842a51bad8	Reapply r235977 "[DebugInfo] Add debug locations to constant SD nodes" [DebugInfo] Add debug locations to constant SD nodes This adds debug location to constant nodes of Selection DAG and updates all places that create constants to pass debug locations (see PR13269). Can't guarantee that all locations are correct, but in a lot of cases choice is obvious, so most of them should be. At least all tests pass. Tests for these changes do not cover everything, instead just check it for SDNodes, ARM and AArch64 where it's easy to get incorrect locations on constants. This is not complete fix as FastISel contains workaround for wrong debug locations, which drops locations from instructions on processing constants, but there isn't currently a way to use debug locations from constants there as llvm::Constant doesn't cache it (yet). Although this is a bit different issue, not directly related to these changes. Differential Revision: http://reviews.llvm.org/D9084 llvm-svn: 235989	2015-04-28 14:05:47 +00:00
Daniel Jasper	48e93f7181	Revert "[DebugInfo] Add debug locations to constant SD nodes" This breaks a test: http://bb.pgr.jp/builders/cmake-llvm-x86_64-linux/builds/23870 llvm-svn: 235987	2015-04-28 13:38:35 +00:00
Sergey Dmitrouk	adb4c69d5c	[DebugInfo] Add debug locations to constant SD nodes This adds debug location to constant nodes of Selection DAG and updates all places that create constants to pass debug locations (see PR13269). Can't guarantee that all locations are correct, but in a lot of cases choice is obvious, so most of them should be. At least all tests pass. Tests for these changes do not cover everything, instead just check it for SDNodes, ARM and AArch64 where it's easy to get incorrect locations on constants. This is not complete fix as FastISel contains workaround for wrong debug locations, which drops locations from instructions on processing constants, but there isn't currently a way to use debug locations from constants there as llvm::Constant doesn't cache it (yet). Although this is a bit different issue, not directly related to these changes. Differential Revision: http://reviews.llvm.org/D9084 llvm-svn: 235977	2015-04-28 11:56:37 +00:00
Eli Bendersky	3e84019a39	Simplify boolean expressions with true and false using clang-tidy Patch by Richard (legalize@xmission.com) Differential Revision: http://reviews.llvm.org/D8521 llvm-svn: 232961	2015-03-23 16:26:23 +00:00
Daniel Sanders	60f1db0525	Recommit r232027 with PR22883 fixed: Add infrastructure for support of multiple memory constraints. The operand flag word for ISD::INLINEASM nodes now contains a 15-bit memory constraint ID when the operand kind is Kind_Mem. This constraint ID is a numeric equivalent to the constraint code string and is converted with a target specific hook in TargetLowering. This patch maps all memory constraints to InlineAsm::Constraint_m so there is no functional change at this point. It just proves that using these previously unused bits in the encoding of the flag word doesn't break anything. The next patch will make each target preserve the current mapping of everything to Constraint_m for itself while changing the target independent implementation of the hook to return Constraint_Unknown appropriately. Each target will then be adapted in separate patches to use appropriate Constraint_* values. PR22883 was caused the matching operands copying the whole of the operand flags for the matched operand. This included the constraint id which needed to be replaced with the operand number. This has been fixed with a conversion function. Following on from this, matching operands also used the operand number as the constraint id. This has been fixed by looking up the matched operand and taking it from there. llvm-svn: 232165	2015-03-13 12:45:09 +00:00

1 2

94 Commits