llvm-project

Commit Graph

Author	SHA1	Message	Date
Arthur Eubanks	cc64ece77d	[NFC][OpaquePtr] Avoid using PointerType::getElementType() in VectorUtils.cpp Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D102533	2021-05-17 18:35:44 -07:00
Adam Nemet	fcffd087c6	[Matrix] Fold the transpose into the matmul operand used to fetch scalars For column-major this is: A * B^t whereas for row-major: A^t * B Differential Revision: https://reviews.llvm.org/D101762	2021-05-17 17:40:46 -07:00
Rob Suderman	a91fb4328f	[mlir][tosa] Cleanup of tosa.rescale lowering to linalg Comment was poorly written. Changed to bail on contradictory information in the double round. Reviewed By: NatashaKnk Differential Revision: https://reviews.llvm.org/D102651	2021-05-17 17:31:20 -07:00
Med Ismail Bennani	1b4d5b3bf3	[lldb/API] Use a valid LineEntry object in SBCompileUnit::FindLineEntryIndex This patch updates `SBCompileUnit::FindLineEntryIndex` to pass a valid `LineEntry` pointer to `CompileUnit::FindLineEntry`. This caused `LineTable::FindLineEntryIndexByFileIndexImpl` to return its `best_match` initial value (UINT32_MAX). rdar://78115426 Differential Revision: https://reviews.llvm.org/D102658 Signed-off-by: Med Ismail Bennani <medismail.bennani@gmail.com>	2021-05-18 01:28:53 +01:00
Nico Weber	bc588f9961	[lld/mac] Inline a check `match()` can only return for non-empty vectors, but at least in non-LTO builds that isn't clear to the compiler. Help it out. This is a minor but measurable speedup on my machine (but less than what we might've lost in https://reviews.llvm.org/D100818#2764272 -- bot note higher N on this measurement here, so higher confidence here): % ministat at_main at_branch x at_main + at_branch N Min Max Median Avg Stddev x 30 3.9243979 4.0395119 3.987375 3.9826236 0.027567796 + 30 3.8495831 4.0009291 3.931325 3.9347135 0.037832878 Difference at 95.0% confidence -0.0479101 +/- 0.0171102 -1.20298% +/- 0.429622% (Student's t, pooled s = 0.0331007) No behavior change. Eventually we should apply these lists at symbol parse time instead of every time shouldExportSymbol() though :) Differential Revision: https://reviews.llvm.org/D102655	2021-05-17 20:04:45 -04:00
Philip Reames	6d3e3ae8a9	[LV] Unconditionally branch from middle to scalar preheader if the scalar loop must execute (try 3) Resubmit after fixing test/Transforms/LoopVectorize/ARM/mve-gather-scatter-tailpred.ll Previous commit message... This is a resubmit of 3e5ce4 (which was reverted by `7fe41ac`). The original commit caused a PPC build bot failure we never really got to the bottom of. I can't reproduce the issue, and the bot owner was non-responsive. In the meantime, we stumbled across an issue which seems possibly related, and worked around a latent bug in `80e8025`. My best guess is that the original patch exposed that latent issue at higher frequency, but it really is just a guess. Original commit message follows... If we know that the scalar epilogue is required to run, modify the CFG to end the middle block with an unconditional branch to scalar preheader. This is instead of a conditional branch to either the preheader or the exit block. The motivation to do this is to support multiple exit blocks. Specifically, the current structure forces us to identify immediate dominators and which exit block to branch from in the middle terminator. For the multiple exit case - where we know require scalar will hold - these questions are ill formed. This is the last change needed to support multiple exit loops, but since the diffs are already large enough, I'm going to land this, and then enable separately. You can think of this as being NFCIish prep work, but the changes are a bit too involved for me to feel comfortable tagging the review that way. Differential Revision: https://reviews.llvm.org/D94892	2021-05-17 16:59:25 -07:00
Stanislav Mekhanoshin	45764efb69	[AMDGPU] Do not check denorm for LDS FP atomic with unsafe flag This is already how it is handled for global and flat atomics. Differential Revision: https://reviews.llvm.org/D102366	2021-05-17 16:53:09 -07:00
Philip Reames	d16da7343d	Revert "[LV] Unconditionally branch from middle to scalar preheader if the scalar loop must execute" This reverts commit `c23ce54b36`. I apparently missed some newly added non-x86 tests.	2021-05-17 16:49:32 -07:00
Philip Reames	c23ce54b36	[LV] Unconditionally branch from middle to scalar preheader if the scalar loop must execute This is a resubmit of 3e5ce4 (which was reverted by `7fe41ac`). The original commit caused a PPC build bot failure we never really got to the bottom of. I can't reproduce the issue, and the bot owner was non-responsive. In the meantime, we stumbled across an issue which seems possibly related, and worked around a latent bug in `80e8025`. My best guess is that the original patch exposed that latent issue at higher frequency, but it really is just a guess. Original commit message follows... If we know that the scalar epilogue is required to run, modify the CFG to end the middle block with an unconditional branch to scalar preheader. This is instead of a conditional branch to either the preheader or the exit block. The motivation to do this is to support multiple exit blocks. Specifically, the current structure forces us to identify immediate dominators and which exit block to branch from in the middle terminator. For the multiple exit case - where we know require scalar will hold - these questions are ill formed. This is the last change needed to support multiple exit loops, but since the diffs are already large enough, I'm going to land this, and then enable separately. You can think of this as being NFCIish prep work, but the changes are a bit too involved for me to feel comfortable tagging the review that way. Differential Revision: https://reviews.llvm.org/D94892	2021-05-17 16:33:56 -07:00
Ben Shi	3cf7983cbe	[RISCV][test] Add new tests of or/xor in the zbs extension Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D102625	2021-05-18 07:10:17 +08:00
Ben Shi	b99e2c5616	[clang][AVR] Redefine [u]int16_t to be compatible with avr-gcc Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D102547	2021-05-18 07:06:12 +08:00
Jim Ingham	82a3883715	Revert "Reset the wakeup timeout when we re-enter the continue wait." This reverts commit `bd5751f3d2`. This patch series is causing us to every so often miss switching the state from eStateRunning to eStateStopped when we get the stop packet from the debug server. Reverting till I can figure out how that could be happening.	2021-05-17 15:37:26 -07:00
Scott Linder	a6d3987b8e	[ADT] Add new type traits for type pack indexes Similar versions of these already exist, this effectively just just factors them out into STLExtras. I plan to use these in future patches. Differential Revision: https://reviews.llvm.org/D100672	2021-05-17 22:28:55 +00:00
Scott Linder	af5247c934	[ADT] Factor out in_place_t and expose in Optional ctor Differential Revision: https://reviews.llvm.org/D100671	2021-05-17 22:25:39 +00:00
Eli Friedman	3dd49ec194	[AArch64][SVE] Implement extractelement of i1 vectors. The implementation just extends the vector to a larger element type, and extracts from that. Not fancy, but generates reasonable code. There was discussion in the review of doing the promotion in target-independent code, but I'm sticking with this to avoid making LegalizeDAG infrastructure more complicated. Differential Revision: https://reviews.llvm.org/D87651	2021-05-17 14:51:11 -07:00
Philip Reames	b6320eeb86	Do actual DCE in LoopUnroll (try 3) Recommitting after fixing a bug found post commit. Amusingly, try 1 had been correct, and by reverting to incorporate last minute review feedback, I introduce the bug. Oops. :) The problem was that recursively deleting an instruction can delete instructions beyond the current iterator (via a dead phi), thus invalidating iteration. Test case added in LoopUnroll/dce.ll to cover this case. LoopUnroll does a limited DCE pass after unrolling, but if you have a chain of dead instructions, it only deletes the last one. Improve the code to recursively delete all trivially dead instructions. Differential Revision: https://reviews.llvm.org/D102511	2021-05-17 14:47:02 -07:00
Arthur Eubanks	ceb1ac9812	[test] Free triple in PassBuilderBindingsTest	2021-05-17 13:58:16 -07:00
Heejin Ahn	6e1c1dac4c	[WebAssembly] Nullify DBG_VALUE_LISTs in DebugValueManager WebAssemblyDebugValueManager class currently does not handle DBG_VALUE_LIST instructions correctly for two reasons, which are explained in https://bugs.llvm.org/show_bug.cgi?id=50361. This effectively nullifies DBG_VALUE_LISTs in WebAssemblyDebugValueManager so that the info will appear as "optimized out" in debuggers but still be at least correct in the meantime. Reviewed By: dschuff, jmorse Differential Revision: https://reviews.llvm.org/D102589	2021-05-17 13:47:36 -07:00
River Riddle	e2e1a78abc	[mlir][NFC] Remove stale `createLowerAffinePass` declaration This pass isn't defined in the Transforms/ library anymore.	2021-05-17 13:25:31 -07:00
Jinsong Ji	82b5281247	[Driver][test] Don't assume integrated-as The tests of fdebug-compilation-dir and -ffile-compilation-dir for `-x assembler` are assuming integrated-as. If the platform set the no-itegrated-as by default (eg: AIX for now), then this test will fail. Add the -integrated-as to aviod relying on the platform defaults. Reviewed By: thakis Differential Revision: https://reviews.llvm.org/D102647	2021-05-17 20:24:21 +00:00
Eli Friedman	698568b74c	[clang CodeGen] Don't crash on large atomic function parameter. I wouldn't recommend writing code like the testcase; a function parameter isn't atomic, so using an atomic type doesn't really make sense. But it's valid, so clang shouldn't crash on it. The code was assuming hasAggregateEvaluationKind(Ty) implies Ty is a RecordType, which isn't true. Just use isRecordType() instead. Differential Revision: https://reviews.llvm.org/D102015	2021-05-17 13:18:23 -07:00
Markus Böck	65271ffe84	[lld][MinGW] Introduce aliases for -Bdynamic and -Bstatic Besides -Bdynamic and -Bstatic, ld documents additional aliases for both of these options. Instead of -Bstatic, one may write -dn, -non_shared or -static. Instead of -Bdynamic one may write -dy or -call_shared. Source: https://sourceware.org/binutils/docs-2.36/ld/Options.html This patch adds those aliases to the MinGW driver of lld for the sake of ld compatibility. Encountered this case while compiling a static Qt 6.1 distribution and got build failures as -static was passed directly to the linker, instead of through the compiler driver. Differential Revision: https://reviews.llvm.org/D102637	2021-05-17 22:13:26 +02:00
Dave Lee	02286d96db	[lldb] Document ctrl-f for completing show-autosuggestion Document how to complete command line suggestions provided by `show-autosuggestion`. Differential Revision: https://reviews.llvm.org/D102544	2021-05-17 12:52:20 -07:00
Mitch Phillips	6791a6b309	Revert "X86: support Swift Async context" This reverts commit `747e5cfb9f`. Reason: New frame layout broke the sanitizer unwinder. Not clear why, but seems like some of the changes aren't always guarded by Swyft checks. See https://reviews.llvm.org/rG747e5cfb9f5d944b47fe014925b0d5dc2fda74d7 for more information.	2021-05-17 12:44:57 -07:00
Vitaly Buka	1eb78a64c4	[NFC][scudo] Clang-format tests	2021-05-17 12:31:09 -07:00
Arthur Eubanks	3a0b6dc3e8	Revert "[Clang] -Wunused-but-set-parameter and -Wunused-but-set-variable" This reverts commit `14dfb3831c`. More false positives, see D100581.	2021-05-17 12:16:10 -07:00
LLVM GN Syncbot	11c857c71d	[gn build] Port `0c557db617`	2021-05-17 18:56:03 +00:00
Sanjay Patel	3cdd05e519	[InstCombine] fold fnegs around select This is one of the folds requested in: https://llvm.org/PR39480 https://alive2.llvm.org/ce/z/NczU3V Note - this uses the normal FMF propagation logic (flags transfer from the final value to new/intermediate ops). It's not clear if this matches what Alive2 implements, so we may want to adjust one or the other.	2021-05-17 14:53:49 -04:00
Sanjay Patel	e9f600f20a	[InstCombine] add tests for fneg-of-select; NFC	2021-05-17 14:53:48 -04:00
Nick Desaulniers	0f41778919	[AArch64] Support customizing stack protector guard Follow up to D88631 but for aarch64; the Linux kernel uses the command line flags: 1. -mstack-protector-guard=sysreg 2. -mstack-protector-guard-reg=sp_el0 3. -mstack-protector-guard-offset=0 to use the system register sp_el0 for the stack canary, enabling the kernel to have a unique stack canary per task (like a thread, but not limited to userspace as the kernel can preempt itself). Address pr/47341 for aarch64. Fixes: https://github.com/ClangBuiltLinux/linux/issues/289 Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed By: xiangzhangllvm, DavidSpickett, dmgreen Differential Revision: https://reviews.llvm.org/D100919	2021-05-17 11:49:22 -07:00
Peter Collingbourne	c870e36be1	gn build: Only build the hwasan runtime in aliasing mode on x86. The LAM mode is currently untested by check-hwasan, so we only need to build the runtime in aliasing mode. Because LAM mode will always need to be conditional (because only certain hardware will support it) we can always just disable the LAM lit tests if it ever starts being tested.	2021-05-17 11:48:49 -07:00
Jacques Pienaar	24bf554b10	Add type function for ConstShape op. - Enables inferring return type for ConstShape, takes into account valid return types; - The compatible return type function could be reused, leaving that for next use refactoring; Differential Revision: https://reviews.llvm.org/D102182	2021-05-17 11:47:19 -07:00
Mats Larsen	0c557db617	[NewPM] Add C bindings for new pass manager This patch contains the bare minimum to run the new Pass Manager from the LLVM-C APIs. It does not feature PGOOptions, PassPlugins or Debugify in its current state. Bugzilla: PR48499 Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D102136	2021-05-17 11:45:47 -07:00
Aart Bik	5879da496c	[mlir][sparse] replace experimental flag with inplace attribute The experimental flag for "inplace" bufferization in the sparse compiler can be replaced with the new inplace attribute. This gives a uniform way of expressing the more efficient way of bufferization. Reviewed By: bixia Differential Revision: https://reviews.llvm.org/D102538	2021-05-17 11:43:44 -07:00
Nico Weber	4a12248ee2	[lld/mac] Honor REFERENCED_DYAMICALLY, set it on __mh_execute_header Has the effect that `__mh_execute_header` stays in the symbol table of outputs even after running `strip` on the output. I don't know if that's important for anything -- my motivation for the patch is just is to make the output more similar to ld64. (Corresponds to symbolTableInAndNeverStrip in ld64.) Differential Revision: https://reviews.llvm.org/D102619	2021-05-17 14:22:12 -04:00
Chris Lattner	648f34a284	Merge with mainline. Differential Revision: https://reviews.llvm.org/D102636	2021-05-17 11:15:10 -07:00
Abbas Sabra	ebcf030efc	[analyzer] Engine: fix crash with SEH __leave keyword MSVC has a `try-except` statement. This statement could containt a `__leave` keyword, which is similar to `goto` to the end of the try block. The semantic of this keyword is not implemented. We should at least parse such code without crashing. https://docs.microsoft.com/en-us/cpp/cpp/try-except-statement?view=msvc-160 Patch By: AbbasSabra! Reviewed By: steakhal Differential Revision: https://reviews.llvm.org/D102280	2021-05-17 20:10:26 +02:00
Michael Benfield	14dfb3831c	[Clang] -Wunused-but-set-parameter and -Wunused-but-set-variable These are intended to mimic warnings available in gcc. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D100581	2021-05-17 11:02:26 -07:00
Nico Weber	0b33977872	Revert "[NewPM] Add C bindings for new pass manager" This reverts commit `cd220a0678`. Doesn't build.	2021-05-17 13:59:12 -04:00
Jim Ingham	bd5751f3d2	Reset the wakeup timeout when we re-enter the continue wait. Differential Revision: https://reviews.llvm.org/D102562	2021-05-17 10:49:47 -07:00
Mats Larsen	cd220a0678	[NewPM] Add C bindings for new pass manager This patch contains the bare minimum to run the new Pass Manager from the LLVM-C APIs. It does not feature PGOOptions, PassPlugins or Debugify in its current state. Bugzilla: PR48499 Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D102136	2021-05-17 10:48:45 -07:00
Shafik Yaghmour	2182eda306	[LLDB] Switch from using member_clang_type.GetByteSize() to member_type->GetByteSize() in ParseSingleMember We have a bug in which using member_clang_type.GetByteSize() triggers record layout and during this process since the record was not yet complete we ended up reaching a record that had not been layed out yet. Using member_type->GetByteSize() avoids this situation since it relies on size from DWARF and will not trigger record layout. For reference: rdar://77293040 Differential Revision: https://reviews.llvm.org/D102445	2021-05-17 10:36:35 -07:00
Roman Lebedev	0633d5ce7b	[LoopIdiom] 'logical right-shift until zero' ('count active bits') "on steroids" idiom recognition. I think i've added exhaustive test coverage, and i have verified that alive2 is happy with all the tests, so in principle i'm fine with landing this without review, but just in case.. This adds support for the "count active bits" pattern, i.e.: ``` int countActiveBits(unsigned val) { int cnt = 0; for( ; (val >> cnt) != 0; ++cnt) ; return cnt; } ``` but a somewhat more general one, since that is what i need: ``` int countActiveBits(unsigned val, int start, int off) { int cnt; for (cnt = start; val >> (cnt + off); cnt++) ; return cnt; } ``` I've followed in footstep of 'left-shift until bittest' idiom (D91038), in the sense that iff the `ctlz` intrinsic is cheap, we'll transform, regardless of all other factors. This can have a shocking effect on certain benchmarks: ``` raw.pixls.us-unique/Olympus/XZ-1$ /repositories/googlebenchmark/tools/compare.py -a benchmarks ~/rawspeed/build-{old,new}/src/utilities/rsbench/rsbench --benchmark_counters_tabular=true --benchmark_min_time=0.00000001 --benchmark_repetitions=128 p1319978.orf RUNNING: /home/lebedevri/rawspeed/build-old/src/utilities/rsbench/rsbench --benchmark_counters_tabular=true --benchmark_min_time=0.00000001 --benchmark_repetitions=128 p1319978.orf --benchmark_display_aggregates_only=true --benchmark_out=/tmp/tmp49_28zcm 2021-05-09T01:06:05+03:00 Running /home/lebedevri/rawspeed/build-old/src/utilities/rsbench/rsbench Run on (32 X 3600.24 MHz CPU s) CPU Caches: L1 Data 32 KiB (x16) L1 Instruction 32 KiB (x16) L2 Unified 512 KiB (x16) L3 Unified 32768 KiB (x2) Load Average: 5.26, 6.29, 3.49 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Benchmark Time CPU Iterations CPUTime,s CPUTime/WallTime Pixels Pixels/CPUTime Pixels/WallTime Raws/CPUTime Raws/WallTime WallTime,s ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ p1319978.orf/threads:32/process_time/real_time_mean 145 ms 145 ms 128 0.145319 0.999981 10.1568M 69.8949M 69.8936M 6.88159 6.88146 0.145322 p1319978.orf/threads:32/process_time/real_time_median 145 ms 145 ms 128 0.145317 0.999986 10.1568M 69.8941M 69.8931M 6.88151 6.88141 0.145319 p1319978.orf/threads:32/process_time/real_time_stddev 0.766 ms 0.766 ms 128 766.586u 15.1302u 0 354.167k 354.098k 0.0348699 0.0348631 766.469u RUNNING: /home/lebedevri/rawspeed/build-new/src/utilities/rsbench/rsbench --benchmark_counters_tabular=true --benchmark_min_time=0.00000001 --benchmark_repetitions=128 p1319978.orf --benchmark_display_aggregates_only=true --benchmark_out=/tmp/tmpwb9sw2x0 2021-05-09T01:06:24+03:00 Running /home/lebedevri/rawspeed/build-new/src/utilities/rsbench/rsbench Run on (32 X 3599.95 MHz CPU s) CPU Caches: L1 Data 32 KiB (x16) L1 Instruction 32 KiB (x16) L2 Unified 512 KiB (x16) L3 Unified 32768 KiB (x2) Load Average: 4.05, 5.95, 3.43 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Benchmark Time CPU Iterations CPUTime,s CPUTime/WallTime Pixels Pixels/CPUTime Pixels/WallTime Raws/CPUTime Raws/WallTime WallTime,s ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ p1319978.orf/threads:32/process_time/real_time_mean 99.8 ms 99.8 ms 128 0.0997758 0.999972 10.1568M 101.797M 101.794M 10.0225 10.0222 0.0997786 p1319978.orf/threads:32/process_time/real_time_median 99.7 ms 99.7 ms 128 0.0997165 0.999985 10.1568M 101.857M 101.854M 10.0284 10.0281 0.0997195 p1319978.orf/threads:32/process_time/real_time_stddev 0.224 ms 0.224 ms 128 224.166u 34.345u 0 226.81k 227.231k 0.0223309 0.0223723 224.586u Comparing /home/lebedevri/rawspeed/build-old/src/utilities/rsbench/rsbench to /home/lebedevri/rawspeed/build-new/src/utilities/rsbench/rsbench Benchmark Time CPU Time Old Time New CPU Old CPU New ---------------------------------------------------------------------------------------------------------------------------------------------------- p1319978.orf/threads:32/process_time/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 128 vs 128 p1319978.orf/threads:32/process_time/real_time_mean -0.3134 -0.3134 145 100 145 100 p1319978.orf/threads:32/process_time/real_time_median -0.3138 -0.3138 145 100 145 100 p1319978.orf/threads:32/process_time/real_time_stddev -0.7073 -0.7078 1 0 1 0 ``` Reviewed By: craig.topper, zhuhan0 Differential Revision: https://reviews.llvm.org/D102116	2021-05-17 20:33:33 +03:00
Matt Morehouse	d97bab6511	[HWASan] Don't build alias mode on non-x86. Alias mode is not expected work on non-x86, so don't build it there. Should fix the aarch64 bot.	2021-05-17 10:32:16 -07:00
Mehdi Amini	43f6e04258	Make `mlir::OpState::operator bool` explicit This change makes the conversion of an mlir::OpState to bool `explicit`. Idiomatic boolean uses continue to work as before, but questionable implicit uses (e.g. accumulating over a range of OpStates to count "true" states) become ill-formed. This makes the class interface a lilttle less error-prone. I tested this change on our internal (fairly large) codebase, and only one fix was needed, which was ultimately an improvement of the affected code. Reviewed By: rriddle, mehdi_amini Differential Revision: https://reviews.llvm.org/D101989	2021-05-17 17:29:25 +00:00
Yaxun (Sam) Liu	18cb17ce4c	[HIP] Fix spack detection Missing or duplicate spack package should not cause error, since users may only installed llvm/clang package, or users may installed duplicate HIP package but will use environment variable or compiler option to choose HIP path. The message about missing or duplicate spack package is informational, therefore should be emitted only when -v is specified. Reviewed by: Artem Belevich Differential Revision: https://reviews.llvm.org/D102556	2021-05-17 13:24:05 -04:00
Rob Suderman	08068ddba7	[mlir][tosa] Fix tosa.avg_pool2d lowering to normalize correctly Initial version of pooling assumed normalization was accross all elements equally. TOSA actually requires the noramalization is perform by how many elements were summed (edges are not artifically dimmer). Updated the lowering to reflect this change with corresponding tests. Reviewed By: NatashaKnk Differential Revision: https://reviews.llvm.org/D102540	2021-05-17 10:00:43 -07:00
Steffen Larsen	f226e28a88	[Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX redux.sync instructions Adds NVPTX builtins and intrinsics for the CUDA PTX `redux.sync` instructions for `sm_80` architecture or newer. PTX ISA description of `redux.sync`: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-redux-sync Authored-by: Steffen Larsen <steffen.larsen@codeplay.com> Differential Revision: https://reviews.llvm.org/D100124	2021-05-17 09:46:59 -07:00
Stuart Adams	02c2468864	[Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX cp.async instructions Adds NVPTX builtins and intrinsics for the CUDA PTX `cp.async` instructions for `sm_80` architecture or newer. PTX ISA description of `cp.async`: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#data-movement-and-conversion-instructions-asynchronous-copy https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-cp-async-mbarrier-arrive Authored-by: Stuart Adams <stuart.adams@codeplay.com> Co-Authored-by: Alexander Johnston <alexander@codeplay.com> Differential Revision: https://reviews.llvm.org/D100394	2021-05-17 09:46:59 -07:00
Alex Zinenko	1417ddafdb	[llvm][doc] fix header for read/write_register intrinsics in LangRef Mutli-line headers are not allowed in RST, reformat the header to be a single wide line.	2021-05-17 18:38:16 +02:00

1 2 3 4 5 ...

388758 Commits All Branches Search

388758 Commits

All Branches