Commit Graph

388758 Commits

Author SHA1 Message Date
Arthur Eubanks cc64ece77d [NFC][OpaquePtr] Avoid using PointerType::getElementType() in VectorUtils.cpp
Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D102533
2021-05-17 18:35:44 -07:00
Adam Nemet fcffd087c6 [Matrix] Fold the transpose into the matmul operand used to fetch scalars
For column-major this is:
  A * B^t
whereas for row-major:
  A^t * B

Differential Revision: https://reviews.llvm.org/D101762
2021-05-17 17:40:46 -07:00
Rob Suderman a91fb4328f [mlir][tosa] Cleanup of tosa.rescale lowering to linalg
Comment was poorly written. Changed to bail on contradictory information in
the double round.

Reviewed By: NatashaKnk

Differential Revision: https://reviews.llvm.org/D102651
2021-05-17 17:31:20 -07:00
Med Ismail Bennani 1b4d5b3bf3 [lldb/API] Use a valid LineEntry object in SBCompileUnit::FindLineEntryIndex
This patch updates `SBCompileUnit::FindLineEntryIndex` to pass a valid
`LineEntry` pointer to `CompileUnit::FindLineEntry`.

This caused `LineTable::FindLineEntryIndexByFileIndexImpl` to return its
`best_match` initial value (UINT32_MAX).

rdar://78115426

Differential Revision: https://reviews.llvm.org/D102658

Signed-off-by: Med Ismail Bennani <medismail.bennani@gmail.com>
2021-05-18 01:28:53 +01:00
Nico Weber bc588f9961 [lld/mac] Inline a check
`match()` can only return for non-empty vectors, but at least in
non-LTO builds that isn't clear to the compiler. Help it out.
This is a minor but measurable speedup on my machine (but less
than what we might've lost in https://reviews.llvm.org/D100818#2764272 --
bot note higher N on this measurement here, so higher confidence here):

    % ministat at_main at_branch
    x at_main
    + at_branch
        N           Min           Max        Median           Avg        Stddev
    x  30     3.9243979     4.0395119      3.987375     3.9826236   0.027567796
    +  30     3.8495831     4.0009291      3.931325     3.9347135   0.037832878
    Difference at 95.0% confidence
            -0.0479101 +/- 0.0171102
            -1.20298% +/- 0.429622%
            (Student's t, pooled s = 0.0331007)

No behavior change.

Eventually we should apply these lists at symbol parse time instead of
every time shouldExportSymbol() though :)

Differential Revision: https://reviews.llvm.org/D102655
2021-05-17 20:04:45 -04:00
Philip Reames 6d3e3ae8a9 [LV] Unconditionally branch from middle to scalar preheader if the scalar loop must execute (try 3)
Resubmit after fixing test/Transforms/LoopVectorize/ARM/mve-gather-scatter-tailpred.ll

Previous commit message...

This is a resubmit of 3e5ce4 (which was reverted by 7fe41ac).  The original commit caused a PPC build bot failure we never really got to the bottom of.  I can't reproduce the issue, and the bot owner was non-responsive.  In the meantime, we stumbled across an issue which seems possibly related, and worked around a latent bug in 80e8025.  My best guess is that the original patch exposed that latent issue at higher frequency, but it really is just a guess.

Original commit message follows...

If we know that the scalar epilogue is required to run, modify the CFG to end the middle block with an unconditional branch to scalar preheader. This is instead of a conditional branch to either the preheader or the exit block.

The motivation to do this is to support multiple exit blocks. Specifically, the current structure forces us to identify immediate dominators and *which* exit block to branch from in the middle terminator. For the multiple exit case - where we know require scalar will hold - these questions are ill formed.

This is the last change needed to support multiple exit loops, but since the diffs are already large enough, I'm going to land this, and then enable separately. You can think of this as being NFCIish prep work, but the changes are a bit too involved for me to feel comfortable tagging the review that way.

Differential Revision: https://reviews.llvm.org/D94892
2021-05-17 16:59:25 -07:00
Stanislav Mekhanoshin 45764efb69 [AMDGPU] Do not check denorm for LDS FP atomic with unsafe flag
This is already how it is handled for global and flat atomics.

Differential Revision: https://reviews.llvm.org/D102366
2021-05-17 16:53:09 -07:00
Philip Reames d16da7343d Revert "[LV] Unconditionally branch from middle to scalar preheader if the scalar loop must execute"
This reverts commit c23ce54b36.  I apparently missed some newly added non-x86 tests.
2021-05-17 16:49:32 -07:00
Philip Reames c23ce54b36 [LV] Unconditionally branch from middle to scalar preheader if the scalar loop must execute
This is a resubmit of 3e5ce4 (which was reverted by 7fe41ac).  The original commit caused a PPC build bot failure we never really got to the bottom of.  I can't reproduce the issue, and the bot owner was non-responsive.  In the meantime, we stumbled across an issue which seems possibly related, and worked around a latent bug in 80e8025.  My best guess is that the original patch exposed that latent issue at higher frequency, but it really is just a guess.

Original commit message follows...

If we know that the scalar epilogue is required to run, modify the CFG to end the middle block with an unconditional branch to scalar preheader. This is instead of a conditional branch to either the preheader or the exit block.

The motivation to do this is to support multiple exit blocks. Specifically, the current structure forces us to identify immediate dominators and *which* exit block to branch from in the middle terminator. For the multiple exit case - where we know require scalar will hold - these questions are ill formed.

This is the last change needed to support multiple exit loops, but since the diffs are already large enough, I'm going to land this, and then enable separately. You can think of this as being NFCIish prep work, but the changes are a bit too involved for me to feel comfortable tagging the review that way.

Differential Revision: https://reviews.llvm.org/D94892
2021-05-17 16:33:56 -07:00
Ben Shi 3cf7983cbe [RISCV][test] Add new tests of or/xor in the zbs extension
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D102625
2021-05-18 07:10:17 +08:00
Ben Shi b99e2c5616 [clang][AVR] Redefine [u]int16_t to be compatible with avr-gcc
Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D102547
2021-05-18 07:06:12 +08:00
Jim Ingham 82a3883715 Revert "Reset the wakeup timeout when we re-enter the continue wait."
This reverts commit bd5751f3d2.
This patch series is causing us to every so often miss switching
the state from eStateRunning to eStateStopped when we get the stop
packet from the debug server.

Reverting till I can figure out how that could be happening.
2021-05-17 15:37:26 -07:00
Scott Linder a6d3987b8e [ADT] Add new type traits for type pack indexes
Similar versions of these already exist, this effectively just just
factors them out into STLExtras. I plan to use these in future patches.

Differential Revision: https://reviews.llvm.org/D100672
2021-05-17 22:28:55 +00:00
Scott Linder af5247c934 [ADT] Factor out in_place_t and expose in Optional ctor
Differential Revision: https://reviews.llvm.org/D100671
2021-05-17 22:25:39 +00:00
Eli Friedman 3dd49ec194 [AArch64][SVE] Implement extractelement of i1 vectors.
The implementation just extends the vector to a larger element type, and
extracts from that.  Not fancy, but generates reasonable code.

There was discussion in the review of doing the promotion in
target-independent code, but I'm sticking with this to avoid making
LegalizeDAG infrastructure more complicated.

Differential Revision: https://reviews.llvm.org/D87651
2021-05-17 14:51:11 -07:00
Philip Reames b6320eeb86 Do actual DCE in LoopUnroll (try 3)
Recommitting after fixing a bug found post commit.  Amusingly, try 1 had been correct, and by reverting to incorporate last minute review feedback, I introduce the bug.  Oops.  :)

The problem was that recursively deleting an instruction can delete instructions beyond the current iterator (via a dead phi), thus invalidating iteration.  Test case added in LoopUnroll/dce.ll to cover this case.

LoopUnroll does a limited DCE pass after unrolling, but if you have a chain of dead instructions, it only deletes the last one. Improve the code to recursively delete all trivially dead instructions.

Differential Revision: https://reviews.llvm.org/D102511
2021-05-17 14:47:02 -07:00
Arthur Eubanks ceb1ac9812 [test] Free triple in PassBuilderBindingsTest 2021-05-17 13:58:16 -07:00
Heejin Ahn 6e1c1dac4c [WebAssembly] Nullify DBG_VALUE_LISTs in DebugValueManager
WebAssemblyDebugValueManager class currently does not handle
DBG_VALUE_LIST instructions correctly for two reasons, which are
explained in https://bugs.llvm.org/show_bug.cgi?id=50361.

This effectively nullifies DBG_VALUE_LISTs in
WebAssemblyDebugValueManager so that the info will appear as "optimized
out" in debuggers but still be at least correct in the meantime.

Reviewed By: dschuff, jmorse

Differential Revision: https://reviews.llvm.org/D102589
2021-05-17 13:47:36 -07:00
River Riddle e2e1a78abc [mlir][NFC] Remove stale `createLowerAffinePass` declaration
This pass isn't defined in the Transforms/ library anymore.
2021-05-17 13:25:31 -07:00
Jinsong Ji 82b5281247 [Driver][test] Don't assume integrated-as
The tests of fdebug-compilation-dir and -ffile-compilation-dir for `-x
assembler` are assuming integrated-as.
If the platform set the no-itegrated-as by default (eg: AIX for now), then this test will
fail.

Add the -integrated-as to aviod relying on the platform defaults.

Reviewed By: thakis

Differential Revision: https://reviews.llvm.org/D102647
2021-05-17 20:24:21 +00:00
Eli Friedman 698568b74c [clang CodeGen] Don't crash on large atomic function parameter.
I wouldn't recommend writing code like the testcase; a function
parameter isn't atomic, so using an atomic type doesn't really make
sense.  But it's valid, so clang shouldn't crash on it.

The code was assuming hasAggregateEvaluationKind(Ty) implies Ty is a
RecordType, which isn't true.  Just use isRecordType() instead.

Differential Revision: https://reviews.llvm.org/D102015
2021-05-17 13:18:23 -07:00
Markus Böck 65271ffe84 [lld][MinGW] Introduce aliases for -Bdynamic and -Bstatic
Besides -Bdynamic and -Bstatic, ld documents additional aliases for both of these options. Instead of -Bstatic, one may write -dn, -non_shared or -static. Instead of -Bdynamic one may write -dy or -call_shared. Source: https://sourceware.org/binutils/docs-2.36/ld/Options.html

This patch adds those aliases to the MinGW driver of lld for the sake of ld compatibility.

Encountered this case while compiling a static Qt 6.1 distribution and got build failures as -static was passed directly to the linker, instead of through the compiler driver.

Differential Revision: https://reviews.llvm.org/D102637
2021-05-17 22:13:26 +02:00
Dave Lee 02286d96db [lldb] Document ctrl-f for completing show-autosuggestion
Document how to complete command line suggestions provided by `show-autosuggestion`.

Differential Revision: https://reviews.llvm.org/D102544
2021-05-17 12:52:20 -07:00
Mitch Phillips 6791a6b309 Revert "X86: support Swift Async context"
This reverts commit 747e5cfb9f.

Reason: New frame layout broke the sanitizer unwinder. Not clear why,
but seems like some of the changes aren't always guarded by Swyft
checks. See
https://reviews.llvm.org/rG747e5cfb9f5d944b47fe014925b0d5dc2fda74d7 for
more information.
2021-05-17 12:44:57 -07:00
Vitaly Buka 1eb78a64c4 [NFC][scudo] Clang-format tests 2021-05-17 12:31:09 -07:00
Arthur Eubanks 3a0b6dc3e8 Revert "[Clang] -Wunused-but-set-parameter and -Wunused-but-set-variable"
This reverts commit 14dfb3831c.

More false positives, see D100581.
2021-05-17 12:16:10 -07:00
LLVM GN Syncbot 11c857c71d [gn build] Port 0c557db617 2021-05-17 18:56:03 +00:00
Sanjay Patel 3cdd05e519 [InstCombine] fold fnegs around select
This is one of the folds requested in:
https://llvm.org/PR39480

https://alive2.llvm.org/ce/z/NczU3V

Note - this uses the normal FMF propagation logic
(flags transfer from the final value to new/intermediate ops).
It's not clear if this matches what Alive2 implements,
so we may want to adjust one or the other.
2021-05-17 14:53:49 -04:00
Sanjay Patel e9f600f20a [InstCombine] add tests for fneg-of-select; NFC 2021-05-17 14:53:48 -04:00
Nick Desaulniers 0f41778919 [AArch64] Support customizing stack protector guard
Follow up to D88631 but for aarch64; the Linux kernel uses the command
line flags:

1. -mstack-protector-guard=sysreg
2. -mstack-protector-guard-reg=sp_el0
3. -mstack-protector-guard-offset=0

to use the system register sp_el0 for the stack canary, enabling the
kernel to have a unique stack canary per task (like a thread, but not
limited to userspace as the kernel can preempt itself).

Address pr/47341 for aarch64.

Fixes: https://github.com/ClangBuiltLinux/linux/issues/289
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>

Reviewed By: xiangzhangllvm, DavidSpickett, dmgreen

Differential Revision: https://reviews.llvm.org/D100919
2021-05-17 11:49:22 -07:00
Peter Collingbourne c870e36be1 gn build: Only build the hwasan runtime in aliasing mode on x86.
The LAM mode is currently untested by check-hwasan, so we only need
to build the runtime in aliasing mode. Because LAM mode will always
need to be conditional (because only certain hardware will support
it) we can always just disable the LAM lit tests if it ever starts
being tested.
2021-05-17 11:48:49 -07:00
Jacques Pienaar 24bf554b10 Add type function for ConstShape op.
- Enables inferring return type for ConstShape, takes into account valid return types;
- The compatible return type function could be reused, leaving that for next use refactoring;

Differential Revision: https://reviews.llvm.org/D102182
2021-05-17 11:47:19 -07:00
Mats Larsen 0c557db617 [NewPM] Add C bindings for new pass manager
This patch contains the bare minimum to run the new Pass Manager from the LLVM-C APIs. It does not feature PGOOptions, PassPlugins or Debugify in its current state. Bugzilla: PR48499

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D102136
2021-05-17 11:45:47 -07:00
Aart Bik 5879da496c [mlir][sparse] replace experimental flag with inplace attribute
The experimental flag for "inplace" bufferization in the sparse
compiler can be replaced with the new inplace attribute. This gives
a uniform way of expressing the more efficient way of bufferization.

Reviewed By: bixia

Differential Revision: https://reviews.llvm.org/D102538
2021-05-17 11:43:44 -07:00
Nico Weber 4a12248ee2 [lld/mac] Honor REFERENCED_DYAMICALLY, set it on __mh_execute_header
Has the effect that `__mh_execute_header` stays in the symbol table of
outputs even after running `strip` on the output. I don't know if that's
important for anything -- my motivation for the patch is just is to make
the output more similar to ld64.

(Corresponds to symbolTableInAndNeverStrip in ld64.)

Differential Revision: https://reviews.llvm.org/D102619
2021-05-17 14:22:12 -04:00
Chris Lattner 648f34a284 Merge with mainline.
Differential Revision: https://reviews.llvm.org/D102636
2021-05-17 11:15:10 -07:00
Abbas Sabra ebcf030efc [analyzer] Engine: fix crash with SEH __leave keyword
MSVC has a `try-except` statement.
This statement could containt a `__leave` keyword, which is similar to
`goto` to the end of the try block. The semantic of this keyword is not
implemented.

We should at least parse such code without crashing.

https://docs.microsoft.com/en-us/cpp/cpp/try-except-statement?view=msvc-160

Patch By: AbbasSabra!

Reviewed By: steakhal

Differential Revision: https://reviews.llvm.org/D102280
2021-05-17 20:10:26 +02:00
Michael Benfield 14dfb3831c [Clang] -Wunused-but-set-parameter and -Wunused-but-set-variable
These are intended to mimic warnings available in gcc.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D100581
2021-05-17 11:02:26 -07:00
Nico Weber 0b33977872 Revert "[NewPM] Add C bindings for new pass manager"
This reverts commit cd220a0678.
Doesn't build.
2021-05-17 13:59:12 -04:00
Jim Ingham bd5751f3d2 Reset the wakeup timeout when we re-enter the continue wait.
Differential Revision: https://reviews.llvm.org/D102562
2021-05-17 10:49:47 -07:00
Mats Larsen cd220a0678 [NewPM] Add C bindings for new pass manager
This patch contains the bare minimum to run the new Pass Manager from the LLVM-C APIs. It does not feature PGOOptions, PassPlugins or Debugify in its current state. Bugzilla: PR48499

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D102136
2021-05-17 10:48:45 -07:00
Shafik Yaghmour 2182eda306 [LLDB] Switch from using member_clang_type.GetByteSize() to member_type->GetByteSize() in ParseSingleMember
We have a bug in which using member_clang_type.GetByteSize() triggers record
layout and during this process since the record was not yet complete we ended
up reaching a record that had not been layed out yet.
Using member_type->GetByteSize() avoids this situation since it relies on size
from DWARF and will not trigger record layout.

For reference: rdar://77293040

Differential Revision: https://reviews.llvm.org/D102445
2021-05-17 10:36:35 -07:00
Roman Lebedev 0633d5ce7b
[LoopIdiom] 'logical right-shift until zero' ('count active bits') "on steroids" idiom recognition.
I think i've added exhaustive test coverage, and i have verified that alive2 is happy with all the tests,
so in principle i'm fine with landing this without review, but just in case..

This adds support for the "count active bits" pattern, i.e.:
```
int countActiveBits(unsigned val) {
    int cnt = 0;
    for( ; (val >> cnt) != 0; ++cnt)
        ;
    return cnt;
}
```
but a somewhat more general one, since that is what i need:
```
int countActiveBits(unsigned val, int start, int off) {
    int cnt;
    for (cnt = start; val >> (cnt + off); cnt++)
        ;
    return cnt;
}
```

I've followed in footstep of 'left-shift until bittest' idiom (D91038),
in the sense that iff the `ctlz` intrinsic is cheap, we'll transform,
regardless of all other factors.

This can have a shocking effect on certain benchmarks:
```
raw.pixls.us-unique/Olympus/XZ-1$ /repositories/googlebenchmark/tools/compare.py -a benchmarks ~/rawspeed/build-{old,new}/src/utilities/rsbench/rsbench --benchmark_counters_tabular=true --benchmark_min_time=0.00000001 --benchmark_repetitions=128 p1319978.orf
RUNNING: /home/lebedevri/rawspeed/build-old/src/utilities/rsbench/rsbench --benchmark_counters_tabular=true --benchmark_min_time=0.00000001 --benchmark_repetitions=128 p1319978.orf --benchmark_display_aggregates_only=true --benchmark_out=/tmp/tmp49_28zcm
2021-05-09T01:06:05+03:00
Running /home/lebedevri/rawspeed/build-old/src/utilities/rsbench/rsbench
Run on (32 X 3600.24 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x16)
  L1 Instruction 32 KiB (x16)
  L2 Unified 512 KiB (x16)
  L3 Unified 32768 KiB (x2)
Load Average: 5.26, 6.29, 3.49
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                                      Time             CPU   Iterations  CPUTime,s CPUTime/WallTime     Pixels Pixels/CPUTime Pixels/WallTime Raws/CPUTime Raws/WallTime WallTime,s
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
p1319978.orf/threads:32/process_time/real_time_mean          145 ms          145 ms          128   0.145319         0.999981   10.1568M       69.8949M        69.8936M      6.88159       6.88146   0.145322
p1319978.orf/threads:32/process_time/real_time_median        145 ms          145 ms          128   0.145317         0.999986   10.1568M       69.8941M        69.8931M      6.88151       6.88141   0.145319
p1319978.orf/threads:32/process_time/real_time_stddev      0.766 ms        0.766 ms          128   766.586u         15.1302u          0       354.167k        354.098k    0.0348699     0.0348631   766.469u
RUNNING: /home/lebedevri/rawspeed/build-new/src/utilities/rsbench/rsbench --benchmark_counters_tabular=true --benchmark_min_time=0.00000001 --benchmark_repetitions=128 p1319978.orf --benchmark_display_aggregates_only=true --benchmark_out=/tmp/tmpwb9sw2x0
2021-05-09T01:06:24+03:00
Running /home/lebedevri/rawspeed/build-new/src/utilities/rsbench/rsbench
Run on (32 X 3599.95 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x16)
  L1 Instruction 32 KiB (x16)
  L2 Unified 512 KiB (x16)
  L3 Unified 32768 KiB (x2)
Load Average: 4.05, 5.95, 3.43
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                                      Time             CPU   Iterations  CPUTime,s CPUTime/WallTime     Pixels Pixels/CPUTime Pixels/WallTime Raws/CPUTime Raws/WallTime WallTime,s
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
p1319978.orf/threads:32/process_time/real_time_mean         99.8 ms         99.8 ms          128  0.0997758         0.999972   10.1568M       101.797M        101.794M      10.0225       10.0222  0.0997786
p1319978.orf/threads:32/process_time/real_time_median       99.7 ms         99.7 ms          128  0.0997165         0.999985   10.1568M       101.857M        101.854M      10.0284       10.0281  0.0997195
p1319978.orf/threads:32/process_time/real_time_stddev      0.224 ms        0.224 ms          128   224.166u          34.345u          0        226.81k        227.231k    0.0223309     0.0223723   224.586u
Comparing /home/lebedevri/rawspeed/build-old/src/utilities/rsbench/rsbench to /home/lebedevri/rawspeed/build-new/src/utilities/rsbench/rsbench
Benchmark                                                               Time             CPU      Time Old      Time New       CPU Old       CPU New
----------------------------------------------------------------------------------------------------------------------------------------------------
p1319978.orf/threads:32/process_time/real_time_pvalue                 0.0000          0.0000      U Test, Repetitions: 128 vs 128
p1319978.orf/threads:32/process_time/real_time_mean                  -0.3134         -0.3134           145           100           145           100
p1319978.orf/threads:32/process_time/real_time_median                -0.3138         -0.3138           145           100           145           100
p1319978.orf/threads:32/process_time/real_time_stddev                -0.7073         -0.7078             1             0             1             0

```

Reviewed By: craig.topper, zhuhan0

Differential Revision: https://reviews.llvm.org/D102116
2021-05-17 20:33:33 +03:00
Matt Morehouse d97bab6511 [HWASan] Don't build alias mode on non-x86.
Alias mode is not expected work on non-x86, so don't build it there.
Should fix the aarch64 bot.
2021-05-17 10:32:16 -07:00
Mehdi Amini 43f6e04258 Make `mlir::OpState::operator bool` explicit
This change makes the conversion of an mlir::OpState to bool `explicit`. Idiomatic boolean uses continue to work as before, but questionable implicit uses (e.g. accumulating over a range of OpStates to count "true" states) become ill-formed. This makes the class interface a lilttle less error-prone.

I tested this change on our internal (fairly large) codebase, and only one fix was needed, which was ultimately an improvement of the affected code.

Reviewed By: rriddle, mehdi_amini

Differential Revision: https://reviews.llvm.org/D101989
2021-05-17 17:29:25 +00:00
Yaxun (Sam) Liu 18cb17ce4c [HIP] Fix spack detection
Missing or duplicate spack package should not cause error, since
users may only installed llvm/clang package, or users may installed
duplicate HIP package but will use environment variable or compiler
option to choose HIP path.

The message about missing or duplicate spack package is informational,
therefore should be emitted only when -v is specified.

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D102556
2021-05-17 13:24:05 -04:00
Rob Suderman 08068ddba7 [mlir][tosa] Fix tosa.avg_pool2d lowering to normalize correctly
Initial version of pooling assumed normalization was accross all elements
equally. TOSA actually requires the noramalization is perform by how
many elements were summed (edges are not artifically dimmer). Updated
the lowering to reflect this change with corresponding tests.

Reviewed By: NatashaKnk

Differential Revision: https://reviews.llvm.org/D102540
2021-05-17 10:00:43 -07:00
Steffen Larsen f226e28a88 [Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX redux.sync instructions
Adds NVPTX builtins and intrinsics for the CUDA PTX `redux.sync` instructions
for `sm_80` architecture or newer.

PTX ISA description of `redux.sync`:
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-redux-sync

Authored-by: Steffen Larsen <steffen.larsen@codeplay.com>

Differential Revision: https://reviews.llvm.org/D100124
2021-05-17 09:46:59 -07:00
Stuart Adams 02c2468864 [Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX cp.async instructions
Adds NVPTX builtins and intrinsics for the CUDA PTX `cp.async` instructions for
`sm_80` architecture or newer.

PTX ISA description of `cp.async`:
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#data-movement-and-conversion-instructions-asynchronous-copy
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-cp-async-mbarrier-arrive

Authored-by: Stuart Adams <stuart.adams@codeplay.com>
Co-Authored-by: Alexander Johnston <alexander@codeplay.com>

Differential Revision: https://reviews.llvm.org/D100394
2021-05-17 09:46:59 -07:00
Alex Zinenko 1417ddafdb [llvm][doc] fix header for read/write_register intrinsics in LangRef
Mutli-line headers are not allowed in RST, reformat the header to be a
single wide line.
2021-05-17 18:38:16 +02:00