Commit Graph

363874 Commits

Author SHA1 Message Date
Raphael Isemann f5f22f0448 [lldb] Skip TestSimulatorPlatform with sanitized builds
The test executable crashes when ran on a simulator. Skipping until this is
fixed.

rdar://67238668
2020-08-17 15:06:48 +02:00
Kai Nacke c2ae7934c8 [SystemZ/ZOS]__(de)register_frame are not available on z/OS.
The functions `__register_frame`/`__deregister_frame` are not
available on z/OS, so add a guard to not use them.

Reviewed By: lhames, abhina.sreeskantharajan

Differential Revision: https://reviews.llvm.org/D84787
2020-08-17 09:00:09 -04:00
Sam Parker dad04e62f1 [NFC] run update test script
On Transforms/LoopUnroll/runtime-small-upperbound.ll
2020-08-17 13:54:28 +01:00
Luís Marques 687e7d3425 [NFC] Tweak a comment about the lock-free builtins 2020-08-17 13:43:53 +01:00
Raphael Isemann cfb773c676 [lldb][NFC] Use StringRef in CreateFunctionDeclaration/GetDeclarationName
CreateFunctionDeclaration should just take a StringRef. GetDeclarationName is
(only) used by CreateFunctionDeclaration so that's why now also takes a
StringRef.
2020-08-17 14:17:20 +02:00
Georgii Rymar bc902191d3 [llvm-readobj] - Remove unwrapOrError calls from GNUStyle<ELFT>::printRelocations.
This fixes existent FIXMEs: we should not error out when unable to
find the number of relocations.

Differential revision: https://reviews.llvm.org/D85891
2020-08-17 15:16:36 +03:00
Sam Elliott 3f7068ad98 [RISCV] Enable the use of the old mucounteren name
The RISC-V Privileged Specification 1.11 defines `mcountinhibit`, which
has the same numeric CSR value as `mucounteren` from 1.09.1. This patch
enables the use of the old `mucounteren` name.

Patch by Yuichi Sugiyama.

Reviewed By: lenary, jrtc27, pzheng

Differential Revision: https://reviews.llvm.org/D85067
2020-08-17 13:11:49 +01:00
Sam Elliott 5f9ecc5d85 [RISCV] Indirect branch generation in position independent code
This fixes the "Unable to insert indirect branch" fatal error sometimes
seen when generating position-independent code.

Patch by msizanoen1

Reviewed By: jrtc27

Differential Revision: https://reviews.llvm.org/D84833
2020-08-17 13:09:26 +01:00
LLVM GN Syncbot e0eb4f204a [gn build] Port c1f6ce0c73 2020-08-17 12:02:24 +00:00
Sanjay Patel e6b6787d01 [InstCombine] fold abs(X)/X to cmp+select
The backend can convert the select-of-constants to
bit-hack shift+logic if desirable.

https://alive2.llvm.org/ce/z/pgJT6E

  define i8 @src(i8 %x) {
  %0:
    %a = abs i8 %x, 1
    %d = sdiv i8 %x, %a
    ret i8 %d
  }
  =>
  define i8 @tgt(i8 %x) {
  %0:
    %cond = icmp sgt i8 %x, 255
    %r = select i1 %cond, i8 1, i8 255
    ret i8 %r
  }
  Transformation seems to be correct!
2020-08-17 08:01:28 -04:00
Sanjay Patel 61512ddd2d [InstCombine] add tests for sdiv-of-abs; NFC 2020-08-17 08:01:27 -04:00
Sanjay Patel 6cd4a6f6b2 [InstCombine] reduce code duplication; NFC 2020-08-17 08:01:27 -04:00
Georgii Rymar 6567f82216 [llvm-readobj/elf] - Refine the warning about the broken PT_DYNAMIC segment.
Splitted out from D85519.

Currently we report "PT_DYNAMIC segment offset + size exceeds the size of the file",
this changes it to
"PT_DYNAMIC segment offset (0x1234) + file size (0x5678) exceeds the size of the file (0x68ab)"

Differential revision: https://reviews.llvm.org/D85654
2020-08-17 14:57:19 +03:00
Simon Pilgrim c1f6ce0c73 [DemandedBits] Improve accuracy of Add propagator
The current demand propagator for addition will mark all input bits at and right of the alive output bit as alive. But carry won't propagate beyond a bit for which both operands are zero (or one/zero in the case of subtraction) so a more accurate answer is possible given known bits.

I derived a propagator by working through truth tables and using a bit-reversed addition to make demand ripple to the right, but I'm not sure how to make a convincing argument for its correctness in the comments yet. Nevertheless, here's a minimal implementation and test to get feedback.

This would help in a situation where, for example, four bytes (<128) packed into an int are added with four others SIMD-style but only one of the four results is actually read.

Known A:     0_______0_______0_______0_______
Known B:     0_______0_______0_______0_______
AOut:        00000000001000000000000000000000
AB, current: 00000000001111111111111111111111
AB, patch:   00000000001111111000000000000000

Committed on behalf of: @rrika (Erika)

Differential Revision: https://reviews.llvm.org/D72423
2020-08-17 12:54:09 +01:00
Simon Pilgrim 79d9e2cd93 [DemandedBits] Reorder addition test checks. NFC.
As suggested on D72423 we should try to keep the same order as the original IR
2020-08-17 12:54:09 +01:00
Sam Parker 613d8f2953 [NFC] Run update script on test
Update IndVarSimplify/no-iv-rewrite.ll
2020-08-17 12:53:14 +01:00
Georgii Rymar c135a68d42 [LLD][ELF] - Do not produce an invalid dynamic relocation order with --shuffle-sections.
Normally (when not on android with android relocation packing enabled),
we put IRelative relocations to ".rel[a].dyn", after other relocations,
to ensure that IRelatives are processed last by the dynamic loader.

To achieve that we add the `in.relaIplt` after the `part.relaDyn`:
https://github.com/llvm/llvm-project/blob/master/lld/ELF/Writer.cpp#L540

The problem is that `--shuffle-sections` might break the sections order.
This patch fixes it.

Fixes https://bugs.llvm.org/show_bug.cgi?id=47056.

Differential revision: https://reviews.llvm.org/D85651
2020-08-17 14:46:52 +03:00
Raphael Isemann 7e6c437fb4 [lldb][NFC] Remove name parameter from CreateFunctionTemplateDecl
It's unused and not documented.
2020-08-17 13:44:10 +02:00
Raphael Isemann 42b9a68352 [lldb][NFC] Use expect_expr in more tests 2020-08-17 13:14:57 +02:00
Simon Pilgrim 1d2ede87ea [X86][AVX] Move lowerShuffleWithVPMOV inside explicit shuffle lowering cases
Perform lowerShuffleWithVPMOV as part of the v16i8/v8i16 shuffle lowering stages, which are the only types that are currently supported.

We need to expand support for lowering shuffles as truncations to fix the remaining regressions in D66004
2020-08-17 11:58:51 +01:00
Raphael Isemann cd2139a527 [lldb][NFC] Use the proper type for the 'storage' parameter of CreateFunctionDeclaration
All the callers pass an enum and we cast the int anyway back to the actual type,
so we might as well just use the type for the parameter.
2020-08-17 12:53:58 +02:00
Cullen Rhodes 2ccde3c96b [InlineCost] Fix scalable vectors in visitAlloca
Discovered as part of the VLS type work (see D85128).

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D85848
2020-08-17 10:34:27 +00:00
Vitaly Buka 3b348d9102 [NFC][StackSafety] Move out sort from the loop 2020-08-17 03:30:14 -07:00
Raphael Isemann 6b97fa0bfe [lldb] Remove OS-specific string from TestInvalidArgsLog
This is the error message from the OS, so we shouldn't check against the
OS-specific part of the string.

Fixes the test on Windows which returns a different error message.
2020-08-17 11:57:43 +02:00
Raphael Isemann 24c74f5e8c [lldb] Don't delete orphaned shared modules in SBDebugger::DeleteTarget
In D83876 the consensus seems that LLDB should never deleted orphaned modules
implicitly. However, SBDebugger::DeleteTarget is currently doing exactly that.
This code was added in 753406221b but I don't see
any explanation in the commit, so I think we should delete it.

Reviewed By: clayborg

Differential Revision: https://reviews.llvm.org/D83933
2020-08-17 11:30:56 +02:00
Pavel Labath 67cdb899c6 [lldb/Utility] Simplify and generalize Scalar class
The class contains an enum listing all host integer types as well as
some non-host types. This setup is a remnant of a time when this class
was actually implemented in terms of host integer types. Now that we are
using llvm::APInt, they are mostly useless and mean that each function
needs to enumerate all of these cases even though it treats most of them
identically.

I only leave e_sint and e_uint to denote the integer signedness, but I
want to remove that in a follow-up as well.

Removing these cases simplifies most of these functions, with the only
exception being PromoteToMaxType, which can no longer rely on a simple
enum comparison to determine what needs to be promoted.

This also makes the class ready to work with arbitrary integer sizes, so
it does not need to be modified when someone needs to add a larger
integer size.

Differential Revision: https://reviews.llvm.org/D85836
2020-08-17 11:09:56 +02:00
Pavel Labath 2d89a3ba12 [lldb] Forcefully complete a type when adding nested classes
With -flimit-debug-info, we can run into cases when we only have a class
as a declaration, but we do have a definition of a nested class. In this
case, clang will hit an assertion when adding a member to an incomplete
type (but only if it's adding a c++ class, and not C struct).

It turns out we already had code to handle a similar situation arising
in the -gmodules scenario. This extends the code to handle
-flimit-debug-info as well, and reorganizes bits of other code handling
completion of types to move functions doing similar things closer
together.

Differential Revision: https://reviews.llvm.org/D85968
2020-08-17 11:09:13 +02:00
Raphael Isemann c2f9454a16 [lldb] Add SBModule::GarbageCollectAllocatedModules and clear modules after each test run
Right now the only places in the SB API where lldb:: ModuleSP instances are
destroyed are in SBDebugger::MemoryPressureDetected (where it's just attempted
but not guaranteed) and in SBDebugger::DeleteTarget (which will be removed in
D83933). Tests that directly create an lldb::ModuleSP and never create a target
therefore currently leak lldb::Module instances. This triggers the sanity checks
in lldbtest that make sure that the global module list is empty after a test.

This patch adds SBModule::GarbageCollectAllocatedModules as an explicit way to
clean orphaned lldb::ModuleSP instances. Also we now start calling this method
at the end of each test run and move the sanity check behind that call to make
this work. This way even tests that don't create targets can pass the sanity
check.

This fixes TestUnicodeSymbols.py when D83865 is applied (which makes that the
sanity checks actually fail the test).

Reviewed By: JDevlieghere

Differential Revision: https://reviews.llvm.org/D83876
2020-08-17 11:00:19 +02:00
Raphael Isemann 867c347c32 [lldb] Fix that log enable's -f parameter causes LLDB to crash when it can't open the log file
We didn't do anything with the llvm::Error we get from `Open`, so when we end up in the
error case we just crash due to the llvm::Error sanity check. Also add the missing newline
behind the error message so it no longer messes with the next (lldb) prompt.

Reviewed By: JDevlieghere

Differential Revision: https://reviews.llvm.org/D85970
2020-08-17 10:43:00 +02:00
Raphael Isemann c57ea1b48f [lldb] Get lldb-server platform's --socket-file working again
`lldb-server platform --socket-file /any/path` currently always fails to create
the socket file.  This stopped working after D67424 which changed the
input variables of `writeFileAtomically` slightly. We're expected to
pass in a temporary path template (`/tmp/foo-%%%%%`) and the final
path we want to write. Instead we currently pass in the never set
`temp_file_path` as the temporary path (which will make this function always
fail) and pass in the temp_file_spec's path as the final path (which is actually
the template path such as `/tmp/foo-%%%%%`) instead of the actual path
we want to write (e.g. `/tmp/foo`).

Reviewed By: labath

Differential Revision: https://reviews.llvm.org/D85890
2020-08-17 10:29:06 +02:00
Kazushi (Jam) Marukawa 40f1e7e804 [VE] Support f128
Support f128 using VE instructions.  Update regression tests.
I've noticed there is no load or store i128 test, so I add them too.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D86035
2020-08-17 17:26:52 +09:00
Raphael Isemann 5913f2591c [lldb][NFC] Remove stride parameter from GetArrayElementType
This parameter isn't used anywhere in LLDB nor the Swift downstream branch. It
also doesn't really fit into the TypeSystem APIs that usually don't return
additional related functionality via some output parameters. Also the
implementations already states that the calculated value there is wrong.

Let's remove it. If we need this functionality at some point then Swift's much
nicer `GetByteStride` function seems like the way to go.

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D84299
2020-08-17 10:19:51 +02:00
Kadir Cetinkaya 53c593c2c8
[clang] Make signature help work with dependent args
Fixes https://github.com/clangd/clangd/issues/490

Differential Revision: https://reviews.llvm.org/D85826
2020-08-17 10:06:36 +02:00
Raphael Isemann 24fc3177c1 [lldb] Print the exception traceback when hitting cleanup errors
Right now if the test suite encounters a cleanup error it just prints "CLEANUP
ERROR:" but not any additional information.

This patch just prints the exception that caused the cleanup error. This should
make debugging the failing tests for D83865 easier (and seems in general nice to
have).

Reviewed By: labath

Differential Revision: https://reviews.llvm.org/D83874
2020-08-17 09:53:52 +02:00
Craig Topper a206f85091 [X86] Reject dirflag in inline asm constraints other than clobber.
Fixes the crash from PR47195.
2020-08-16 23:33:45 -07:00
Chen Zheng 4d52ebb9b9 [PowerPC] Make StartMI ignore COPY like instructions.
Reviewed By: lkail

Differential Revision: https://reviews.llvm.org/D85659
2020-08-17 02:12:30 -04:00
Yonghong Song aa61e43040 [InstCombine] Fix a compilation bug
With gcc 6.3.0, I hit the following compilation bug.
  ../lib/Transforms/InstCombine/InstCombineVectorOps.cpp:937:2: error: extra ‘;’ [-Werror=pedantic]
   };
    ^
  cc1plus: all warnings being treated as errors

The error is introduced by Commit ae7f08812e ("[InstCombine]
Aggregate reconstruction simplification (PR47060)")
2020-08-16 21:56:42 -07:00
Yonghong Song 000ad1a976 [clang] fix a compilation bug
With gcc 6.3.0, I hit the following compilation bug:
  /home/yhs/work/llvm-project/clang/lib/Frontend/CompilerInvocation.cpp:
  In function ‘bool ParseCodeGenArgs(clang::CodeGenOptions&, llvm::opt::ArgList&,
  clang::InputKind, clang::DiagnosticsEngine&, const clang::TargetOptions&,
  const clang::FrontendOptions&)’:
  /home/yhs/work/llvm-project/clang/lib/Frontend/CompilerInvocation.cpp:780:12:
    error: unused variable ‘A’ [-Werror=unused-variable]
     if (Arg *A = Args.getLastArg(OPT_fuse_ctor_homing))
              ^
  cc1plus: all warnings being treated as errors

The bug is introduced by Commit ae6523cd62 ("[DebugInfo] Add
-fuse-ctor-homing cc1 flag so we can turn on constructor homing only
if limited debug info is already on.")
2020-08-16 21:53:37 -07:00
zhanghb97 fcd2969da9 Initial MLIR python bindings based on the C API.
* Basic support for context creation, module parsing and dumping.

Differential Revision: https://reviews.llvm.org/D85481
2020-08-16 19:34:25 -07:00
Vitaly Buka e10e7829bf [StackSafety] Skip ambiguous lifetime analysis
If we can't identify alloca used in lifetime marker we
need to assume to worst case scenario.

Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D84630
2020-08-16 18:05:52 -07:00
Richard Smith 948219d109 Replace setter named 'getAsOpaqueInt' with a real getter.
Clean up a bunch of places where the opaque forms of FPOptions and
FPOptionsOverride were being used inappropriately.
2020-08-16 16:38:33 -07:00
Richard Smith ae500e4d09 Always keep unset fields in FPOptionsOverride zeroed.
There are three fields that the FPOptions default constructor sets to
non-zero values; those fields previously could have been zero or
non-zero depending on whether they'd been explicitly removed from the
FPOptionsOverride set. However, that doesn't seem to ever actually
happen, so this is NFC, except that it makes the AST file representation
of FPOptionsOverride make more sense.
2020-08-16 15:44:51 -07:00
Richard Smith ae3067055b Use consistent code for setting FPFeatures from operator constructors. 2020-08-16 15:40:38 -07:00
Richard Smith 9860e68450 Don't leave the FPOptions in a UnaryOperator uninitialized.
We don't appear to use these FPOptions for anything right now, but
they shouldn't be uninitialized because that makes our AST file output
nondeterministic.
2020-08-16 15:16:12 -07:00
Mehdi Amini de71b46a51 Add missing parsing for attributes to std.generic_atomic_rmw op
Fix llvm.org/pr47182

Differential Revision: https://reviews.llvm.org/D86030
2020-08-16 22:13:58 +00:00
Roman Lebedev 0ec1f0f332
[NFCI][InstCombine] Pacify GCC builds - don't name variable and enum class identically 2020-08-16 23:37:36 +03:00
Roman Lebedev ae7f08812e
[InstCombine] Aggregate reconstruction simplification (PR47060)
This pattern happens in clang C++ exception lowering code, on unwind branch.
We end up having a `landingpad` block after each `invoke`, where RAII
cleanup is performed, and the elements of an aggregate `{i8*, i32}`
holding exception info are `extractvalue`'d, and we then branch to common block
that takes extracted `i8*` and `i32` elements (via `phi` nodes),
form a new aggregate, and finally `resume`'s the exception.

The problem is that, if the cleanup block is effectively empty,
it shouldn't be there, there shouldn't be that `landingpad` and `resume`,
said `invoke` should be a  `call`.

Indeed, we do that simplification in e.g. SimplifyCFG `SimplifyCFGOpt::simplifyResume()`.
But the thing is, all this extra `extractvalue` + `phi` + `insertvalue` cruft,
while it is pointless, does not look like "empty cleanup block".
So the `SimplifyCFGOpt::simplifyResume()` fails, and the exception is has
higher cost than it could have on unwind branch :S

This doesn't happen *that* often, but it will basically happen once per C++
function with complex CFG that called more than one other function
that isn't known to be `nounwind`.

I think, this is a missing fold in InstCombine, so i've implemented it.

I think, the algorithm/implementation is rather self-explanatory:
1. Find a chain of `insertvalue`'s that fully tell us the initializer of the aggregate.
2. For each element, try to find from which aggregate it was extracted.
   If it was extracted from the aggregate with identical type,
   from identical element index, great.
3. If all elements were found to have been extracted from the same aggregate,
   then we can just use said original source aggregate directly,
   instead of re-creating it.
4. If we fail to find said aggregate when looking only in the current block,
   we need be PHI-aware - we might have different source aggregate when coming
   from each predecessor.

I'm not sure if this already handles everything, and there are some FIXME's,
i'll deal with all that later in followups.

I'd be fine with going with post-commit review here code-wise,
but just in case there are thoughts, i'm posting this.

On RawSpeed, for example, this has the following effect:
```
| statistic name                                    | baseline | proposed |     Δ |       % | abs(%) |
|---------------------------------------------------|---------:|---------:|------:|--------:|-------:|
| instcombine.NumAggregateReconstructionsSimplified |        0 |     1253 |  1253 |   0.00% |  0.00% |
| simplifycfg.NumInvokes                            |      948 |     1355 |   407 |  42.93% | 42.93% |
| instcount.NumInsertValueInst                      |     4382 |     3210 | -1172 | -26.75% | 26.75% |
| simplifycfg.NumSinkCommonCode                     |      574 |      458 |  -116 | -20.21% | 20.21% |
| simplifycfg.NumSinkCommonInstrs                   |     1154 |      921 |  -233 | -20.19% | 20.19% |
| instcount.NumExtractValueInst                     |    29017 |    26397 | -2620 |  -9.03% |  9.03% |
| instcombine.NumDeadInst                           |   166618 |   174705 |  8087 |   4.85% |  4.85% |
| instcount.NumPHIInst                              |    51526 |    50678 |  -848 |  -1.65% |  1.65% |
| instcount.NumLandingPadInst                       |    20865 |    20609 |  -256 |  -1.23% |  1.23% |
| instcount.NumInvokeInst                           |    34023 |    33675 |  -348 |  -1.02% |  1.02% |
| simplifycfg.NumSimpl                              |   113634 |   114708 |  1074 |   0.95% |  0.95% |
| instcombine.NumSunkInst                           |    15030 |    14930 |  -100 |  -0.67% |  0.67% |
| instcount.TotalBlocks                             |   219544 |   219024 |  -520 |  -0.24% |  0.24% |
| instcombine.NumCombined                           |   644562 |   645805 |  1243 |   0.19% |  0.19% |
| instcount.TotalInsts                              |  2139506 |  2135377 | -4129 |  -0.19% |  0.19% |
| instcount.NumBrInst                               |   156988 |   156821 |  -167 |  -0.11% |  0.11% |
| instcount.NumCallInst                             |  1206144 |  1207076 |   932 |   0.08% |  0.08% |
| instcount.NumResumeInst                           |     5193 |     5190 |    -3 |  -0.06% |  0.06% |
| asm-printer.EmittedInsts                          |   948580 |   948299 |  -281 |  -0.03% |  0.03% |
| instcount.TotalFuncs                              |    11509 |    11507 |    -2 |  -0.02% |  0.02% |
| inline.NumDeleted                                 |    97595 |    97597 |     2 |   0.00% |  0.00% |
| inline.NumInlined                                 |   210514 |   210522 |     8 |   0.00% |  0.00% |
```
So we manage to increase the amount of `invoke` -> `call` conversions in SimplifyCFG by almost a half,
and there is a very apparent decrease in instruction and basic block count.

On vanilla llvm-test-suite:
```
| statistic name                                    | baseline | proposed |     Δ |       % | abs(%) |
|---------------------------------------------------|---------:|---------:|------:|--------:|-------:|
| instcombine.NumAggregateReconstructionsSimplified |        0 |      744 |   744 |   0.00% |  0.00% |
| instcount.NumInsertValueInst                      |     2705 |     2053 |  -652 | -24.10% | 24.10% |
| simplifycfg.NumInvokes                            |     1212 |     1424 |   212 |  17.49% | 17.49% |
| instcount.NumExtractValueInst                     |    21681 |    20139 | -1542 |  -7.11% |  7.11% |
| simplifycfg.NumSinkCommonInstrs                   |    14575 |    14361 |  -214 |  -1.47% |  1.47% |
| simplifycfg.NumSinkCommonCode                     |     6815 |     6743 |   -72 |  -1.06% |  1.06% |
| instcount.NumLandingPadInst                       |    14851 |    14712 |  -139 |  -0.94% |  0.94% |
| instcount.NumInvokeInst                           |    27510 |    27332 |  -178 |  -0.65% |  0.65% |
| instcombine.NumDeadInst                           |  1438173 |  1443371 |  5198 |   0.36% |  0.36% |
| instcount.NumResumeInst                           |     2880 |     2872 |    -8 |  -0.28% |  0.28% |
| instcombine.NumSunkInst                           |    55187 |    55076 |  -111 |  -0.20% |  0.20% |
| instcount.NumPHIInst                              |   321366 |   320916 |  -450 |  -0.14% |  0.14% |
| instcount.TotalBlocks                             |   886816 |   886493 |  -323 |  -0.04% |  0.04% |
| instcount.TotalInsts                              |  7663845 |  7661108 | -2737 |  -0.04% |  0.04% |
| simplifycfg.NumSimpl                              |   886791 |   887171 |   380 |   0.04% |  0.04% |
| instcount.NumCallInst                             |   553552 |   553733 |   181 |   0.03% |  0.03% |
| instcombine.NumCombined                           |  3200512 |  3201202 |   690 |   0.02% |  0.02% |
| instcount.NumBrInst                               |   741794 |   741656 |  -138 |  -0.02% |  0.02% |
| simplifycfg.NumHoistCommonInstrs                  |    14443 |    14445 |     2 |   0.01% |  0.01% |
| asm-printer.EmittedInsts                          |  7978085 |  7977916 |  -169 |   0.00% |  0.00% |
| inline.NumDeleted                                 |    73188 |    73189 |     1 |   0.00% |  0.00% |
| inline.NumInlined                                 |   291959 |   291968 |     9 |   0.00% |  0.00% |
```
Roughly similar effect, less instructions and blocks total.

See also: rGe492f0e03b01a5e4ec4b6333abb02d303c3e479e.

Compile-time wise, this appears to be roughly geomean-neutral:
http://llvm-compile-time-tracker.com/compare.php?from=39617aaed95ac00957979bc1525598c1be80e85e&to=b59866cf30420da8f8e3ca239ed3bec577b23387&stat=instructions

And this is a win size-wize in general:
http://llvm-compile-time-tracker.com/compare.php?from=39617aaed95ac00957979bc1525598c1be80e85e&to=b59866cf30420da8f8e3ca239ed3bec577b23387&stat=size-text

See https://bugs.llvm.org/show_bug.cgi?id=47060

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D85787
2020-08-16 23:27:56 +03:00
Johannes Doerfert 5272d29e2c [OpenMP][CUDA] Keep one kernel list per device, not globally.
Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D86039
2020-08-16 14:38:35 -05:00
Johannes Doerfert aa27cfc1e7 [OpenMP][CUDA] Cache the maximal number of threads per block (per kernel)
Instead of calling `cuFuncGetAttribute` with
`CU_FUNC_ATTRIBUTE_MAX_THREADS_PER_BLOCK` for every kernel invocation,
we can do it for the first one and cache the result as part of the
`KernelInfo` struct. The only functional change is that we now expect
`cuFuncGetAttribute` to succeed and otherwise propagate the error.
Ignoring any error seems like a slippery slope...

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D86038
2020-08-16 14:38:33 -05:00
Johannes Doerfert 95a25e4c32 [OpenMP][FIX] Do not use TBAA in type punning reduction GPU code PR46156
When we implement OpenMP GPU reductions we use type punning a lot during
the shuffle and reduce operations. This is not always compatible with
language rules on aliasing. So far we generated TBAA which later allowed
to remove some of the reduce code as accesses and initialization were
"known to not alias". With this patch we avoid TBAA in this step,
hopefully for all accesses that we need to.

Verified on the reproducer of PR46156 and QMCPack.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D86037
2020-08-16 14:38:31 -05:00