Commit Graph

392449 Commits

Author SHA1 Message Date
Florian Hahn 8db9cb262f
[Matrix] Add tests for hoisting address computations. 2021-06-30 14:31:00 +01:00
Peter Smith fc1cb3104b [LLD][ELF][ARM] Tidy up test to hook up missing filecheck patterns [NFC]
A couple of filecheck patterns had not been hooked up with
the patterns suffering from some drift. As this test is old
and llvm-objdump has improved a lot, take this opportunity to
hide the instruction encoding. I've also taken out a lot of
the explanatory comments that llvm-objdump improvements make
redundant, as these comments oftern don't get updated when addresses
change.

Differential Revision: https://reviews.llvm.org/D104907
2021-06-30 14:16:40 +01:00
alex-t e585b332e4 [AMDGPU] PHI node cost should not be counted for the size and latency.
Details: https://reviews.llvm.org/D96805 changed the GCNTTIImpl::getCFInstrCost to return 1 for the PHI nodes
  for the TTI::TCK_CodeSize and TTI::TCK_SizeAndLatency. This is incorrect because the value moves that are the
  result of the PHI lowering are inserted into the basic block predecessors - not into the block itself.
  As a result of this change LoopRotate and LoopUnroll were broken because of the incorrect Loop header and loop
  body size/cost estimation.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D105104
2021-06-30 16:11:17 +03:00
Peter Smith dd4d3f7406 [LLD][ELF][ARM] Fix case of patched unrelocated BLX
There are a couple of problems with the code to patch
unrelocated BLX instructions:
1. The calculation of the PC needs to take into account
   the alignment of the instruction. The Thumb BLX
   uses alignDown(PC, 4) for the source address.
2. The calculation of the PC bias is hard-coded to 4
   which works for Thumb, but when there is a BLX the
   branch will be in Arm state so it needs an 8 byte
   PC bias.

No asssembler generates an unrelocated BLX instruction
so these problems do not affect real world programs.
However we should still fix them.

Differential Revision: https://reviews.llvm.org/D104905
2021-06-30 14:07:35 +01:00
Bradley Smith 002911503f [TargetLowering][AArch64][SVE] Take into account accessed type when clamping address
When clamping the index for a memory access to a stacked vector we must
take into account the entire type being accessed, not just assume that
we are accessing only a single element.

Differential Revision: https://reviews.llvm.org/D105016
2021-06-30 13:30:18 +01:00
Tobias Gysi 42d99bc376 [mlir][linalg][python] Update the OpDSL doc (NFC).
Update the OpDSL documentation to reflect recent changes. In particular, the updated documentation discusses:
- Attributes used to parameterize index expressions
- Shape-only tensor support
- Scalar parameters

Differential Revision: https://reviews.llvm.org/D105123
2021-06-30 12:27:17 +00:00
Saiyedul Islam f7ce532d62 [clang-offload-bundler] Add unbundling of archives containing bundled object files into device specific archives
This patch adds unbundling support of an archive file. It takes an
archive file along with a set of offload targets as input.
Output is a device specific archive for each given offload target.
Input archive contains bundled code objects bundled using
clang-offload-bundler. Each generated device specific archive contains
a set of device code object files which are named as
<Parent Bundle Name>-<CodeObject-GPUArch>.

Entries in input archive can be of any binary type which is
supported by clang-offload-bundler, like *.bc. Output archives will
contain files in same type.

Example Usuage:
  clang-offload-bundler --unbundle --inputs=lib-generic.a -type=a
      -targets=openmp-amdgcn-amdhsa--gfx906,openmp-amdgcn-amdhsa--gfx908
      -outputs=devicelib-gfx906.a,deviceLib-gfx908.a

Reviewed By: jdoerfert, yaxunl

Differential Revision: https://reviews.llvm.org/D93525
2021-06-30 17:55:50 +05:30
Simon Pilgrim fcd0cb3921 Fix MSVC "32-bit shift implicitly converted to 64 bits" warning. 2021-06-30 13:23:53 +01:00
Alexey Bataev 7fab1146e4 [OPENMP]Fix PR50929: Ignored initializer clause in user-defined reduction.
No need to try to create the default constructor for private copy, it
will be called automatically in the initializer of the declare
reduction. Fixes balance between constructors/destructors calls.

Differential Revision: https://reviews.llvm.org/D105143
2021-06-30 04:55:38 -07:00
Zhouyi Zhou 2fd75507d1 [clang] NFC: add line break at the end of if expressions
Hi,

In function TransformTemplateArgument,
would it be better to add line break at the end of "if" expressions?

I use clang-format to do the job for me.

Thanks a lot

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D104604
2021-06-30 19:48:24 +08:00
madhur13490 a7ed55f64c [AMDGPU] Simplify getReservedNumSGPRs
This is a followup patch on D103636 where
it seemed checking on amdgpu-calls and
amdgpu-stack-objects is unnecessary. Removing these
checks didn't regress any tests functionally.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D104513
2021-06-30 16:19:39 +05:30
Florian Hahn 611a02cce5
[ConstantRanges] Use APInt for constant case for urem/srem.
Currently UREM & SREM on constant ranges produces overly pessimistic
results for single element constant ranges.

Delegate to APInt's implementation if both operands are single element
constant ranges. We already do something similar for other binary
operators, like binary AND.

Fixes PR49731.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D105115
2021-06-30 11:18:20 +01:00
Florian Mayer ad8494c021 [hwasan] Make sure we retag with a new tag on free.
Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D105021
2021-06-30 11:13:38 +01:00
David Sherwood 7b7b5b5a26 [NFC] Rename shadowed variable in InnerLoopVectorizer::createInductionVariable
Avoid creating a IRBuilder stack variable with the same name as the
class member.
2021-06-30 11:11:49 +01:00
Florian Mayer a24f104645 [MTE] Remove redundant helper function.
Looking at PostDominatorTree::dominates, we can see that has the same
logic (with the addition of handling Phi nodes - which are not used as inputs in
this pass) as the helper function.

Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D105141
2021-06-30 11:11:26 +01:00
Jay Foad 2da58826a5 [TableGen] Allow identical MnemonicAliases with no predicate
My use case for this is illustrated in the test case: I want to define
the same instruction twice with different (disjoint) predicates, because
the instruction has different operands on different subtargets. It's
convenient to do this with a multiclass that also defines an alias for
the instruction.

Previously tablegen would complain if this alias was defined twice with
no predicate. One way to fix this would be to add a predicate on each
definition of the alias, matching the predicate on the instruction. But
this (a) is slightly awkward to do in the real world use case I had, and
(b) leads to an inefficient matcher that will do something like this:

  if (Mnemonic == "foo_alias") {
    if (Features.test(Feature_Subtarget1Bit))
      Mnemonic == "foo";
    else if (Features.test(Feature_Subtarget2Bit))
      Mnemonic == "foo";
    return;
  }

It would be more efficient to skip the feature tests and return "foo"
unconditionally.

Overall it seems better to allow multiple definitions of the identical
alias with no predicate.

Differential Revision: https://reviews.llvm.org/D105033
2021-06-30 10:53:39 +01:00
Valeriy Savchenko c818cb96ad [analyzer][satest][NFC] Relax dependencies requirements 2021-06-30 12:50:21 +03:00
Igor Kudrin 657e067bb5 [ARMInstPrinter] Print the target address of a branch instruction
This follows other patches that changed printing immediate values of
branch instructions to target addresses, see D76580 (x86), D76591 (PPC),
D77853 (AArch64).

As observing immediate values might sometimes be useful, they are
printed as comments for branch instructions.

// llvm-objdump -d output (before)
000200b4 <_start>:
   200b4: ff ff ff fa   blx     #-4 <thumb>
000200b8 <thumb>:
   200b8: ff f7 fc ef   blx     #-8 <_start>

// llvm-objdump -d output (after)
000200b4 <_start>:
   200b4: ff ff ff fa   blx     0x200b8 <thumb>         @ imm = #-4
000200b8 <thumb>:
   200b8: ff f7 fc ef   blx     0x200b4 <_start>        @ imm = #-8

// GNU objdump -d.
000200b4 <_start>:
   200b4:       faffffff        blx     200b8 <thumb>
000200b8 <thumb>:
   200b8:       f7ff effc       blx     200b4 <_start>

Differential Revision: https://reviews.llvm.org/D104701
2021-06-30 16:35:28 +07:00
Tobias Gysi 4361bd9b7b [mlir][linalg][python] Explicit shape and dimension order in OpDSL.
Extend the OpDSL syntax with an optional `domain` function to specify an explicit dimension order. The extension is needed to provide more control over the dimension order instead of deducing it implicitly depending on the formulation of the tensor comprehension. Additionally, the patch also ensures the symbols are ordered according to the operand definitions of the operation.

Differential Revision: https://reviews.llvm.org/D105117
2021-06-30 08:59:39 +00:00
Igor Kudrin 17bcae8906 [ARM][NFC] Remove an unused method
`ARMInstPrinter::printMveAddrModeQOperand()` was added in D62680, but
was never used. It looks like `printT2AddrModeImm8Operand<false>()` is
used instead.

Differential Revision: https://reviews.llvm.org/D105124
2021-06-30 15:55:37 +07:00
Stephan Herhut db2de8d7f1 [mlir][llvm] Add a test for memref.copy lowering to llvm
This was missing and also there was a bug in the lowering itself, which went unnoticed due to it.

Differential Revision: https://reviews.llvm.org/D105122
2021-06-30 10:49:29 +02:00
Sjoerd Meijer b062fff87a Recommit "[AArch64] Custom lower <4 x i8> loads"
This recommits D104782 including a fix for adding a wrong operand to the new
load node.

Differential Revision: https://reviews.llvm.org/D105110
2021-06-30 09:18:06 +01:00
Dmitry Polukhin fceaf86211 [clang] Fix UB when string.front() is used for the empty string
Compilation database might have empty string as a command line argument.
But ExpandResponseFilesDatabase::expand doesn't expect this and assumes
that string.front() can be used for any argument. It is undefined behaviour if
string is empty. With debug build mode it causes crash in clangd.

Test Plan: check-clang

Differential Revision: https://reviews.llvm.org/D105120
2021-06-30 01:07:47 -07:00
Kai Luo 1f169a774c [PowerPC][AIX] Re-generate test aix-framepointer-save-restore.ll. NFC. 2021-06-30 05:37:49 +00:00
Uday Bondhugula 071d26f808 [MLIR] Fix generateCopyForMemRefRegion
Fix generateCopyForMemRefRegion for a missing check: in some cases, when
the thing to generate copies for itself is empty, no fast buffer/copy
loops would have been allocated/generated. Add an extra assertion there
while at this.

Differential Revision: https://reviews.llvm.org/D105170
2021-06-30 10:24:10 +05:30
Kai Luo 338a3f495e [PowerPC][AIX] Pre-commit tracetable test for D100167. NFC. 2021-06-30 04:39:31 +00:00
Mehdi Amini 8b8f5c54d5 Fix test pass registration to use the new API / not use the deprecated one (NFC) 2021-06-30 04:09:11 +00:00
Tony Tye 7f19aa73c2 [AMDGPU] Update gfx90a memory model support
Update AMDGPU gfx90a memory model to make coarse grain memory allocations
consistent when fine grained system scope atomic acquire and release is
performed.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D105137
2021-06-30 04:05:22 +00:00
Chuanqi Xu 801c2b9bba [FuncSpec] Add an option to specializing literal constant
Now the option is off by default. Since we are not sure if this option
would make the compile time increase aggressively. Although we tested it
on SPEC2017, we may need to test more to make it on by default.

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D104365
2021-06-30 11:26:44 +08:00
Chuanqi Xu 1d9539cf49 [Coroutine] Add statistics for the number of elided coroutine
Now we lack a benchmark to measure the performance change for each
commit.
Since coro elide is the main optimization in coroutine module, I wonder
it may be an estimation to count the number of elided coroutine in
private code bases.
e.g., for a certain commit, if we found that the number of elided goes
down, we could find it before the commit check-in.

Reviewed By: lxfind

Differential Revision: https://reviews.llvm.org/D105095
2021-06-30 11:20:53 +08:00
Fangrui Song 814dffa4b7 [llvm-objcopy][MachO] Support LC_LINKER_OPTIMIZATION_HINT load command
The load command is currently specific to arm64 and holds information
for instruction rewriting, e.g.  converting a GOT load to an ADR to
compute a local address.
(On ELF the information is usually conveyed by relocations, e.g.
R_X86_64_REX_GOTPCRELX, R_PPC64_TOC16_HA)

Reviewed By: alexander-shaposhnikov

Differential Revision: https://reviews.llvm.org/D104968
2021-06-29 18:47:55 -07:00
Greg Clayton 43f6dad234 Fix buildbot compile error for https://reviews.llvm.org/D105160. 2021-06-29 18:03:25 -07:00
Greg Clayton c8164d0276 Create synthetic symbol names on demand to improve memory consumption and startup times.
This fix was created after profiling the target creation of a large C/C++/ObjC application that contained almost 4,000,000 redacted symbol names. The symbol table parsing code was creating names for each of these synthetic symbols and adding them to the name indexes. The code was also adding the object file basename to the end of the symbol name which doesn't allow symbols from different shared libraries to share the names in the constant string pool.

Prior to this fix this was creating 180MB of "___lldb_unnamed_symbol" symbol names and was taking a long time to generate each name, add them to the string pool and then add each of these names to the name index.

This patch fixes the issue by:
- not adding a name to synthetic symbols at creation time, and allows name to be dynamically generated when accessed
- doesn't add synthetic symbol names to the name indexes, but catches this special case as name lookup time. Users won't typically set breakpoints or lookup these synthetic names, but support was added to do the lookup in case it does happen
- removes the object file baseanme from the generated names to allow the names to be shared in the constant string pool

Prior to this fix the startup times for a large application was:
12.5 seconds (cold file caches)
8.5 seconds (warm file caches)

After this fix:
9.7 seconds (cold file caches)
5.7 seconds (warm file caches)

The names of the symbols are auto generated by appending the symbol's UserID to the end of the "___lldb_unnamed_symbol" string and is only done when the name is requested from a synthetic symbol if it has no name.

Differential Revision: https://reviews.llvm.org/D105160
2021-06-29 17:44:33 -07:00
Nick Desaulniers 98b9fc9b93 [Test] delete LPM RUNs in inline_nossp.ll
This test was modified in D104958. Invoking opt with -{passname} (vs
-passes={passname}) without -enable-new-pm={0|1} is now ambiguous and
dependent on how LLVM was configured.  Drop the LPM runs rather than
fix since there unlikely to be any users still on LPM that rely on the
behavior in this test.

See also:
https://lists.llvm.org/pipermail/llvm-dev/2021-June/151553.html

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D105154
2021-06-29 17:09:42 -07:00
David Blaikie 632e15e766 Conditionalize function only used in an assert to address -Wunused-function 2021-06-29 16:39:59 -07:00
Akira Hatanaka 6cda73e3c4 [CodeGen] Add ParmVarDecls to FunctionDecls that are created to generate
ObjC property getter/setter functions

This is needed to prevent clang from crashing when we make the changes
proposed in https://reviews.llvm.org/D98799.

Differential Revision: https://reviews.llvm.org/D104883
2021-06-29 16:27:24 -07:00
Lei Huang 1df981f43a Revert "Attempt to disable MLIR JIT tests on PowerPC to unbreak the bot"
This reverts commit 652f4b5140.

Re-enable MLLIR JIT tests.
The MLIR Bot was updated to export LD_LIBRARY_PATH=/usr/lib64, which
seem to fix this issue.
2021-06-29 18:03:23 -05:00
Steffen Larsen 3644726a78 [Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX 6.5 and 7.0 WMMA and MMA instructions
Adds NVPTX builtins and intrinsics for the CUDA PTX `wmma.load`, `wmma.store`, `wmma.mma`, and `mma` instructions added in PTX 6.5 and 7.0.

PTX ISA description of

  - `wmma.load`: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-instructions-wmma-ld
  - `wmma.store`: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-instructions-wmma-st
  - `wmma.mma`: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-instructions-wmma-mma
  - `mma`: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-instructions-mma

Overview of `wmma.mma` and `mma` matrix shape/type combinations added with specific PTX versions: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-shape

Authored-by: Steffen Larsen <steffen.larsen@codeplay.com>
Co-Authored-by: Stuart Adams <stuart.adams@codeplay.com>

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D104847
2021-06-29 15:44:07 -07:00
Dhruva Chakrabarti e0b713a035 [libomptarget] [amdgpu] Change default number of teams per computation unit
This patch is related to https://reviews.llvm.org/D98832. Based on discussions there, I decided to separate out the teams default as this patch. This change is to increase the number of teams per computation unit so as to provide more wavefronts for hiding latency. This change improves performance for some programs, including 20-50% for some Stream benchmarks.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D99003
2021-06-29 15:34:35 -07:00
Adrian Prantl 21e013303b Improve path remapping in cross-debugging scenarios
This patch implements a slight improvement when debugging across
platforms and remapping source paths that are in a non-native
format. See the unit test for examples.

rdar://79205675

Differential Revision: https://reviews.llvm.org/D104407
2021-06-29 15:27:01 -07:00
Adrian Prantl a0e1b11fac Modernize Module::RemapFile to return an Optional (NFC)
This addresses feedback raised in https://reviews.llvm.org/D104404.

Differential Revision: https://reviews.llvm.org/D104724
2021-06-29 15:19:31 -07:00
Adrian Prantl 302b1b9718 Express PathMappingList::FindFile() in terms of PathMappingList::RemapPath()
NFC.

This patch replaces the function body FindFile() with a call to
RemapPath(), since the two functions implement the same functionality.

Differential Revision: https://reviews.llvm.org/D104406
2021-06-29 15:14:31 -07:00
Adrian Prantl a346372200 Change PathMappingList::FindFile to return an optional result (NFC)
This is an NFC modernization refactoring that replaces the combination
of a bool return + reference argument, with an Optional return value.

Differential Revision: https://reviews.llvm.org/D104405
2021-06-29 15:10:46 -07:00
Aaron Puchert e0b90771c3 Thread safety analysis: Rename parameters of ThreadSafetyAnalyzer::intersectAndWarn (NFC)
In D104261 we made the parameters' meaning slightly more specific, this
changes their names accordingly. In all uses we're building a new lock
set by intersecting existing locksets. The first (modifiable) argument
is the new lock set being built, the second (non-modifiable) argument is
the exit set of a preceding block.

Reviewed By: aaron.ballman, delesley

Differential Revision: https://reviews.llvm.org/D104649
2021-06-29 23:56:52 +02:00
Aaron Puchert f664e2ec37 Thread safety analysis: Always warn when dropping locks on back edges
We allow branches to join where one holds a managed lock but the other
doesn't, but we can't do so for back edges: because there we can't drop
them from the lockset, as we have already analyzed the loop with the
larger lockset. So we can't allow dropping managed locks on back edges.

We move the managed() check from handleRemovalFromIntersection up to
intersectAndWarn, where we additionally check if we're on a back edge if
we're removing from the first lock set (the entry set of the next block)
but not if we're removing from the second lock set (the exit set of the
previous block). Now that the order of arguments matters, I had to swap
them in one invocation, which also causes some minor differences in the
tests.

Reviewed By: delesley

Differential Revision: https://reviews.llvm.org/D104261
2021-06-29 23:56:52 +02:00
Arthur Eubanks cb3580e7ad [OpaquePtr][BitcodeWriter] Handle attributes with types
For example, byval.

Skip the type attribute auto-upgrade if we already have the type.

I've actually seen this error of the ValueEnumerator missing a type
attribute's type in a non-opaque pointer context.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D105138
2021-06-29 14:47:29 -07:00
Nikita Popov b810600a93 [Test] Regenerate test checks (NFC)
Make these follow the update_test_checks.py format.
2021-06-29 23:44:56 +02:00
Matt Arsenault 990278d026 CodeGen: Store LLT instead of uint64_t in MachineMemOperand
GlobalISel is relying on regular MachineMemOperands to track all of
the memory properties of accesses. Just the raw byte size is
insufficent to disambiguate all situations. For example, if we need to
split an unaligned extending load, we need to know the number of bits
in the original source value and can't infer it from the result
type. This is also a problem for extending vector loads.

This does decrease the maximum representable size from the full
uint64_t bytes to a maximum of 16-bits. No in tree testcases hit this,
other than places using UINT64_MAX for unknown sizes. This may be an
issue for G_MEMCPY and co., although they can just use unknown size
for large static sizes. This also has potential for backend abuse by
relying on the type when it really shouldn't be relevant after
selection.

This does not include the necessary MIR printer/parser changes to
represent this.
2021-06-29 17:38:51 -04:00
Matt Arsenault 49fa6abf74 Revert "GlobalISel: Use MMO helper for getting the size in bits"
This reverts commit dc98adfb44.

This should still be done, but this is currently causing some commit
ordering issues.
2021-06-29 17:38:51 -04:00
Akira Hatanaka 8d21d54725 [CodeGen] Stop creating fake FunctionDecls when generating IR for
functions implicitly generated by the compiler

These fake functions would cause clang to crash if the changes proposed
in https://reviews.llvm.org/D98799 were made.
2021-06-29 14:22:33 -07:00