Commit Graph

397574 Commits

Author SHA1 Message Date
Roman Lebedev d4d459e747
[X86] AMD Zen 3: MULX w/ mem operand has the same throughput as with reg op
Exegesis is faulty and sometimes when measuring throughput^-1
produces snippets that have loop-carried dependencies,
which must be what caused me to incorrectly measure it originally.

After looking much more carefully, the inverse throughput should match
that of the MULX w/ reg op.

As per llvm-exegesis measurements.
2021-08-27 13:27:05 +03:00
Roman Lebedev 0f04936a2d
[X86] AMD Zen 3: MULX produces low part of the result in 3cy, +1cy for high part
As per llvm-exegesis measurements.
2021-08-27 13:27:05 +03:00
Roman Lebedev db2c6cd99c
[NFC][X86][MCA] AMD Zen 3: improve MULX test coverage
Latency for MULX isn't right
2021-08-27 13:27:05 +03:00
Yaron Keren 692ebe5395 [docs] Add DIA register instructions to Getting Started with Visual Studio page
Since Visual Studio 2017 the DIA libs are not registered by default, see:
https://docs.microsoft.com/en-us/visualstudio/extensibility/breaking-changes-2017?view=vs-2019#change-reduce-registry-impact
LLDB building instruction already specify registering these DLLs, required
both the LLVM PDB tests and LLDB build.

Differential Revision: https://reviews.llvm.org/D108811
2021-08-27 13:10:19 +03:00
Balazs Benics 6ad47e1c4f [analyzer] Catch leaking stack addresses via stack variables
Not only global variables can hold references to dead stack variables.
Consider this example:

  void write_stack_address_to(char **q) {
    char local;
    *q = &local;
  }

  void test_stack() {
    char *p;
    write_stack_address_to(&p);
  }

The address of 'local' is assigned to 'p', which becomes a dangling
pointer after 'write_stack_address_to()' returns.

The StackAddrEscapeChecker was looking for bindings in the store which
referred to variables of the popped stack frame, but it only considered
global variables in this regard. This patch relaxes this, catching
stack variable bindings as well.

---

This patch also works for temporary objects like:

  struct Bar {
    const int &ref;
    explicit Bar(int y) : ref(y) {
      // Okay.
    } // End of the constructor call, `ref` is dangling now. Warning!
  };

  void test() {
    Bar{33}; // Temporary object, so the corresponding memregion is
             // *not* a VarRegion.
  }

---

The return value optimization aka. copy-elision might kick in but that
is modeled by passing an imaginary CXXThisRegion which refers to the
parent stack frame which is supposed to be the 'return slot'.
Objects residing in the 'return slot' outlive the scope of the inner
call, thus we should expect no warning about them - except if we
explicitly disable copy-elision.

Reviewed By: NoQ, martong

Differential Revision: https://reviews.llvm.org/D107078
2021-08-27 11:31:16 +02:00
Sylvestre Ledru c22bd391bc polly: remove the old reference to svn in the doc 2021-08-27 10:46:50 +02:00
Sylvestre Ledru fe611b1da8 [clang] Move the soname declaration in a variable at the top of the file
Currently, it is a bit buried in the file even if this is
pretty important for distro.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D108533
2021-08-27 09:07:12 +02:00
Chuanqi Xu a52cfb3523 [NFC] [ASTReader] Remove unused variables 2021-08-27 14:00:03 +08:00
LLVM GN Syncbot f8df807653 [gn build] Port b749ef9e22 2021-08-27 04:42:51 +00:00
Lang Hames b749ef9e22 [ORC][ORC-RT] Reapply "Introduce ELF/*nix Platform and runtime..." with fixes.
This reapplies e256445bff, which was reverted in 45ac5f5441 due to bot errors
(e.g. https://lab.llvm.org/buildbot/#/builders/112/builds/8599). The issue that
caused the bot failure was fixed in 2e6a4fce35.
2021-08-27 14:41:58 +10:00
Lang Hames 2e6a4fce35 [ORC][JITLink][ELF] Treat STB_GNU_UNIQUE as Weak in the JIT.
This should fix the bot error in
https://lab.llvm.org/buildbot/#/builders/112/builds/8599
which forced reversion of the ELFNixPlatform in 45ac5f5441.

This should allow us to re-enable the ELFNixPlatform in a follow-up patch.
2021-08-27 14:41:28 +10:00
Matt Arsenault ca4be0f9a1 AMDGPU: Fix hardcoded registers in test 2021-08-26 22:09:31 -04:00
Matt Arsenault a020581f2e AMDGPU/GlobalISel: Add baseline test for new ABI attribute hints 2021-08-26 22:09:11 -04:00
Matt Arsenault 04ce2de330 AMDGPU: Remove implicit argument attributes when introducing new calls
In a future patch, a new set of amdgpu-no-* attributes will be
introduced to indicate when a function does not need an implicitly
passed input. This pass introduces new instances of these intrinsic
calls, and should remove the attributes if they were present before.
2021-08-26 22:08:04 -04:00
Matt Arsenault a74278f21f AMDGPU: Fix broken test 2021-08-26 22:08:04 -04:00
Chen Zheng 324bd467a2 [PowerPC][ELF] make sure local variable space does not overlap with parameter save area
Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D105271
2021-08-27 01:58:41 +00:00
Matt Arsenault 088cc63640 AMDGPU: Invert AMDGPUAttributor
Switch to using BitIntegerState for each of the inputs, and invert
their meanings.

This now diverges more from the old AMDGPUAnnotateKernelFeatures, but
this isn't used yet anyway.
2021-08-26 21:32:13 -04:00
Matt Arsenault 0150597c67 AMDGPU: Fix broken check lines 2021-08-26 21:30:06 -04:00
Matt Arsenault 3fdcd9bb13 GlobalISel: Add CallBase to CallLoweringInfo
The DAG version has this, and is necessary for call lowering to take
advantage of any attributes at the call site.
2021-08-26 21:09:11 -04:00
Matt Arsenault 46d82e7357 AMDGPU: Restrict attributor transforms
We only really want this to add the custom attributes. Theoretically
the regular transforms were already run at this point. Touching
undefined behavior breaks a lot of tests when this is enabled by
default, many of which are expecting to test handling of undef
operations.
2021-08-26 21:08:51 -04:00
George Rokos 3819aae6dd [libomptarget][NFC] Replaced obsolete name "getOrAllocTgtPtr" with new "getTargetPointer" in debug messages. 2021-08-26 18:01:18 -07:00
Matt Arsenault cf32d61a05 AMDGPU: Remove hacky attribute deduction from AMDGPUAttributor
amdgpu-calls and amdgpu-stack-objects don't really belong as
attributes, and are currently a hacky way of passing an analysis into
the DAG. These don't really belong in the IR, and don't really fit in
with the other attributes. Remove these to facilitate inverting the
pass.

I don't exactly understand the indirect call test changes. These tests
are using calls which are trivially replacable with a direct call, so
I'm not sure what the point is.
2021-08-26 20:31:14 -04:00
Matt Arsenault 98d7aa435f AMDGPU: Stop inferring use of llvm.amdgcn.kernarg.segment.ptr
We no longer use this intrinsic outside of the backend and no longer
support using it outside of kernels.
2021-08-26 20:30:03 -04:00
Heejin Ahn f5cff292e2 [WebAssembly] Fix PHI when relaying longjmps
When doing Emscritpen EH, if SjLj is also enabled and used and if the
thrown exception has a possiblity being a longjmp instead of an
exception, we shouldn't swallow it; we should rethrow, or relay it. It
was done in D106525 and the code is here:
8441a8eea8/llvm/lib/Target/WebAssembly/WebAssemblyLowerEmscriptenEHSjLj.cpp (L858-L898)

Here is the pseudocode of that part: (copied from comments)
```
if (%__THREW__.val == 0 || %__THREW__.val == 1)
  goto %tail
else
  goto %longjmp.rethrow

longjmp.rethrow: ;; This is longjmp. Rethrow it
  %__threwValue.val = __threwValue
  emscripten_longjmp(%__THREW__.val, %__threwValue.val);

tail: ;; Nothing happened or an exception is thrown
  ... Continue exception handling ...
```

If the current BB (where the `invoke` is created) has successors that
has the current BB as its PHI incoming node, now that has to change to
`tail` in the pseudocode, because `tail` is the latest BB that is
connected with the next BB, but this was missing.

Reviewed By: tlively

Differential Revision: https://reviews.llvm.org/D108785
2021-08-26 17:25:26 -07:00
David Blaikie 3784fc493e Remove set-but-unused variable 2021-08-26 16:58:47 -07:00
Vitaly Buka f1bb30a495 [sanitizer] No THREADLOCAL in qsort and bsearch
qsort can reuse qsort_r if available.
bsearch always passes key as the first comparator argument, so we
can use it to wrap the original comparator.

Differential Revision: https://reviews.llvm.org/D108751
2021-08-26 16:55:06 -07:00
Matt Arsenault 04da89e652 AMDGPU: Remove unnecessary -NEXT checks
This avoids spuriously breaking the test in a future change
2021-08-26 19:37:54 -04:00
Matt Arsenault cab0ec5c45 AMDGPU: Fix amdgpu_gfx calling convention usage in test
This was calling a regular C function from amdgpu_gfx, which isn't
defined to have all of the necessary implicit arguments.
2021-08-26 19:37:54 -04:00
Jez Ng c74eb05f21 [lld-macho][nfc] Clean up InputSection constructors 2021-08-26 19:07:48 -04:00
Artem Belevich 5c24a1e1db [CUDA] update constraints on NVPTX builtins to include PTX73 and 74. 2021-08-26 16:01:57 -07:00
Matt Arsenault ce51c5d4a9 AMDGPU: Fix crashing on kernel declarations when lowering LDS
This was trying to insert the used marker into a declaration.
2021-08-26 19:01:10 -04:00
Jez Ng 9b5148d426 [lld-macho] Have -ObjC load archive members before symbol resolution
This is what ld64 does. Deviating in behavior here can result
in some subtle duplicate symbol errors, as detailed in the objc.s test.

Differential Revision: https://reviews.llvm.org/D108781
2021-08-26 18:52:07 -04:00
Jez Ng 9065fe5591 [lld-macho] Refactor archive loading
The previous logic was duplicated between symbol-initiated
archive loads versus flag-initiated loads (i.e. `-force_load` and
`-ObjC`). This resulted in code duplication as well as redundant work --
we would create Archive instances twice whenever we had one of those
flags; once in `getArchiveMembers` and again when we constructed the
ArchiveFile.

This was motivated by an upcoming diff where we load archive members
containing ObjC-related symbols before loading those containing
ObjC-related sections, as well as before performing symbol resolution.
Without this refactor, it would be difficult to do that while avoiding
loading the same archive member twice.

Differential Revision: https://reviews.llvm.org/D108780
2021-08-26 18:52:07 -04:00
Jez Ng 2179930868 [lld-macho] Fix unwind info personality size
This was missed by {D107035}. This fix addresses the following warning:

  loop variable 'personality' has type 'const uint32_t &' (aka 'const unsigned int &') but is initialized with type 'const unsigned long long' resulting in a copy [-Wrange-loop-analysis]

In addition to fixing the size, I also removed the const reference,
since there's no performance benefit to avoiding copies of integer-sized
values.
2021-08-26 18:52:06 -04:00
Butygin 1e35a7690d [mlir][spirv] Initial support for 64 bit index type and builtins
Differential Revision: https://reviews.llvm.org/D108516
2021-08-27 01:38:53 +03:00
Benson Chu 7bd92f5911 [AST] Pick last tentative definition as the acting definition
Clang currently picks the second tentative definition when
VarDecl::getActingDefinition is called.

This can lead to attributes being dropped if they are attached to
tentative definitions that appear after the second one. This is
because VarDecl::getActingDefinition loops through VarDecl::redecls
assuming that the last tentative definition is the last element in the
iterator. However, it is the second element that would be the last
tentative definition.

This changeset modifies getActingDefinition to iterate through the
declaration chain in reverse, so that it can immediately return when
it encounters a tentative definition.

Originally the unit test for this changeset did not have a -triple
flag for the clang invocation, leading to this test being broken on
MacOS, since Mach-O does not support the section attribute.

Differential Revision: https://reviews.llvm.org/D99732
2021-08-26 16:49:54 -05:00
Arthur Eubanks 6eed1fb349 [clang][NewPM] Mention that legacy PM flags are deprecated
Differential Revision: https://reviews.llvm.org/D108789
2021-08-26 14:42:55 -07:00
Yonghong Song 82d9cb34a2 [DebugInfo] convert btf_tag attrs to DI annotations for func parameters
Generate btf_tag annotations for DILocalVariable. The annotations
are represented as an DINodeArray in DebugInfo.

Differential Revision: https://reviews.llvm.org/D106620
2021-08-26 14:27:58 -07:00
Fangrui Song a42bd1b560 [CMake] Change -DENABLE_EXPERIMENTAL_NEW_PASS_MANAGER=off to -DLLVM_ENABLE_NEW_PASS_MANAGER=off
LLVM_ENABLE_NEW_PASS_MANAGER is set to ENABLE_EXPERIMENTAL_NEW_PASS_MANAGER, so
-DLLVM_ENABLE_NEW_PASS_MANAGER=off has no effect.

Change the cache variable to LLVM_ENABLE_NEW_PASS_MANAGER instead.
A user opting out the new PM needs to switch from
-DENABLE_EXPERIMENTAL_NEW_PASS_MANAGER=off to
-DLLVM_ENABLE_NEW_PASS_MANAGER=off.

Also give a warning that -DLLVM_ENABLE_NEW_PASS_MANAGER=off is deprecated.

Reviewed By: aeubanks, phosek

Differential Revision: https://reviews.llvm.org/D108775
2021-08-26 14:25:31 -07:00
Yonghong Song 1bebc31c61 [DebugInfo] generate btf_tag annotations for func parameters
Generate btf_tag annotations for function parameters.
A field "annotations" is introduced to DILocalVariable, and
annotations are represented as an DINodeArray, similar to
DIComposite elements. The following example illustrates how
annotations are encoded in IR:
    distinct !DILocalVariable(name: "info",, arg: 1, ..., annotations: !10)
    !10 = !{!11, !12}
    !11 = !{!"btf_tag", !"a"}
    !12 = !{!"btf_tag", !"b"}

Differential Revision: https://reviews.llvm.org/D106620
2021-08-26 14:18:30 -07:00
Artem Dergachev 7309359928 [analyzer] Fix scan-build report deduplication.
The previous behavior was to deduplicate reports based on md5 of the
html file. This algorithm might have worked originally but right now
HTML reports contain information rich enough to make them virtually
always distinct which breaks deduplication entirely.

The new strategy is to (finally) take advantage of IssueHash - the
stable report identifier provided by clang that is the same if and only if
the reports are duplicates of each other.

Additionally, scan-build no longer performs deduplication on its own.
Instead, the report file name is now based on the issue hash,
and clang instances will silently refuse to produce a new html file
when a duplicate already exists. This eliminates the problem entirely.

The '-analyzer-config stable-report-filename' option is deprecated
because report filenames are no longer unstable. A new option is
introduced, '-analyzer-config verbose-report-filename', to produce
verbose file names that look similar to the old "stable" file names.
The old option acts as an alias to the new option.

Differential Revision: https://reviews.llvm.org/D105167
2021-08-26 13:34:29 -07:00
Kirill Stoimenov a3f4139626 [asan] Implemented flag to emit intrinsics to optimize ASan callbacks.
Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D108377
2021-08-26 20:33:57 +00:00
Kirill Stoimenov 2e83a0efb9 [asan] Fixed a runtime crash.
Looks like the NoRegister has some effect on the final code that is generated. My guess is that some optimization kicks in at the end?

When I use -S to dump the assembly I get the correct version with 'shrq    $3, %r8':
        movq    %r9, %r8
        shrq    $3, %r8
        movsbl  2147450880(%r8), %r8d

But, when I disassemble the final binary I get RAX in stead of R8:
        mov    %r9,%r8
        shr    $0x3,%rax
        movsbl 0x7fff8000(%r8),%r8d

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D108745
2021-08-26 20:30:25 +00:00
Rob Suderman 90478251c7 [mlir][tosa] Tosa reverse to linalg supporting dynamic shapes
Needed to switch to extract to support tosa.reverse using dynamic shapes.

Reviewed By: NatashaKnk

Differential Revision: https://reviews.llvm.org/D108744
2021-08-26 13:23:59 -07:00
Alexey Bataev 84cbd71c95 [SLP]Improve graph reordering.
Reworked reordering algorithm. Originally, the compiler just tried to
detect the most common order in the reordarable nodes (loads, stores,
extractelements,extractvalues) and then fully rebuilding the graph in
the best order. This was not effecient, since it required an extra
memory and time for building/rebuilding tree, double the use of the
scheduling budget, which could lead to missing vectorization due to
exausted scheduling resources.

Patch provide 2-way approach for graph reodering problem. At first, all
reordering is done in-place, it doe not required tree
deleting/rebuilding, it just rotates the scalars/orders/reuses masks in
the graph node.

The first step (top-to bottom) rotates the whole graph, similarly to the previous
implementation. Compiler counts the number of the most used orders of
the graph nodes with the same vectorization factor and then rotates the
subgraph with the given vectorization factor to the most used order, if
it is not empty. Then repeats the same procedure for the subgraphs with
the smaller vectorization factor. We can do this because we still need
to reshuffle smaller subgraph when buildiong operands for the graph
nodes with lasrger vectorization factor, we can rotate just subgraph,
not the whole graph.

The second step (bottom-to-top) scans through the leaves and tries to
detect the users of the leaves which can be reordered. If the leaves can
be reorder in the best fashion, they are reordered and their user too.
It allows to remove double shuffles to the same ordering of the operands in
many cases and just reorder the user operations instead. Plus, it moves
the final shuffles closer to the top of the graph and in many cases
allows to remove extra shuffle because the same procedure is repeated
again and we can again merge some reordering masks and reorder user nodes
instead of the operands.

Also, patch improves cost model for gathering of loads, which improves
x264 benchmark in some cases.

Gives about +2% on AVX512 + LTO (more expected for AVX/AVX2) for {625,525}x264,
+3% for 508.namd, improves most of other benchmarks.
The compile and link time are almost the same, though in some cases it
should be better (we're not doing an extra instruction scheduling
anymore) + we may vectorize more code for the large basic blocks again
because of saving scheduling budget.

Differential Revision: https://reviews.llvm.org/D105020
2021-08-26 12:31:18 -07:00
Nikita Popov 8441a8eea8 [MergeICmps] Add test for call before first load (NFC)
If a clobbering call happens before all loads, that shouldn't
block the transform.
2021-08-26 21:24:22 +02:00
Arthur Eubanks 14d45e41bf [test] Update precommit tests for D108734 2021-08-26 12:05:56 -07:00
Vitaly Buka 96fa1eaae4 [sanitizer] Add basic qsort test 2021-08-26 12:03:26 -07:00
Jon Chesterfield 3d85342982 [libomptarget][amdgpu][nfc] Rename variables, delete dead code 2021-08-26 19:58:38 +01:00
Andrea Di Biagio 44a13f33be Revert "[MCA][NFC] Remove redundant calls to std::move."
This reverts commit 9cc0023fb8.
due to buildbot failures.
2021-08-26 19:53:17 +01:00