Commit Graph

420879 Commits

Author SHA1 Message Date
Matt Arsenault c986d476cd AMDGPU: Update reqd-work-group-size optimization for umin intrinsic
This code was pattern matching the ID computation expression as it
appears in the library. This was a compare and select, but now that
umin is canonical, we were no longer matching. Update to match the
intrinsic instead.
2022-04-12 20:03:02 -04:00
Muhammad Omair Javaid 42ebfa8269 Revert "[AArch64] Set maximum VF with shouldMaximizeVectorBandwidth"
This reverts commit 64b6192e81.

This broke LLVM AArch64 buildbot clang-aarch64-sve-vls-2stage:

https://lab.llvm.org/buildbot/#/builders/176/builds/1515

llvm-tblgen crashes after applying this patch.
2022-04-13 04:53:07 +05:00
Arthur Eubanks 32f3633171 [test][DSE] Precommit test 2022-04-12 16:21:04 -07:00
Matt Arsenault d4b1be20f6 RegAllocGreedy: Fix illegal eviction assert for urgent evictions
The condition in canEvictInterferenceBasedOnCost is slightly different
from the assertion in evictInteference.
canEvictInterferenceBasedOnCost uses a <= check for the cascade number
for legality, but the assert was checking for <. For equal cascade
numbers for an urgent eviction, canEvictInterferenceBasedOnCost could
return success. The actual eviction would then hit this assert. Avoid
ever returning true for equivalent cascade numbers.

The resulting failed allocation seems a bit off to me. e.g. in
illegal-eviction-assert.mir, I wuold assume %0 gets allocated starting
at $vgpr0. That was its initial allocation choice, but was later
evicted. In this example no evictions can help improve anything.
2022-04-12 19:16:56 -04:00
Stanislav Mekhanoshin f6462a26f0 [AMDGPU] Split unaligned 4 DWORD DS operations
Similarly to 3 DWORD operations it is better for performance
to split unlaligned operations as long a these are at least
DWORD alignmened. Performance data:

```
Using platform: AMD Accelerated Parallel Processing
Using device: gfx900:xnack-

ds_write_b128                      aligned by 16:  4.9 sec
ds_write2_b64                      aligned by 16:  5.1 sec
ds_write2_b32 * 2                  aligned by 16:  5.5 sec
ds_write_b128                      aligned by  1:  8.1 sec
ds_write2_b64                      aligned by  1:  8.7 sec
ds_write2_b32 * 2                  aligned by  1: 14.0 sec
ds_write_b128                      aligned by  2:  8.1 sec
ds_write2_b64                      aligned by  2:  8.7 sec
ds_write2_b32 * 2                  aligned by  2: 14.0 sec
ds_write_b128                      aligned by  4:  5.6 sec
ds_write2_b64                      aligned by  4:  8.7 sec
ds_write2_b32 * 2                  aligned by  4:  5.6 sec
ds_write_b128                      aligned by  8:  5.6 sec
ds_write2_b64                      aligned by  8:  5.1 sec
ds_write2_b32 * 2                  aligned by  8:  5.6 sec
ds_read_b128                       aligned by 16:  3.8 sec
ds_read2_b64                       aligned by 16:  3.8 sec
ds_read2_b32 * 2                   aligned by 16:  4.0 sec
ds_read_b128                       aligned by  1:  4.6 sec
ds_read2_b64                       aligned by  1:  8.1 sec
ds_read2_b32 * 2                   aligned by  1: 14.0 sec
ds_read_b128                       aligned by  2:  4.6 sec
ds_read2_b64                       aligned by  2:  8.1 sec
ds_read2_b32 * 2                   aligned by  2: 14.0 sec
ds_read_b128                       aligned by  4:  4.6 sec
ds_read2_b64                       aligned by  4:  8.1 sec
ds_read2_b32 * 2                   aligned by  4:  4.0 sec
ds_read_b128                       aligned by  8:  4.6 sec
ds_read2_b64                       aligned by  8:  3.8 sec
ds_read2_b32 * 2                   aligned by  8:  4.0 sec

Using platform: AMD Accelerated Parallel Processing
Using device: gfx1030

ds_write_b128                      aligned by 16:  6.2 sec
ds_write2_b64                      aligned by 16:  7.1 sec
ds_write2_b32 * 2                  aligned by 16:  7.6 sec
ds_write_b128                      aligned by  1: 24.1 sec
ds_write2_b64                      aligned by  1: 25.2 sec
ds_write2_b32 * 2                  aligned by  1: 43.7 sec
ds_write_b128                      aligned by  2: 24.1 sec
ds_write2_b64                      aligned by  2: 25.1 sec
ds_write2_b32 * 2                  aligned by  2: 43.7 sec
ds_write_b128                      aligned by  4: 14.4 sec
ds_write2_b64                      aligned by  4: 25.1 sec
ds_write2_b32 * 2                  aligned by  4:  7.6 sec
ds_write_b128                      aligned by  8: 14.4 sec
ds_write2_b64                      aligned by  8:  7.1 sec
ds_write2_b32 * 2                  aligned by  8:  7.6 sec
ds_read_b128                       aligned by 16:  6.2 sec
ds_read2_b64                       aligned by 16:  6.3 sec
ds_read2_b32 * 2                   aligned by 16:  7.5 sec
ds_read_b128                       aligned by  1: 12.5 sec
ds_read2_b64                       aligned by  1: 24.0 sec
ds_read2_b32 * 2                   aligned by  1: 43.6 sec
ds_read_b128                       aligned by  2: 12.5 sec
ds_read2_b64                       aligned by  2: 24.0 sec
ds_read2_b32 * 2                   aligned by  2: 43.6 sec
ds_read_b128                       aligned by  4: 12.5 sec
ds_read2_b64                       aligned by  4: 24.0 sec
ds_read2_b32 * 2                   aligned by  4:  7.5 sec
ds_read_b128                       aligned by  8: 12.5 sec
ds_read2_b64                       aligned by  8:  6.3 sec
ds_read2_b32 * 2                   aligned by  8:  7.5 sec
```

Differential Revision: https://reviews.llvm.org/D123634
2022-04-12 16:07:13 -07:00
Lang Hames db8469c4d7 [docs][ORC] Fix RST error in dfffb7df24. 2022-04-12 16:06:11 -07:00
owenca 0cde8bdb0b Revert "[clang-format] Allow empty .clang-format file"
This reverts commit 4e814a6f2d.
2022-04-12 16:05:39 -07:00
Matt Arsenault eefed1dbf0 RegAllocGreedy: Roll back successful recolorings on failure
This is a replacement for the original fix attempted in
c46aab01c0.

This fixes "overlapping insert" assertion failures when trying to
unwind an unsuccessful recoloring attempt.

The problem would occur when there are multiple recoloring candidates
which recursively required recoloring. If one recoloring candidate was
successfully recolored at one level, and the next recoloring candidate
was unsuccessful, we would not roll back the first candidates
successful recoloring. The forgotten successful recoloring may have
been assigned to something that conflicts with a register that needs
to be restored in a parent recoloring attempt.

See the testcase added in issue48473 for a more concrete example with
explanation.
2022-04-12 19:02:48 -04:00
Lang Hames dfffb7df24 [docs] Update OrcV2 doc to include some notes on code removal. 2022-04-12 15:58:25 -07:00
owenca 4e814a6f2d [clang-format] Allow empty .clang-format file
Differential Revision: https://reviews.llvm.org/D123535
2022-04-12 15:45:22 -07:00
Yuanfang Chen 81b51b61f8 Fix libcxx build after cd0a5889d7 2022-04-12 15:42:56 -07:00
Arthur Eubanks 51561b5e80 [ArgPromo][OpaquePointer] Don't promote mismatched function types
Mismatched call/callee function types is considered an indirect call.

Fixes crash in https://reviews.llvm.org/D123300#3446023.
2022-04-12 15:17:45 -07:00
Lang Hames ebdc60a232 [examples][ORC] Add a new example showing the ORCv2 removable code APIs. 2022-04-12 15:05:07 -07:00
Nikita Popov 0adadfa68f [MSan] Ensure argument shadow initialized on memcpy
We need to explicitly query the shadow here, because it is lazily
initialized for byval arguments. Without opaque pointers this used to
mostly work out, because there would be a bitcast to `i8*` present, and
that would query, and copy in case of byval, the argument shadow.

Reviewed By: vitalybuka, eugenis

Differential Revision: https://reviews.llvm.org/D123602
2022-04-12 14:53:02 -07:00
Vitaly Buka efdc90baaa Revert "[MSan] Ensure argument shadow initialized on memcpy"
Invalid author.

This reverts commit 163a9f4552.
2022-04-12 14:53:02 -07:00
Yuanfang Chen cd0a5889d7 [Reland][lit] Use sharding for GoogleTest format
This helps lit unit test performance by a lot, especially on windows. The performance gain comes from launching one gtest executable for many subtests instead of one (this is the current situation).

The shards are executed by the test runner and the results are stored in the
json format supported by the GoogleTest. Later in the test reporting stage,
all test results in the json file are retrieved to continue the test results
summary etc.

On my Win10 desktop, before this patch: `check-clang-unit`: 177s, `check-llvm-unit`: 38s; after this patch: `check-clang-unit`: 37s, `check-llvm-unit`: 11s.
On my Linux machine, before this patch: `check-clang-unit`: 46s, `check-llvm-unit`: 8s; after this patch: `check-clang-unit`: 7s, `check-llvm-unit`: 4s.

Reviewed By: yln, rnk, abrachet

Differential Revision: https://reviews.llvm.org/D122251
2022-04-12 14:51:12 -07:00
Vitaly Buka 163a9f4552 [MSan] Ensure argument shadow initialized on memcpy
We need to explicitly query the shadow here, because it is lazily
initialized for byval arguments. Without opaque pointers this used to
mostly work out, because there would be a bitcast to `i8*` present, and
that would query, and copy in case of byval, the argument shadow.

Reviewed By: vitalybuka, eugenis

Differential Revision: https://reviews.llvm.org/D123602
2022-04-12 14:49:52 -07:00
Johannes Doerfert 9dc7da3f9c [GlobalsModRef][FIX] Ensure we honor synchronizing effects of intrinsics
This is a long standing problem that resurfaces once in a while [0].
There might actually be two problems because I'm not 100% sure if the
issue underlying https://reviews.llvm.org/D115302 would be solved by
this or not. Anyway.

In 2008 we thought intrinsics do not read/write globals passed to them:
d4133ac315
This is not correct given that intrinsics can synchronize threads and
cause effects to effectively become visible.

NOTE: I did not yet modify any tests but only tried out the reproducer
      of https://github.com/llvm/llvm-project/issues/54851.

Fixes: https://github.com/llvm/llvm-project/issues/54851

[0] https://discourse.llvm.org/t/bug-gvn-memdep-bug-in-the-presence-of-intrinsics/59402

Differential Revision: https://reviews.llvm.org/D123531
2022-04-12 16:42:50 -05:00
Johannes Doerfert 0f070bee82 [NVPTX][FIX] Allow __nvvm_reflect in the presence of opaque pointers
Differential Revision: https://reviews.llvm.org/D123522
2022-04-12 16:42:50 -05:00
Johannes Doerfert a3a42c3ca2 [OpenMP][FIX] Ensure to set the context for wait events if necessary
Differential Revision: https://reviews.llvm.org/D123445
2022-04-12 16:42:50 -05:00
Matt Arsenault 788f94f731 AMDGPU: Don't use unreachable on stores to unhandled address space
For stores to constant address space, this will now consistently hit a
selection error instead of hitting unreachable in an asserts build.

I'm not sure what we should really do here. We could either just
codegen as if it were global, delete the instruction, or declare the
IR invalid (we really should have a target IR verifier to enforce it).
2022-04-12 17:31:50 -04:00
owenca c80eaa919f Revert "[clang-format] Allow empty .clang-format file"
This reverts commit 6eafda0ef0.
2022-04-12 14:28:02 -07:00
owenca 6eafda0ef0 [clang-format] Allow empty .clang-format file
Differential Revision: https://reviews.llvm.org/D123535
2022-04-12 13:54:12 -07:00
Matt Arsenault 3754f60112 GlobalISel: Implement MoreElements for select of vector conditions 2022-04-12 16:54:04 -04:00
Matt Arsenault 6009122250 AArch64/GlobalISel: Remove pointless s1 legalize rules
These have no net effect on the legalize rules.
2022-04-12 16:54:04 -04:00
Matt Arsenault 3f2cc7cc2b GlobalISel: Fix lowerSelect handling of boolean high bits
This was making several invalid assumptions about the incoming
select. First, it was assuming the incoming condition was either s1 or
already sign extended, not accounting for different boolean high bits
behavior between scalar and vector conditions. We only had a vector
boolean due to the intermediate step vector select, which is now
avoided.

Second, it was assuming it can use the result vector type as a boolean
mask. These types don't have anything to do with other, and only makes
sense in the context of the expansion to bit operations. Since these
logically are part of the same lowering, do the complete expansion in
a single step.

The added select_v4s1_s1 test does fail to legalize, since it seems
AArch64's vector legalization support is pretty incomplete.
2022-04-12 16:54:03 -04:00
Matt Arsenault 0e489926be GlobalISel: Handle widening addo/subo booleans
This will be tested in a future patch
2022-04-12 16:54:03 -04:00
Matt Arsenault 95c2bcbf8b GlobalISel: Handle widening umulo/smulo condition outputs 2022-04-12 16:54:03 -04:00
Matt Arsenault abe171df06 GlobalISel: Update mutationIsSane assert for scalable vectors 2022-04-12 16:54:03 -04:00
Matt Arsenault 120c5115b8 Mips/GlobalISel: Add test for atomic load 2022-04-12 16:54:03 -04:00
Craig Topper 057c063c9b [RISCV] Add a encodeLMUL function to RISCVVType. NFC
This moves the encoding handling out of the assembly parser.

Reviewed By: khchen, frasercrmck

Differential Revision: https://reviews.llvm.org/D123553
2022-04-12 13:39:47 -07:00
Quinn Pham 7d7022fb0c [PowerPC] Fix EmitPPCBuiltinExpr to emit arguments once
This patch changes `EmitPPCBuiltinExpr` in `CGBuiltin.cpp` to remove
the loop at the beginning of the function that emits the arguments and
to delay emitting the arguments until inside the switch statement. These
changes will put `EmitPPCBuiltinExpr` in line with the strategy of the
target independent function `EmitBuiltinExpr`. Also, this patch
ensures that arguments are only emitted once.

Tests that included builtins affected by these changes have been
modified to match expected behaviour.

Reviewed By: #powerpc, nemanjai, amyk

Differential Revision: https://reviews.llvm.org/D121637
2022-04-12 15:33:20 -05:00
Fangrui Song d10c091683 lit.cfg.py: remove obsoleted feature clang-driver 2022-04-12 13:31:07 -07:00
Fangrui Song 63fbc77121 [Driver][test] Remove unused/obsoleted REQUIRES: clang-driver
It (introduced by 556d713c70) appears to be
related to the removed dragonegg project. In addition, the feature was a bit
misnamed and may lur users to unnecessarily use it.
2022-04-12 13:29:46 -07:00
Walter Erquinigo 44103c96fa [trace][intelpt] Remove code smell when printing the raw trace size
Something ugly I did was to report the trace buffer size to the DecodedThread,
which is later used as part of the `dump info` command. Instead of doing that,
we can just directly ask the trace for the raw buffer and print its size.

I thought about not asking for the entire trace but instead just for its size,
but in this case, as our traces as not extremely big, I prefer to ask for the
entire trace, ensuring it could be fetched, and then print its size.

Differential Revision: https://reviews.llvm.org/D123358
2022-04-12 13:08:03 -07:00
Walter Erquinigo bdf3e7e5b8 [trace][intelpt] Add task timer classes
I'm adding two new classes that can be used to measure the duration of long
tasks as process and thread level, e.g. decoding, fetching data from
lldb-server, etc. In this first patch, I'm using it to measure the time it takes
to decode each thread, which is printed out with the `dump info` command. In a
later patch I'll start adding process-level tasks and I might move these
classes to the upper Trace level, instead of having them in the intel-pt
plugin. I might need to do that anyway in the future when we have to
measure HTR. For now, I want to keep the impact of this change minimal.

With it, I was able to generate the following info of a very big trace:

```
(lldb) thread trace dump info                                                                                                            Trace technology: intel-pt

thread #1: tid = 616081
  Total number of instructions: 9729366

  Memory usage:
    Raw trace size: 1024 KiB
    Total approximate memory usage (excluding raw trace): 123517.34 KiB
    Average memory usage per instruction (excluding raw trace): 13.00 bytes

  Timing:
    Decoding instructions: 1.62s

  Errors:
    Number of TSC decoding errors: 0
```

As seen above, it took 1.62 seconds to decode 9.7M instructions. This is great
news, as we don't need to do any optimization work in this area.

Differential Revision: https://reviews.llvm.org/D123357
2022-04-12 13:08:03 -07:00
Fangrui Song 9f526057d6 [ubsan][test] Unsupport Android for new test diag-stacktrace.cpp
https://reviews.llvm.org/D123562#3446485 reported that the test failed
on arm-linux-android.
2022-04-12 12:55:44 -07:00
Daniel Grumberg 7443a504bf [clang][extract-api] Add support for true anonymous enums
Anonymous enums without a typedef should have a "(anonymous)" identifier.

Differential Revision: https://reviews.llvm.org/D123533
2022-04-12 20:42:17 +01:00
Changpeng Fang 8edaf25986 AMDGPU: Emit metadata for the hidden_multigrid_sync_arg conditionally
Summary:
  Introduce a new function attribute, amdgpu-no-multigrid-sync-arg, which is default.
We use implicitarg_ptr + offset to check whether the multigrid synchronization
pointer is used. If yes, we remove this attribute and also remove
amdgpu-no-implicitarg-ptr. We generate metadata for the hidden_multigrid_sync_arg
only when the amdgpu-no-multigrid-sync-arg attribute is removed from the function.

Reviewers: arsenm, sameerds, b-sumner and foad

Differential Revision: https://reviews.llvm.org/D123548
2022-04-12 12:36:30 -07:00
Stanislav Mekhanoshin 65b8a43243 [AMDGPU] Update ds-alignment.ll test checks. NFC. 2022-04-12 12:06:02 -07:00
Aart Bik 28063a281b [mlir][sparse] refactored python setup of sparse compiler
Reviewed By: bixia

Differential Revision: https://reviews.llvm.org/D123419
2022-04-12 11:58:41 -07:00
Mahesh Ravishankar b40e901333 [mlir][Linalg] Allow collapsing subset of the reassociations when fusing by collapsing.
This change generalizes the fusion of `tensor.expand_shape` ->
`linalg.generic` op by collapsing to handle cases where only a subset
of the reassociations specified in the `tensor.expand_shape` are valid
to be collapsed.
The method that does the collapsing is refactored to allow it to be a
generic utility when required.

Reviewed By: gysit

Differential Revision: https://reviews.llvm.org/D123153
2022-04-12 18:56:32 +00:00
Simon Pilgrim f061c1050b [SLP][X86] Add ray_sphere intersection methods from c-ray benchmark
We're failing to vectorize several comparison reduction patterns.

Issue #43090 was based off this, but while that simplified test case is now folding, the original still fails due to poor cost model values for vXi1 extractions
2022-04-12 19:51:27 +01:00
Jonas Devlieghere a66ff2316e
[lldb] Re-enable fixed on-device tests
These tests were fixed by 833882b327.
2022-04-12 11:39:25 -07:00
Nick Desaulniers 23ec5782c3 [Bitcode] materialize Functions early when BlockAddress taken
IRLinker builds a work list of functions to materialize, then moves them
from a source module to a destination module one at a time.

This is a problem for blockaddress Constants, since they need not refer
to the function they are used in; IPSCCP is quite good at sinking these
constants deep into other functions when passed as arguments.

This would lead to curious errors during LTO:
  ld.lld: error: Never resolved function from blockaddress ...
based on the ordering of function definitions in IR.

The problem was that IRLinker would basically do:

  for function f in worklist:
    materialize f
    splice f from source module to destination module

in one pass, with Functions being lazily added to the running worklist.
This confuses BitcodeReader, which cannot disambiguate whether a
blockaddress is referring to a function which has not yet been parsed
("materialized") or is simply empty because its body was spliced out.
This causes BitcodeReader to insert Functions into its BasicBlockFwdRefs
list incorrectly, as it will never re-materialize an already
materialized (but spliced out) function.

Because of the possibility that blockaddress Constants may appear in
Functions other than the ones they reference, this patch adds a new
bitcode function code FUNC_CODE_BLOCKADDR_USERS that is a simple list of
Functions that contain BlockAddress Constants that refer back to this
Function, rather then the Function they are scoped in. We then
materialize those functions when materializing `f` from the example loop
above. This might over-materialize Functions should the user of
BitcodeReader ultimately decide not to link those Functions, but we can
at least now we can avoid this ordering related issue with blockaddresses.

Fixes: https://github.com/llvm/llvm-project/issues/52787
Fixes: https://github.com/ClangBuiltLinux/linux/issues/1215

Reviewed By: dexonsmith

Differential Revision: https://reviews.llvm.org/D120781
2022-04-12 11:38:35 -07:00
Shraiysh Vaishay b18e82186f [mlir][OpenMP] Added omp.task
This patch adds tasking construct according to Section 2.10.1 of OpenMP 5.0

Reviewed By: peixin, kiranchandramohan, abidmalikwaterloo

Differential Revision: https://reviews.llvm.org/D123575
2022-04-12 23:55:47 +05:30
Fangrui Song fdd424e37a [ubsan] Fix print_stacktrace=1:fast_unwind_on_fatal=0 to correctly fallback to fast unwinder
ubsan_GetStackTrace (from 52b751088b) called by
~ScopeReport leaves top/bottom zeroes in the
`!WillUseFastUnwind(request_fast_unwind)` code path.
When BufferedStackTrace::Unwind falls back to UnwindFast,
`if (stack_top < 4096) return;` will return early, leaving just one frame in the stack trace.

Fix this by always initializing top/bottom like 261d6e05d5.

Reviewed By: eugenis, yln

Differential Revision: https://reviews.llvm.org/D123562
2022-04-12 11:24:19 -07:00
Martin Sebor deadda749a [InstCombine] Add more memrchr tests (NFC). 2022-04-12 11:55:33 -06:00
Jonathan Peyton d49ce7c356 [OpenMP][libomp] Replace global variable references with local object
Remove references to global __kmp_topology within a kmp_topology_t
object method. There should just be implicit references to the
private object.
2022-04-12 12:50:41 -05:00
Arthur Eubanks 9faab435a3 [docs] Mention that we are in the process of removing the legacy PM for the optimization pipeline
And remove references to flags to turn it off.

Reviewed By: nikic, MaskRay

Differential Revision: https://reviews.llvm.org/D123547
2022-04-12 10:47:58 -07:00