Commit Graph

400158 Commits

Author SHA1 Message Date
Roman Lebedev 70c90cc5bd
[X86][Costmodel] Load/store i16 Stride=2 VF=8 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/e5YE99a4P - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: =2.0`
So pick cost of `6`.

For store we have:
https://godbolt.org/z/3vM4KsE1n - for intels `Block RThroughput: =3.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `3`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110505
2021-09-27 14:18:29 +03:00
Roman Lebedev 49e532aa52
[X86][Costmodel] Load/store i16 Stride=2 VF=4 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/1j3nf3dro - for intels `Block RThroughput: =2.0`; for ryzens, `Block RThroughput: <=1.0`
So pick cost of `2`.

For store we have:
https://godbolt.org/z/4n1zvP37j - for intels `Block RThroughput: =1.0`; for ryzens, `Block RThroughput: <=0.5`
So pick cost of `1`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110504
2021-09-27 14:15:25 +03:00
Dmitry Vyukov 354ded67b3 tsan: align ThreadState to cache line
There are 2 reasons to do this:
1. We place hot data in the first cache line of ThreadState,
this assumed that it's cache-line-aligned but we never actually
enforced it (or it was lost at some point).
2. The new vector clock uses vector instructions and requires
data alignment. Later the new vector clock will be embedded in
ThreadState, then ensuring vector clock alignment will be
impossible w/o ThreadState alignment.

Depends on D110519.

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D110520
2021-09-27 12:54:09 +02:00
Dmitry Vyukov ed7f3f5bc9 tsan: move shadow stack into ThreadState
Currently the shadow stack is located in the trace memory mapping.
The new tsan runtime will remove the trace memory mapping.
Move the shadow stack into ThreadState as a preparation step.

Reviewed By: melver

Differential Revision: https://reviews.llvm.org/D110519
2021-09-27 12:53:02 +02:00
Fraser Cormack e2b46e336b [DAGCombiner][VP] Fold zero-length or false-masked VP ops
This patch adds a generic DAGCombine for vector-predicated (VP) nodes.
Those for which we can determine that no vector element is active can be
replaced by either undef or, for reductions, the start value.

This is tested rather trivially at the IR level, where it's possible
that we want to teach instcombine to perform this optimization.

However, we can also see the zero-evl case arise during SelectionDAG
legalization, when wide VP operations can be split into two and the
upper operation emerges as trivially false.

It's possible that we could perform this optimization "proactively"
(both on legal vectors and before splitting) and reduce the width of an
operation and insert it into a larger undef vector:

```
v8i32 vp_add x, y, mask, 4
->
v8i32 insert_subvector (v8i32 undef), (v4i32 vp_add xsub, ysub, mask, 4), i32 0
```

This is somewhat analogous to similar vector narrow/widening
optimizations, but it's unclear at this point whether that's beneficial
to do this for VP ops for any/all targets.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D109148
2021-09-27 11:30:09 +01:00
David Green bb2d23dcd4 [ARM] Improve detection of fallthough when aligning blocks
We align non-fallthrough branches under Cortex-M at O3 to lead to fewer
instruction fetches. This improves that for the block after a LE or
LETP. These blocks will still have terminating branches until the
LowOverheadLoops pass is run (as they are not handled by analyzeBranch,
the branch is not removed until later), so canFallThrough will return
false. These extra branches will eventually be removed, leaving a
fallthrough, so treat them as such and don't add unnecessary alignments.

Differential Revision: https://reviews.llvm.org/D107810
2021-09-27 11:21:21 +01:00
Nicolas Vasilache 1b49a72de9 [mlir] Factor out constraint set creation from hoist padding.
This revision adds a

```
FlatAffineValueConstraints(ValueRange ivs, ValueRange lbs, ValueRange ubs)
```

method and use it in hoist padding.

Differential Revision: https://reviews.llvm.org/D110427
2021-09-27 10:11:35 +00:00
Daniel Kiss 77aa9ca92a [libunwind] Support cfi_undefined and cfi_register for float registers.
During a backtrace the `.cfi_undefined` for a float register causes an assert in libunwind.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D110144
2021-09-27 12:04:02 +02:00
Max Kazantsev 4992220ea7 [Test] Regenerate test checks with autogen script 2021-09-27 16:55:59 +07:00
Nicolas Vasilache b74493ecea [mlir][Linalg] Refactor padding hoisting - NFC
This revision extracts padding hoisting in a new file and cleans it up in prevision of future improvements and extensions.

Differential Revision: https://reviews.llvm.org/D110414
2021-09-27 09:50:31 +00:00
Simon Pilgrim 468ff703e1 [X86] combineVectorHADDSUB - remove the broken HOP(x,x) merging code (PR51974)
This intention of this code turns out to be superfluous as we can handle this with shuffle combining, and it has a critical flaw in that it doesn't check for dependencies.

Fixes PR51974
2021-09-27 10:41:22 +01:00
Florian Hahn 4b581e87df
[LV] Add tests where rt checks may make vectorization unprofitable.
Add a few additional tests which require a large number of runtime
checks for D109368.
2021-09-27 10:32:28 +01:00
Pushpinder Singh 9d0eb440ff [libomptarget][nfc][amdgpu] Reorder function to clarify review diff 2021-09-27 09:30:55 +00:00
Matthias Springer ffdf0a370d [mlir][vector] Fix bug in vector-transfer-full-partial-split
When splitting with linalg.copy, cannot write into the destination alloc directly. Instead, write into a subview of the alloc.

Differential Revision: https://reviews.llvm.org/D110512
2021-09-27 18:12:17 +09:00
Ben Shi 683e506324 [AArch64][test] Add more tests of add/sub with immediate
Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D110474
2021-09-27 09:05:21 +00:00
David Spickett 3c65d54ec3 [llvm] Disable LLVM_ENABLE_PER_TARGET_RUNTIME_DIR by default on Arm Linux
Due to the way detecting the hard float ABI is currently
handled, clang fails to find the per target dir.

I am working to fix this but in the meantime disable it by
default on Arm Linux.
2021-09-27 09:03:26 +00:00
Fraser Cormack d48f6df1f8 [RISCV] Create the correct mask type when lowering EXTRACT_VECTOR_ELT
This particular case was creating a `VMSET_VL` using the old
fixed-length type in order to pass a mask to other custom nodes
operating on the scalable container type. This kind of thing wasn't
caught for us; I only noticed when experimenting with odd-length
vectors, where it was trying to generate an invalid `v3i1` MVT.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D110420
2021-09-27 09:43:40 +01:00
Krasimir Georgiev 8cb234e07d [Bazel] Fix for 6498b0e991 2021-09-27 10:44:58 +02:00
Jon Chesterfield 726a34f063 [libomptarget][amdgpu] Replace dead exit call with returning error 2021-09-27 09:43:37 +01:00
Michał Górny f4b71e3479 [llvm] [ADT] Add a range/iterator-based Split()
Add a llvm::Split() implementation that can be used via range-for loop,
e.g.:

    for (StringRef x : llvm::Split("foo,bar,baz", ','))
      ...

The implementation uses an additional SplittingIterator class that
uses StringRef::split() internally.

Differential Revision: https://reviews.llvm.org/D110496
2021-09-27 10:43:09 +02:00
Balazs Benics 66d9d1012b [clang][AST] Add support for ShuffleVectorExpr to ASTImporter
Addresses https://bugs.llvm.org/show_bug.cgi?id=51902

Reviewed By: shafik, martong

Differential Revision: https://reviews.llvm.org/D110052
2021-09-27 10:17:12 +02:00
Max Kazantsev 0bd9162fd7 [Test] Add test showing that SCEV cannot properly infer ranges of cycled phis 2021-09-27 15:01:43 +07:00
Krasimir Georgiev 92b475f0b0 [lldb] silence -Wsometimes-uninitialized warnings
No functional changes intended.

Silence warnings from
3a6ba36751.
2021-09-27 09:35:58 +02:00
Vignesh Balu 62fddd5ff5 [OpenMP][OMPD] Implementation of OMPD debugging library - libompd.
This is a continuation of the review: https://reviews.llvm.org/D100182
This patch implements the OMPD API as specified in the standard doc.

Reviewed By: @hbae
Differential Revision: https://reviews.llvm.org/D100183
2021-09-27 12:32:31 +05:30
serge-sans-paille e45f67f31e Make analyze-cc path discovery sensible to symlinks
Fix https://bugs.llvm.org/show_bug.cgi?id=51897

Differential Revision: https://reviews.llvm.org/D110521
2021-09-27 08:35:19 +02:00
Freddy Ye 902ec6142a [X86][ISel] Lowering FROUND(f16) and FROUNDEVEN(f16)
When AVX512FP16 is enabled, FROUND(f16) cannot be dealt with
TypeLegalize, and no libcall in libm is ready for fround(f16) now.
FROUNDEVEN(f16) has related instruction in AVX512FP16.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D110312
2021-09-27 13:35:03 +08:00
Max Kazantsev e787678cef [Test] Add some simple tests where IndVars cannot remove a check in loop
Previously I've added tests that require context for inference, but it
seems tha SCEV can't prove same facts even when the context isn't required.
2021-09-27 12:12:51 +07:00
Michael Kruse 91f46bb77e [Polly] Reject reject regions entered by an indirectbr/callbr.
SplitBlockPredecessors is unable to insert an additional BasicBlock
between an indirectbr/callbr terminator and the successor blocks.
This is needed by Polly to normalize the control flow before emitting
its optimzed code.

This patches rejects regions entered by an indirectbr/callbr to not fail
later at code generation.

This fixes llvm.org/PR51964
2021-09-26 21:21:50 -05:00
Lang Hames 1ea8d12510 [ORC] Add missing lock to CompileOnDemandLayer::getPerDylibResources.
The getPerDylibResources method may be called concurrently from multiple
threads, so we need to protect access to the underlying map.

Possible for fix https://llvm.org/PR51064
2021-09-26 18:35:58 -07:00
Wang, Pengfei 7d6889964a [X86][FP16] Add more builtins to avoid multi evaluation problems & add 2 missed intrinsics
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D110336
2021-09-27 09:27:04 +08:00
Lang Hames 4b37462aab [ORC] Fix SimpleRemoteEPC data races.
Adds a 'start' method to SimpleRemoteEPCTransport to defer transport startup
until the client has been configured. This avoids races on client members if the
first messages arrives while the client is being configured.

Also fixes races on the file descriptors in FDSimpleRemoteEPCTransport.
2021-09-26 18:11:48 -07:00
Amara Emerson acd13994d1 [GlobalISel] Re-generate some call lowering tests with the new CHECK-NEXT behaviour. 2021-09-26 17:25:38 -07:00
Mehdi Amini 9c2cd6e7c8 Fix clang-tidy warning "modernize-use-nullptr" in MLIR VulkanRuntime (NFC) 2021-09-26 22:06:00 +00:00
Mehdi Amini b3891f28a3 Fix ClangTidyLegacy warning: "'virtual' is redundant since the function is already declared 'final' " (NFC) 2021-09-26 22:02:23 +00:00
Lang Hames daf0b2f078 [MCJIT] This test shouldn't require an unwind table.
This should fix the failures on the Fuchsia bot that started in
https://lab.llvm.org/buildbot/#/builders/98/builds/6401.
2021-09-26 14:14:41 -07:00
Michał Górny e2f780fba9 [lldb] [gdb-remote] Use llvm::StringRef.split() and llvm::to_integer()
Replace the uses of StringConvert combined with hand-rolled array
splitting with llvm::StringRef.split() and llvm::to_integer().

Differential Revision: https://reviews.llvm.org/D110472
2021-09-26 21:23:26 +02:00
Nikita Popov 7a855596c3 [BasicAA] Don't check whether GEP is sized (NFC)
GEPs are required to have sized source element type, so we can
just assert that here.
2021-09-26 21:21:54 +02:00
Simon Pilgrim c0eff50fc5 [X86][SSE] combineMulToPMADDWD - enable sext_extend_vector_inreg(vXi16) -> zext_extend_vector_inreg(vXi16) fold
The plan is to allow combineMulToPMADDWD to match illegal vector types (as long as they're still pow2), which should allow us to start removing the 128-bit limit on more of the PMADDWD combines.
2021-09-26 19:37:23 +01:00
Simon Pilgrim ed3e4917b3 [X86] Fold PACK(*_EXTEND_VECTOR_INREG, UNDEF) -> *_EXTEND_VECTOR_INREG
For 128-bit vectors, we can remove a PACK of a EXTEND_VECTOR_INREG node and just create a smaller extension to the result/packed type.
2021-09-26 19:37:22 +01:00
Lang Hames f40685138b [ORC] Remote OrcRemoteTargetClient and OrcRemoteTargetServer.
Now that the lli and lli-child-target tools have been updated to use
SimpleRemoteEPC (6498b0e991) the OrcRemoteTarget* APIs are no longer needed.

Once the LLJITWithRemoteDebugging example has been migrated to SimpleRemoteEPC
we will remove OrcRPCExecutorProcessControl, and the ORC RPC system itself.
2021-09-26 11:32:51 -07:00
Lang Hames a12c0d5ea6 [ORC] Export process symbols in lli-child-target.
We want this behavior for future testing infrastructure anyway, and it may help
with the failure in https://lab.llvm.org/buildbot/#/builders/98/builds/6401:

/b/fuchsia-x86_64-linux/llvm.obj/tools/clang/stage2-bins/bin/lli: warning:
remote mcjit does not support lazy compilation
Finalization error: could not register eh-frame: __register_frame function not
found
/b/fuchsia-x86_64-linux/llvm.obj/tools/clang/stage2-bins/bin/lli: disconnecting
2021-09-26 11:22:49 -07:00
LLVM GN Syncbot a44b122ade [gn build] Port 6498b0e991 2021-09-26 17:25:08 +00:00
Lang Hames 6498b0e991 Reintroduce "[ORC] Introduce EPCGenericRTDyldMemoryManager."
This reintroduces "[ORC] Introduce EPCGenericRTDyldMemoryManager."
(bef55a2b47) and "[lli] Add ChildTarget dependence
on OrcTargetProcess library." (7a219d801b) which were
reverted in 99951a5684 due to bot failures.

The root cause of the bot failures should be fixed by "[ORC] Fix uninitialized
variable." (0371049277) and "[ORC] Wait for
handleDisconnect to complete in SimpleRemoteEPC::disconnect."
(320832cc9b).
2021-09-27 03:24:33 +10:00
Simon Pilgrim 3fe9767204 [X86] Fold ADD(VPMADDWD(X,Y),VPMADDWD(Z,W)) -> VPMADDWD(SHUFFLE(X,Z), SHUFFLE(Y,W))
Merge addition of VPMADDWD nodes if each element pair doesn't use the upper element in each pair (i.e. its zero) - we can generalize this to either element in the pair if we one day create VPMADDWD with zero lower elements.

There are still a number of issues with extending/shuffling with 256/512-bit VPMADDWD nodes so this initially only works for v2i32/v4i32 cases - I'm working on removing all these limitations but there's still a bit of yak shaving to go.....
2021-09-26 18:08:29 +01:00
Lang Hames 175c1a39e8 [ORC][llvm-jitlink] Add debugging output to SimpleRemoteEPC (and Server).
Also adds an optional 'debug' argument to the llvm-jitlink-executor tool to
enable debug-logging.
2021-09-26 10:00:29 -07:00
Kazu Hirata c4ae4a745d [RISCV] Remove redundant declaration RISCVMnemonicSpellCheck (NFC)
Note that RISCVMnemonicSpellCheck is defined in
RISCVGenAsmMatcher.inc, which RISCVAsmParser.cpp includes.

Identified with readability-redundant-declaration.
2021-09-26 09:26:57 -07:00
Roman Lebedev d9413f46b3
[X86][Costmodel] Load/store i16 VF=2 interleaving costs
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/M8vEKs5jY - for intels `Block RThroughput: =2.0`;
                                  for ryzens, `Block RThroughput: <=1.0`
So pick cost of `2`.

For store we have:
https://godbolt.org/z/Kx1nKz7je - for intels `Block RThroughput: =1.0`;
                                  for ryzens, `Block RThroughput: <=0.5`
So pick cost of `1`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D103144
2021-09-26 19:13:23 +03:00
Nikita Popov 14a49f5840 [DSE] Don't check getUnderlyingObject() return value (NFC)
getUnderlyingObject() never returns null. It will simply return
something that is not the "root" underlying object.

Also drop a stale comment.
2021-09-26 18:01:26 +02:00
Nikita Popov f3c74b72f4 [DSE] Make DSEState non-copyable (NFC)
As it contains a self-reference, the default copy/move ctors
would not be safe.

Move the DSEState::get() method into the ctor to make sure no move
occurs here even without NRVO.

This is a speculative fix for test failures on
llvm-clang-x86_64-expensive-checks-win.
2021-09-26 17:54:38 +02:00
Jon Chesterfield 8cf93a35d4 [libomptarget][amdgpu] Destruct HSA queues
Store queues in unique_ptr so they are destroyed when the global DeviceInfo is. Currently they leak which raises an assert in debug builds of hsa.

Reviewed By: pdhaliwal

Differential Revision: https://reviews.llvm.org/D109511
2021-09-26 15:34:21 +01:00