Move the (SSE-only) generic, legalized type conversion matching after the specific,custom conversion cases, allowing us to properly provide cost overrides.
The next step will be to clean up some of the weird existing costs and then to enable AVX+ legalized costs, which will let us strip out a lot of the cost tables entries.
Cross function boundary bufferization support is added.
This is enabled by cross-function boundary alias analysis, for which the bufferization process is extended: it can now modify the BufferizationAliasInfo as new ops are introduced.
A number of simplifying assumptions are made:
1. by default we bufferize to the most dynamic strided memref type, further memref::CastOp canonicalizations are expected to clean up the IR.
2. in the current implementation, the stride information is always erased at function boundaries. A subsequent pass will be required to analyze the meet of all call ops to a function and decide whether more static buffer types can be used. This will potentially clone functions when it is deemed profitable to do so (e.g. when the stride-1 dimension may vary).
3. external function always bufferize to the most dynamic strided memref version. This may require special annotations for specifying that particular operands of top-level functions have contiguous buffer layout.
An alternative to point 3. would be to support tensor layout annotations, which is currently not supported in MLIR.
Differential revision: https://reviews.llvm.org/D104873
Very late in compilation, backends like X86 will perform optimisations like
this:
$cx = MOV16rm $rax, ...
->
$rcx = MOV64rm $rax, ...
Widening the load from 16 bits to 64 bits. SEeing how the lower 16 bits
remain the same, this doesn't affect execution. However, any debug
instruction reference to the defined operand now refers to a 64 bit value,
nto a 16 bit one, which might be unexpected. Elsewhere in codegen, there's
often this pattern:
CALL64pcrel32 @foo, implicit-def $rax
%0:gr64 = COPY $rax
%1:gr32 = COPY %0.sub_32bit
Where we want to refer to the definition of $eax by the call, but don't
want to refer the copies (they don't define values in the way
LiveDebugValues sees it). To solve this, add a subregister field to the
existing "substitutions" facility, so that we can describe a field within
a larger value definition. I would imagine that this would be used most
often when a value is widened, and we need to refer to the original,
narrower definition.
Differential Revision: https://reviews.llvm.org/D88891
This extends the effects of [[ http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1825r0.html | P1825 ]] to all C++ standards from C++11 up to C++20.
According to Motion 23 from Cologne 2019, P1825R0 was accepted as a Defect Report, so we retroactively apply this all the way back to C++11.
Note that we also remove implicit moves from C++98 as an extension
altogether, since the expanded first overload resolution from P1825
can cause some meaning changes in C++98.
For example it can change which copy constructor is picked when both const
and non-const ones are available.
This also rips out warn_return_std_move since there are no cases where it would be worthwhile to suggest it.
This also fixes a bug with bailing into the second overload resolution
when encountering a non-rvref qualified conversion operator.
This was unnoticed until now, so two new test cases cover these.
Signed-off-by: Matheus Izvekov <mizvekov@gmail.com>
Reviewed By: rsmith
Differential Revision: https://reviews.llvm.org/D104500
Since I had some fun understanding how to properly use llvm::Expected<T> I added some code examples that I would have liked to see when learning to use it.
Reviewed By: sammccall
Differential Revision: https://reviews.llvm.org/D105014
https://bugs.llvm.org/show_bug.cgi?id=50727
When processing C# Lambda expression in the indentation can goes a little wrong,
resulting the the closing } being at the wrong indentation level and meaning the remaining part of the file is
incorrectly indented.
This can be a fairly common pattern for when C# wants to peform a UI action from a thread,
and it wants to invoke that action on the main thread
Reviewed By: exv, jbcoe
Differential Revision: https://reviews.llvm.org/D104388
It seems like ExprEngine::handleLVectorSplat() was used at only 2
places. It might be better to directly inline them for readability.
It seems like these cases were not covered by tests according to my
coverage measurement, so I'm adding tests as well, demonstrating that no
behavior changed.
Besides that, I'm handling CK_MatrixCast similarly to how the rest of
the unhandled casts are evaluated.
Differential Revision: https://reviews.llvm.org/D105125
Reviewed by: NoQ
Previously `LValueToRValueBitCast`s were modeled in the same way how
a regular `BitCast` was. However, this should not produce an l-value.
Modeling bitcasts accurately is tricky, so it's probably better to
model this expression by binding a fresh conjured value.
The following code should not result in a diagnostic:
```lang=C++
__attribute__((always_inline))
static inline constexpr unsigned int_castf32_u32(float __A) {
return __builtin_bit_cast(unsigned int, __A); // no-warning
}
```
Previously, it reported
`Address of stack memory associated with local variable '__A' returned
to caller [core.StackAddressEscape]`.
Differential Revision: https://reviews.llvm.org/D105017
Reviewed by: NoQ, vsavchenko
Adds support for both synchronous and asynchronous calls to wrapper functions
using SPS (Simple Packed Serialization). Also adds support for wrapping
functions on the JIT side in SPS-based wrappers that can be called from the
executor.
These new methods simplify calls between the JIT and Executor, and will be used
in upcoming ORC runtime patches to enable communication between ORC and the
runtime.
Move the OpDSL doc to a linalg sub folder and updated the integration in the main linalg documentation.
Differential Revision: https://reviews.llvm.org/D105188
Add helpers to facilitate adding arguments and results to operations
that implement the `FunctionLike` trait. These operations already have a
convenient argument and result *erasure* mechanism, but a corresopnding
utility for insertion is missing. This introduces such a utility.
While on regular Linux system (Fedora 34 GA, not updated):
* thread #1, name = '1', stop reason = hit program assert
frame #0: 0x00007ffff7e242a2 libc.so.6`raise + 322
frame #1: 0x00007ffff7e0d8a4 libc.so.6`abort + 278
frame #2: 0x00007ffff7e0d789 libc.so.6`__assert_fail_base.cold + 15
frame #3: 0x00007ffff7e1ca16 libc.so.6`__assert_fail + 70
* frame #4: 0x00000000004011bd 1`main at assert.c:7:3
On Fedora 35 pre-release one gets:
* thread #1, name = '1', stop reason = signal SIGABRT
* frame #0: 0x00007ffff7e48ee3 libc.so.6`pthread_kill@GLIBC_2.2.5 + 67
frame #1: 0x00007ffff7dfb986 libc.so.6`raise + 22
frame #2: 0x00007ffff7de5806 libc.so.6`abort + 230
frame #3: 0x00007ffff7de571b libc.so.6`__assert_fail_base.cold + 15
frame #4: 0x00007ffff7df4646 libc.so.6`__assert_fail + 70
frame #5: 0x00000000004011bd 1`main at assert.c:7:3
I did not write a testcase as one needs the specific glibc. An
artificial test would just copy the changed source.
Reviewed By: mib
Differential Revision: https://reviews.llvm.org/D105133
This patch changes return type of tryCandidate from void to bool:
1. Methods in some targets already follow this convention.
2. This would help if some target wants to re-use generic code.
3. It looks more intuitive if these try-method returns the same type.
We may need to change return type of them from bool to some enum
further, to make it less confusing.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D103951
It's already covered by multiple tests, but to trigger
this path we need MTE+GWP which disabled.
Reviewed By: hctim, pcc
Differential Revision: https://reviews.llvm.org/D105232
This is a first step towards consistently using the term 'executor' for the
process that executes JIT'd code. I've opted for 'executor' as the preferred
term over 'target' as target is already heavily overloaded ("the target
machine for the executor" is much clearer than "the target machine for the
target").
Now we lack a benchmark to measure the performance change for each
commit.
Since coro elide is the main optimization in coroutine module, I wonder
it may be an estimation to count the number of elided coroutine in
private code bases.
e.g., for a certain commit, if we found that the number of elided goes
down, we could find it before the commit check-in.
Reviewed By: lxfind
Differential Revision: https://reviews.llvm.org/D105095
By using stable_sort.
Added a test case which previously failed when expensive checks were
enabled.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D105240
* Split memref.dim into two operations: memref.dim and tensor.dim. Both ops have the same builder interface and op argument names, so that they can be used with templates in patterns that apply to both tensors and memrefs (e.g., some patterns in Linalg).
* Add constant materializer to TensorDialect (needed for folding in affine.apply etc.).
* Remove some MemRefDialect dependencies, make some explicit.
Differential Revision: https://reviews.llvm.org/D105165
This reverts commit 2240b41ee4.
A value of 0 for KernDescVal WG_Size implies it is unknown, so it should be
set to the default. The above change was made without this assumption.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D105250
This patch introduces a custom rule for expanding the LLVM target
enumeration .def files. This provides a slightly cleaner API for these
rules, but is mostly to permit selects to be used when determining which
LLVM targets to build. Right now the target list is generated at Bazel
configure time, but this will allows us to add functionality to also
control which targets are built based on config settings.
Tested: Ran `bazel test --config=rbe ... @llvm-project//...`
Reviewed By: chandlerc
Differential Revision: https://reviews.llvm.org/D104969
Uses elementwise interface to generalize canonicalization pattern and add a new
pattern for vector.contract case.
Differential Revision: https://reviews.llvm.org/D104343
This is one sibling of the fold added with c7b658aeb5 .
(X + C2) <u C --> X >s ~C2 (if C == C2 + SMIN)
I'm still not sure how to describe it best, but we're
translating 2 constants from an unsigned range comparison
to signed because that eliminates the offset (add) op.
This could be extended to handle the more general (non-constant)
pattern too:
https://alive2.llvm.org/ce/z/K-fMBf
define i1 @src(i8 %a, i8 %c2) {
%t = add i8 %a, %c2
%c = add i8 %c2, 128 ; SMIN
%ov = icmp ult i8 %t, %c
ret i1 %ov
}
define i1 @tgt(i8 %a, i8 %c2) {
%not_c2 = xor i8 %c2, -1
%ov = icmp sgt i8 %a, %not_c2
ret i1 %ov
}
Previously, we only applied the renames to
ConcatOutputSections.
Reviewed By: #lld-macho, thakis
Differential Revision: https://reviews.llvm.org/D105079