It turns out that we can't remove the operator new and delete
interceptors on Android without breaking ABI, so bring them back
as forwards to the malloc and free functions.
Differential Revision: https://reviews.llvm.org/D91219
__register_frame and __deregister_frame are associated with the
.eh_frame section, which I think is used on all of our platforms
except Windows and 32-bit ARM (which uses the ARM EHABI).
Also add a file that was added to lld/MachO.
This enables a method sending an autorelease message to an object and
returning the object in MRR to avoid adding the object to an autorelease
pool if a call to objc_retainAutoreleasedReturnValue in the caller
function accepts the hand off of the retain count.
rdar://problem/50678052
Differential Revision: https://reviews.llvm.org/D91111
This patch converts elementwise ops on tensors to linalg.generic ops
with the same elementwise op in the payload (except rewritten to
operate on scalars, obviously). This is a great form for later fusion to
clean up.
E.g.
```
// Compute: %arg0 + %arg1 - %arg2
func @f(%arg0: tensor<?xf32>, %arg1: tensor<?xf32>, %arg2: tensor<?xf32>) -> tensor<?xf32> {
%0 = addf %arg0, %arg1 : tensor<?xf32>
%1 = subf %0, %arg2 : tensor<?xf32>
return %1 : tensor<?xf32>
}
```
Running this through
`mlir-opt -convert-std-to-linalg -linalg-fusion-for-tensor-ops` we get:
```
func @f(%arg0: tensor<?xf32>, %arg1: tensor<?xf32>, %arg2: tensor<?xf32>) -> tensor<?xf32> {
%0 = linalg.generic {indexing_maps = [#map0, #map0, #map0, #map0], iterator_types = ["parallel"]} ins(%arg0, %arg1, %arg2 : tensor<?xf32>, tensor<?xf32>, tensor<?xf32>) {
^bb0(%arg3: f32, %arg4: f32, %arg5: f32): // no predecessors
%1 = addf %arg3, %arg4 : f32
%2 = subf %1, %arg5 : f32
linalg.yield %2 : f32
} -> tensor<?xf32>
return %0 : tensor<?xf32>
}
```
So the elementwise ops on tensors have nicely collapsed into a single
linalg.generic, which is the form we want for further transformations.
Differential Revision: https://reviews.llvm.org/D90354
This patch adds an `ElementwiseMappable` trait as discussed in the RFC
here:
https://llvm.discourse.group/t/rfc-std-elementwise-ops-on-tensors/2113/23
This trait can power a number of transformations and analyses.
A subsequent patch adds a convert-elementwise-to-linalg pass exhibits
how this trait allows writing generic transformations.
See https://reviews.llvm.org/D90354 for that patch.
This trait slightly changes some verifier messages, but the diagnostics
are usually about as good. I fiddled with the ordering of the trait in
the .td file trait lists to minimize the changes here.
Differential Revision: https://reviews.llvm.org/D90731
Previously we used setRegClass to rgpr, which may expand the register
domain if the result was already in a constrained class (tcgpr in the
above PR).
Differential Revision: https://reviews.llvm.org/D91192
ScopBuilder distributes independent instructions between statements.
Only modeled (e.g. not synthesizable) instructions are represented.
To compute independence, non-modeled instructions were used in some
parts of determining instruction independence, which could lead to the
re-introduction of non-model instructions.
In particular, required invariant loads could be added to instruction
list, which then led to redundant MemoryAccesses for such a load.
This fixes llvm.org/PR48059.
The original bug was discovered in T75057860. Clang front-end emits an AST that looks like this for an co_await expression:
|- ExprWithCleanups
|- -CoawaitExpr
|- -MaterializeTemporaryExpr ... Awaiter
...
|- -CXXMemberCallExpr ... .await_ready
...
|- -CallExpr ... __builtin_coro_resume
...
|- -CXXMemberCallExpr ... .await_resume
...
ExprWithCleanups is responsible for cleaning up (including calling dtors) for the temporaries generated in the wrapping expression).
In the above structure, the __builtin_coro_resume part (which corresponds to the code for the suspend case in the co_await with symmetric transfer), the pseudocode looks like this:
__builtin_coro_resume(
awaiter.await_suspend(
from_address(
__builtin_coro_frame())).address());
One of the temporaries that's generated as part of this code is the coroutine handle returned from awaiter.await_suspend() call. The call returns a handle which is a prvalue (since it's a returned value on the fly). In order to call the address() method on it, it needs to be converted into an xvalue. Hence a materialized temp is created to hold it. This temp will need to be cleaned up eventually. Now, since all cleanups happen at the end of the entire co_await expression, which is after the <coro.suspend> suspension point, the compiler will think that such a temp needs to live across suspensions, and need to be put on the coroutine frame, even though it's only used temporarily just to call address() method.
Such a phenomena not only unnecessarily increases the frame size, but can lead to ASAN failures, if the coroutine was already destroyed as part of the await_suspend() call. This is because if the coroutine was already destroyed, the frame no longer exists, and one can not store anything into it. But if the temporary object is considered to need to live on the frame, it will be stored into the frame after await_suspend() returns.
A fix attempt was done in https://reviews.llvm.org/D87470. Unfortunately it is incorrect. The reason is that cleanups in Clang works more like linearly than nested. There is one current state indicating whether it needs cleanup, and an ExprWithCleanups resets that state. This means that an ExprWithCleanups must be capable of cleaning up all temporaries created in the wrapping expression, otherwise there will be dangling temporaries cleaned up at the wrong place.
I eventually found a walk-around (https://reviews.llvm.org/D89066) that doesn't break any existing tests while fixing the issue. But it targets the final co_await only. If we ever have a co_await that's not on the final awaiter and the frame gets destroyed after suspend, we are in trouble. Hence we need a proper fix.
This patch is the proper fix. It does the folllowing things to fully resolve the issue:
1. The AST has to be generated in the order according to their nesting relationship. We should not generate AST out of order because then the code generator would incorrectly track the state of temporaries and when a cleanup is needed. So the code in buildCoawaitCalls is reorganized so that we will be generating the AST for each coawait member call in order along with their child AST.
2. await_ready() call is wrapped with an ExprWithCleanups so that temporaries in it gets cleaned up as early as possible to avoid living across suspension.
3. await_suspend() call is wrapped with an ExprWithCleanups if it's not a symmetric transfer. In the case of a symmetric transfer, in order to maintain the musttail call contract, the ExprWithCleanups is wraaped before the resume call.
4. In the end, we mark again that it needs a cleanup, so that the entire CoawaitExpr will be wrapped with a ExprWithCleanups which will clean up the Awaiter object associated with the await expression.
Differential Revision: https://reviews.llvm.org/D90990
These are opsel opcodes with op_sel actually being ignored.
As a such op_sel_hi needs to be set to default 1 even though
these bits are ignored. This is compatibility change.
Differential Revision: https://reviews.llvm.org/D91202
Tracking local variables across suspend points is still somewhat incomplete.
Consider this coroutine snippet:
```
resumable foo() {
int x[10] = {};
int a = 3;
co_await std::experimental::suspend_always();
a++;
x[0] = 1;
a += 2;
x[1] = 2;
a += 3;
x[2] = 3;
}
```
Can't manage to print `a` or `x` if they turn out to be allocas during
CoroSplit (which happens if you build this code with `-O0` prior to this
commit):
```
* thread #1, queue = 'com.apple.main-thread', stop reason = step over
frame #0: 0x0000000100003729 main-noprint`foo() at main-noprint.cpp:43:5
40 co_await std::experimental::suspend_always();
41 a++;
42 x[0] = 1;
-> 43 a += 2;
44 x[1] = 2;
45 a += 3;
46 x[2] = 3;
(lldb) p x
error: <user expression 21>:1:1: use of undeclared identifier 'x'
x
^
```
The generated IR contains a `llvm.dbg.declare` for `x` in it's initialization
basic block. After CoroSplit, the `llvm.dbg.declare` might not dominate all of
`x` uses and we lose debugging quality.
Add `llvm.dbg.value`s to all relevant basic blocks such that if later
transformations break the dominance the reliable debug info is already in
place. For instance, this BB:
```
await.ready:
...
%arrayidx = getelementptr inbounds [10 x i32], [10 x i32]* %x.reload.addr, i64 0, i64 0, !dbg !760
...
%arrayidx19 = getelementptr inbounds [10 x i32], [10 x i32]* %x.reload.addr, i64 0, i64 1, !dbg !763
...
%arrayidx21 = getelementptr inbounds [10 x i32], [10 x i32]* %x.reload.addr, i64 0, i64 2, !dbg !766
```
becomes:
```
await.ready:
...
call void @llvm.dbg.value(metadata [10 x i32]* %x.reload.addr, metadata !751, metadata !DIExpression()), !dbg !753
...
%arrayidx = getelementptr inbounds [10 x i32], [10 x i32]* %x.reload.addr, i64 0, i64 0, !dbg !760
...
%arrayidx19 = getelementptr inbounds [10 x i32], [10 x i32]* %x.reload.addr, i64 0, i64 1, !dbg !763
...
%arrayidx21 = getelementptr inbounds [10 x i32], [10 x i32]* %x.reload.addr, i64 0, i64 2, !dbg !766
```
Differential Revision: https://reviews.llvm.org/D90772
This is a prep step for widening induction variables in LoopFlatten if this is
posssible (D90640), to avoid having to perform certain overflow checks. Since
IndVarSimplify may already widen induction variables, we want to run
LoopFlatten just before IndVarSimplify. This is a minor reshuffle as both
passes were already close after each other.
Differential Revision: https://reviews.llvm.org/D90402
Just enough to consume some bitcode files and link them. There's more
to be done around the symbol resolution API and the LTO config, but I don't yet
understand what all the various LTO settings do...
Reviewed By: #lld-macho, compnerd, smeenai, MaskRay
Differential Revision: https://reviews.llvm.org/D90663
We should have maxprot == initprot for all non-i386 architectures, which
is what ld64 does.
Reviewed By: #lld-macho, compnerd
Differential Revision: https://reviews.llvm.org/D89420
Apple devtools use this to locate the dSYM files for a given
binary.
The UUID is computed based on an MD5 hash of the binary's contents. In order to
hash the contents, we must first write them, but LC_UUID itself must be part of
the written contents in order for all the offsets to be calculated correctly.
We resolve this circular paradox by first writing an LC_UUID with an all-zero
UUID, then updating the UUID with its real value later.
I'm not sure there's a good way to test that the value of the UUID is
"as expected", so I've just checked that it's present.
Reviewed By: #lld-macho, compnerd, smeenai
Differential Revision: https://reviews.llvm.org/D89418
Stub dylibs differ from "real" dylibs in that they lack any content in
their sections. What they do have are export tries and symbol tables,
which means we can still link against them. I am unclear how to
properly create these stub dylibs; XCode 11.3's `lipo` is able to create
stub dylibs, but those lack LC_ID_DYLIB load commands and are considered
invalid by most tooling. Newer versions of `lipo` aren't able to create
stub dylibs at all. However, recent SDKs in XCode still come with valid
stub dylibs, so it still seems worthwhile to support them. The YAML in
this diff's test was generated by taking a non-stub dylib and editing
the appropriate fields.
Reviewed By: #lld-macho, smeenai
Differential Revision: https://reviews.llvm.org/D89012
In C++11 standard, to become implicitly movable, the expression in return
statement should be a non-volatile automatic object. CWG1579 changed the rule
to require that the expression only needs to be an automatic object. C++14
standard and C++17 standard kept this rule unchanged. C++20 standard changed
the rule back to require the expression be a non-volatile automatic object.
This should be a typo in standards, and VD should be non-volatile.
Differential Revision: https://reviews.llvm.org/D88295
This converts LoopFlatten from a LoopPass to a FunctionPass so that we don't
run into problems of a loop pass deleting a (inner)loop.
Differential Revision: https://reviews.llvm.org/D90940
This only exposes the ability to round-trip a textual pipeline at the
moment.
To exercise it, we also bind the libTransforms in a new Python extension. This
does not include any interesting bindings, but it includes all the
mechanism to add separate native extensions and load them dynamically.
As such passes in libTransforms are only registered after `import
mlir.transforms`.
To support this global registration, the TableGen backend is also
extended to bind to the C API the group registration for passes.
Reviewed By: stellaraccident
Differential Revision: https://reviews.llvm.org/D90819
Adding new paragraphs under "Introducing New Components" section to
check the different levels of support we have, to help introduction of
smaller set of changes without overwhelming new collaborators and
potentially losing the contribution.
Differential Revision: D91013
This patch turns VPWidenSelectRecipe into a VPValue and uses it
during VPlan construction and codegeneration instead of the plain IR
reference where possible.
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D84682
The value_type is a const pointer, which makes the iteator non-copyable.
Before the patch, the normal usage like below was illegal:
```
auto It = lookupresult.begin();
...
It = lookupresult.end(); // the copy is not allowed.
```
Differential Revision: https://reviews.llvm.org/D91158
Copy the recent improvements from the FreeBSDRemote plugin, notably:
- moving event reporting setup into SetupTrace() helper
- adding more debug info into SIGTRAP handling
- handling user-generated (and unknown) SIGTRAP events
- adding missing error handling to the generic signal handler
- fixing attaching to processes
- switching watchpoint helpers to use llvm::Error
- minor style and formatting changes
This fixes a number of tests, mostly related to fixed attaching.
Differential Revision: https://reviews.llvm.org/D91167
The macro HAVE_EHTABLE_SUPPORT is used by parts of ExecutionEngine to tell __register_frame/__deregister_frame is available to register the
FDE for a generated (JIT) code. It's currently set by a slowly growing set of macro tests in the respective headers, which is updated now and then when it fails to link on some platform or another due to the symbols being missing (see for example https://bugs.llvm.org/show_bug.cgi?id=5715).
This change converts the macro in two HAVE_(DE)REGISTER_FRAME config.h macros (like most of the other HAVE_* macros) and set's them based on whether CMake can actually find a definition for these symbols to link to at configuration time.
Reviewed By: hubert.reinterpretcast
Differential Revision: https://reviews.llvm.org/D87114
This introduces a new pseudo instruction, almost identical to a
t2DoLoopStart but taking 2 parameters - the original loop iteration
count needed for a low overhead loop, plus the VCTP element count needed
for a DLSTP instruction setting up a tail predicated loop. The idea is
that the instruction holds both values and the backend
ARMLowOverheadLoops pass can pick between the two, depending on whether
it creates a tail predicated loop or falls back to a low overhead loop.
To do that there needs to be something that converts a t2DoLoopStart to
a t2DoLoopStartTP, for which this patch repurposes the
MVEVPTOptimisationsPass as a "tail predication and vpt optimisation"
pass. The extra operand for the t2DoLoopStartTP is chosen based on the
operands of VCTP's in the loop, and the instruction is moved as late in
the block as possible to attempt to increase the likelihood of making
tail predicated loops.
Differential Revision: https://reviews.llvm.org/D90591
This was missing as discovered by the SystemZ multistage bot:
http://lab.llvm.org:8011/#/builders/8, where wrong code resulted when this
extension was not performed.
Thanks for review by Ulrich Weigand and Roman Lebedev.
Differential Revision: https://reviews.llvm.org/D90760