WebAssembly's shift instructions implicitly masks the shift count, so optimize
out redundant explicit masks of the shift count. For vector shifts, this
currently only works if the mask is applied before splatting the shift count,
but this should be addressed in a future commit. Resolves PR49655.
Differential Revision: https://reviews.llvm.org/D105600
Reland of 31859f896.
This change implements new DAG notes GLOBAL_GET/GLOBAL_SET, and
lowering methods for load and stores of reference types from IR
globals. Once the lowering creates the new nodes, tablegen pattern
matches those and converts them to Wasm global.get/set.
Differential Revision: https://reviews.llvm.org/D104797
This is a mechanical change. This actually also renames the
similarly named methods in the SmallString class, however these
methods don't seem to be used outside of the llvm subproject, so
this doesn't break building of the rest of the monorepo.
This change implements new DAG notes GLOBAL_GET/GLOBAL_SET, and
lowering methods for load and stores of reference types from IR
globals. Once the lowering creates the new nodes, tablegen pattern
matches those and converts them to Wasm global.get/set.
Reviewed By: tlively
Differential Revision: https://reviews.llvm.org/D95425
This patch adds TargetStackID::WasmLocal. This stack holds locations of
values that are only addressable by name -- not via a pointer to memory.
For the WebAssembly target, these objects are lowered to WebAssembly
local variables, which are managed by the WebAssembly run-time and are
not addressable by linear memory.
For the WebAssembly target IR indicates that an AllocaInst should be put
on TargetStackID::WasmLocal by putting it in the non-integral address
space WASM_ADDRESS_SPACE_WASM_VAR, with value 1. SROA will mostly lift
these allocations to SSA locals, but any alloca that reaches instruction
selection (usually in non-optimized builds) will be assigned the new
TargetStackID there. Loads and stores to those values are transformed
to new WebAssemblyISD::LOCAL_GET / WebAssemblyISD::LOCAL_SET nodes,
which then lower to the type-specific LOCAL_GET_I32 etc instructions via
tablegen patterns.
Differential Revision: https://reviews.llvm.org/D101140
This patch adds TargetStackID::WasmLocal. This stack holds locations of
values that are only addressable by name -- not via a pointer to memory.
For the WebAssembly target, these objects are lowered to WebAssembly
local variables, which are managed by the WebAssembly run-time and are
not addressable by linear memory.
For the WebAssembly target IR indicates that an AllocaInst should be put
on TargetStackID::WasmLocal by putting it in the non-integral address
space WASM_ADDRESS_SPACE_WASM_VAR, with value 1. SROA will mostly lift
these allocations to SSA locals, but any alloca that reaches instruction
selection (usually in non-optimized builds) will be assigned the new
TargetStackID there. Loads and stores to those values are transformed
to new WebAssemblyISD::LOCAL_GET / WebAssemblyISD::LOCAL_SET nodes,
which then lower to the type-specific LOCAL_GET_I32 etc instructions via
tablegen patterns.
Differential Revision: https://reviews.llvm.org/D101140
This patch adds TargetStackID::WasmLocal. This stack holds locations of
values that are only addressable by name -- not via a pointer to memory.
For the WebAssembly target, these objects are lowered to WebAssembly
local variables, which are managed by the WebAssembly run-time and are
not addressable by linear memory.
For the WebAssembly target IR indicates that an AllocaInst should be put
on TargetStackID::WasmLocal by putting it in the non-integral address
space WASM_ADDRESS_SPACE_WASM_VAR, with value 1. SROA will mostly lift
these allocations to SSA locals, but any alloca that reaches instruction
selection (usually in non-optimized builds) will be assigned the new
TargetStackID there. Loads and stores to those values are transformed
to new WebAssemblyISD::LOCAL_GET / WebAssemblyISD::LOCAL_SET nodes,
which then lower to the type-specific LOCAL_GET_I32 etc instructions via
tablegen patterns.
Differential Revision: https://reviews.llvm.org/D101140
DwarfDebug unconditionally assumes for all call instructions the 0th
operand is the callee operand, which seems to be true for other targets,
but not for WebAssembly. This adds `TargetInstrInfo::getCallOperand`
method whose default implementation returns `getOperand(0)` and makes
WebAssembly overrides it to use its own utility method to get the callee
operand.
This also fixes an existing bug in `WebAssembly::getCalleeOp`, which was
uncovered by this CL.
Reviewed By: dschuff, djtodoro
Differential Revision: https://reviews.llvm.org/D102978
`WebAssemblyDebugValueManager` does not currently handle
`DBG_VALUE_LIST`, which is a recent addition to LLVM. We tried to
nullify them within the constructor of `WebAssemblyDebugValueManager` in
D102589, but it made the class error-prone to use because it deletes
instructions within the constructor and thus invalidates existing
iterators within the BB, so the user of the class should take special
care not to use invalidated iterators. This actually caused a bug in
ExplicitLocals pass.
Instead of trying to fix ExplicitLocals pass to make the iterator usage
correct, which is possible but error-prone, this adds
NullifyDebugValueLists pass that nullifies all `DBG_VALUE_LIST`
instructions before we run WebAssembly specific passes in the backend.
We can remove this pass after we implement handlers for
`DBG_VALUE_LIST`s in `WebAssemblyDebugValueManager` and elsewhere.
Fixes https://github.com/emscripten-core/emscripten/issues/14255.
Reviewed By: dschuff
Differential Revision: https://reviews.llvm.org/D102999
Unlike normal loads these don't have an extension field, but we know
from TargetLowering whether these are sign-extending or zero-extending,
and so can optimise away unnecessary extensions.
This was noticed on RISC-V, where sign extensions in the calling
convention would result in unnecessary explicit extension instructions,
but this also fixes some Mips inefficiencies. PowerPC sees churn in the
tests as all the zero extensions are only for promoting 32-bit to
64-bit, but these zero extensions are still not optimised away as they
should be, likely due to i32 being a legal type.
This also simplifies the WebAssembly code somewhat, which currently
works around the lack of target-independent combines with some ugly
patterns that break once they're optimised away.
Re-landed with correct handling in ComputeNumSignBits for Tmp == VTBits,
where zero-extending atomics were incorrectly returning 0 rather than
the (slightly confusing) required return value of 1.
Re-landed again after D102819 fixed PowerPC to correctly zero-extend all
of its atomics as it claimed to do, since the combination of that bug
and this optimisation caused buildbot regressions.
Reviewed By: RKSimon, atanasyan
Differential Revision: https://reviews.llvm.org/D101342
__table_base is know 64-bit, since in LLVM it represents a function pointer offset
__table_base32 is a copy in wasm32 for use in elem init expr, since no truncation may be used there.
New reloc R_WASM_TABLE_INDEX_REL_SLEB64 added
Differential Revision: https://reviews.llvm.org/D101784
We have been handling filters and landingpads incorrectly all along. We
pass clauses' (catches') types to `__cxa_find_matching_catch` in JS glue
code, which returns the thrown pointer and sets the selector using
`setTempRet0()`.
We apparently have been doing the same for filters' (exception specs')
types; we pass them to `__cxa_find_matching_catch` just the same way as
clauses. And `__cxa_find_matching_catch` treats all given types as
clauses. So it is a little surprising; maybe we intended to do something
from the JS side and didn't end up doing?
So anyway, I don't think supporting exception specs in Emscripten EH is
a priority, but this can actually cause incorrect results for normal
catches when functions are inlined and the inlined spec type has a
parent-child relationship with the catch's type.
---
The below is an example of a bug that can happen when inlining and class
hierarchy is mixed. If you are busy you can skip this part:
```
struct A {};
struct B : A {};
void bar() throw (B) { throw B(); }
void foo() {
try {
bar();
} catch (A &) {
fputs ("Expected result\n", stdout);
}
}
```
In the unoptimized code, `bar`'s landingpad will have a filter for `B`
and `foo`'s landingpad will have a clause for `A`. But when `bar` is
inlined into `foo`, `foo`'s landingpad has both a filter for `B` and a
clause for `A`, and it passes the both types to
`__cxa_find_matching_catch`:
```
__cxa_find_matching_catch(typeinfo for B, typeinfo for A)
```
`__cxa_find_matching_catch` thinks both are clauses, and looks at the
first type `B`, which belongs to a filter. And the thrown type is `B`,
so it thinks the first type `B` is caught. But this makes it return an
incorrect selector, because it is supposed to catch the exception using
the second type `A`, which is a parent of `B`. As a result, the `foo` in
the example program above does not print "Expected result" but just
throws the exception to the caller. (This wouldn't have happened if `A`
and `B` are completely disjoint types, such as `float` and `int`)
Fixes https://bugs.llvm.org/show_bug.cgi?id=50357.
Reviewed By: dschuff, kripken
Differential Revision: https://reviews.llvm.org/D102795
WebAssemblyDebugValueManager class currently does not handle
DBG_VALUE_LIST instructions correctly for two reasons, which are
explained in https://bugs.llvm.org/show_bug.cgi?id=50361.
This effectively nullifies DBG_VALUE_LISTs in
WebAssemblyDebugValueManager so that the info will appear as "optimized
out" in debuggers but still be at least correct in the meantime.
Reviewed By: dschuff, jmorse
Differential Revision: https://reviews.llvm.org/D102589
When a stackified variable has an associated `DBG_VALUE` instruction,
DebugFixup pass adds a `DBG_VALUE` instruction after the stackified
value's last use to clear the variable's debug range info. But when the
last use instruction is a terminator, it can cause a verification
failure (when run with `-verify-machineinstrs`) because there are no
instructions allowed after a terminator.
For example:
```
%myvar = ...
DBG_VALUE target-index(wasm-operand-stack), $noreg, !"myvar", ...
BR_IF 0, %myvar, ...
DBG_VALUE $noreg, $noreg, !"myvar", ...
```
In this test, `%myvar` is stackified, so the first `DBG_VALUE`
instruction's first operand has changed to `wasm-operand-stack` to
denote it. And an additional `DBG_VALUE` instruction is added after its
last use, `BR_IF`, to signal variable `myvar` is not in the operand
stack anymore. But because the `DBG_VALUE` instruction is added after
the `BR_IF`, a terminator, it fails MachineVerifier.
`DBG_VALUE` instructions are used in `DbgEntityHistoryCalculator` to
compute value ranges to emit DWARF info, and it turns out the
`DbgEntityHistoryCalculator` terminates ranges at the end of a BB, so we
don't need to emit `DBG_VALUE` after a terminator.
Fixes https://bugs.llvm.org/show_bug.cgi?id=50175.
Reviewed By: dschuff
Differential Revision: https://reviews.llvm.org/D102309
In wasm64, the signatures of some library functions and global variables
defined in Emscripten change:
- `emscripten_longjmp`: `(i32, i32) -> ()` -> `(i64, i32) -> ()`
This changes because the first argument is the address of a memory
buffer. This in turn causes more changes below.
- `setThrew`: `(i32, i32) -> ()` -> `(i64, i32) -> ()`
`emscripten_longjmp` calls `setThrew` with the i64 buffer argument as
the first parameter.
- `__THREW__` (global var): `i32` to `i64`
`setThrew`'s first argument is set to this `__THREW__` variable, so it
should change to i64 as well.
- `testSetjmp`: `(i32, i32*, i32) -> (i32)` -> `(i64, i32*, i32) -> (i32)`
In the code transformation done in this pass, the value of `__THREW__`
is passed as the first parameter of `testSetjmp`.
This patch creates some helper functions to easily get types that become
different depending on the wasm32/wasm64, and uses them to change
various function signatures and code transformations. Also updates the
tests with WASM32/WASM64 check lines.
(Untested) Emscripten side patch: https://github.com/emscripten-core/emscripten/pull/14108
Reviewed By: aardappel
Differential Revision: https://reviews.llvm.org/D101985
We explicitly made it error out in D101403, out of a good intention that
the error message will make people less confusing. Turns out, we weren't
failing all cases of wasm EH + SjLj; only a few cases were failing and
our client was able to get around by fixing source code, but now we made
it fail for all cases, even the cases that previously succeeded fail,
which we didn't intend. This reverts that change.
Reviewed By: tlively
Differential Revision: https://reviews.llvm.org/D102364
As we have been missing support for WebAssembly globals on the IR level,
the lowering of WASM_SYMBOL_TYPE_GLOBAL to IR was incomplete. This
commit fleshes out the lowering support, lowering references to and
definitions of addrspace(1) values to correctly typed
WASM_SYMBOL_TYPE_GLOBAL symbols.
Depends on D101608.
Differential Revision: https://reviews.llvm.org/D101913
This patch adds support for WebAssembly globals in LLVM IR, representing
them as pointers to global values, in a non-default, non-integral
address space. Instruction selection legalizes loads and stores to
these pointers to new WebAssemblyISD nodes GLOBAL_GET and GLOBAL_SET.
Once the lowering creates the new nodes, tablegen pattern matches those
and converts them to Wasm global.get/set of the appropriate type.
Based on work by Paulo Matos in https://reviews.llvm.org/D95425.
Reviewed By: pmatos
Differential Revision: https://reviews.llvm.org/D101608
This change was originally landed in: 5000a1b4b9
It was reverted in: 061e071d8c
This change adds support for a new WASM_SEG_FLAG_STRINGS flag in
the object format which works in a similar fashion to SHF_STRINGS
in the ELF world.
Unlike the ELF linker this support is currently limited:
- No support for SHF_MERGE (non-string merging)
- Always do full tail merging ("lo" can be merged with "hello")
- Only support single byte strings (p2align 0)
Like the ELF linker merging is only performed at `-O1` and above.
This fixes part of https://bugs.llvm.org/show_bug.cgi?id=48828,
although crucially it doesn't not currently support debug sections
because they are not represented by data segments (they are custom
sections)
Differential Revision: https://reviews.llvm.org/D97657
This change adds support for a new WASM_SEG_FLAG_STRINGS flag in
the object format which works in a similar fashion to SHF_STRINGS
in the ELF world.
Unlike the ELF linker this support is currently limited:
- No support for SHF_MERGE (non-string merging)
- Always do full tail merging ("lo" can be merged with "hello")
- Only support single byte strings (p2align 0)
Like the ELF linker merging is only performed at `-O1` and above.
This fixes part of https://bugs.llvm.org/show_bug.cgi?id=48828,
although crucially it doesn't not currently support debug sections
because they are not represented by data segments (they are custom
sections)
Differential Revision: https://reviews.llvm.org/D97657
Unlike normal loads these don't have an extension field, but we know
from TargetLowering whether these are sign-extending or zero-extending,
and so can optimise away unnecessary extensions.
This was noticed on RISC-V, where sign extensions in the calling
convention would result in unnecessary explicit extension instructions,
but this also fixes some Mips inefficiencies. PowerPC sees churn in the
tests as all the zero extensions are only for promoting 32-bit to
64-bit, but these zero extensions are still not optimised away as they
should be, likely due to i32 being a legal type.
This also simplifies the WebAssembly code somewhat, which currently
works around the lack of target-independent combines with some ugly
patterns that break once they're optimised away.
Re-landed with correct handling in ComputeNumSignBits for Tmp == VTBits,
where zero-extending atomics were incorrectly returning 0 rather than
the (slightly confusing) required return value of 1.
Reviewed By: RKSimon, atanasyan
Differential Revision: https://reviews.llvm.org/D101342
- Removes the mention of fastcomp, which is deprecated.
- Some functions in Emscripten have moved from JS glue code to
compiler-rt/emscripten_setjmp.c and
compiler-rt/emscripten_exception_builtins.c. This fixes comments about
that.
Reviewed By: sbc100
Differential Revision: https://reviews.llvm.org/D101812
The WebAssembly SIMD intrinsics in wasm_simd128.h generally try not to require
any particular alignment for memory operations to be maximally flexible. For
builtin memory access functions and their corresponding LLVM IR intrinsics,
there's no way to set the expected alignment, so the best we can do is set the
alignment to 1 in the backend. This change means that the alignment hints in the
emitted code will no longer be incorrect when users use the intrinsics to access
unaligned data.
Differential Revision: https://reviews.llvm.org/D101850
This seems to have broken sanitizers, giving lots of
Assertion `NumBits <= MAX_INT_BITS && "bitwidth too large"' failed.
failures across multiple targets (currently X86 and PowerPC). Reverting
until I have a chance to reproduce and debug.
This reverts commit 6e876f9ded.
Unlike normal loads these don't have an extension field, but we know
from TargetLowering whether these are sign-extending or zero-extending,
and so can optimise away unnecessary extensions.
This was noticed on RISC-V, where sign extensions in the calling
convention would result in unnecessary explicit extension instructions,
but this also fixes some Mips inefficiencies. PowerPC sees churn in the
tests as all the zero extensions are only for promoting 32-bit to
64-bit, but these zero extensions are still not optimised away as they
should be, likely due to i32 being a legal type.
This also simplifies the WebAssembly code somewhat, which currently
works around the lack of target-independent combines with some ugly
patterns that break once they're optimised away.
Reviewed By: RKSimon, atanasyan
Differential Revision: https://reviews.llvm.org/D101342
We previously had an ISel pattern for i64x2.abs, but because the ISDNode was not
marked legal for v2i64, the instruction was not being selected.
Differential Revision: https://reviews.llvm.org/D101803
- Error out when both Emscripten EH and wasm EH are used together, i.e.,
both `-enable-emscripten-cxx-exceptions` and `-exception-model=wasm`
are given together. This will not happen if you use Emscripten, but
this can happen when you call `llc` manually with wrong set of
arguments.
- Currently we don't yet support using wasm EH with Emscripten SjLj.
Unlike `-enable-emscripten-cxx-exceptions` which is turned on only
when you use `emcc -s DISABLE_EXCEPTION_CATCHING=0`,
`-enable-emscripten-sjlj` is turned on by Emscripten by default. So we
error out only when it is turned on and `setjmp` or `longjmp` is
actually used.
Reviewed By: tlively
Differential Revision: https://reviews.llvm.org/D101403
Previously we used an i32 constant to store the saturation width, but i32 isn't
legal on RISCV64. This wasn't a big deal to fix, but it is extra work for the
type legalizer.
This patch uses a VTSDNode to store the type similar to SEXT_INREG. This makes
it opaque to the type legalizer.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D101262
Background:
CFGStackify's [[ 398f253400/llvm/lib/Target/WebAssembly/WebAssemblyCFGStackify.cpp (L1481-L1540) | fixEndsAtEndOfFunction ]] fixes block/loop/try's return
type when the end of function is unreachable and the function return
type is not void. So if a function returns i32 and `block`-`end` wraps the
whole function, i.e., the `block`'s `end` is the last instruction of the
function, the `block`'s return type should be i32 too:
```
block i32
...
end
end_function
```
If there are consecutive `end`s, this signature has to be propagate to
those blocks too, like:
```
block i32
...
block i32
...
end
end
end_function
```
This applies to `try`-`end` too:
```
try i32
...
catch
...
end
end_function
```
In case of `try`, we not only follow consecutive `end`s but also follow
`catch`, because for the type of the whole `try` to be i32, both `try`
and `catch` parts have to be i32:
```
try i32
...
block i32
...
end
catch
...
block i32
...
end
end
end_function
```
---
Previously we only handled consecutive `end`s or `end` before a `catch`.
But now we have `delegate`, which serves like `end` for
`try`-`delegate`. So we have to follow `delegate` too and mark its
corresponding `try` as i32 (the function's return type):
```
try i32
...
catch
...
try i32 ;; Here
...
delegate N
end
end_function
```
Reviewed By: tlively
Differential Revision: https://reviews.llvm.org/D101036
This adds support for YAML serialization of `Params` and `Results`
fields in `WebAssemblyMachineFunctionInfo`. Types are printed as `MVT`'s
string representation. This is for writing MIR tests easier.
The tests added are testing simple parsing and printing of `params` /
`results` fields under `machineFunctionInfo`.
Reviewed By: tlively
Differential Revision: https://reviews.llvm.org/D101029
This CL
1. Creates Utils/ directory under lib/Target/WebAssembly
2. Moves existing WebAssemblyUtilities.cpp|h into the Utils/ directory
3. Creates Utils/WebAssemblyTypeUtilities.cpp|h and put type
declarataions and type conversion functions scattered in various
places into this single place.
It has been suggested several times that it is not easy to share utility
functions between subdirectories (AsmParser, DIsassembler, MCTargetDesc,
...). Sometimes we ended up [[ https://reviews.llvm.org/D92840#2478863 | duplicating ]] the same function because of
this.
There are already other targets doing this: AArch64, AMDGPU, and ARM
have Utils/ subdirectory under their target directory.
This extracts the utility functions into a single directory Utils/ and
make them sharable among all passes in WebAssembly/ and its
subdirectories. Also I believe gathering all type-related conversion
functionalities into a single place makes it more usable. (Actually I
was working on another CL that uses various type conversion functions
scattered in multiple places, which became the motivation for this CL.)
Reviewed By: dschuff, aardappel
Differential Revision: https://reviews.llvm.org/D100995
This is just a cleanup of the very high level stuff. I'm sure there is
more to update here but I'll leave that to others and/or a followup.
Differential Revision: https://reviews.llvm.org/D100888
af7925b4dd added a custom DAG combine for recognizing fp-to-ints of
extract_subvectors that could be lowered to f64x2.convert_low_i32x4_{s,u}
instructions. This commit extends the combines to recognize equivalent
extract_subvectors of fp-to-ints as well.
Differential Revision: https://reviews.llvm.org/D100790
We previously used splats instead of v128.const to materialize vector constants
because V8 did not support v128.const. Now that V8 supports v128.const, we can
use v128.const instead. Although this increases code size, it should also
increase performance (or at least require fewer engine-side optimizations), so
it is an appropriate change to make.
Differential Revision: https://reviews.llvm.org/D100716
Use the target-independent @llvm.fptosi and @llvm.fptoui intrinsics instead.
This includes removing the instrinsics for i32x4.trunc_sat_zero_f64x2_{s,u},
which are now represented in IR as a saturating truncation to a v2i32 followed by
a concatenation with a zero vector.
Differential Revision: https://reviews.llvm.org/D100596
Removes the builtins and intrinsics used to opt in to using these instructions
and replaces them with normal ISel patterns now that they are no longer
prototypes.
Differential Revision: https://reviews.llvm.org/D100402
Add a custom DAG combine and ISD opcode for detecting patterns like
(uint_to_fp (extract_subvector ...))
before the extract_subvector is expanded to ensure that they will ultimately
lower to f64x2.convert_low_i32x4_{s,u} instructions. Since these instructions
are no longer prototypes and can now be produced via standard IR, this commit
also removes the target intrinsics and builtins that had been used to prototype
the instructions.
Differential Revision: https://reviews.llvm.org/D100425
Now that these instructions are no longer prototypes, we do not need to be
careful about keeping them opt-in and can use the standard LLVM infrastructure
for them. This commit removes the bespoke intrinsics we were using to represent
these operations in favor of the corresponding target-independent intrinsics.
The clang builtins are preserved because there is no standard way to easily
represent these operations in C/C++.
For consistency with the scalar codegen in the Wasm backend, the intrinsic used
to represent {f32x4,f64x2}.nearest is @llvm.nearbyint even though
@llvm.roundeven better captures the semantics of the underlying Wasm
instruction. Replacing our use of @llvm.nearbyint with use of @llvm.roundeven is
left to a potential future patch.
Differential Revision: https://reviews.llvm.org/D100411
In the final SIMD spec, there is only a single v128.any_true instruction, rather
than one for each lane interpretation because the semantics do not depend on the
lane interpretation.
Differential Revision: https://reviews.llvm.org/D100241
When lowering a BUILD_VECTOR SDNode, we choose among various possible vector
creation instructions in an attempt to minimize the total number of instructions
used. We previously considered using swizzles, consts, and splats, and this
patch adds shuffles as well. A common pattern that now lowers to shuffles is
when two 64-bit vectors are concatenated. Previously, concatenations generally
lowered to sequences of extract_lane and replace_lane instructions when they
could have been a single shuffle.
Differential Revision: https://reviews.llvm.org/D100018
wasm64 was missing DAG ISEL patterns for external symbol based global.get, but simply adding these analogous to the existing 32-bit versions doesn't work.
This is because we are conflating the 32-bit global index with the pointer represented by the external symbol, which for wasm32 happened to work.
The simplest fix is to pretend we have a 64-bit global index. This sounds incorrect, but is immaterial since once this index is stored as a MachineOperand it becomes 64-bit anyway (and has been all along). As such, the EmitInstrWithCustomInserter based implementation I experimented with become a no-op and no further changes in the C++ code are required.
Differential Revision: https://reviews.llvm.org/D99904
This is a followup to D98145: As far as I know, tracking of kill
flags in FastISel is just a compile-time optimization. However,
I'm not actually seeing any compile-time regression when removing
the tracking. This probably used to be more important in the past,
before FastRA was switched to allocate instructions in reverse
order, which means that it discovers kills as a matter of course.
As such, the kill tracking doesn't really seem to serve a purpose
anymore, and just adds additional complexity and potential for
errors. This patch removes it entirely. The primary changes are
dropping the hasTrivialKill() method and removing the kill
arguments from the emitFast methods. The rest is mechanical fixup.
Differential Revision: https://reviews.llvm.org/D98294
A frequent pattern for floating point conditional branches use an xor
to invert the input for the branch. Instead we can fold away the xor
by swapping the branch target instead.
Differential Revision: https://reviews.llvm.org/D99171
Removes the prototype builtin and intrinsic for i64x2.eq and implements that
instruction as well as the other i64x2 comparison instructions in the final SIMD
spec. Unsigned comparisons were not included in the final spec, so they still
need to be scalarized via a custom lowering.
Differential Revision: https://reviews.llvm.org/D99623
Currently needsStackRealignment returns false if canRealignStack returns false.
This means that the behavior of needsStackRealignment does not correspond to
it's name and description; a function might need stack realignment, but if it
is not possible then this function returns false. Furthermore,
needsStackRealignment is not virtual and therefore some backends have made use
of canRealignStack to indicate whether a function needs stack realignment.
This patch attempts to clarify the situation by separating them and introducing
new names:
- shouldRealignStack - true if there is any reason the stack should be
realigned
- canRealignStack - true if we are still able to realign the stack (e.g. we
can still reserve/have reserved a frame pointer)
- hasStackRealignment = shouldRealignStack && canRealignStack (not target
customisable)
Targets can now override shouldRealignStack to indicate that stack realignment
is required.
This change will make it easier in a future change to handle the case where we
need to realign the stack but can't do so (for example when the register
allocator creates an aligned spill after the frame pointer has been
eliminated).
Differential Revision: https://reviews.llvm.org/D98716
Change-Id: Ib9a4d21728bf9d08a545b4365418d3ffe1af4d87
When I updated the SIMD opcodes in f5764a8654, I accidentally missed updating
i8x16.popcnt. This patch fixes the omission.
Differential Revision: https://reviews.llvm.org/D99536
This patch changes the interface to take a RegisterKind, to indicate
whether the register bitwidth of a scalar register, fixed-width vector
register, or scalable vector register must be returned.
Reviewed By: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D98874
This commit adds a full WasmTableType to MCSymbolWasm, differing from
the current situation (just an ElemType) in that it additionally records
a WasmLimits.
We add support for specifying the limits in .S files also, via the
following syntax variations:
.tabletype SYM, ELEMTYPE
.tabletype SYM, ELEMTYPE, MINSIZE
.tabletype SYM, ELEMTYPE, MINSIZE, MAXSIZE
Depends on D99186.
Differential Revision: https://reviews.llvm.org/D99191
Updates the names (e.g. widen => extend, saturate => sat) and opcodes of all
SIMD instructions to match the finalized SIMD spec. Deliberately does not change
the public interface in wasm_simd128.h yet; that will require more care.
Depends on D98466.
Differential Revision: https://reviews.llvm.org/D98676
Removes the instruction definitions, intrinsics, and builtins for qfma/qfms,
signselect, and prefetch instructions, which were not included in the final
WebAssembly SIMD spec.
Depends on D98457.
Differential Revision: https://reviews.llvm.org/D98466
Now that the WebAssembly SIMD specification is finalized and engines are
generally up-to-date, there is no need for a separate target feature for gating
SIMD instructions that engines have not implemented. With this change,
v128.const is now enabled by default with the simd128 target feature.
Differential Revision: https://reviews.llvm.org/D98457
This patch adds handling for DBG_VALUE_LIST in the MIR-passes (after
finalize-isel), excluding the debug liveness passes and DWARF emission. This
most significantly affects MachineSink, which now needs to consider all used
registers of a debug value when sinking, but for most passes this change is
simply replacing getDebugOperand(0) with an iteration over all debug operands.
Differential Revision: https://reviews.llvm.org/D92578
This `R_WASM_MEMORY_ADDR_SELFREL_I32` relocation represents an offset
between its relocating address and the symbol address. It's very similar
to `R_X86_64_PC32` but restricted to be used for only data segments.
```
S + A - P
```
A: Represents the addend used to compute the value of the relocatable
field.
P: Represents the place of the storage unit being relocated.
S: Represents the value of the symbol whose index resides in the
relocation entry.
Proposal: https://github.com/WebAssembly/tool-conventions/issues/162
Differential Revision: https://reviews.llvm.org/D96659
Rewrites test to use correct architecture triple; fixes incorrect
reference in SourceLevelDebugging doc; simplifies `spillReg` behaviour
so as to not be dependent on changes elsewhere in the patch stack.
This reverts commit d2000b45d0.
This is a case D97677 missed. When taking out remaining BBs that are
reachable from already-taken-out exceptions (because they are not
subexcptions but unwind destinations), I assumed the remaining BBs are
not EH pads, but they can be. For example,
```
try {
try {
throw 0;
} catch (int) { // (a)
}
} catch (int) { // (b)
}
try {
foo();
} catch (int) { // (c)
}
```
In this code, (b) is the unwind destination of (a) so its exception is
taken out of (a)'s exception, But even though the next try-catch is not
inside the first two-level try-catches, because the first try always
throws, its continuation BB is unreachable and the whole rest of the
function is dominated by EH pad (a), including EH pad (c). So after we
take out of (b)'s exception out of (a)'s, we also need to take out (c)'s
exception out of (a)'s, because (c) is reachable from (b).
This adds one more step before what we did for remaining BBs in D97677;
it traverses EH pads first to take subexceptions out of their incorrect
parent exception. It's the same thing as D97677, but because we can do
this before we add BBs to exceptions' sets, we don't need to fix sets
and only need to fix parent exception pointers.
Other changes are variable name changes (I changed `WE` -> `SrcWE`,
`UnwindWE` -> `DstWE` for clarity), some comment changes, and a drive-by
fix in a bug in a `LLVM_DEBUG` print statement.
Fixes https://github.com/emscripten-core/emscripten/issues/13588.
Reviewed By: dschuff
Differential Revision: https://reviews.llvm.org/D97929
Background:
Wasm EH, while using Windows EH (catchpad/cleanuppad based) IR, uses
Itanium-based libraries and ABIs with some modifications.
`__clang_call_terminate` is a wrapper generated in Clang's Itanium C++
ABI implementation. It contains this code, in C-style pseudocode:
```
void __clang_call_terminate(void *exn) {
__cxa_begin_catch(exn);
std::terminate();
}
```
So this function is a wrapper to call `__cxa_begin_catch` on the
exception pointer before termination.
In Itanium ABI, this function is called when another exception is thrown
while processing an exception. The pointer for this second, violating
exception is passed as the argument of this `__clang_call_terminate`,
which calls `__cxa_begin_catch` with that pointer and calls
`std::terminate` to terminate the program.
The spec (https://libcxxabi.llvm.org/spec.html) for `__cxa_begin_catch`
says,
```
When the personality routine encounters a termination condition, it
will call __cxa_begin_catch() to mark the exception as handled and then
call terminate(), which shall not return to its caller.
```
In wasm EH's Clang implementation, this function is called from
cleanuppads that terminates the program, which we also call terminate
pads. Cleanuppads normally don't access the thrown exception and the
wasm backend converts them to `catch_all` blocks. But because we need
the exception pointer in this cleanuppad, we generate
`wasm.get.exception` intrinsic (which will eventually be lowered to
`catch` instruction) as we do in the catchpads. But because terminate
pads are cleanup pads and should run even when a foreign exception is
thrown, so what we have been doing is:
1. In `WebAssemblyLateEHPrepare::ensureSingleBBTermPads()`, we make sure
terminate pads are in this simple shape:
```
%exn = catch
call @__clang_call_terminate(%exn)
unreachable
```
2. In `WebAssemblyHandleEHTerminatePads` pass at the end of the
pipeline, we attach a `catch_all` to terminate pads, so they will be in
this form:
```
%exn = catch
call @__clang_call_terminate(%exn)
unreachable
catch_all
call @std::terminate()
unreachable
```
In `catch_all` part, we don't have the exception pointer, so we call
`std::terminate()` directly. The reason we ran HandleEHTerminatePads at
the end of the pipeline, separate from LateEHPrepare, was it was
convenient to assume there was only a single `catch` part per `try`
during CFGSort and CFGStackify.
---
Problem:
While it thinks terminate pads could have been possibly split or calls
to `__clang_call_terminate` could have been duplicated,
`WebAssemblyLateEHPrepare::ensureSingleBBTermPads()` assumes terminate
pads contain no more than calls to `__clang_call_terminate` and
`unreachable` instruction. I assumed that because in LLVM very limited
forms of transformations are done to catchpads and cleanuppads to
maintain the scoping structure. But it turned out to be incorrect;
passes can merge cleanuppads into one, including terminate pads, as long
as the new code has a correct scoping structure. One pass that does this
I observed was `SimplifyCFG`, but there can be more. After this
transformation, a single cleanuppad can contain any number of other
instructions with the call to `__clang_call_terminate` and can span many
BBs. It wouldn't be practical to duplicate all these BBs within the
cleanuppad to generate the equivalent `catch_all` blocks, only with
calls to `__clang_call_terminate` replaced by calls to `std::terminate`.
Unless we do more complicated transformation to split those calls to
`__clang_call_terminate` into a separate cleanuppad, it is tricky to
solve.
---
Solution (?):
This CL just disables the generation and use of `__clang_call_terminate`
and calls `std::terminate()` directly in its place.
The possible downside of this approach can be, because the Itanium ABI
intended to "mark" the violating exception handled, we don't do that
anymore. What `__cxa_begin_catch` actually does is increment the
exception's handler count and decrement the uncaught exception count,
which in my opinion do not matter much given that we are about to
terminate the program anyway. Also it does not affect info like stack
traces that can be possibly shown to developers.
And while we use a variant of Itanium EH ABI, we can make some
deviations if we choose to; we are already different in that in the
current version of the EH spec we don't support two-phase unwinding. We
can possibly consider a more complicated transformation later to
reenable this, but I don't think that has high priority.
Changes in this CL contains:
- In Clang, we don't generate a call to `wasm.get.exception()` intrinsic
and `__clang_call_terminate` function in terminate pads anymore; we
simply generate calls to `std::terminate()`, which is the default
implementation of `CGCXXABI::emitTerminateForUnexpectedException`.
- Remove `WebAssembly::ensureSingleBBTermPads() function and
`WebAssemblyHandleEHTerminatePads` pass, because terminate pads are
already `catch_all` now (because they don't need the exception
pointer) and we don't need these transformations anymore.
- Change tests to use `std::terminate` directly. Also removes tests that
tested `LateEHPrepare::ensureSingleBBTermPads` and
`HandleEHTerminatePads` pass.
- Drive-by fix: Add some function attributes to EH intrinsic
declarations
Fixes https://github.com/emscripten-core/emscripten/issues/13582.
Reviewed By: dschuff, tlively
Differential Revision: https://reviews.llvm.org/D97834
This patch adds a new instruction that can represent variadic debug values,
DBG_VALUE_VAR. This patch alone covers the addition of the instruction and a set
of basic code changes in MachineInstr and a few adjacent areas, but does not
correctly handle variadic debug values outside of these areas, nor does it
generate them at any point.
The new instruction is similar to the existing DBG_VALUE instruction, with the
following differences: the operands are in a different order, any number of
values may be used in the instruction following the Variable and Expression
operands (these are referred to in code as “debug operands”) and are indexed
from 0 so that getDebugOperand(X) == getOperand(X+2), and the Expression in a
DBG_VALUE_VAR must use the DW_OP_LLVM_arg operator to pass arguments into the
expression.
The new DW_OP_LLVM_arg operator is only valid in expressions appearing in a
DBG_VALUE_VAR; it takes a single argument and pushes the debug operand at the
index given by the argument onto the Expression stack. For example the
sub-expression `DW_OP_LLVM_arg, 0` has the meaning “Push the debug operand at
index 0 onto the expression stack.”
Differential Revision: https://reviews.llvm.org/D82363
The WebAssembly text and binary formats have different operand orders
for the "type" and "table" fields of call_indirect (and
return_call_indirect). In LLVM we use the binary order for the MCInstr,
but when we produce or consume the text format we should use the text
order. For compilation units targetting WebAssembly 1.0 (without the
reference types feature), we omit the table operand entirely.
Differential Revision: https://reviews.llvm.org/D97761
This fixes two bugs in `WebAssemblyExceptionInfo` grouping, created by
D97247. These two bugs are not easy to split into two different CLs,
because tests that fail for one also tend to fail for the other.
- In D97247, when fixing `ExceptionInfo` grouping by taking out
the unwind destination' exception from the unwind src's exception, we
just iterated the BBs in the function order, but this was incorrect;
this changes it to dominator tree preorder. Please refer to the
comments in the code for the reason and an example.
- After this subexception-taking-out fix, there still can be remaining
BBs we have to take out. When Exception B is taken out of Exception A
(because EHPad B is the unwind destination of EHPad A), there can
still be BBs within Exception A that are reachable from Exception B,
which also should be taken out. Please refer to the comments in the
code for more detailed explanation on why this can happen. To make
this possible, this splits `WebAssemblyException::addBlock` into two
parts: adding to a set and adding to a vector. We need to iterate on
BBs within a `WebAssemblyException` to fix this, so we add BBs to sets
first. But we add BBs to vectors later after we fix all incorrectness
because deleting BBs from vectors is expensive. I considered removing
the vector from `WebAssemblyException`, but it was not easy because
this class has to maintain a similar interface with `MachineLoop` to
be wrapped into a single interface `SortRegion`, which is used in
CFGSort.
Other misc. drive-by fixes:
- Make `WebAssemblyExceptionInfo` do not even run when wasm EH is not
used or the function doesn't have any EH pads, not to waste time
- Add `LLVM_DEBUG` lines for easy debugging
- Fix `preds` comments in cfg-stackify-eh.ll
- Fix `__cxa_throw`'s signature in cfg-stackify-eh.ll
Fixes https://github.com/emscripten-core/emscripten/issues/13554.
Reviewed By: dschuff, tlively
Differential Revision: https://reviews.llvm.org/D97677
In `LateEHPrepare::addCatchAlls`, the current code tries to get the
iterator's debug info even when it is `MachineBasicBlock::end()`. This
fixes the bug by adding empty debug info instead in that case.
Reviewed By: tlively
Differential Revision: https://reviews.llvm.org/D97679
If the reference-types feature is enabled, call_indirect will explicitly
reference its corresponding function table via TABLE_NUMBER
relocations against a table symbol.
Also, as before, address-taken functions can also cause the function
table to be created, only with reference-types they additionally cause a
symbol table entry to be emitted.
Differential Revision: https://reviews.llvm.org/D90948
D97247 added the reverse mapping from unwind destination to their
source, but it had a critical bug; sources can be multiple, because
multiple BBs can have a single BB as their unwind destination.
This changes `WasmEHFuncInfo::getUnwindSrc` to `getUnwindSrcs` and makes
it return a vector rather than a single BB. It does not return the const
reference to the existing vector but creates a new vector because
`WasmEHFuncInfo` stores not `BasicBlock*` or `MachineBasicBlock*` but
`PointerUnion` of them. Also I hoped to unify those methods for
`BasicBlock` and `MachineBasicBlock` into one using templates to reduce
duplication, but failed because various usages require `BasicBlock*` to
be `const` but it's hard to make it `const` for `MachineBasicBlock`
usages.
Fixes https://github.com/emscripten-core/emscripten/issues/13514.
(More precisely, fixes
https://github.com/emscripten-core/emscripten/issues/13514#issuecomment-784708744)
Reviewed By: dschuff, tlively
Differential Revision: https://reviews.llvm.org/D97583
Use `APInt` to convert a 32-bit or 64-bit immediate to an `APFloat` rather than
`bit_cast` to a `float` or `double` to avoid going through host floating-point and
potentially changing the bit pattern of NaNs.
Differential Revision: https://reviews.llvm.org/D97490
This is a case D97178 tried to solve but missed. D97178 could not handle
the case when
multiple consecutive delegates are generated:
- Before:
```
block
br (a)
try
catch
end_try
end_block
<- (a)
```
- After
```
block
br (a)
try
...
try
try
catch
end_try
<- (a)
delegate
delegate
end_block
<- (b)
```
(The `br` should point to (b) now)
D97178 assumed `end_block` exists two BBs later than `end_try`, because
it assumed the order as `end_try` BB -> `delegate` BB -> `end_block` BB.
But it turned out there can be multiple `delegate`s in between. This
patch changes the logic so we just search from `end_try` BB until we
find `end_block`.
Fixes https://github.com/emscripten-core/emscripten/issues/13515.
(More precisely, fixes
https://github.com/emscripten-core/emscripten/issues/13515#issuecomment-784711318.)
Reviewed By: dschuff, tlively
Differential Revision: https://reviews.llvm.org/D97569
This CL is not big but contains changes that span multiple analyses and
passes. This description is very long because it tries to explain basics
on what each pass/analysis does and why we need this change on top of
that. Please feel free to skip parts that are not necessary for your
understanding.
---
`WasmEHFuncInfo` contains the mapping of <EH pad, the EH pad's next
unwind destination>. The value (unwind dest) here is where an exception
should end up when it is not caught by the key (EH pad). We record this
info in WasmEHPrepare to fix catch mismatches, because the CFG itself
does not have this info. A CFG only contains BBs and
predecessor-successor relationship between them, but in `WasmEHFuncInfo`
the unwind destination BB is not necessarily a successor or the key EH
pad BB. Their relationship can be intuitively explained by this C++ code
snippet:
```
try {
try {
foo();
} catch (int) { // EH pad
...
}
} catch (...) { // unwind destination
}
```
So when `foo()` throws, it goes to `catch (int)` first. But if it is not
caught by it, it ends up in the next unwind destination `catch (...)`.
This unwind destination is what you see in `catchswitch`'s
`unwind label %bb` part.
---
`WebAssemblyExceptionInfo` groups exceptions so that they can be sorted
continuously together in CFGSort, as we do for loops. What this analysis
does is very simple: it creates a single `WebAssemblyException` per EH
pad, and all BBs that are dominated by that EH pad are included in this
exception. We also identify subexception relationship in this way: if
EHPad A domiantes EHPad B, EHPad B's exception is a subexception of
EHPad A's exception.
This simple rule turns out to be incorrect in some cases. In
`WasmEHFuncInfo`, if EHPad A's unwind destination is EHPad B, it means
semantically EHPad B should not be included in EHPad A's exception,
because it does not make sense to rethrow/delegate to an inner scope.
This is what happened in CFGStackify as a result of this:
```
try
try
catch
... <- %dest_bb is among here!
end
delegate %dest_bb
```
So this patch adds a phase in `WebAssemblyExceptionInfo::recalculate` to
make sure excptions' unwind destinations are not subexceptions of
their unwind sources in `WasmEHFuncInfo`.
But this alone does not prevent `dest_bb` in the example above from
being sorted within the inner `catch`'s exception, even if its exception
is not a subexception of that `catch`'s exception anymore, because of
how CFGSort works, which will be explained below.
---
CFGSort places BBs within the same `SortRegion` (loop or exception)
continuously together so they can be demarcated with `loop`-`end_loop`
or `catch`-`end_try` in CFGStackify.
`SortRegion` is a wrapper for one of `MachineLoop` or
`WebAssemblyException`. `SortRegionInfo` already does some complicated
things because there discrepancies between those two data structures.
`WebAssemblyException` is what we control, and it is defined as an EH
pad as its header and BBs dominated by the header as its BBs (with a
newly added exception of unwind destinations explained in the previous
paragraph). But `MachineLoop` is an LLVM data structure and uses the
standard loop detection algorithm. So by the algorithm, BBs that are 1.
dominated by the loop header and 2. have a path back to its header.
Because of the second condition, many BBs that are dominated by the loop
header are not included in the loop. So BBs that contain `return` or
branches to outside of the loop are not technically included in
`MachineLoop`, but they can be sorted together with the loop with no
problem.
Maybe to relax the condition, in CFGSort, when we are in a `SortRegion`
we allow sorting of not only BBs that belong to the current innermost
region but also BBs that are by the current region header.
(This was written this way from the first version written by Dan, when
only loops existed.) But now, we have cases in exceptions when EHPad B
is the unwind destination for EHPad A, even if EHPad B is dominated by
EHPad A it should not be included in EHPad A's exception, and should not
be sorted within EHPad A.
One way to make things work, at least correctly, is change `dominates`
condition to `contains` condition for `SortRegion` when sorting BBs, but
this will change compilation results for existing non-EH code and I
can't be sure it will not degrade performance or code size. I think it
will degrade performance because it will force many BBs dominated by a
loop, which don't have the path back to the header, to be placed after
the loop and it will likely to create more branches and blocks.
So this does a little hacky check when adding BBs to `Preferred` list:
(`Preferred` list is a ready list. CFGSort maintains ready list in two
priority queues: `Preferred` and `Ready`. I'm not very sure why, but it
was written that way from the beginning. BBs are first added to
`Preferred` list and then some of them are pushed to `Ready` list, so
here we only need to guard condition for `Preferred` list.)
When adding a BB to `Preferred` list, we check if that BB is an unwind
destination of another BB. To do this, this adds the reverse mapping,
`UnwindDestToSrc`, and getter methods to `WasmEHFuncInfo`. And if the BB
is an unwind destination, it checks if the current stack of regions
(`Entries`) contains its source BB by traversing the stack backwards. If
we find its unwind source in there, we add the BB to its `Deferred`
list, to make sure that unwind destination BB is added to `Preferred`
list only after that region with the unwind source BB is sorted and
popped from the stack.
---
This does not contain a new test that crashes because of this bug, but
this fix changes the result for one of existing test case. This test
case didn't crash because it fortunately didn't contain `delegate` to
the incorrectly placed unwind destination BB.
Fixes https://github.com/emscripten-core/emscripten/issues/13514.
Reviewed By: dschuff, tlively
Differential Revision: https://reviews.llvm.org/D97247
This adds support for serialization of `WasmEHFuncInfo`, in the form of
<Source BB Number, Unwind destination BB number>. To make YAML mapping
work, we needed to make a copy of the existing `SrcToUnwindDest` map
within `yaml::WebAssemblyMachineFunctionInfo`.
It was hard to add EH MIR tests for CFGStackify because `WasmEHFuncInfo`
could not be read from test MIR files. This adds the serialization
support for that to make EH MIR tests easier.
Reviewed By: dschuff
Differential Revision: https://reviews.llvm.org/D97174
This renames variable and method names in `WasmEHFuncInfo` class to be
simpler and clearer. For example, unwind destinations are EH pads by
definition so it doesn't necessarily need to be included in every method
name. Also I am planning to add the reverse mapping in a later CL,
something like `UnwindDestToSrc`, so this renaming will make meanings
clearer.
Reviewed By: dschuff
Differential Revision: https://reviews.llvm.org/D97173
If the reference-types feature is enabled, call_indirect will explicitly
reference its corresponding function table via `TABLE_NUMBER`
relocations against a table symbol.
Also, as before, address-taken functions can also cause the function
table to be created, only with reference-types they additionally cause a
symbol table entry to be emitted.
We abuse the used-in-reloc flag on symbols to indicate which tables
should end up in the symbol table. We do this because unfortunately
older wasm-ld will carp if it see a table symbol.
Differential Revision: https://reviews.llvm.org/D90948
Usually `EH_LABEL`s are placed in
- Before an `invoke` (which becomes calls in the backend)
- After an `invoke`
- At the start of an EH pad
I don't know exactly why, but I noticed there are cases of multiple, not
a single, `EH_LABEL` instructions in the beginning of an EH pad. In that
case `global.set` instruction placed to restore `__stack_pointer` ended
up between two `EH_LABEL` instructions before `CATCH`. It should follow
after the `EH_LABEL`s and `CATCH`. This CL fixes that case.
Reviewed By: dschuff
Differential Revision: https://reviews.llvm.org/D96970
Updating `EHPadStack` with respect to `TRY` and `CATCH` instructions
have to be done after checking all other conditions, not before. Because
we did this before checking other conditions, when we encounter `TRY`
and we want to record the current mismatching range, we already have
popped up the entry from `EHPadStack`, which we need to access to record
the range.
The `baz` call in the added test needs try-delegate because the previous
TRY marker placement for `quux` was placed before `baz`, because `baz`'s
return value was stackified in RegStackify. If this wasn't stackified
this try-delegate is not strictly necessary, but at the moment it is not
easy to identify cases like this. I plan to transfer `nounwind`
attributes from the LLVM IR to prevent cases like this. The call in the
test does not have `unwind` attribute in order to test this bug, but in
many cases of this pattern the previous call has `nounwind` attribute.
Reviewed By: tlively
Differential Revision: https://reviews.llvm.org/D96711
Previously we assumed `rethrow`'s argument was always 0, but it turned
out `rethrow` follows the same rule with `br` or `delegate`:
https://github.com/WebAssembly/exception-handling/pull/137https://github.com/WebAssembly/exception-handling/issues/146#issuecomment-777349038
Currently `rethrow`s generated by our backend always rethrow the
exception caught by the innermost enclosing catch, so this adds a
function to compute that and replaces `rethrow`'s argument with its
computed result.
This also renames `EHPadStack` in `InstPrinter` to `TryStack`, because
in CFGStackify we use `EHPadStack` to mean the range between
`catch`~`end`, while in `InstPrinter` we used it to mean the range
between `try`~`catch`, so choosing different names would look clearer.
Doesn't contain any functional changes in `InstPrinter`.
Reviewed By: dschuff
Differential Revision: https://reviews.llvm.org/D96595
I previously assumed `delegate`'s immediate argument computation
followed a different rule than that of branches, but we agreed to make
it the same
(https://github.com/WebAssembly/exception-handling/issues/146). This
removes the need for a separate `DelegateStack` in both CFGStackify and
InstPrinter.
When computing the immediate argument, we use a different function for
`delegate` computation because in MIR `DELEGATE`'s instruction's
destination is the destination catch BB or delegate BB, and when it is a
catch BB, we need an additional step of getting its corresponding `end`
marker.
Reviewed By: tlively, dschuff
Differential Revision: https://reviews.llvm.org/D96525
Enable partial and runtime unrolling with a threshold of 30, which
was derived from a large number of kernels running on node and
wasmtime for amd64 and aarch64.
Unrolling is enabled by default at -O2 and -O3 and is disabled at
-Oz and -Os. Compiling with -Os is recommended if the wasm binary
size is the most important factor.
Differential Revision: https://reviews.llvm.org/D95125
Fixes TableGen parser errors reported by D95874 due to incompatible types being used on multiclass templates.
Differential Revision: https://reviews.llvm.org/D96205
This updates InstPrinter and AsmParser for `delegate` and `catch_all`
instructions. Both will reject programs with multiple `catch_all`s per a
single `try`. And InstPrinter uses `EHInstStack` to figure out whether
to print catch label comments: It does not print catch label comments
for second `catch` or `catch_all` in a `try`.
Reviewed By: aardappel
Differential Revision: https://reviews.llvm.org/D94051
Terminate pads, cleanup pads with `__clang_call_terminate` call, have
`catch` instruction in them because `__clang_call_terminate` takes an
exception pointer. But these terminate pads should be reached also in
case of foreign exception. So this pass attaches an additional
`catch_all` BB after every terminate pad BB, with a call to
`std::terminate`.
Reviewed By: tlively
Differential Revision: https://reviews.llvm.org/D94050
This fixes unwind destination mismatches caused by 'catch'es, which
occur when a foreign exception is not caught by the nearest `catch` and
the next outer `catch` is not the catch it should unwind to, or the next
unwind destination should be the caller instead. This kind of mismatches
didn't exist in the previous version of the spec, because in the
previous spec `catch` was effectively `catch_all`, catching all
exceptions.
Reviewed By: tlively
Differential Revision: https://reviews.llvm.org/D94049
This adds `delegate` instruction and use it to fix unwind destination
mismatches created by marker placement in CFGStackify.
There are two kinds of unwind destination mismatches:
- Mismatches caused by throwing instructions (here we call it "call
unwind mismatches", even though `throw` and `rethrow` can also cause
mismatches)
- Mismatches caused by `catch`es, in case a foreign exception is not
caught by the nearest `catch` and the next outer `catch` is not the
catch it should unwind to. This kind of mismatches didn't exist in the
previous version of the spec, because in the previous spec `catch` was
effectively `catch_all`, catching all exceptions.
This implements routines to fix the first kind of unwind mismatches,
which we call "call unwind mismatches". The second mismatch (catch
unwind mismatches) will be fixed in a later CL.
This also reenables all previously disabled tests in cfg-stackify-eh.ll
and updates FileCheck lines to match the new spec. Two tests were
deleted because they specifically tested the way we fixed unwind
mismatches before using `exnref`s and branches, which we don't do
anymore.
Reviewed By: tlively
Differential Revision: https://reviews.llvm.org/D94048
We used to require .functype immediately follows the label it sets the type of, but not all Clang output follows this rule.
Now we simply allow it on any symbol, but only assume its a function start for a defined symbol, which is simpler and more general.
Fixes (part of) https://bugs.llvm.org/show_bug.cgi?id=49036
Differential Revision: https://reviews.llvm.org/D96165
As mentioned in TODO comment, casting double to float causes NaNs to change bits.
To avoid the change, this patch adds support for single-floating-point immediate value on MachineCode.
Patch by Yuta Saito.
Differential Revision: https://reviews.llvm.org/D77384
Various *TargetStreamer.h need formatted_raw_ostream but rely on a
forward declaration of formatted_raw_ostream in MCStreamer.h. This
patch adds forward declarations right in *TargetStreamer.h.
While we are at it, this patch removes the one in MCStreamer.h, where
it is unnecessary.
This reverts commit 418df4a6ab.
This change broke emscripten tests, I believe because it started
generating 5-byte a wide table index in the call_indirect instruction.
Neither v8 nor wabt seem to be able to handle that. The spec
currently says that this is single 0x0 byte and:
"In future versions of WebAssembly, the zero byte occurring in the
encoding of the call_indirectcall_indirect instruction may be used to
index additional tables."
So we need to revisit this change. For backwards compat I guess
we need to guarantee that __indirect_function_table is always at
address zero. We could also consider making this a single-byte
relocation with and assert if have more than 127 tables (for now).
Differential Revision: https://reviews.llvm.org/D95005
This patch changes to make call_indirect explicitly refer to the
corresponding function table, residualizing TABLE_NUMBER relocs against
it.
With this change, wasm-ld now sees all references to tables, and can
link multiple tables.
Differential Revision: https://reviews.llvm.org/D90948
After placing markers, we removed some unnecessary branches, but it only
handled the simplest case. This makes more unnecessary branches to be
removed.
Reviewed By: dschuff, tlively
Differential Revision: https://reviews.llvm.org/D94047
Updating `ScopeTops` is something we frequently do in CFGStackify, so
this factors it out as a function. This also makes a few utility
functions templated so that they are not dependent on input vector
types and simplifies function parameters.
Reviewed By: tlively
Differential Revision: https://reviews.llvm.org/D94046
This ensures every single terminate pad is a single BB in the form of:
```
%exn = catch $__cpp_exception
call @__clang_call_terminate(%exn)
unreachable
```
This is a preparation for HandleEHTerminatePads pass, which will be
added in a later CL and will run after CFGStackify. That pass duplicates
terminate pads with a `catch_all` instruction, and duplicating it
becomes simpler if we can ensure every terminate pad is a single BB.
Reviewed By: dschuff, tlively
Differential Revision: https://reviews.llvm.org/D94045
This removes unreachable EH pads in LateEHPrepare. This is not for
optimization but for preparation for CFGStackify. In CFGStackify, we
determine where to place `try` marker by computing the nearest common
dominator of all predecessors of an EH pad, but when an EH pad does not
have a predecessor, it becomes tricky. We can insert an empty dummy BB
before the EH pad and place the `try` there, but removing unreachable EH
pads is simpler.
This moves an existing exception label test from eh-label.mir to
exception.mir and adds a new test there.
This also adds some comments to existing methods.
Reviewed By: dschuff, tlively
Differential Revision: https://reviews.llvm.org/D94044
- Updates InstPrinter to handle `catch_all`.
- Makes `rethrow` condition an early exit from the function to make the
rest simpler.
- Unify label and catch counters. They don't need to be counted
separately and this will help `delegate` instruction later.
- Removes `LastSeenEHInst` field. This was first introduced to handle
when there are more than one `catch` blocks per `try`, but this was
not implemented correctly and not being used at the moment anyway.
- Reenables all tests in cfg-stackify-eh.ll that don't deal with unwind
destination mismatches, which will be handled in a later CL.
Reviewed By: dschuff, tlively, aardappel
Differential Revision: https://reviews.llvm.org/D94043
This removes `exnref` type and `br_on_exn` instruction. This is
effectively NFC because most uses of these were already removed in the
previous CLs.
Reviewed By: dschuff, tlively
Differential Revision: https://reviews.llvm.org/D94041
This implements basic instructions for the new spec.
- Adds new versions of instructions: `catch`, `catch_all`, and `rethrow`
- Adds support for instruction selection for the new instructions
- `catch` needs a custom routine for the same reason `throw` needs one,
to encode `__cpp_exception` tag symbol.
- Updates `WebAssembly::isCatch` utility function to include `catch_all`
and Change code that compares an instruction's opcode with `catch` to
use that function.
- LateEHPrepare
- Previously in LateEHPrepare we added `catch` instruction to both
`catchpad`s (for user catches) and `cleanuppad`s (for destructors).
In the new version `catch` is generated from `llvm.catch` intrinsic
in instruction selection phase, so we only need to add `catch_all`
to the beginning of cleanup pads.
- `catch` is generated from instruction selection, but we need to
hoist the `catch` instruction to the beginning of every EH pad,
because `catch` can be in the middle of the EH pad or even in a
split BB from it after various code transformations.
- Removes `addExceptionExtraction` function, which was used to
generate `br_on_exn` before.
- CFGStackfiy: Deletes `fixUnwindMismatches` function. Running this
function on the new instruction causes crashes, and the new version
will be added in a later CL, whose contents will be completely
different. So deleting the whole function will make the diff easier to
read.
- Reenables all disabled tests in exception.ll and eh-lsda.ll and a
single basic test in cfg-stackify-eh.ll.
- Updates existing tests to use the new assembly format. And deletes
`br_on_exn` instructions from the tests and FileCheck lines.
Reviewed By: dschuff, tlively
Differential Revision: https://reviews.llvm.org/D94040
Clang generates `wasm.get.exception` and `wasm.get.ehselector`
intrinsics, which respectively return a caught exception value (a
pointer to some C++ exception struct) and a selector (an integer value
that tells which C++ `catch` clause the current exception matches, or
does not match any).
WasmEHPrepare is a pass that does some IR-level preparation before
instruction selection. Previously one of things we did in this pass was
to convert `wasm.get.exception` intrinsic calls to
`wasm.extract.exception` intrinsics. Their semantics were the same
except `wasm.extract.exception` did not have a token argument. We
maintained these two separate intrinsics with the same semantics because
instruction selection couldn't handle token arguments. This
`wasm.extract.exception` intrinsic was later converted to
`extract_exception` instruction in instruction selection, which was a
pseudo instruction to implement `br_on_exn`. Because `br_on_exn` pushed
an extracted value onto the value stack after the `end` instruction of a
`block`, but LLVM does not have a way of modeling that kind of behavior,
so this pseudo instruction was used to pull an extracted value out of
thin air, like this:
```
block $l0
...
br_on_exn $cpp_exception $l0
...
end
extract_exception ;; pushes values onto the stack
```
In the new spec, we don't need this pseudo instruction anymore because
`catch` itself returns a value and we don't have `br_on_exn` anymore. In
the spec `catch` returns multiple values (like `br_on_exn`), but here we
assume it only returns a single i32, which is sufficient to support C++.
So this renames `wasm.get.exception` intrinsic to `wasm.catch`. Because
this CL does not yet contain instruction selection for `wasm.catch`
intrinsic, all `RUN` lines in exception.ll, eh-lsda.ll, and
cfg-stackify-eh.ll, and a single `RUN` line in wasm-eh.cpp (which is an
end-to-end test from C++ source to assembly) fail. So this CL
temporarily disables those `RUN` lines, and for those test files without
any valid remaining `RUN` lines, adds a dummy `RUN` line to make them
pass. These tests will be reenabled in later CLs.
Reviewed By: dschuff, tlively
Differential Revision: https://reviews.llvm.org/D94039
`wasm_rethrow_in_catch` intrinsic and builtin are used in order to
rethrow an exception when the exception is caught but there is no
matching clause within the current `catch`. For example,
```
try {
foo();
} catch (int n) {
...
}
```
If the caught exception does not correspond to C++ `int` type, it should
be rethrown. These intrinsic/builtin were renamed `rethrow_in_catch`
because at the time I thought there would be another intrinsic for C++'s
`throw` keyword, which rethrows an exception. It turned out that `throw`
keyword doesn't require wasm's `rethrow` instruction, so we rename
`rethrow_in_catch` to just `rethrow` here.
Reviewed By: dschuff, tlively
Differential Revision: https://reviews.llvm.org/D94038
A struct in C passed by value did not get debug information. Such values are currently
lowered to a Wasm local even in -O0 (not to an alloca like on other archs), which becomes
a Target Index operand (TI_LOCAL). The DWARF writing code was not emitting locations
in for TI's specifically if the location is a single range (not a list).
In addition, the ExplicitLocals pass which removes the ARGUMENT pseudo instructions did
not update the associated DBG_VALUEs, and couldn't even find these values since the code
assumed such instructions are adjacent, which is not the case here.
Also fixed asm printing of TIs needed by a test.
Differential Revision: https://reviews.llvm.org/D94140
Make the sequence of passes to select and rewrite instructions to
physical registers be a target callback. This is to prepare to allow
targets to split register allocation into multiple phases.
For wasm-ld table linking work to proceed, object files should indicate
if they use an indirect function table. In the future this will be done
by the usual symbols and relocations mechanism, but until that support
lands in the linker, the presence of an `__indirect_function_table` in
the object file's import section shows that the object file needs an
indirect function table.
Prior to https://reviews.llvm.org/D91637, this condition was met by all
object files residualizing an `__indirect_function_table` import.
Since https://reviews.llvm.org/D91637, the intention has been that only
those object files needing an indirect function table would have the
`__indirect_function_table` import. However, we missed the case of
object files which use the table via `call_indirect` but which
themselves do not declare any indirect functions.
This changeset makes it so that when we lower a call to `call_indirect`,
that we ensure that a `__indirect_function_table` symbol is present and
that it will be propagated to the linker.
A followup patch will revise this mechanism to make an explicit link
between `call_indirect` and its associated indirect function table; see
https://reviews.llvm.org/D90948.
Differential Revision: https://reviews.llvm.org/D92840
As proposed in https://github.com/WebAssembly/simd/pull/380. This commit makes
the new instructions available only via clang builtins and LLVM intrinsics to
make their use opt-in while they are still being evaluated for inclusion in the
SIMD proposal.
Depends on D93771.
Differential Revision: https://reviews.llvm.org/D93775
This commit is a follow-on to c2c2e9119e73, using the `Vec` records introduced
in that commit in the rest of the SIMD instruction definitions. Also removes
unnecessary types in output patterns.
Differential Revision: https://reviews.llvm.org/D93771
Introduce `Vec` records, each bundling all information related to a single SIMD
lane interpretation. This lets TableGen definitions take a single Vec parameter
from which they can extract information rather than taking multiple redundant
parameters. This commit refactors all of the SIMD load and store instruction
definitions to use the new `Vec`s. Subsequent commits will similarly refactor
additional instruction definitions.
Differential Revision: https://reviews.llvm.org/D93660
This CL changes the asm syntax for section flags, making them more like ELF
(previously "passive" was the only option). Now we also allow "G" to designate
COMDAT group sections. In these sections we set the appropriate comdat flag on
function symbols, and also avoid auto-creating a new section for them.
This also adds asm-based tests for the changes D92691 to go along with
the direct-to-object tests.
Differential Revision: https://reviews.llvm.org/D92952
This is a reland of rG4564553b8d8a with a fix to the lit pipeline in
llvm/test/MC/WebAssembly/comdat.ll
This CL changes the asm syntax for section flags, making them more like ELF
(previously "passive" was the only option). Now we also allow "G" to designate
COMDAT group sections. In these sections we set the appropriate comdat flag on
function symbols, and also avoid auto-creating a new section for them.
This also adds asm-based tests for the changes D92691 to go along with
the direct-to-object tests.
Differential Revision: https://reviews.llvm.org/D92952
The main this this test does is to add the `IsNotPIC` predicate to the
all the atomic instructions pattern that directly refer to
`tglobaladdr`.
This is because in PIC mode we need to generate separate instruction
sequence (either a direct global.get, or __memory_base + offset) for
accessing global addresses.
As part of this change I noticed that many of the `Requires` attributes
added to the instruction in `WebAssemblyInstrAtomics.td` were being
honored. This is because the wrapped in a `let Predicates =
[HasAtomics]` block and it seems that that outer wrapping overrides any
`Requires` on defs within it. As a workaround I removed the outer
`let` and added `HasAtomics` to all the inner `Requires`. I believe
that all the instrucitons that don't have `Requires` explicit bottom out
in `ATOMIC_I` and `ATOMIC_NRI` which have `HasAtomics` so this should
not remove this predicate from any patterns (at least that is the idea).
The alternative to this approach looks like implementing something
like `PredicateControl` in `Mips.td` where we can split the predicates
into groups so they don't clobber each other.
Differential Revision: https://reviews.llvm.org/D92744
This adds missing `select` instruction support and block return type
support for reference types. Also refactors WebAssemblyInstrRef.td and
rearranges tests in reference-types.s. Tests don't include `exnref`
types, because we currently don't support `exnref` for `ref.null` and
the type will be removed soon anyway.
Reviewed By: tlively, sbc100, wingo
Differential Revision: https://reviews.llvm.org/D92359
This patch factors out the part of printInstruction that gets the
mnemonic string for a given MCInst. This is intended to be used
subsequently for the instruction-mix remarks to display the final
mnemonic (D90040).
Unfortunately making `getMnemonic` available to the AsmPrinter
seems to require making it virtual. Not sure if there's a way around
that with the current layering of the AsmPrinters.
Reviewed By: Paul-C-Anagnostopoulos
Differential Revision: https://reviews.llvm.org/D90039
I'm not why it was added to DAGToDAG oringally but it seems
to make sense alongside the non-TLS version: LowerGlobalAddress
Differential Revision: https://reviews.llvm.org/D91432
These relocations represent offsets from the __tls_base symbol.
Previously we were just using normal MEMORY_ADDR relocations and relying
on the linker to select a segment-offset rather and absolute value in
Symbol::getVirtualAddress(). Using an explicit relocation type allows
allow us to clearly distinguish absolute from relative relocations based
on the relocation information alone.
One place this is useful is being able to reject absolute relocation in
the PIC case, but still accept TLS relocations.
Differential Revision: https://reviews.llvm.org/D91276
No longer rely on an external tool to build the llvm component layout.
Instead, leverage the existing `add_llvm_componentlibrary` cmake function and
introduce `add_llvm_component_group` to accurately describe component behavior.
These function store extra properties in the created targets. These properties
are processed once all components are defined to resolve library dependencies
and produce the header expected by llvm-config.
Differential Revision: https://reviews.llvm.org/D90848
When machine instructions are in the form of
```
%0 = CONST_I32 @str
%1 = ADD_I32 %stack.0, %0
%2 = LOAD 0, 0, %1
```
In the `ADD_I32` instruction, it is possible to fold it if `%0` is a
`CONST_I32` from an immediate number. But in this case it is a global
address, so we shouldn't do that. But we haven't checked if the operand
of `ADD` is an immediate so far. This fixes the problem. (The case
applies the same for `ADD_I64` and `CONST_I64` instructions.)
Fixes https://bugs.llvm.org/show_bug.cgi?id=47944.
Patch by Julien Jorge (jjorge@quarkslab.com)
Reviewed By: dschuff
Differential Revision: https://reviews.llvm.org/D90577
This patch adds a new "heap type" operand kind to the WebAssembly MC
layer, used by ref.null. Currently the possible values are "extern" and
"func"; when typed function references come, though, this operand may be
a type index.
Note that the "heap type" production is still known as "refedtype" in
the draft proposal; changing its name in the spec is
ongoing (https://github.com/WebAssembly/reference-types/issues/123).
The register form of ref.null is still untested.
Differential Revision: https://reviews.llvm.org/D90608
Also added general wasm64 DWARF test
Also added asserts for unsupported reloc combinations that triggered this bug.
Differential Revision: https://reviews.llvm.org/D90503
As proposed in https://github.com/WebAssembly/simd/pull/376. This commit
implements new builtin functions and intrinsics for these instructions, but does
not yet add them to wasm_simd128.h because they have not yet been merged to the
proposal. These are the first instructions with opcodes greater than 0xff, so
this commit updates the MC layer and disassembler to handle that correctly.
Differential Revision: https://reviews.llvm.org/D90253
Implementation of instructions table.get, table.set, table.grow,
table.size, table.fill, table.copy.
Missing instructions are table.init and elem.drop as they deal with
element sections which are not yet implemented.
Added more tests to tables.s
Differential Revision: https://reviews.llvm.org/D89797
Prototype the newly proposed load_lane instructions, as specified in
https://github.com/WebAssembly/simd/pull/350. Since these instructions are not
available to origin trial users on Chrome stable, make them opt-in by only
selecting them from intrinsics rather than normal ISel patterns. Since we only
need rough prototypes to measure performance right now, this commit does not
implement all the load and store patterns that would be necessary to make full
use of the offset immediate. However, the full suite of offset tests is included
to make it easy to track improvements in the future.
Since these are the first instructions to have a memarg immediate as well as an
additional immediate, the disassembler needed some additional hacks to be able
to parse them correctly. Making that code more principled is left as future
work.
Differential Revision: https://reviews.llvm.org/D89366
Adds more testing in basic-assembly.s and a new test tables.s.
Adds support to yaml reading and writing of tables as well.
Differential Revision: https://reviews.llvm.org/D88815
This reverts commit 432e4e56d3, which reverted 542523a61a. Two issues from
the original commit have been fixed. First, MSVC does not like when std::array
is initialized with only single braces, so this commit switches to using the
more portable double braces. Second, there was a subtle endianness bug that
prevented the original commit from working correctly on big-endian machines,
which has been fixed by switching to using endianness-agnostic bit twiddling
instead of type punning.
Differential Revision: https://reviews.llvm.org/D88773
In LowerEmscriptenEHSjLj, `longjmp` used to be replaced with
`emscripten_longjmp_jmpbuf(jmp_buf*, i32)`, which will eventually be
lowered to `emscripten_longjmp(i32, i32)`. The reason we used two
different names was because they had different signatures in the IR
pass.
D88697 fixed this by only using `emscripten_longjmp(i32, i32)` and
adding a `ptrtoint` cast to its first argument, so
```
longjmp(buf, 0)
```
becomes
```
emscripten_longjmp((i32)buf, 0)
```
But this assumed all uses of `longjmp` was a direct call to it, which
was not the case. This patch handles indirect uses of `longjmp` by
replacing
```
longjmp
```
with
```
(i32(*)(jmp_buf*, i32))emscripten_longjmp
```
Reviewed By: tlively
Differential Revision: https://reviews.llvm.org/D89032
Renaming for some Emscripten EH functions has so far been done in
wasm-emscripten-finalize tool in Binaryen. But recently we decided to
make a compilation/linking path that does not rely on
wasm-emscripten-finalize for modifications, so here we move that
functionality to LLVM.
Invoke wrappers are generated in LowerEmscriptenEHSjLj pass, but final
wasm types are not available in the IR pass, we need to rename them at
the end of the pipeline.
This patch also removes uses of `emscripten_longjmp_jmpbuf` in
LowerEmscriptenEHSjLj pass, replacing that with `emscripten_longjmp`.
`emscripten_longjmp_jmpbuf` is lowered to `emscripten_longjmp`, but
previously we generated calls to `emscripten_longjmp_jmpbuf` in
LowerEmscriptenEHSjLj pass because it takes `jmp_buf*` instead of `i32`.
But we were able use `ptrtoint` to make it use `emscripten_longjmp`
directly here.
Addresses:
https://github.com/WebAssembly/binaryen/issues/3043https://github.com/WebAssembly/binaryen/issues/3081
Companions:
https://github.com/WebAssembly/binaryen/pull/3191https://github.com/emscripten-core/emscripten/pull/12399
Reviewed By: dschuff, tlively, sbc100
Differential Revision: https://reviews.llvm.org/D88697
v128.const was recently implemented in V8, but until it rolls into Chrome
stable, we can't enable it in the WebAssembly backend without breaking origin
trial users. So far we have been lowering build_vectors that would otherwise
have been lowered to v128.const to splats followed by sequences of replace_lane
instructions to initialize each lane individually. That produces large and
inefficient code, so this patch introduces new logic to lower integer vector
constants to a single i64x2.splat where possible, with at most a single
i64x2.replace_lane following it if necessary.
Adapted from a patch authored by @omnisip.
Differential Revision: https://reviews.llvm.org/D88591
1c5a3c4d38 updated the variables inserted by Emscripten SjLj lowering to be
thread-local, depending on the CoalesceFeaturesAndStripAtomics pass to downgrade
them to normal globals if the target features did not support TLS. However, this
had the unintended side effect of preventing all non-TLS-supporting objects from
being linked into modules with shared memory, because stripping TLS marks an
object as thread-unsafe. This patch fixes the problem by only making the SjLj
lowering variables thread-local if the target machine supports TLS so that it
never introduces new usage of TLS that will be stripped. Since SjLj lowering
works on Modules instead of Functions, this required that the
WebAssemblyTargetMachine have its feature string updated to reflect the
coalesced features collected from all the functions so that a
WebAssemblySubtarget can be created without using any particular function.
Differential Revision: https://reviews.llvm.org/D88323
Emscripten's longjump and exception mechanism depends on two global variables,
`__THREW__` and `__threwValue`, which are changed to be defined as thread-local
in https://github.com/emscripten-core/emscripten/pull/12056. This patch updates
the corresponding code in the WebAssembly backend to properly declare these
globals as thread-local as well.
Differential Revision: https://reviews.llvm.org/D88262
Also renamed the fields to follow style guidelines.
Accessors help with readability - weight mutation, in particular,
is easier to follow this way.
Differential Revision: https://reviews.llvm.org/D87725
The versions that take 'unsigned' will be removed in the future.
I tried to use getOriginalAlign instead of getAlign in some
places. getAlign factors in the minimum alignment implied by
the offset in the pointer info. Since we're also passing the
pointer info we can use the original alignment.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D87592
This adds and optional ", immutable" to the end of a `.globaltype`
declaration. I would have prefered to match the `.wat` syntax
where immutable is the default and `mut` is the signifier for
mutable globals. Sadly changing the default would break backwards
compat with existing assembly in the wild so I think its best
to stick with this approach.
Differential Revision: https://reviews.llvm.org/D87515
Currently, using llvm-objdump to disassemble a function containing
unreachable will trigger an assertion while decoding the opcode, since both
unreachable and debug_unreachable have the same encoding. To avoid this, set
unreachable as the canonical decoding.
Differential Revision: https://reviews.llvm.org/D87431
When the function return type is non-void and `end` instructions are at
the very end of a function, CFGStackify's `fixEndsAtEndOfFunction`
function fixes the corresponding block/loop/try's type to match the
function's return type. This is applied to consecutive `end` markers at
the end of a function. For example, when the function return type is
`i32`,
```
block i32 ;; return type is fixed to i32
...
loop i32 ;; return type is fixed to i32
...
end_loop
end_block
end_function
```
But try-catch is a little different, because it consists of two parts:
a try part and a catch part, and both parts' return type should satisfy
the function's return type. Which means,
```
try i32 ;; return type is fixed to i32
...
block i32 ;; this should be changed i32 too!
...
end_block
catch
...
end_try
end_function
```
As you can see in this example, it is not sufficient to only `end`
instructions at the end of a function; in case of `try`, we should
check instructions before `catch`es, in case their corresponding `try`'s
type has been fixed.
This changes `fixEndsAtEndOfFunction`'s algorithm to use a worklist
that contains a reverse iterator, each of which is a starting point for
a new backward `end` instruction search.
Fixes https://bugs.llvm.org/show_bug.cgi?id=47413.
Reviewed By: dschuff, tlively
Differential Revision: https://reviews.llvm.org/D87207
Fixes PR47375, in which an assertion was triggering because
WebAssemblyTargetLowering::isVectorLoadExtDesirable was improperly
assuming the use of simple value types.
Differential Revision: https://reviews.llvm.org/D87110
There's a special case in hasAttribute for None when pImpl is null. If pImpl is not null we dispatch to pImpl->hasAttribute which will always return false for Attribute::None.
So if we just want to check for None its sufficient to just check that pImpl is null. Which can even be done inline.
This patch adds a helper for that case which I hope will speed up our getSubtargetImpl implementations.
Differential Revision: https://reviews.llvm.org/D86744
This patch implements initial backend support for a -mtune CPU controlled by a "tune-cpu" function attribute. If the attribute is not present X86 will use the resolved CPU from target-cpu attribute or command line.
This patch adds MC layer support a tune CPU. Each CPU now has two sets of features stored in their GenSubtargetInfo.inc tables . These features lists are passed separately to the Processor and ProcessorModel classes in tablegen. The tune list defaults to an empty list to avoid changes to non-X86. This annoyingly increases the size of static tables on all target as we now store 24 more bytes per CPU. I haven't quantified the overall impact, but I can if we're concerned.
One new test is added to X86 to show a few tuning features with mismatched tune-cpu and target-cpu/target-feature attributes to demonstrate independent control. Another new test is added to demonstrate that the scheduler model follows the tune CPU.
I have not added a -mtune to llc/opt or MC layer command line yet. With no attributes we'll just use the -mcpu for both. MC layer tools will always follow the normal CPU for tuning.
Differential Revision: https://reviews.llvm.org/D85165
Allow inlining only when the Callee has a subset of the Caller's
features. In principle, we should be able to inline regardless of any
features because WebAssembly supports features at module granularity,
not function granularity, but without this restriction it would be
possible for a module to "forget" about features if all the functions
that used them were inlined.
Requested in PR46812.
Differential Revision: https://reviews.llvm.org/D85494
Rather than just saying that some feature is missing, report the exact
features to make the error message more useful and actionable.
Differential Revision: https://reviews.llvm.org/D85795
The officially specified abbreviation for WebAssembly is Wasm and the
spec explicitly calls out WASM as being an incorrect spelling. This
patch fixes a few comments and error messages to use the
spec-compliant abbreviation.
Differential Revision: https://reviews.llvm.org/D85764
Fixes PR47040, in which an assertion was improperly triggered during
FastISel's address computation. The issue was that an `Address` set to
be relative to the FrameIndex with offset zero was incorrectly
considered to have an unset base. When the left hand side of an add
set the Address to be 0 off the FrameIndex, the right side would not
detect that the Address base had already been set and could try to set
the Address to be relative to a register instead, triggering an
assertion.
This patch fixes the issue by explicitly tracking whether an `Address`
has been set rather than interpreting an offset of zero to mean the
`Address` has not been set.
Differential Revision: https://reviews.llvm.org/D85581
Specified in https://github.com/WebAssembly/simd/pull/237, these
instructions load the first vector lane from memory and zero the other
lanes. Since these instructions are not officially part of the SIMD
proposal, they are only available on an opt-in basis via LLVM
intrinsics and clang builtin functions. If these instructions are
merged to the proposal, this implementation will change so that the
instructions will be generated from normal IR. At that point the
intrinsics and builtin functions would be removed.
This PR also changes the opcodes for the experimental f32x4.qfm{a,s}
instructions because their opcodes conflicted with those of the
v128.load{32,64}_zero instructions. The new opcodes were chosen to
match those used in V8.
Differential Revision: https://reviews.llvm.org/D84820
LLVM selection dag assumes "switch" indices are pointer sized, which causes problems for our 32-bit br_table. The new function ensures 32-bit operands don't get unnecessarily extended, and 64-bit operands get truncated.
Note that the changes to the existing test test exactly that: the addition of -NEXT in 2 places ensures no extension is inserted (which the test previously ignored) and that the wrap is present (previously omitted in wasm64 mode).
Differential Revision: https://reviews.llvm.org/D84705
When it was first created, CFGSort only made sure BBs in each
`MachineLoop` are sorted together. After we added exception support,
CFGSort now also sorts BBs in each `WebAssemblyException`, which
represents a `catch` block, together, and
`Region` class was introduced to be a thin wrapper for both
`MachineLoop` and `WebAssemblyException`.
But how we compute those loops and exceptions is different.
`MachineLoopInfo` is constructed using the standard loop computation
algorithm in LLVM; the definition of loop is "a set of BBs that are
dominated by a loop header and have a path back to the loop header". So
even if some BBs are semantically contained by a loop in the original
program, or in other words dominated by a loop header, if they don't
have a path back to the loop header, they are not considered a part of
the loop. For example, if a BB is dominated by a loop header but
contains `call abort()` or `rethrow`, it wouldn't have a path back to
the header, so it is not included in the loop.
But `WebAssemblyException` is wasm-specific data structure, and its
algorithm is simple: a `WebAssemblyException` consists of an EH pad and
all BBs dominated by the EH pad. So this scenario is possible: (This is
also the situation in the newly added test in cfg-stackify-eh.ll)
```
Loop L: header, A, ehpad, latch
Exception E: ehpad, latch, B
```
(B contains `abort()`, so it does not have a path back to the loop
header, so it is not included in L.)
And it is sorted in this order:
```
header
A
ehpad
latch
B
```
And when CFGStackify places `end_loop` or `end_try` markers, it
previously used `WebAssembly::getBottom()`, which returns the latest BB
in the sorted order, and placed the marker there. So in this case the
marker placements will be like this:
```
loop
header
try
A
catch
ehpad
latch
end_loop <-- misplaced!
B
end_try
```
in which nesting between the loop and the exception is not correct.
`end_loop` marker has to be placed after `B`, and also after `end_try`.
Maybe the fundamental way to solve this problem is to come up with our
own algorithm for computing loop region too, in which we include all BBs
dominated by a loop header in a loop. But this takes a lot more effort.
The only thing we need to fix is actually, `getBottom()`. If we make it
return the right BB, which means in case of a loop, the latest BB of the
loop itself and all exceptions contained in there, we are good.
This renames `Region` and `RegionInfo` to `SortRegion` and
`SortRegionInfo` and extracts them into their own file. And add
`getBottom` to `SortRegionInfo` class, from which it can access
`WebAssemblyExceptionInfo`, so that it can compute a correct bottom
block for loops.
Reviewed By: dschuff
Differential Revision: https://reviews.llvm.org/D84724
Instead, pattern match extends of extract_subvectors to generate
widening operations. Since extract_subvector is not a legal node, this
is implemented via a custom combine that recognizes extract_subvector
nodes before they are legalized. The combine produces custom ISD nodes
that are later pattern matched directly, just like the intrinsic was.
Also removes the clang builtins for these operations since the
instructions can now be generated from portable code sequences.
Differential Revision: https://reviews.llvm.org/D84556
Rather than expanding truncating stores so that vectors are stored one
lane at a time, lower them to a sequence of instructions using
narrowing operations instead, when possible. Since the narrowing
operations have saturating semantics, but truncating stores require
truncation, mask the stored value to manually truncate it before
narrowing. Also, since narrowing is a binary operation, pass in the
original vector as the unused second argument.
Differential Revision: https://reviews.llvm.org/D84377
Accounting for the fact that Wasm function indices are 32-bit, but in wasm64 we want uniform 64-bit pointers.
Includes reloc types for 64-bit table indices.
Differential Revision: https://reviews.llvm.org/D83729
Although the SIMD spec proposal does not specifically include a
select instruction, the select instruction in MVP WebAssembly is
polymorphic over the selected types, so it is able to work on v128
values when they are enabled. This patch introduces a new variant of
the select instruction for each legal vector type. Additional ISel
patterns are adapted from the SELECT_I32 and SELECT_I64 patterns.
Depends on D83736.
Differential Revision: https://reviews.llvm.org/D83737
We were previously expanding vselect and matching on the expansion to
generate bitselects, but in some cases the expansion would be further
combined and a bitselect would not get generated. This patch improves
codegen in those cases by legalizing vselect and lowering it to
v128.bitselect. The old pattern that matches the expansion is still
useful for lowering IR that already uses the expansion rather than a
select operation.
Differential Revision: https://reviews.llvm.org/D83734
In BUILD_VECTOR lowering, we used to generally prefer using splats
over v128.const instructions because v128.const has a very large
encoding. However, in d5b7a4e2e8 we switched to preferring consts
because they are expected to be more efficient in engines. This patch
updates the ISel patterns to match this current preference.
Differential Revision: https://reviews.llvm.org/D83581
This patch builds on 0d7286a652 by simplifying the code for detecting
splat values and adding new tests demonstrating the lowering of
splatted absolute value shift amounts, which are common in code
generated by Halide. The lowering is very bad right now, but
subsequent patches will improve it considerably. The tests will be
useful for evaluating the improvements in those patches.
Reviewed By: aheejin
Differential Revision: https://reviews.llvm.org/D83493
Since WebAssembly's vector shift instructions take a scalar shift
amount rather than a vector shift amount, we have to check in ISel
that the vector shift amount is a splat. Previously, we were checking
explicitly for splat BUILD_VECTOR nodes, but this change uses the
standard utilities for detecting splat values that can handle more
complex splat patterns. Since the C++ ISel lowering is now more
general than the ISel patterns, this change also simplifies shift
lowering by using the C++ lowering for all SIMD shifts rather than
mixing C++ and normal pattern-based lowering.
This change improves ISel for shifts to the point that the
simd-shift-unroll.ll regression test no longer tests the code path it
was originally meant to test. The bug corresponding to that regression
test is no longer reproducible with its original reported reproducer,
so rather than try to fix the regression test, this change just
removes it.
Differential Revision: https://reviews.llvm.org/D83278
This covers both the existing memory functions as well as the new bulk memory proposal.
Added new test files since changes where also required in the inputs.
Also removes unused init/drop intrinsics rather than trying to make them work for 64-bit.
Differential Revision: https://reviews.llvm.org/D82821
OSS-Fuzz and the Emscripten test suite uncovered some edge cases in
which the range check instruction seemed to be an (i32.const 0) or
other unexpected instruction, triggering an assertion. Unfortunately
the reproducers are rather complicated, so they don't make good unit
tests. This commit removes the bad assertion and conservatively
optimizes range checks only when the range check instruction is
i32.gt_u.
Differential Revision: https://reviews.llvm.org/D83169
Summary:
Since the br_table instruction takes an i32, switches over i64s (and
larger integers) must use the i32.wrap_i64 instruction to truncate the
table index. This truncation makes numbers just over 2^32
indistinguishable from small numbers, so it was a miscompilation to
omit the range check preceding these br_tables. This change fixes the
problem by skipping the "fixing" of the br_table when the range check
is an i64 instruction.
Fixes PR46447.
Reviewers: aheejin, dschuff, kripken
Reviewed By: kripken
Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D83017
Soon it will be disallowed to depend on MachineFunction state in the
constructor. This was only being used to get the MachineRegisterInfo
for an assert, which I'm not sure is necessarily worth it. I would
think any missing defs would be caught by the verifier later instead.
Following on from this RFC[0] from a while back, this is the first patch towards
implementing variadic debug values.
This patch specifically adds a set of functions to MachineInstr for performing
operations specific to debug values, and replacing uses of the more general
functions where appropriate. The most prevalent of these is replacing
getOperand(0) with getDebugOperand(0) for debug-value-specific code, as the
operands corresponding to values will no longer be at index 0, but index 2 and
upwards: getDebugOperand(x) == getOperand(x+2). Similar replacements have been
added for the other operands, along with some helper functions to replace
oft-repeated code and operate on a variable number of value operands.
[0] http://lists.llvm.org/pipermail/llvm-dev/2020-February/139376.html<Paste>
Differential Revision: https://reviews.llvm.org/D81852
When created in RegStackify pass, `TEE` has two destinations, where
op0 is stackified and op1 is not. But it is possible that
op0 becomes unstackified in `fixUnwindMismatches` function in
CFGStackify pass when a nested try-catch-end is introduced, violating
the invariant of `TEE`s destinations.
In this case we convert the `TEE` into two `COPY`s, which will
eventually be resolved in ExplicitLocals.
Reviewed By: dschuff
Differential Revision: https://reviews.llvm.org/D81851
Summary:
This commit fixes a bug in the FixBrTables pass in which an
unconditional branch from the switch header block to the jump table
block was not removed before the blocks were combined. The result was
an invalid CFG in the MachineFunction. This commit also switches from
using bespoke branch analysis and deletion code to using the standard
utilities for the same.
Reviewers: aheejin, dschuff
Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D81909
This adds 4 new reloc types.
A lot of code that previously assumed any memory or offset values could be contained in a uint32_t (and often truncated results from functions returning 64-bit values) have been upgraded to uint64_t. This is not comprehensive: it is only the values that come in contact with the new relocation values and their dependents.
A new tablegen mapping was added to automatically upgrade loads/stores in the assembler, which otherwise has no way to select for these instructions (since they are indentical other than for the offset immediate). It follows a similar technique to https://reviews.llvm.org/D53307
Differential Revision: https://reviews.llvm.org/D81704
Context: https://github.com/WebAssembly/memory64/blob/master/proposals/memory64/Overview.md
This is just a first step, adding the new instruction variants while keeping the existing 32-bit functionality working.
Some of the basic load/store tests have new wasm64 versions that show that the basics of the target are working.
Further features need implementation, but these will be added in followups to keep things reviewable.
Differential Revision: https://reviews.llvm.org/D80769
Summary:
This commit slightly modifies the MCDisassembler, and llvm-objdump to
allow targets to also decode entire symbols.
WebAssembly uses the onSymbolStart hook it to decode preludes.
WebAssembly partially disassembles the symbol in its target specific
way; and then falls back to the normal flow of llvm-objdump.
AMDGPU needs it to decode kernel descriptors entirely, and move to the
next symbol.
This commit is to split the above task into 2.
- Changes to llvm-objdump and MC-layer without breaking WebAssembly code
[ this commit ]
- AMDGPU's implementation of onSymbolStart that decodes kernel
descriptors. [ https://reviews.llvm.org/D80713 ]
Reviewers: scott.linder, t-tye, sunfish, arsenm, jhenderson, MaskRay, aardappel
Reviewed By: scott.linder, jhenderson, aardappel
Subscribers: bcain, dschuff, wdng, tpr, sbc100, jgravelle-google, hiraditya, aheejin, MaskRay, rupprecht, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80512
Summary:
After their range checks were removed in 7f50c15be5, br_tables
started being duplicated into their predecessors by tail
folding. Unfortunately, when the br_tables were in loops this
transformation introduced bad irreducible control flow which was later
expanded into even more br_tables. This commit abuses the
`isNotDuplicable` property to prevent this irreducible control flow
from being introduced. This change saves a few dozen bytes of code
size and has a negligible affect on performance for most of the large
Emscripten benchmarks, but can improve performance significantly on
microbenchmarks of switches in loops.
Reviewers: aheejin, dschuff
Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D81628
Summary:
As specified in https://github.com/WebAssembly/simd/pull/232. These
instructions are implemented as LLVM intrinsics for now rather than
normal ISel patterns to make these instructions opt-in. Once the
instructions are merged to the spec proposal, the intrinsics will be
replaced with proper ISel patterns.
Reviewers: aheejin
Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D81222
Previously, it tried to infer the correct destination block from the
successor list, but this is a rather tricky propspect, given the
existence of successors that occur mid-block, such as invoke, and
potentially in the future, callbr/INLINEASM_BR. (INLINEASM_BR, in
particular would be problematic, because its successor blocks are not
distinct from "normal" successors, as EHPads are.)
Instead, require the caller to pass in the expected fallthrough
successor explicitly. In most callers, the correct block is
immediately clear. But, in MachineBlockPlacement, we do need to record
the original ordering, before starting to reorder blocks.
Unfortunately, the goal of decoupling the behavior of end-of-block
jumps from the successor list has not been fully accomplished in this
patch, as there is currently no other way to determine whether a block
is intended to fall-through, or end as unreachable. Further work is
needed there.
Differential Revision: https://reviews.llvm.org/D79605
Summary:
Unlike normal traps, debug traps are allowed to return and can have
additional instructions in the same basic block. Without explicit
backend support for debug traps, they are lowered in ISel as normal
traps. Since normal traps are lowered in the WebAssembly backend to
the UNREACHABLE instruction, which is a terminator, using debug traps
could lead to invalid MBBs when there are additional instructions
after the trap. This patch fixes the issue by lowering debug traps to
a new version of the UNREACHABLE instruction, DEBUG_UNREACHABLE, that
is not a terminator.
An alternative approach would have been to make UNREACHABLE not a
terminator, but that breaks a large number of tests. In particular, it
would require removing the traps inserted after noreturn calls to
@llvm.wasm.throw because otherwise the terminator throw would be
followed by a non-terminator UNREACHABLE and we would be back to
having invalid MBBs. Overall the approach in this patch seems simpler.
Reviewers: aheejin, dschuff
Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D81055
Summary:
The code previously assumed that the index of a vector extract was
constant, but this was not always true. This patch fixes the problem
by bailing out of the lowering if the index is nonconstant and also
replaces `static_cast`s in the lowering function with `cast`s because
the latter contain type-checking asserts that would make similar
issues easier to find and debug.
Reviewers: aheejin
Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D81025
This reverts commit 755a895915.
Although I was not able to reproduce any test failures locally,
aheejin was able to reproduce them and found a fix, applied here.
Summary:
Jump tables for most targets cannot handle out of range indices by
themselves, so LLVM emits range checks to guard the jump
tables. WebAssembly, on the other hand, implements jump tables using
the br_table instruction, which takes a default branch target as an
operand, making the range checks redundant. This patch introduces a
new MachineFunction pass in the WebAssembly backend to find and
eliminate the redundant range checks.
Reviewers: aheejin, dschuff
Subscribers: mgorny, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80863
Summary:
`getMatchingEHPad()` in LateEHPrepare is a function to find the nearest
EH pad that dominates the given instruction. This intends to be
lightweight so it does not use full WebAssemblyException scope analysis
or dominator analysis. It simply does backward BFS to its predecessors
and stops at the first EH pad each search path encounters. All search
should end up at the same EH pad, and if not, it returns null.
But it didn't take into account that when there are inner scopes within
the current scope, some path in BFS can hit an inner EH pad first. For
example, in the given diagram, `Inst` belongs to the outer scope and
`getMathingEHPad()` should return 'EHPad 1', but some search path can go
into the inner scope and end up with 'EHPad 2'. The search will return
null because different paths end up with different EH pads.
```
--- EHPad 1 ---
| - EHPad 2 - |
| | | |
| ----------- |
| Inst |
---------------
```
So far this was OK because we haven't tested a case in which a given
instruction is far from its EH pad. Also, this bug does not happen when
the inner EH scope is a cleanup scope, because a cleanup scope ends with
a `cleanupret` whose successor is an EH pad, so the search encounters
that EH pad first before going into the child scope. But this can happen
when the child scope is a catch scope that ends with `catchret`. So this
patch, when doing backward BFS, does not search predecessors that ends
with `catchret`. Because `catchret`s are replaced with `br`s during this
pass, this records BBs that have `catchret`s in the beginning, before
doing any other transformations.
Reviewers: dschuff
Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80571
Summary:
One of the things `removeUnnecessaryInstrs()` in CFGStackify does is to
remove an unnecessary unconditinal branch before an EH pad. When there
is an unconditional branch right before a catch instruction and it
branches to the end of `end_try` marker, we don't need the branch,
because it there is no exception, the control flow transfers to
that point anyway.
```
bb0:
try
...
br bb2 <- Not necessary
bb1:
catch
...
bb2:
end
```
This applies when we have a conditional branch followed by an
unconditional one, in which case we should only remove the unconditional
branch. For example:
```
bb0:
try
...
br_if someplace_else
br bb2 <- Not necessary
bb1:
catch
...
bb2:
end
```
But `TargetInstrInfo::removeBranch` we used removed all existing
branches when there are multiple ones. This patch fixes it by only
deleting the last (= unconditional) branch manually.
Also fixes some `preds` comments in the test file.
Reviewers: dschuff
Subscribers: sbc100, jgravelle-google, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80572
Replace with forward declaration and move dependency down to source files that actually need it.
Both TargetLowering.h and TargetMachine.h are 2 of the most expensive headers (top 10) in the ClangBuildAnalyzer report when building llc.
Summary:
The code previously assumed the source of the bitcast in the combined
pattern was a vector type, but this is not always true. This patch
adds a check to avoid an assertion failure in that case.
Reviewers: aheejin
Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80164
Summary:
This reflects changes in the spec proposal made since basic arithmetic
was first implemented.
Reviewers: aheejin
Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D80174
Summary:
This new custom DAG combine fixes a codegen issue with the
wasm_simd128.h intrinsics. Clang lowers the
return (v128_t)(__f32x4){__a, __a, __a, __a};
body of f32x4_splat to a splat shuffle of a bitcasted vector, as seen
in the new simd-shuffle-bitcast.ll test. The bitcast interfered with
the target-independent DAG combine that combines splat shuffles into
BUILD_VECTOR nodes, so this patch introduces a new custom DAG combine
to hoist the bitcast out of the shuffle, allowing the
target-independent combine to work as intended.
Reviewers: aheejin, dschuff
Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80021
Summary:
Move instructions that have recently been implemented in V8 from the
`unimplemented-simd128` target feature to the `simd128` target
feature. The updated instructions match the update at
https://github.com/WebAssembly/simd/pull/223.
Reviewers: aheejin
Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D79973
BuildMI requires this debug loc to be from the same sub program as the variable metadata passed in.
Differential Revision: https://reviews.llvm.org/D80019
Summary:
As proposed in https://github.com/WebAssembly/simd/pull/122. Since
these instructions are not yet merged to the SIMD spec proposal, this
patch makes them entirely opt-in by surfacing them only through LLVM
intrinsics and clang builtins. If these instructions are made
official, these intrinsics and builtins should be replaced with simple
instruction patterns.
Reviewers: aheejin
Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D79742
Summary:
Although using `__builtin_shufflevector` and the `shufflevector`
instruction works fine, they are not opaque to the optimizer. As a
result, DAGCombine can potentially reduce the number of shuffles and
change the shuffle masks. This is unexpected behavior for users of the
WebAssembly SIMD intrinsics who have crafted their shuffles to
optimize the code generated by engines. This patch solves the problem
by adding a new shuffle intrinsic that is opaque to the optimizers in
line with the decision of the WebAssembly SIMD contributors at
https://github.com/WebAssembly/simd/issues/196#issuecomment-622494748. In
the future we may implement custom DAG combines to properly optimize
shuffles and replace this solution.
Reviewers: aheejin, dschuff
Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D66983
Summary:
The WebAssembly backend automatically lowers atomic operations and TLS
to nonatomic operations and non-TLS data when either are present and
the atomics or bulk-memory features are not present, respectively. The
resulting object is no longer thread-safe, so the linker has to be
told not to allow it to be linked into a module with shared
memory. This was previously done by disallowing the 'atomics' feature,
which prevented any objct with its atomic operations or TLS removed
from being linked with any object containing atomics or TLS, and
therefore preventing it from being linked into a module with shared
memory since shared memory requires atomics.
However, as of https://github.com/WebAssembly/threads/issues/144, the
validation rules are relaxed to allow atomic operations to validate
with unshared memories, which makes it perfectly safe to link an
object with stripped atomics and TLS with another object that still
contains TLS and atomics as long as the resulting module has an
unshared memory. To allow this kind of link, this patch disallows a
pseudo-feature 'shared-mem' rather than 'atomics' to communicate to
the linker that the object is not thread-safe. This means that the
'atomics' feature is available to accurately reflect whether or not an
object has atomics enabled.
As a drive-by tweak, this change also requires that bulk-memory be
enabled in addition to atomics in order to use shared memory. This is
because initializing shared memories requires bulk-memory operations.
Reviewers: aheejin, sbc100
Subscribers: dschuff, jgravelle-google, hiraditya, sunfish, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79542
Make the kind of cost explicit throughout the cost model which,
apart from making the cost clear, will allow the generic parts to
calculate better costs. It will also allow some backends to
approximate and correlate the different costs if they wish. Another
benefit is that it will also help simplify the cost model around
immediate and intrinsic costs, where we currently have multiple APIs.
RFC thread:
http://lists.llvm.org/pipermail/llvm-dev/2020-April/141263.html
Differential Revision: https://reviews.llvm.org/D79002
Summary:
This fixes a few things that are connected. It is very hard to provide
an independent test case for each of those fixes, because they are
interconnected and sometimes one masks another. The provided test case
triggers some of those bugs below but not all.
---
1. Background:
`placeBlockMarker` takes a BB, and if the BB is a destination of some
branch, it places `end_block` marker there, and computes the nearest
common dominator of all predecessors (what we call 'header') and places
a `block` marker there.
When we first place markers, we traverse BBs from top to bottom. For
example, when there are 5 BBs A, B, C, D, and E and B, D, and E are
branch destinations, if mark the BB given to `placeBlockMarker` with `*`
and draw a rectangle representing the border of `block` and `end_block`
markers, the process is going to look like
```
-------
----- |-----|
--- |---| ||---||
|A| ||A|| |||A|||
--- --> |---| --> ||---||
*B | B | || B ||
C | C | || C ||
D ----- |-----|
E *D | D |
E -------
*E
```
which means when we first place markers, we go from inner to outer
scopes. So when we place a `block` marker, if the header already
contains other `block` or `try` marker, it has to belong to an inner
scope, so the existing `block`/`try` markers should go _after_ the new
marker. This was the assumption we had.
But after placing all markers we run `fixUnwindMismatches` function.
There we do some control flow transformation and create some branches,
and we call `placeBlockMarker` again to place `block`/`end_block`
markers for those newly created branches. We can't assume that we are
traversing branch destination BBs from top to bottom now because we are
basically inserting some new markers in the middle of existing markers.
Fix:
In `placeBlockMarker`, we don't have the assumption that the BB given is
in the order of top to bottom, and when placing `block` markers,
calculates whether existing `block` or `try` markers are inner or
outer scopes with respect to the current scope.
---
2. Background:
In `fixUnwindMismatches`, when there is a call whose correct unwind
destination mismatches the current destination after initially placing
`try` markers, we wrap that with a new nested `try`/`catch`/`end` and
jump to the correct handler within the new `catch`. The correct handler
code is split as a separate BB from its original EH pad so it can be
branched to. Here's an example:
- Before
```
mbb:
call @foo <- Unwind destination mismatch!
wrong-ehpad:
catch
...
cont:
end_try
...
correct-ehpad:
catch
[handler code]
```
- After
```
mbb:
try (new)
call @foo
nested-ehpad: (new)
catch (new)
local.set n / drop (new)
br %handleri (new)
nested-end: (new)
end_try (new)
wrong-ehpad:
catch
...
cont:
end_try
...
correct-ehpad:
catch
local.set n / drop (new)
handler: (new)
end_try
[handler code]
```
Note that after this transformation, it is possible there are no calls
to actually unwind to `correct-ehpad` here. `call @foo` now
branches to `handler`, and there can be no other calls to unwind to
`correct-ehpad`. In this case `correct-ehpad` does not have any
predecessors anymore.
This can cause a bug in `placeBlockMarker`, because we may need to place
`end_block` marker in `handler`, and `placeBlockMarker` computes the
nearest common dominator of all predecessors. If one of `handler`'s
predecessor (here `correct-ehpad`) does not have any predecessors, i.e.,
no way of reaching it, we cannot correctly compute the common dominator
of predecessors of `handler`, and end up placing no `block`/`end`
markers. This bug actually sometimes masks the bug 1.
Fix:
When we have an EH pad that does not have any predecessors after this
transformation, deletes all its successors, so that its successors don't
have any dangling predecessors.
---
3. Background:
Actually the `handler` BB in the example shown in bug 2 doesn't need
`end_block` marker, despite it being a new branch destination, because
it already has `end_try` marker which can serve the same purpose. I just
put that example there for an illustration purpose. There is a case we
actually need to place `end_block` marker: when the branch dest is the
appendix BB. The appendix BB is created when there is a call that is
supposed to unwind to the caller ends up unwinding to a wrong EH pad. In
this case we also wrap the call with a nested `try`/`catch`/`end`,
create an 'appendix' BB at the very end of the function, and branch to
that BB, where we rethrow the exception to the caller.
Fix:
When we don't actually need to place block markers, we don't.
---
4. In case we fall through to the continuation BB after the catch block,
after extracting handler code in `fixUnwindMismatches` (refer to bug 2
for an example), we now have to add a branch to it to bypass the
handler.
- Before
```
try
...
(falls through to 'cont')
catch
handler body
end
<-- cont
```
- After
```
try
...
br %cont (new)
catch
end
handler body
<-- cont
```
The problem is, we haven't been placing a new `end_block` marker in the
`cont` BB in this case. We should, and this fixes it. But it is hard to
provide a test case that triggers this bug, because the current
compilation pipeline from .ll to .s does not generate this kind of code;
we always have a `br` after `invoke`. But code without `br` is still
valid, and we can have that kind of code if we have some pipeline
changes or optimizations later. Even mir test cases cannot trigger this
part for now, because we don't encode auxiliary EH-related data
structures (such as `WasmEHFuncInfo`) in mir now. Those functionalities
can be added later, but I don't think we should block this fix on that.
Reviewers: dschuff
Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79324
Summary:
As described in https://github.com/WebAssembly/simd/pull/209. This is
the final reorganization of the SIMD opcode space before
standardization. It has been landed in concert with corresponding
changes in other projects in the WebAssembly SIMD ecosystem.
Reviewers: aheejin
Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79224
This method has been commented as deprecated for a while. Remove
it and replace all uses with the equivalent getCalledOperand().
I also made a few cleanups in here. For example, to removes use
of getElementType on a pointer when we could just use getFunctionType
from the call.
Differential Revision: https://reviews.llvm.org/D78882