Our existing combine allows to merge the shuffle of 2 similar 64-bit wide 'horizontal ops' (HADD/PACK/etc.) if the shuffle was a UNPCK/MOVSD.
This patch generalizes this to decode any target shuffle mask that can be widened to a 128-bit repeating v2*64 mask, which helps us catch PBLENDW/PBLENDD cases.
If we're packing from 128-bits to 64-bits then we don't need the RHS argument. This helps with register allocation, especially as we avoid repeating a use of the input value.
The ForOp::build ensures that there is a block terminator which is great for
the default use case when there are no iter_args and loop.for returns no
results. In non-zero results case we always need to call replaceOpWithNewOp
which is not the nicest thing in the world. We can stop inserting YieldOp when
iter_args is non-empty. IfOp::build already behaves similarly.
Summary: This revision adds support for marking the last region as variadic in the ODS region list with the VariadicRegion directive.
Differential Revision: https://reviews.llvm.org/D77455
Summary:
This patch is to teach `llvm-objdump` dump dynamic symbols (`-T` and `--dynamic-syms`). Currently, this patch is not fully compatible with `gnu-objdump`, but I would like to continue working on this in next few patches. It has two issues.
1. Some symbols shouldn't be marked as global(g). (`-t/--syms` has same issue as well) (Fixed by D75659)
2. `gnu-objdump` can dump version information and *dynamically* insert before symbol name field.
`objdump -T a.out` gives:
```
DYNAMIC SYMBOL TABLE:
0000000000000000 w D *UND* 0000000000000000 _ITM_deregisterTMCloneTable
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 printf
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 __libc_start_main
0000000000000000 w D *UND* 0000000000000000 __gmon_start__
0000000000000000 w D *UND* 0000000000000000 _ITM_registerTMCloneTable
0000000000000000 w DF *UND* 0000000000000000 GLIBC_2.2.5 __cxa_finalize
```
`llvm-objdump -T a.out` gives:
```
DYNAMIC SYMBOL TABLE:
0000000000000000 w D *UND* 0000000000000000 _ITM_deregisterTMCloneTable
0000000000000000 g DF *UND* 0000000000000000 printf
0000000000000000 g DF *UND* 0000000000000000 __libc_start_main
0000000000000000 w D *UND* 0000000000000000 __gmon_start__
0000000000000000 w D *UND* 0000000000000000 _ITM_registerTMCloneTable
0000000000000000 w DF *UND* 0000000000000000 __cxa_finalize
```
Reviewers: jhenderson, grimar, MaskRay, espindola
Reviewed By: jhenderson, grimar
Subscribers: emaste, rupprecht, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75756
`isKnownReachable` had only interface (always returns true).
Changed it to call `isPotentiallyReachable`.
This change enables deductions of other Abstract Attributes depending on
AAReachability to use reachability information obtained from CFG, and it
can make them stronger.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D76210
This commit was made to settle [[ https://github.com/llvm/llvm-project/issues/175 | this issue on GitHub ]].
I added analysis getters for LoopInfo, DominatorTree, and
PostDominatorTree. And I added a test to show an improvement of the
deduction of `dereferenceable` attribute.
Reviewed By: jdoerfert, uenoku
Differential Revision: https://reviews.llvm.org/D76378
Query AAValueSimplify on pointers in memory accessing instructions to take
advantage of the constant propagation (or any other value simplification) of such values.
Both test formats are equivalent, so this *should* not be a problem.
However, I'm taking advantage of the week-end to test this and see if
there are any failures. If so, it should be fine to revert this until
the failures have been addressed.
For the time being, it is still possible to use the old format by passing
`--param=use_old_format=True` when running Lit.
Summary:
Allow users to simultaneously enable address and undefined behavior
sanitizers, in the same manner that LLVM's 'HandleLLVMOptions.cmake'
allows.
Prior to this patch, `cmake -DLLVM_USE_SANITIZER="Address;Undefined"`
would succeed and the build would build most of the LLVM project with
`-fsanitize=address,undefined`, but a warning would be printed by
libcxx's CMake, and the build would use neither sanitizer. This
patch results in no warning being printed, and both sanitizers are used
in building libcxx.
Reviewers: jroelofs, EricWF, ldionne, #libc!
Subscribers: mgorny, dexonsmith, llvm-commits, libcxx-commits
Tags: #libc
Differential Revision: https://reviews.llvm.org/D77466
finalizeSynthetic(in.symTab) calls sortSymTabSymbols() to order local
symbols before non-local symbols.
The newly added tests ensure that thunk symbols are added before
finalizeSynthetic(in.symTab), otherwise .symtab would be out of order.
After 49d00824bb, VPWidenRecipe only stores a single instruction.
tryToWiden can simply return the widen recipe, like other helpers in
VPRecipeBuilder.
This adds a minimal out-of-tree dialect template which can be used to start work on a standalone dialect implementation without having to integrate it in the main LLVM tree.
It mostly sets up the directory structure and provides CMakeLists.txt files to build a dialect library, an opt-like tool to operate on that dialect as well as tests. It could be expanded in the future to add examples of more user-defined operations, types, attributes, generated enums, transforms, etc. and linked to a tutorial.
Differential Revision: https://reviews.llvm.org/D77133
Summary:
D77423 started using a dominator tree in WasmEHPrepare, but we deleted
BBs in `prepareThrows` before we used the domtree in `prepareEHPads`,
and those CFG changes were not reflected in the domtree. This uses
`DomTreeUpdater` to make sure we update the domtree every time we delete
BBs from the CFG. This fixes ubsan/msan/expensive_check errors caught in
LLVM buildbots.
Reviewers: dschuff
Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77465
eraseInstFromFunction() adds the operands of the erased instructions,
as those might now be dead as well. However, this is limited to
instructions with less than 8 operands.
This check doesn't make a lot of sense to me. As the instruction
gets removed afterwards, I don't see a potential for anything
overly pathological happening here (as we can only add those
operands to the worklist once). The impact on CTMark is in
the noise. We also have the same code in instruction sinking
and don't limit the operand count there.
Differential Revision: https://reviews.llvm.org/D77325
Summary:
It seems we need a different matcher for binary operator
in a template context.
Fixes this issue:
https://bugs.llvm.org/show_bug.cgi?id=44499
Reviewers: aaron.ballman, alexfh, hokein, njames93
Reviewed By: aaron.ballman
Subscribers: xazax.hun, cfe-commits
Tags: #clang, #clang-tools-extra
Differential Revision: https://reviews.llvm.org/D76990
Summary:
When we insert a call to the personality function wrapper
(`_Unwind_CallPersonality`) for a catch pad, we store some necessary
info in `__wasm_lpad_context` struct and pass it. One of the info is the
LSDA address for the function. For this, we insert a call to
`wasm.lsda()`, which will be lowered down to the address of LSDA, and
store it in a field in `__wasm_lpad_context`.
There are exceptions to this personality call insertion: catchpads for
`catch (...)` and cleanuppads (for destructors) don't need personality
function calls, because we don't need to figure out whether the current
exception should be caught or not. (They always should.)
There was a little optimization to `wasm.lsda()` call insertion. Because
the LSDA address is the same throughout a function, we don't need to
insert a store of `wasm.lsda()` return value in every catchpad. For
example:
```
try {
foo();
} catch (int) {
// wasm.lsda() call and a store are inserted here, like, in
// pseudocode,
// %lsda = wasm.lsda();
// store %lsda to a field in __wasm_lpad_context
try {
foo();
} catch (int) {
// We don't need to insert the wasm.lsda() and store again, because
// to arrive here, we have already stored the LSDA address to
// __wasm_lpad_context in the outer catch.
}
}
```
So the previous algorithm checked if the current catch has a parent EH
pad, we didn't insert a call to `wasm.lsda()` and its store.
But this was incorrect, because what if the outer catch is `catch (...)`
or a cleanuppad?
```
try {
foo();
} catch (...) {
// wasm.lsda() call and a store are NOT inserted here
try {
foo();
} catch (int) {
// We need wasm.lsda() here!
}
}
```
In this case we need to insert `wasm.lsda()` in the inner catchpad,
because the outer catchpad does not have one.
To minimize the number of inserted `wasm.lsda()` calls and stores, we
need a way to figure out whether we have encountered `wasm.lsda()` call
in any of EH pads that dominates the current EH pad. To figure that
out, we now visit EH pads in BFS order in the dominator tree so that we
visit parent BBs first before visiting its child BBs in the domtree.
We keep a set named `ExecutedLSDA`, which basically means "Do we have
`wasm.lsda()` either in the current EH pad or any of its parent EH
pads in the dominator tree?". This is to prevent scanning the domtree up
to the root in the worst case every time we examine an EH pad: each EH
pad only needs to examine its immediate parent EH pad.
- If any of its parent EH pads in the domtree has `wasm.lsda()`, this
means we don't need `wasm.lsda()` in the current EH pad. We also insert
the current EH pad in `ExecutedLSDA` set.
- If none of its parent EH pad has `wasm.lsda()`
- If the current EH pad is a `catch (...)` or a cleanuppad, done.
- If the current EH pad is neither a `catch (...)` nor a cleanuppad,
add `wasm.lsda()` and the store in the current EH pad, and add the
current EH pad to `ExecutedLSDA` set.
Reviewers: dschuff
Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77423
Similar to the lowerV16I8Shuffle implementation, for binary compaction v8i16 shuffles we can avoid the PUNPCKLDQ(PSHUFB,PSHUFB) pattern on SSE41+ targets by using PACKUSDW and PBLENDW. Before SSE41 we would need to use PACKSSDW but that requires sign extension that seems to destroy any gains, even on targets without PSHUFB.
This is a bigger gain on AMD than Intel targets but should never be a regression, and avoiding the shuffle mask load(s) is always useful.
Noticed in codegen while dealing with PR31443.
Since D73835 we no longer need to define the whole IRBuilder
implementation in the header. This patch moves some of the larger
methods out of line, into the C++ file.
Differential Revision: https://reviews.llvm.org/D77332
Summary:
Note: This revision is very similar to D62296.
In D75756, we need `getDynamicSymbolIterators()` to skip first NULL symbol in `.dynsym`. And I believe it might be worth pointing this out in a separate patch to gather you experts' opinions.
I have checked that current code base will not be affected by this change.
```
dynamic_symbol_begin()
|- dynamic_symbol_end(): Ok
`- getDynamicSymbolIterators()
|- addDynamicElfSymbols(): llvm/tools/llvm-objdump/llvm-objdump.cpp, Line 934
| Ok, NULL symbol will be omitted by Line 945-947
| StringRef Name = unwrapOrError(Symbol.getName(), Obj->getName());
| if (Name.empty()) continue;
|- dumpSymbolNameFromObject(): llvm/tools/llvm-nm/llvm-nm.cpp, Line 1192
| There's no test for dumping dynamic debugging symbol. This patch helps improve llvm-nm behavior. (we should add test for this later)
`- computeSymbolSizes(): llvm/lib/Object/SymbolSize.cpp, Line 52
|- OProfileJITEventListener::notifyObjectLoaded(): llvm/lib/ExecutionEngine/OProfileJIT/OProfileJITEventListener.cpp, Line 92
| Ok, NULL symbol will be omitted by Line 94-95
| if (!Sym.getType() || *Sym.getType() != SF_Function) continue;
|- IntelJITEventListener::notifyObjectLoaded(): llvm/lib/ExecutionEngine/IntelJITEvents/IntelJITEventListener.cpp, Line 98
| Ok, NULL symbol will be omitted by Line 124-126 (same as previous one)
|- PerfJITEventListener::notifyObjectLoaded(): llvm/lib/ExecutionEngine/PerfJITEvents/PerfJITEventListener.cpp, Line 244
| Ok, NULL symbol will be omitted by Line 254-256, (same as previous one)
|- SymbolizableObjectFile::create(): llvm/lib/DebugInfo/Symbolize/SymbolizableObjectFile.cpp, Line 73
| Ok, NULL symbol will be omitted by Line 75
| res->addSymbol()
| In addSymbol(), Line 167-168
| if (!Sec || (Obj && Obj->section_end() == *Sec)) return std::error_code();
|- dumpCXXData(): llvm/tools/llvm-cxxdump/llvm-cxxdump.cpp, Line 189
| Ok, NULL symbol will be omitted by Line 199-202
| object::section_iterator SecI = *SecIOrErr;
| // Skip external symbols.
| if (SecI == Obj->section_end())
| continue;
`- printLineInfoForInput(): llvm/tools/llvm-rtdyld/llvm-rtdyld.cpp, Line 418
Ok, NULL symbol will be omitted by Line 430-477
if (Type == object::SymbolRef::ST_Function) {
...
}
```
Reviewers: grimar, jhenderson, MaskRay
Reviewed By: jhenderson, MaskRay
Subscribers: rupprecht, arphaman, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76081
Forward declare DemandedBits in IVDescriptors, and move include
into the cpp file. Also drop the include from LoopUtils, which
does not need it at all.