It is possible to merge reuse and reorder shuffles and reduce the total
cost of the vectorization tree/number of final instructions.
Differential Revision: https://reviews.llvm.org/D94992
Some elementwise operations are not scalarizable, vectorizable, or tensorizable.
Split `ElementwiseMappable` trait into the following, more precise traits.
- `Elementwise`
- `Scalarizable`
- `Vectorizable`
- `Tensorizable`
This allows for reuse of `Elementwise` in dialects like HLO.
Differential Revision: https://reviews.llvm.org/D97674
This prepares codegen for a change that will remove the identical
folds from IR because they are not poison-safe. See
D93065 / D97360
for details.
We already generically support scalar types, and there are various
target-specific transforms that overlap the vector folds. For example,
x86 recognizes the and patterns, but not or. We can end up with 1
extra instruction there, but I think that is still preferred over the
blendv alternative that loads a constant vector.
If this is not optimal, then it should be fixed with a later transform
(this change is not expected to result in any regressions because
InstCombine currently does the same thing).
Removing custom code and supporting undefs in constant-pattern-matching
can be follow-up changes.
Differential Revision: https://reviews.llvm.org/D97730
lli aims to provide both, RuntimeDyld and JITLink, as the dynamic linkers/loaders for it's JIT implementations. And they both offer debugging via the GDB JIT interface, which builds on the two well-known symbol names `__jit_debug_descriptor` and `__jit_debug_register_code`. As these symbols must be unique accross the linked executable, we can only define them in one of the libraries and make the other depend on it. OrcTargetProcess is a minimal stub for embedding a JIT client in remote executors. For the moment it seems reasonable to have the definition there and let ExecutionEngine depend on it, until we find a better solution.
This is the second commit for the reviewed patch.
Reviewed By: lhames
Differential Revision: https://reviews.llvm.org/D97339
Add a new ObjectLinkingLayer plugin `DebugObjectManagerPlugin` and infrastructure to handle creation of `DebugObject`s as well as their registration in OrcTargetProcess. The current implementation only covers ELF on x86-64, but the infrastructure is not limited to that.
The journey starts with a new `LinkGraph` / `JITLinkContext` pair being created for a `MaterializationResponsibility` in ORC's `ObjectLinkingLayer`. It sends a `notifyMaterializing()` notification, which is forwarded to all registered plugins. The `DebugObjectManagerPlugin` aims to create a `DebugObject` form the provided target triple and object buffer. (Future implementations might create `DebugObject`s from a `LinkGraph` in other ways.) On success it will track it as the pending `DebugObject` for the `MaterializationResponsibility`.
This patch only implements the `ELFDebugObject` for `x86-64` targets. It follows the RuntimeDyld approach for debug object setup: it captures a copy of the input object, parses all section headers and prepares to patch their load-address fields with their final addresses in target memory. It instructs the plugin to report the section load-addresses once they are available. The plugin overrides `modifyPassConfig()` and installs a JITLink post-allocation pass to capture them.
Once JITLink emitted the finalized executable, the plugin emits and registers the `DebugObject`. For emission it requests a new `JITLinkMemoryManager::Allocation` with a single read-only segment, copies the object with patched section load-addresses over to working memory and triggers finalization to target memory. For registration, it notifies the `DebugObjectRegistrar` provided in the constructor and stores the previously pending`DebugObject` as registered for the corresponding MaterializationResponsibility.
The `DebugObjectRegistrar` registers the `DebugObject` with the target process. `llvm-jitlink` uses the `TPCDebugObjectRegistrar`, which calls `llvm_orc_registerJITLoaderGDBWrapper()` in the target process via `TargetProcessControl` to emit a `jit_code_entry` compatible with the GDB JIT interface [1]. So far the implementation only supports registration and no removal. It appears to me that it wouldn't raise any new design questions, so I left this as an addition for the near future.
[1] https://sourceware.org/gdb/current/onlinedocs/gdb/JIT-Interface.html
Reviewed By: lhames
Differential Revision: https://reviews.llvm.org/D97335
The argument value determines the dynamic linker to use (`default`, `rtdyld` or `jitlink`). The JITLink implementation only supports in-process JITing for now. This is the first commit for the reviewed patch.
Reviewed By: lhames
Differential Revision: https://reviews.llvm.org/D97339
CodeCompletionContext::Kind has 36 Kinds. The completion model currently
only handles categorical features of 32 cardinality.
Changing the datatype to uint64_t will solve the problem.
This reverts commit 438b5bb05a.
ccb4124a41 fixed translating -gz=zlib to --compress-debug-sections for
linker invocation for several ToolChains, but omitted FreeBSD.
Differential Revision: https://reviews.llvm.org/D97752
GDB remote protocol does not specify length of g packet for register read. It depends on remote to include all or exclude certain registers from g packet. In case a register or set of registers is not included as part of g packet then we should fall back to p packet for reading all registers excluded from g packet by remote. This patch adds support for above feature and adds a test-case for the same.
Reviewed By: labath
Differential Revision: https://reviews.llvm.org/D97498
The default expansion of CONCAT_VECTORS goes through the stack. This
patch avoids that penalty by custom-lowering CONCAT_VECTORS to a series
of INSERT_SUBVECTOR nodes. Futher optimizations are possible, but this
is a good start.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D97692
In AArch32 ARM, the PC reads two instructions ahead of the currently
executiing instruction. This evaluates to 8 in ARM state and 4 in
Thumb state. Branch instructions on AArch32 compensate for this by
subtracting the PC bias from the addend. For a branch to symbol this
will result in an addend of -8 in ARM state and -4 in Thumb state.
The existing ARM Target::inBranchRange function accounted for this
implict addend within the function meaning that if the addend were
to be taken into account by the caller then it would be double
counted. This complicates the interface for all Targets as callers
wanting to account for addends had to account for the ARM PC-bias.
In certain situations such as:
https://github.com/ClangBuiltLinux/linux/issues/1305
the PC-bias compensation code didn't match up. In particular
normalizeExistingThunk() didn't put the PC-bias back in as Arm
thunks did not store the addend.
The simplest fix for the problem is to add the PC bias in
normalizeExistingThunk when restoring the addend. However I think
it is worth refactoring the Arm inBranchRange implementation so
that fewer calls to getPCBias are needed for other Targets. I
wasn't able to remove getPCBias completely but hopefully the
Relocations.cpp code is simpler now.
In principle a test could be written to replicate the linux kernel
build failure but I wasn't able to reproduce with a small example
that I could build up from scratch.
Fixes https://github.com/ClangBuiltLinux/linux/issues/1305
Differential Revision: https://reviews.llvm.org/D97550
This patch continues detensorizing implementation by detensoring
internal control flow in functions.
In order to detensorize functions, all the non-entry block's arguments
are detensored and branches between such blocks are properly updated to
reflect the detensored types as well. Function entry block (signature)
is left intact.
This continues work towards handling github/google/iree#1159.
Reviewed By: silvas
Differential Revision: https://reviews.llvm.org/D97148
When lli runs the below IR, it emits in-memory debug objects and registers them with the GDB JIT interface. The tests dump and check the registered information. IR has limited ability to produce complex output in a portable way. Instead the tests rely on built-in functions implemented in lli. They use a new command line flag `-generate=function-name` to instruct the ORC JIT to expose the built-in function with the given name to the JITed program.
`debug-descriptor-elf-minimal.ll` calls `__dump_jit_debug_descriptor()` to reflect the list of debug entries issued for itself after emitting the main module. The output is textual and can be checked straight away.
`debug-objects-elf-minimal.ll` calls `__dump_jit_debug_objects()`, which instructs lli to walk through the list of debug entries and append the encountered in-memory objects to the program output. We feed this output into llvm-dwarfdump to parse the DWARF in each file and dump their structures.
We can do the same for JITLink once D97335 has landed.
Reviewed By: lhames
Differential Revision: https://reviews.llvm.org/D97694
clang-tools-extra/clangd/benchmarks/CompletionModel/DecisionForestBenchmark.cpp fails to compile since `"CompletionModel.h"` is auto-generated from clang-tools-extra/clangd/quality/model/features.json, which was changed in https://reviews.llvm.org/D94697 to remove `setFilterLength` and `setIsForbidden`, rename `setFileProximityDistance` and `setSymbolScopeDistance`, and add `setNumNameInContext` and `setFractionNameInContext`. This patch removes calls to the two removed functions, updates calls to the two renamed functions, and adds calls to the two new functions. (`20` is an arbitrary choice for the `setNumNameInContext` argument.) It also changes the `FlipCoin` argument from float to double to silence lossy conversion warnings.
Note: I don't use this tool but encountered the build errors and took a shot at fixing them. Please holler if there's another recommended solution. Thanks!
Reviewed By: usaxena95
Differential Revision: https://reviews.llvm.org/D97620
This makes code completion use a Decision Forest based ranking algorithm to rank
completion candidates. [Esitmated 6% accuracy boost]. This was
previously hidden behind the flag --ranking-model=decision_forest. This
patch makes it the default ranking algorithm.
Note: this is a generic model, not specialized for any particular
project. clangd does not collect or upload data to train code completion.
Also treat Keywords separately as they are not recorded by the training set generator.
Differential Revision: https://reviews.llvm.org/D96353
Add dexter tests using the optnone attribute in various scenarios. Our users
have found optnone useful when debugging optimised code. We have these tests
downstream (and one upstream already: D89873) and we would like to contribute
them if there is any interest.
The tests are fairly self explanatory. Testing optnone with:
* optnone-fastmath.cpp: floats and -ffast-math,
* optnone-simple-functions: simple functions and integer arithmetic,
* optnone-struct-and-methods: a struct with methods,
* optnone-vectors-and-functions: templates and integer vector arithmetic.
optnone-vectors-and-functions contains two FIXMEs. The first problem is that
lldb seems to struggle with evaluating expressions with the templates used
here (example below). Perhaps this is PR42920?
(lldb) p TypeTraits<int __attribute__((ext_vector_type(4)))>::NumElements
error: <user expression 0>:1:1: no template named 'TypeTraits'
TypeTraits<int __attribute__((ext_vector_type(4)))>::NumElements
^
The second is that while lldb cannot evaluate the following expression, gdb
can, but it reports that the variable has been optimzed away. It does this when
compiling at O0 too. llvm-dwarfdump shows that MysteryNumber does have a
location. I don't know whether the DIE is bad or if both debuggers just don't
support it.
TypeTraits<int __attribute__((ext_vector_type(4)))>::MysteryNumber
DW_TAG_variable
DW_AT_specification (0x0000006b "MysteryNumber")
DW_AT_location (DW_OP_addr 0x601028)
DW_AT_linkage_name ("_ZN10TypeTraitsIDv4_iE13MysteryNumberE")
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D97668
Just a pure method renaming.
It is a preparation step for replacing "memory space as raw integer"
with more generic "memory space as attribute", which will be done in
separate commit.
The `MemRefType::getMemorySpace` method will return `Attribute` and
become the main API, while `getMemorySpaceAsInt` will be declared as
deprecated and will be replaced in all in-tree dialects (also in separate
commits).
Reviewed By: mehdi_amini, rriddle
Differential Revision: https://reviews.llvm.org/D97476