The index type does not have a bitsize and hence the size of corresponding allocations cannot be computed. Instead, the promotion pass now has an explicit option to specify the size of index.
Differential Revision: https://reviews.llvm.org/D91360
This exposes a hook to configure legality of operations such that only
`scf.parallel` operations that have mapping attributes are marked as
illegal. Consequently, the transformation can now also be applied to
mixed forms.
Differential Revision: https://reviews.llvm.org/D91340
A recent refactoring removed the need to interleave verifier passes and instead opted to verify during the normal execution of passes instead. As such, the old verify pass is no longer necessary and can be removed.
Differential Revision: https://reviews.llvm.org/D91212
This revision adds support in the parser/printer for "deferrable" aliases, i.e. those that can be resolved after printing has finished. This allows for printing aliases for operation locations after the module instead of before, i.e. this is now supported:
```
"foo.op"() : () -> () loc(#loc)
#loc = loc("some_location")
```
Differential Revision: https://reviews.llvm.org/D91227
On x86_64, when you hit a __builtin_debugtrap instruction, you
can continue past this in the debugger. This patch has debugserver
recognize the specific instruction used for __builtin_debugtrap
and advance the pc past it, so that the user can continue execution
once they've hit one of these.
In the patch discussion, we were in agreement that it would be better
to have this knowledge up in lldb instead of depending on each
stub rewriting the pc behind the debugger's back, but that's a
larger scale change for another day.
<rdar://problem/65521634>
Differential revision: https://reviews.llvm.org/D91238
The new option `-fproc-stat-info=<file>` can be used to generate report
about used memory and execution tile of each stage of compilation.
Documentation for this option can be found in `UserManual.rst`. The
option can be used in parallel builds.
Differential Revision: https://reviews.llvm.org/D78903
This removes the need to have an explicit `cast<>` given that we always know it `isa` instance of the interface.
Differential Revision: https://reviews.llvm.org/D91304
Some users have native c++ data types that correspond to floating point values stored within a DenseElementsAttr that do not have a corresponding native C++ data type(e.g. bfloat16/half/etc). This revision allows for such users to use those native types directly, and removes the need to go through APFloat when the much faster native value path is available.
Differential Revision: https://reviews.llvm.org/D91402
implementation.
This patch aims to improve support for out-of-process JITing using OrcV2. It
introduces two new class templates, OrcRPCTargetProcessControlBase and
OrcRPCTPCServer, which together implement the TargetProcessControl API by
forwarding operations to an execution process via an Orc-RPC Endpoint. These
utilities are used to implement out-of-process JITing from llvm-jitlink to
a new llvm-jitlink-executor tool.
This patch also breaks the OrcJIT library into three parts:
-- OrcTargetProcess: Contains code needed by the JIT execution process.
-- OrcShared: Contains code needed by the JIT execution and compiler
processes
-- OrcJIT: Everything else.
This break-up allows JIT executor processes to link against OrcTargetProcess
and OrcShared only, without having to link in all of OrcJIT. Clients executing
JIT'd code in-process should start linking against OrcTargetProcess as well as
OrcJIT.
In the near future these changes will enable:
-- Removal of the OrcRemoteTargetClient/OrcRemoteTargetServer class templates
which provided similar functionality in OrcV1.
-- Restoration of Chapter 5 of the Building-A-JIT tutorial series, which will
serve as a simple usage example for these APIs.
-- Implementation of lazy, cross-target compilation in lli's -jit-kind=orc-lazy
mode.
This option was in a rather convoluted place, causing global parameters
to be set in awkward and undesirable ways to try to account for it
indirectly. Add tests for the -disable-debug-info option and ensure we
don't print unintended markers from unintended places.
Reviewed By: dstenb
Differential Revision: https://reviews.llvm.org/D91083
This was a mistake introduced in D91294. I'm not sure how to
exercise this with the existing code, but I hit it while trying
some follow up experiments.
We can't store garbage in the unused bits. It possible that something like zextload from i1/i2/i4 is created to read the memory. Those zextloads would be legalized assuming the extra bits are 0.
I'm not sure that the code in lowerStore is executed for the v1i1/v2i1/v4i1 case. It looks like the DAG combine in combineStore may have converted them to v8i1 first. And I think we're missing some cases to avoid going to the stack in the first place. But I don't have time to investigate those things at the moment so I wanted to focus on the correctness issue.
Should fix PR48147.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D91294
If we cannot prove that the check is trivially true, but can prove that it either
fails on the 1st iteration or never fails, we can replace it with first iteration check.
Differential Revision: https://reviews.llvm.org/D88527
Reviewed By: skatkov
07f1047f41 changed the CMake detection to use find_package(Python3 ...
but didn't update the lit configuration to use the expected Python3_EXECUTABLE
cmake variable to point to the interpreter path.
This resulted in an empty path on MacOS.
Currently the affinity format string has initial value. When users set
the format via OMP_AFFINITY_FORMAT, it will overwrite the format string. However,
when copying the format, the tailing null is missing. As a result, if the user
format string is shorter than default value, the remaining part in the default
value still makes effort. This bug is not exposed because the test case doesn't
check the end of a string. It only checks whether given output "contains" the
check string.
Reviewed By: AndreyChurbanov
Differential Revision: https://reviews.llvm.org/D91309
- If an aggregate argument is indirectly accessed within kernels, direct
passing results in unpromotable `alloca`, which degrade performance
significantly. InferAddrSpace pass is enhanced in
[D91121](https://reviews.llvm.org/D91121) to take the assumption that
generic pointers loaded from the constant memory could be regarded
global ones. The need for the coercion on aggregate arguments is
mitigated.
Differential Revision: https://reviews.llvm.org/D89980
There might be some demanded/known bits way to generalize this,
but I'm not seeing it right now.
This came up as a regression when I was looking at a different
demanded bits improvement.
https://rise4fun.com/Alive/5fl
Name: general
Pre: ((-1 << countTrailingZeros(C1)) & C2) == 0
%a1 = add i8 %x, C1
%a2 = and i8 %x, C2
%r = sub i8 %a1, %a2
=>
%r = and i8 %a1, ~C2
Name: test 1
%a1 = add i8 %x, 192
%a2 = and i8 %x, 10
%r = sub i8 %a1, %a2
=>
%r = and i8 %a1, -11
Name: test 2
%a1 = add i8 %x, -108
%a2 = and i8 %x, 3
%r = sub i8 %a1, %a2
=>
%r = and i8 %a1, -4
- Move isSupportedMemRefType() to ConvertToLLVMPatterns and check if the
memref element type is supported there.
Differential Revision: https://reviews.llvm.org/D91374
Display null pointer as `nullptr`, `nil` and `NULL` for C++,
Objective-C/Objective-C++ and C respectively. The original motivation
for this patch was to display a null std::string pointer as nullptr
instead of "", but the fix seemed generic enough to be done for all
summary providers.
Differential revision: https://reviews.llvm.org/D77153
The previous code defined it as allocating a new memref for its result.
However, this is not how it is treated by the dialect conversion framework,
that does the equivalent of inserting and folding it away internally
(even independent of any canonicalization patterns that we have
defined).
The semantics as they were previously written were also very
constraining: Nontrivial analysis is needed to prove that the new
allocation isn't needed for correctness (e.g. to avoid aliasing).
By removing those semantics, we avoid losing that information.
Differential Revision: https://reviews.llvm.org/D91382
We lower them to a std.global_memref (uniqued by constant value) + a
std.get_global_memref to produce the corresponding memref value.
This allows removing Linalg's somewhat hacky lowering of tensor
constants, now that std properly supports this.
Differential Revision: https://reviews.llvm.org/D91306
It was incorrect in the presence of a tensor argument with multiple
uses.
The bufferization of subtensor_insert was writing into a converted
memref operand, but there is no guarantee that the converted memref for
that operand is safe to write into. In this case, the same converted
memref is written to in-place by the subtensor_insert bufferization,
violating the tensor-level semantics.
I left some comments in a TODO about ways forward on this. I will be
working actively on this problem in the coming days.
Differential Revision: https://reviews.llvm.org/D91371
Select the following:
- G_SELECT cc, 0, 1 -> CSINC zreg, zreg, cc
- G_SELECT cc 0, -1 -> CSINV zreg, zreg cc
- G_SELECT cc, 1, f -> CSINC f, zreg, inv_cc
- G_SELECT cc, -1, f -> CSINV f, zreg, inv_cc
- G_SELECT cc, t, 1 -> CSINC t, zreg, cc
- G_SELECT cc, t, -1 -> CSINC t, zreg, cc
(IR example: https://godbolt.org/z/YfPna9)
These correspond to a bunch of the AArch64csel patterns in AArch64InstrInfo.td.
Unfortunately, it doesn't seem like we can import patterns that use NZCV like
those ones do. E.g.
```
def : Pat<(AArch64csel GPR32:$tval, (i32 1), (i32 imm:$cc), NZCV),
(CSINCWr GPR32:$tval, WZR, (i32 imm:$cc))>;
```
So we have to manually select these for now.
This replaces `selectSelectOpc` with an `emitSelect` function, which performs
these optimizations.
Differential Revision: https://reviews.llvm.org/D90701
When parsing DWARF and laying out bit-fields we don't properly take into account when they are in a union, they will all have a zero offset.
Differential Revision: https://reviews.llvm.org/D91118