This patch adds the capability to perform constraint redundancy checks for `FlatAffineConstraints` using `Simplex`, via a new member function `FlatAffineConstraints::removeRedundantConstraints`. The pre-existing redundancy detection algorithm runs a full rational emptiness check for each inequality separately for checking redundancy. Leveraging the existing `Simplex` infrastructure, in this patch we have an algorithm for redundancy checks that can check each constraint by performing pivots on the tableau, which provides an alternative to running Fourier-Motzkin elimination for each constraint separately.
Differential Revision: https://reviews.llvm.org/D84935
The `UnrollMaxBlockToAnalyze` parameter is used at the stage when we have no
information about a loop body BB cost. In some cases, e.g. for simple loop
```
for(int i=0; i<32; ++i){
D = Arr2[i*8 + C1];
Arr1[i*64 + C2] += C3 * D;
Arr1[i*64 + C2 + 2048] += C4 * D;
}
```
current default parameter value is not enough to run deeper cost analyze so the
loop is not completely unrolled.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D86248
It isn't very wise to pass an assembly file to the compiler and tell it to compile as a C file and hope that the compiler recognizes it as assembly instead.
Instead enable the ASM language and mark the files as being ASM.
[525/634] Building C object lib/tsan/CMakeFiles/clang_rt.tsan-aarch64.dir/rtl/tsan_rtl_aarch64.S.o
FAILED: lib/tsan/CMakeFiles/clang_rt.tsan-aarch64.dir/rtl/tsan_rtl_aarch64.S.o
/opt/tooling/drive/host/bin/clang --target=aarch64-linux-gnu -I/opt/tooling/drive/llvm/compiler-rt/lib/tsan/.. -isystem /opt/tooling/drive/toolchain/opt/drive/toolchain/include -x c -Wall -Wno-unused-parameter -fno-lto -fPIC -fno-builtin -fno-exceptions -fomit-frame-pointer -funwind-tables -fno-stack-protector -fno-sanitize=safe-stack -fvisibility=hidden -fno-lto -O3 -gline-tables-only -Wno-gnu -Wno-variadic-macros -Wno-c99-extensions -Wno-non-virtual-dtor -fPIE -fno-rtti -Wframe-larger-than=530 -Wglobal-constructors --sysroot=. -MD -MT lib/tsan/CMakeFiles/clang_rt.tsan-aarch64.dir/rtl/tsan_rtl_aarch64.S.o -MF lib/tsan/CMakeFiles/clang_rt.tsan-aarch64.dir/rtl/tsan_rtl_aarch64.S.o.d -o lib/tsan/CMakeFiles/clang_rt.tsan-aarch64.dir/rtl/tsan_rtl_aarch64.S.o -c /opt/tooling/drive/llvm/compiler-rt/lib/tsan/rtl/tsan_rtl_aarch64.S
/opt/tooling/drive/llvm/compiler-rt/lib/tsan/rtl/tsan_rtl_aarch64.S:29:1: error: expected identifier or '('
.section .text
^
1 error generated.
Fixed Clang not being passed as the assembly compiler for compiler-rt runtime build.
Patch By: tambre
Differential Revision: https://reviews.llvm.org/D85706
Use the stack to save and restore the link register when there is no
available register to do it.
Differential Revision: https://reviews.llvm.org/D76069
When hoisting simple values out from a loop, and an optsize attribute, a
convergent call, or an invoke instruction hindered the pass from
unswitching the loop, the pass would return an incorrect Modified
status.
This was caught using the check introduced by D80916.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D86085
Comparison against null is a common pattern that usually is followed by
error handling code and the likes. We now use AANonNull to simplify
these comparisons optimistically in order to make more code dead early
on.
Reviewed By: uenoku
Differential Revision: https://reviews.llvm.org/D86145
`AADereferenceable::getAssumedDereferenceableBytes()` is actually
deducing `dereferenceable_or_null`. We should not use that information
to deduce `nonnull`, since it doesn't imply `nonnull`.
This patch adds support for constrained scalar fp to int operations on
PowerPC. Besides, this fixes the FP exception bit of quad-precision
convert & truncate instructions.
Reviewed By: steven.zhang, uweigand
Differential Revision: https://reviews.llvm.org/D81537
This commit introduced a non-trivial compile time regression that needs
to be addressed: https://reviews.llvm.org/D70365#2227627
Given that it is unclear how long that will take, I'll revert it for
now.
This reverts commit eedf18fc1f.
This commits breaks certain OpenMP codes (on power) because it expanded
the Attributor scope without telling the Attributor about the SCC
extend. See: https://reviews.llvm.org/D85544#2227611
This reverts commit b0b32e6490.
Target memory manager is introduced in this patch which aims to manage target
memory such that they will not be freed immediately when they are not used
because the overhead of memory allocation and free is very large. For CUDA
device, cuMemFree even blocks the context switch on device which affects
concurrent kernel execution.
The memory manager can be taken as a memory pool. It divides the pool into
multiple buckets according to the size such that memory allocation/free
distributed to different buckets will not affect each other.
In this version, we use the exact-equality policy to find a free buffer. This
is an open question: will best-fit work better here? IMO, best-fit is not good
for target memory management because computation on GPU usually requires GBs of
data. Best-fit might lead to a serious waste. For example, there is a free
buffer of size 1960MB, and now we need a buffer of size 1200MB. If best-fit,
the free buffer will be returned, leading to a 760MB waste.
The allocation will happen when there is no free memory left, and the memory
free on device will take place in the following two cases:
1. The program ends. Obviously. However, there is a little problem that plugin
library is destroyed before the memory manager is destroyed, leading to a fact
that the call to target plugin will not succeed.
2. Device is out of memory when we request a new memory. The manager will walk
through all free buffers from the bucket with largest base size, pick up one
buffer, free it, and try to allocate immediately. If it succeeds, it will
return right away rather than freeing all buffers in free list.
Update:
A threshold (8KB by default) is set such that users could control what size of memory
will be managed by the manager. It can also be configured by an environment variable
`LIBOMPTARGET_MEMORY_MANAGER_THRESHOLD`.
Reviewed By: jdoerfert, ye-luo, JonChesterfield
Differential Revision: https://reviews.llvm.org/D81054
- Rename AMDGPU SCC DWARF register to STATUS since the scalar
condition code is a bit within the STATUS register.
- Correct bit size of the VCC_64 register to 64 which is the size in
wave64 mode.
Differential Revision: https://reviews.llvm.org/D86259
We don't need a std::string for a literal string, we can use a
StringRef.
The addition of StringRefs produces a Twine that we can just call
str() without converting to a SmallString ourselves. Twine will
do that internally.
- This utility to merge a block anywhere into another one can help inline single
block regions into other blocks.
- Modified patterns test to use the new function.
Differential Revision: https://reviews.llvm.org/D86251
This adds parsing and codegen support for tune in target attribute.
I've implemented this so that arch in the target attribute implicitly disables tune from the command line. I'm not sure what gcc does here. But since -march implies -mtune. I assume 'arch' in the target attribute implies tune in the target attribute.
Differential Revision: https://reviews.llvm.org/D86187
for array bounds, not "integer constant" rules.
For an array bound of class type, this causes us to perform an implicit
conversion to size_t, instead of looking for a unique conversion to
integral or unscoped enumeration type. This affects which cases are
valid when a class has multiple implicit conversion functions to
different types.
I move the triple (de)composition logic into the builder in e5d08fcbac
but this test is relying on Make to construct the set the ARCH,
ARCH_CFLAGS and SDKROOT based on the given TRIPLE. This patch updates
the test to pass these variables directly.
Differential revision: https://reviews.llvm.org/D86244
This patch adds the infrastructure to have language specific REPL init
files. It's the foundation work to a following patch that will introduce
Swift REPL init file.
When lldb is launched with the `--repl` option, it will look for a REPL
init file in the home directory and source it. This overrides the
default `~/.lldbinit`, which content might make the REPL behave
unexpectedly. If the REPL init file doesn't exists, lldb will fall back
to the default init file.
rdar://65836048
Differential Revision: https://reviews.llvm.org/D86242
Signed-off-by: Med Ismail Bennani <medismail.bennani@gmail.com>
The behavior of the CrossOver mutator has changed with
bb54bcf849. This seems to affect the
value-profile-load test on Darwin. This patch provides a wider margin for
determining success of the value-profile-load test, by testing the targeted
functionality (i.e., GEP index value profile) more directly and faster. To this
end, LoadTest.cpp now uses a narrower condition (Size != 8) for initial pruning
of inputs, effectively preventing libFuzzer from generating inputs longer than
necessary and spending time on mutating such long inputs in the corpus - a
functionality not meant to be tested by this specific test.
Previously, on x86/Linux, it required 6,597,751 execs with -use_value_profile=1
and 19,605,575 execs with -use_value_profile=0 to hit the crash. With this
patch, the test passes with 174,493 execs, providing a wider margin from the
given trials of 10,000,000. Note that, without the value profile (i.e.,
-use_value_profile=0), the test wouldn't pass as it still requires 19,605,575
execs to hit the crash.
Differential Revision: https://reviews.llvm.org/D86247
InitializeInterceptors() calls dlsym(), which calls calloc(). Depending
on the allocator implementation, calloc() may invoke mmap(), which
results in a segfault since REAL(mmap) is still being resolved.
We fix this by doing a direct syscall if interceptors haven't been fully
resolved yet.
Reviewed By: vitalybuka
Differential Revision: https://reviews.llvm.org/D86168
-force-attribute adds an attribute to function via command-line.
However, there was no counter-part to remove an attribute. This patch
adds -force-remove-attribute that removes an attribute from function.
Differential Revision: https://reviews.llvm.org/D85586
There's a potential motivating case to increase this limit in PR47191:
http://bugs.llvm.org/PR47191
But first we should make it less hacky. The limit in InstCombine is directly tied
to this value because an increase there can cause asserts in the underlying value
tracking calls if not changed together. The usage in VectorUtils is independent,
but the comment suggests that we should use the same value unless there's a known
reason to diverge. There are similar limits in codegen analysis, but I think we
should leave those independent in case we intentionally want the optimization
power/cost to be different there.
Differential Revision: https://reviews.llvm.org/D86113
Add the unsigned complements to the existing FPToSI and SIToFP operations in the
standard dialect, with one-to-one lowerings to the corresponding LLVM operations.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D85557
Some code bases out there pass -mtune=generic to clang. This would have
been ignored prior to D85384. Now it results in an error
because "generic" isn't recognized by isValidCPUName.
And if we let it go through to the backend as a tune
setting it would get the tune flags closer to i386 rather
than a modern CPU.
I plan to change what tune=generic does in the backend in
a future patch. And allow this in the frontend.
But this should be a quick fix for the error some users
are seeing.
If the function is not marked exlicitly as declare target and it calls
function(s), marked as declare target device_type(host), these host-only
functions should not be dignosed as used in device mode, if the caller
function is not used in device mode too.
Differential Revision: https://reviews.llvm.org/D86164
PDL presents a high level abstraction for the rewrite pattern infrastructure available in MLIR. This abstraction allows for representing patterns transforming MLIR, as MLIR. This allows for applying all of the benefits that the general MLIR infrastructure provides, to the infrastructure itself. This means that pattern matching can be more easily verified for correctness, targeted by frontends, and optimized.
PDL abstracts over various different aspects of patterns and core MLIR data structures. Patterns are specified via a `pdl.pattern` operation. These operations contain a region body for the "matcher" code, and terminate with a `pdl.rewrite` that either dispatches to an external rewriter or contains a region for the rewrite specified via `pdl`. The types of values in `pdl` are handle types to MLIR C++ types, with `!pdl.attribute`, `!pdl.operation`, and `!pdl.type` directly mapping to `mlir::Attribute`, `mlir::Operation*`, and `mlir::Value` respectively.
An example pattern is shown below:
```mlir
// pdl.pattern contains metadata similarly to a `RewritePattern`.
pdl.pattern : benefit(1) {
// External input operand values are specified via `pdl.input` operations.
// Result types are constrainted via `pdl.type` operations.
%resultType = pdl.type
%inputOperand = pdl.input
%root, %results = pdl.operation "foo.op"(%inputOperand) -> %resultType
pdl.rewrite(%root) {
pdl.replace %root with (%inputOperand)
}
}
```
This is a culmination of the work originally discussed here: https://groups.google.com/a/tensorflow.org/g/mlir/c/j_bn74ByxlQ
Differential Revision: https://reviews.llvm.org/D84578
This patch was reverted in 7c182663a8 due to some failures
observed on PCC based machines. Failures were due to Endianness issue and
long double representation issues.
Patch is revised to address Endianness issue. Furthermore, support
for emission of `DW_OP_implicit_value` for `long double` has been removed
(since it was unclean at the moment). Planning to handle this in
a clean way soon!
For more context, please refer to following review link.
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D83560
This patch contains the following changes:
1. Renamed the function `DeviceTy::data_exchange` to `DeviceTy::dataExchange`;
2. Changed the second argument `DeviceTy DstDev` to `DeviceTy &DstDev`;
3. Renamed the last argument.
Reviewed By: ye-luo
Differential Revision: https://reviews.llvm.org/D86238
TargetRegisterInfo::getMinimalPhysRegClass() returns rtcGPR64RegClassID for X16
and X17, as it's the last matching class. This in turn gets passed to
AArch64RegisterBankInfo::getRegBankFromRegClass(), which hits an unreachable.
It seems sensible to handle this case, so copies from X16 and X17 work.
Copying from X17 is used in inline assembly in libunwind for pointer
authentication.
Differential Revision: https://reviews.llvm.org/D85720
llvm is missing support for DW_OP_implicit_value operation.
DW_OP_implicit_value op is indispensable for cases such as
optimized out long double variables.
For intro refer: DWARFv5 Spec Pg: 40 2.6.1.1.4 Implicit Location Descriptions
Consider the following example:
```
int main() {
long double ld = 3.14;
printf("dummy\n");
ld *= ld;
return 0;
}
```
when compiled with tunk `clang` as
`clang test.c -g -O1` produces following location description
of variable `ld`:
```
DW_AT_location (0x00000000:
[0x0000000000201691, 0x000000000020169b): DW_OP_constu 0xc8f5c28f5c28f800, DW_OP_stack_value, DW_OP_piece 0x8, DW_OP_constu 0x4000, DW_OP_stack_value, DW_OP_bit_piece 0x10 0x40, DW_OP_stack_value)
DW_AT_name ("ld")
```
Here one may notice that this representation is incorrect(DWARF4
stack could only hold integers(and only up to the size of address)).
Here the variable size itself is `128` bit.
GDB and LLDB confirms this:
```
(gdb) p ld
$1 = <invalid float value>
(lldb) frame variable ld
(long double) ld = <extracting data from value failed>
```
GCC represents/uses DW_OP_implicit_value in these sort of situations.
Based on the discussion with Jakub Jelinek regarding GCC's motivation
for using this, I concluded that DW_OP_implicit_value is most appropriate
in this case.
Link: https://gcc.gnu.org/pipermail/gcc/2020-July/233057.html
GDB seems happy after this patch:(LLDB doesn't have support
for DW_OP_implicit_value)
```
(gdb) p ld
p ld
$1 = 3.14000000000000012434
```
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D83560
Print which load command we were looking for when the sanity check
fails:
AssertionError: 0 != 1 : wrong number of load commands for
LC_VERSION_MIN_MACOSX
D81345 appears to accidentally disables vectorization when explicitly
enabled. As PGSO isn't currently accessible from LoopAccessInfo, revert back to
the vectorization with versioning-for-unit-stride for PGSO.
Differential Revision: https://reviews.llvm.org/D85784