Limit elimination of stores at the end of a function to MemoryDefs with
a single underlying object, to save compile time.
In practice, the case with multiple underlying objects seems not very
important in practice. For -O3 -flto on MultiSource/SPEC2000/SPEC2006
this results in a total of 2 more stores being eliminated.
We can always re-visit that in the future.
Similar to the existing transform - peek through a select
to match a value and its negation.
https://alive2.llvm.org/ce/z/MXi5KG
define i8 @src(i1 %b, i8 %x) {
%0:
%neg = sub i8 0, %x
%sel = select i1 %b, i8 %x, i8 %neg
%abs = abs i8 %sel, 1
ret i8 %abs
}
=>
define i8 @tgt(i1 %b, i8 %x) {
%0:
%abs = abs i8 %x, 1
ret i8 %abs
}
Transformation seems to be correct!
This relands D84013 but with a test that relies on less shell features to
hopefully make the test pass on Fuchsia (where the test from the previous patch
version strangely failed with a plain "Exit code 1").
Original summary:
D81347 changes the ASTFileSignature to be an array of 20 uint8_t instead of 5 uint32_t.
However, it didn't update the code in ObjectFilePCHContainerOperations that creates
the dwoID in the module from the ASTFileSignature (`Buffer->Signature` being the
array subclass that is now `std::array<uint8_t, 20>` instead of `std::array<uint32_t, 5>`).
```
uint64_t Signature = [..] (uint64_t)Buffer->Signature[1] << 32 | Buffer->Signature[0]
```
This code works with the old ASTFileSignature (where two uint32_t are enough to
fill the uint64_t), but after the patch this only took two bytes from the ASTFileSignature
and only partly filled the Signature uint64_t.
This caused that the dwoID in the module ref and the dwoID in the actual module no
longer match (which in turns causes that LLDB keeps warning about the dwoID's not
matching when debugging -gmodules-compiled binaries).
This patch just unifies the logic for turning the ASTFileSignature into an uint64_t which
makes the dwoID match again (and should prevent issues like that in the future).
Reviewed By: aprantl, dang
Differential Revision: https://reviews.llvm.org/D84013
This is a fixup of commit 0819a6416f (D77152) which could
result in miscompiles. The miscompile could only happen for targets
where isOperationLegalOrCustom could return different values for
FSHL and FSHR.
The commit mentioned above added logic in expandFunnelShift to
convert between FSHL and FSHR by swapping direction of the
funnel shift. However, that transform is only legal if we know
that the shift count (modulo bitwidth) isn't zero.
Basically, since fshr(-1,0,0)==0 and fshl(-1,0,0)==-1 then doing a
rewrite such as fshr(X,Y,Z) => fshl(X,Y,0-Z) would be incorrect if
Z modulo bitwidth, could be zero.
```
$ ./alive-tv /tmp/test.ll
----------------------------------------
define i32 @src(i32 %x, i32 %y, i32 %z) {
%0:
%t0 = fshl i32 %x, i32 %y, i32 %z
ret i32 %t0
}
=>
define i32 @tgt(i32 %x, i32 %y, i32 %z) {
%0:
%t0 = sub i32 32, %z
%t1 = fshr i32 %x, i32 %y, i32 %t0
ret i32 %t1
}
Transformation doesn't verify!
ERROR: Value mismatch
Example:
i32 %x = #x00000000 (0)
i32 %y = #x00000400 (1024)
i32 %z = #x00000000 (0)
Source:
i32 %t0 = #x00000000 (0)
Target:
i32 %t0 = #x00000020 (32)
i32 %t1 = #x00000400 (1024)
Source value: #x00000000 (0)
Target value: #x00000400 (1024)
```
It could be possible to add back the transform, given that logic
is added to check that (Z % BW) can't be zero. Since there were
no test cases proving that such a transform actually would be useful
I decided to simply remove the faulty code in this patch.
Reviewed By: foad, lebedev.ri
Differential Revision: https://reviews.llvm.org/D86430
* Generic mlir.ir.Attribute class.
* First standard attribute (mlir.ir.StringAttr), following the same pattern as generic vs standard types.
* NamedAttribute class.
Differential Revision: https://reviews.llvm.org/D86250
D70867 introduced support for expanding most ppc_fp128 operations. But
sitofp/uitofp is missing. This patch adds that after D81669.
Reviewed By: uweigand
Differntial Revision: https://reviews.llvm.org/D81918
We may meet Invalid CTR loop crash when there's constrained ops inside.
This patch adds constrained FP intrinsics to the list so that CTR loop
verification doesn't complain about it.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D81924
This patch makes these operations legal, and add necessary codegen
patterns.
There's still some issue similar to D77033 for conversion from v1i128
type. But normal type tests synced in vector-constrained-fp-intrinsic
are passed successfully.
Reviewed By: uweigand
Differential Revision: https://reviews.llvm.org/D83654
If not overridden, AddClangSystemIncludeArgs's implementation is empty, so by
default, no system include args are added to the Clang driver. This means that
invoking Clang without the frontend must include a manual -I/usr/include flag,
which is inconsistent behavior. Therefore, override and implement this method
to match. Some boilerplate is also borrowed for handling of the other driver
flags.
While we are here, also override and enable HasNativeLLVMSupport.
Patch by: 3405691582 (dana koch)
Differential Revision: https://reviews.llvm.org/D86412
This patch fix the usage of the wait-argument in a clause and add several tests and fix the unparsing of
the wait-argument.
Reviewed By: sscalpone
Differential Revision: https://reviews.llvm.org/D86325
Removing terminators will result in invalid IR, making further
reductions pointless. I do not think there is any valid use case where
we actually want to create invalid IR as part of a reduction.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D86210
The following program miscompiles because rL216012 added static
relocation model support but not for PIC.
```
// clang -fpic -mcmodel=large -O0 a.cc
double foo() { return 42.0; }
```
This patch adds PIC support.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D86024
The pattern matching does not account for truncating stores,
so it is unlikely to work at later stages. So we are likely
wasting compile-time with no hope of improvement by running
this later.
The example demonstrates how to use a module summary index file produced for ThinLTO to:
* find the module that defines the main entry point
* find all extra modules that are required for the build
A LIT test runs the example as part of the LLVM test suite [1] and shows how to create a module summary index file.
The code also provides two Error types that can be useful when working with ThinLTO summaries.
[1] if LLVM_BUILD_EXAMPLES=ON and platform is not Windows
Differential Revision: https://reviews.llvm.org/D85974
As suggested by @rsmith on PR47267, by replacing the builtin_memcpy bitcast pattern with builtin_bit_cast we can use _castf32_u32, _castu32_f32, _castf64_u64 and _castu64_f64 inside constant expresssions (constexpr). Although __builtin_bit_cast was added for c++20 it works on all clang c/c++ modes.
Differential Revision: https://reviews.llvm.org/D86398
This will allow out-of-tree translation to register the dialects they expect
to see in their input, on the model of getDependentDialects() for passes.
Differential Revision: https://reviews.llvm.org/D86409
Currently, this function is present in the dynsym table of
libunwind.so (on ELF targets). Make the function static instead.
In the previous release (LLVM 10.x), this function was instead a lambda
function inside LocalAddressSpace::findUnwindSections, and because
LocalAddressSpace was marked with _LIBUNWIND_HIDDEN, the lambda
function was also a hidden symbol.
Differential Revision: https://reviews.llvm.org/D86372
gcc errors on this, but I'm nervous that since -mtune has been
ignored by clang for so long that there may be code bases out
there that pass 32-bit cpus to clang.
so that the user does not have to pipe the output to `jq` or `python -m json.tool`.
This change makes testing more convenient because `-NEXT` patterns can be used.
The "prettify by default" is a good tradeoff to make. The output size increases a bit.
Differential Revision: https://reviews.llvm.org/D86318
This should be NFC in terms of output because the endian
check further down would bail out too, but we are wasting
time by waiting to that point to give up. If we generalize
that function to deal with more than i8 types, we should
not have to deal with the degenerate case.