This is an unusual canonicalization because we create an extra instruction,
but it's likely better for analysis and codegen (similar reasoning as D133399).
InstCombine::Negator may create this kind of multiply from negate and shift,
but this should not conflict because of the narrow negation.
I don't know how to create a fully general proof for this kind of transform in
Alive2, but here's an example with bitwidths similar to one of the regression
tests:
https://alive2.llvm.org/ce/z/J3jTjR
Differential Revision: https://reviews.llvm.org/D133667
Much like f16 and f32, we shouldn't try to shrink bf16 to smaller fp
constant. The code may not be optimal, but this allows us to legalize
bf16 constants under Arm without errors.
The previous resolve only creates the host associated varaibles for
common block members, but does not replace the original objects with
the new created ones. Fix it and also compute the sizes and offsets
for the host common block members if they are host associated.
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D127214
The real(10) is supported on x86_64. On aarch64, the value of
selected_real_kind(16) should be 16 rather than 10 since real(10)
is not supported on x86_64. Previously, the real type support check
is not target dependent. Support it now through the target triple
information.
Reviewed By: clementval
Differential Revision: https://reviews.llvm.org/D134021
Instead of using `reverse_iterator`, share the optimization between the 4 algorithms. The key observation here that `memmove` applies to both `copy` and `move` identically, and to their `_backward` versions very similarly. All algorithms now follow the same pattern along the lines of:
```
if constexpr (can_memmove<InIter, OutIter>) {
memmove(first, last, out);
} else {
naive_implementation(first, last, out);
}
```
A follow-up will delete `unconstrained_reverse_iterator`.
This patch removes duplication and divergence between `std::copy`, `std::move` and `std::move_backward`. It also improves testing:
- the test for whether the optimization is used only applied to `std::copy` and, more importantly, was essentially a no-op because it would still pass if the optimization was not used;
- there were no tests to make sure the optimization is not used when the effect would be visible.
Differential Revision: https://reviews.llvm.org/D130695
The current generation is unsafe as it is evaluated during verify
invocation rather than during verifySymbolUses. Remove until this is
safely generated.
Differential Revision: https://reviews.llvm.org/D134558
One of the sources is the same size as the destination so that source
doesn't have an overlap with the destination register. By using the _TIED
form we avoid an early clobber contraint for that source.
This matches what was already done for instrinsics. ConvertToThreeAddress
will fix it if it can't stay tied.
`LLVM_DISTRIBUTION_COMPONENTS` now influences the llvm binary in the
normal cmake output directory when it is set. This allows for
distribution targets to only include tools they want in the llvm
binary. It must be done this way because only one target can be
associated with a specific output name.
Differential Revision: https://reviews.llvm.org/D131310
Add LLVM_LIBRARY_VISIBILITY to remove unneeded GOT and unique_ptr
indirection. We can move other global variables into ctx without
indirection concern. In the long term we may consider passing Ctx
as a parameter to various functions and eliminate global state as
much as possible and then remove `Ctx::reset`.
`config` has 1000+ uses so we try to avoid changing `config->foo`. Define a
wrapper with LLVM_LIBRARY_VISIBILITY to remove unneeded GOT and unique_ptr
indirection.
My x86-64 lld executable is 11+KiB smaller.
Profiles show that `DWARFUnit::ExtractUnitDIENoDwoIfNeeded` is both high firing (tens of thousands of calls) and fast running (15 µs mean).
Timers like this are noise and load for profiling systems, and can be removed.
rdar://100326595
Differential Revision: https://reviews.llvm.org/D134920
Profiles show that `SymbolFileDWARF::FindFunctions` is both high firing (many thousands of calls) and fast running (35 µs mean).
Timers like this are noise and load for profiling systems, and can be removed.
rdar://100326595
Differential Revision: https://reviews.llvm.org/D134922
This ensures PreservedCFGCheckerAnalysis is always added, independent of
whether opt was built with assertions enabled or not.
This fixes a few buildbot failures for bots that don't have assertions
enabled.