This patch adds a specialized implementation of getIntrinsicInstrCost
and add initial cost-modeling for min/max vector intrinsics.
AArch64 NEON support umin/smin/umax/smax for vectors
<8 x i8>, <16 x i8>, <4 x i16>, <8 x i16>, <2 x i32> and <4 x i32>.
Notably, it does not support vectors with i64 elements.
This change by itself should have very little impact on codegen, but in
follow-up patches I plan to teach the vectorizers to consider using
those intrinsics on platforms where it is profitable, e.g. because there
is no general 'select'-like instruction.
The current cost returned should be better for throughput, latency and size.
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D89953
Split the current NetBSD watchpoint implementation for x86 into Utility,
and revamp it to improve readability. This code is meant to be used
as a common class for all x86 watchpoint implementation, particularly
these on FreeBSD and Linux.
The code uses global watchpoint enable bits, as required by the NetBSD
kernel. If it ever becomes necessary for any platform to use local
enable bits instead, this can be trivially abstracted out.
The code also postpones clearing DR6 until a new different watchpoint
is being set in place of the old one. This is necessary since LLDB
repeatedly reenables watchpoints on all threads, by clearing
and restoring them. When DR6 is cleared as a part of that, then pending
events on other threads can no longer be associated with watchpoints
correctly.
Differential Revision: https://reviews.llvm.org/D89874
This patch adjusts _when_ something happens in LiveDebugValues /
InstrRefBasedLDV, to make it more amenable to dealing with DBG_INSTR_REF
instructions. There's no functional change.
In the current InstrRefBasedLDV implementation, we collect the machine
value-number transfer function for blocks at the same time as the
variable-value transfer function. After solving machine value numbers, the
variable-value transfer function is updated so that DBG_VALUEs of live-in
registers have the correct value. The same would need to be done for
DBG_INSTR_REFs, to connect instruction-references with machine value
numbers.
Rather than writing more code for that, this patch separates the two: we
collect the (machine-value-number) transfer function and solve for
machine value numbers, then step through the MachineInstrs again collecting
the variable value transfer function. This simplifies things for the new
few patches.
Differential Revision: https://reviews.llvm.org/D85760
Added optimization pass to convert heap-based allocs to stack-based allocas in
buffer placement. Added the corresponding test file.
Differential Revision: https://reviews.llvm.org/D89688
Before this change, we would run `maxIterations` if the first iteration changed the op.
After this change, we exit the loop as soon as an iteration hasn't changed the op.
Assuming that we have reached a fixed point when an iteration doesn't change the op, this doesn't affect correctness.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D89981
This patch copies @vsk's fix to instcombine from D85555 over to mem2reg. The
motivation and rationale are exactly the same: When mem2reg removes an alloca,
it erases the dbg.{addr,declare} instructions which refer to the alloca. It
would be better to instead remove all debug intrinsics which describe the
contents of the dead alloca, namely all dbg.value(<dead alloca>, ...,
DW_OP_deref)'s.
As far as I can tell, prior to D80264 these `dbg.value+deref`s would have been
silently dropped instead of being made `undef`, so we're just returning to
previous behaviour with these patches.
Testing:
`llvm-lit llvm/test` and `ninja check-clang` gave no unexpected failures. Added
3 tests, each of which covers a dbg.value deletion path in mem2reg:
mem2reg-promote-alloca-1.ll
mem2reg-promote-alloca-2.ll
mem2reg-promote-alloca-3.ll
The first is based on the dexter test inlining.c from D89543. This patch also
improves the debugging experience for loop.c from D89543, which suffers
similarly after arg promotion instead of inlining.
These are all inspired by existing test coverage we have in an internal
testsuite.
Reviewed by: grimar, MaskRay
Differential Revision: https://reviews.llvm.org/D89775
Use isKnownXY comparators when one of the operands can be with
scalable vectors or getFixedSize() for all the other cases.
This patch also does bug fixes for getPrimitiveSizeInBits by using
getFixedSize() near the places with the TypeSize comparison.
Differential Revision: https://reviews.llvm.org/D89703
When switching the register debug operands to $noreg in
setupDebugValueUndef() also clear the sub-register indices for virtual
registers. This is done when marking DBG_VALUEs undef in other cases,
e.g. in LiveDebugVariables. I have not found any cases where leaving the
sub-register index causes any issues, and the indices would eventually
get dropped when LiveDebugVariables reinserted the undef DBG_VALUEs
after register scheduling, but if nothing else it looked a bit weird in
printouts to have sub-register indices on $noreg, and I don't think the
sub-register index holds any meaningful information at that point.
I have not been able to find any source-level reproducer for this with
an upstream target, so I have just added an instrumented machine-sink
test.
Reviewed By: djtodoro, jmorse
Differential Revision: https://reviews.llvm.org/D89941
This diff refactors the code which determines the tool type based on
how llvm-objcopy is invoked (objcopy vs strip vs bitcode-strip vs install-name-tool).
NFC.
Test plan: make check-all
Differential revision: https://reviews.llvm.org/D89713
We want to have a caching version of symbolic BE exit count
rather than recompute it every time we need it.
Differential Revision: https://reviews.llvm.org/D89954
Reviewed By: nikic, efriedma
The runtimes build was lying to the various runtimes builds by setting
XXX_STANDALONE_BUILD=ON when they are really not being built standalone.
Only COMPILER_RT_STANDALONE_BUILD appears to be necessary, but setting it
for the other runtimes actually breaks everything.
Differential Revision: https://reviews.llvm.org/D90005
The UtilityFunction ctor was dropping the text argument. Probably for
that reason ClangUtilityFunction was setting the parent's member
directly instead of deferring to the parent ctor. Also change the
signatures to take strings which are std::moved in place.
Now there are two main classes in Value hierarchy, which support metadata,
these are Instruction and GlobalObject. They implement different APIs for
metadata manipulation, which however overlap. This change moves metadata
manipulation code into Value, so descendant classes can use this code for
their operations on metadata.
No functional changes intended.
Differential Revision: https://reviews.llvm.org/D67626
The devirtualization wrapper misses cases where if it wraps a pass
manager, an individual pass may devirtualize an indirect call created by
a previous pass. For example, inlining may create a new indirect call
which is devirtualized by instcombine. Currently the devirtualization
wrapper will not see that because it only checks cgscc edges at the very
beginning and end of the pass (manager) it wraps.
This fixes some tests testing this exact behavior in the legacy PM.
This piggybacks off of updateCGAndAnalysisManagerForPass()'s detection
of promoted ref to call edges.
This supercedes one of the previous mechanisms to detect
devirtualization by keeping track of potentially promoted call
instructions via WeakTrackingVHs.
There is one more existing way of detecting devirtualization, by
checking if the number of indirect calls has decreased and the number of
direct calls has increased in a function. It handles cases where calls
to functions without definitions are promoted, and some tests rely on
that. LazyCallGraph doesn't track edges to functions without
definitions so this part can't be removed in this change.
check-llvm and check-clang with -abort-on-max-devirt-iterations-reached
on by default doesn't show any failures outside of tests specifically
testing it so it doesn't needlessly rerun passes more than necessary.
(The NPM -O2/3 pipeline run the inliner/function simplification pipeline
under a devirtualization repeater pass up to 4 times by default).
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D89587
`SourceManager::getFileEntryRefForID`'s remaining callers just want the
filename component, which is coming from the `FileInfo`. Replace the API
with `getNonBuiltinFilenameForID`, which also removes another use of
`FileEntryRef::FileEntryRef` outside of `FileManager`.
Both callers are collecting file dependencies, and one of them relied on
this API to filter out built-ins (as exposed by
clang/test/ClangScanDeps/modules-full.cpp). It seems nice to continue
providing that service.
Differential Revision: https://reviews.llvm.org/D89508
LD64 emits string tables which start with a space and a zero byte.
This diff adjusts StringTableBuilder for linked Mach-O binaries to match LD64's behavior.
Test plan: make check-all
Differential revision: https://reviews.llvm.org/D89561
An alwaysinline function may not get inlined in inliner-wrapper due to
the inlining order.
Previously for the following, the inliner would first inline @a() into @b(),
```
define void @a() {
entry:
call void @b()
ret void
}
define void @b() alwaysinline {
entry:
br label %for.cond
for.cond:
call void @a()
br label %for.cond
}
```
making @b() recursive and unable to be inlined into @a(), ending at
```
define void @a() {
entry:
call void @b()
ret void
}
define void @b() alwaysinline {
entry:
br label %for.cond
for.cond:
call void @b()
br label %for.cond
}
```
Running always-inliner first makes sure that we respect alwaysinline in more cases.
Fixes https://bugs.llvm.org/show_bug.cgi?id=46945.
Reviewed By: davidxl, rnk
Differential Revision: https://reviews.llvm.org/D86988
`SourceManager::isMainFile` does not use the filename, so it doesn't
need the full `FileEntryRef`; in fact, it's misleading to take the name
because that makes it look relevant. Simplify the API, and in the
process remove some calls to `FileEntryRef::FileEntryRef` in the unit
tests (which were blocking making that private to `SourceManager`).
Differential Revision: https://reviews.llvm.org/D89507
Add helpers `getSLocEntryOrNull`, which handles the `Invalid` logic
around `getSLocEntry`, and `getSLocEntryForFile`, which also checks for
`SLocEntry::isFile`, and use them to reduce repeated code.
Differential Revision: https://reviews.llvm.org/D89503
`size_t` has different width on 32- and 64-bit architecture, but the
computation to floor to power of two assumed it is 64-bit, which can cause an
integer overflow. In this patch, architecture detection is added so that the
operation for 64-bit `size_t`. Thank Luke for reporting the issue.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D89878
In 5d796645, we stopped looking at the LIBCXXABI_LIBCXX_INCLUDES variable,
which broke users of the Standalone build. This patch reinstates that
variable, however it must point to the *installed* path of the libc++
headers, not the libc++ headers in the source tree (which has always
been the case, but wasn't enforced before).
If LIBCXXABI_LIBCXX_INCLUDES points to the libc++ headers in the source
tree, the `__config_site` header will fail to be found.
to their parent classes.
SampleProfileReaderExtBinary/SampleProfileWriterExtBinary specify the typical
section layout currently used by SampleFDO. Currently a lot of section
reader/writer stay in the two classes. However, as we expect to have more
types of SampleFDO profiles, we hope those new types of profiles can share
the common sections while configuring their own sections easily with minimal
change. That is why I move some common stuff from
SampleProfileReaderExtBinary/SampleProfileWriterExtBinary to
SampleProfileReaderExtBinaryBase/SampleProfileWriterExtBinaryBase so new
profiles class inheriting from the base class can reuse them.
Differential Revision: https://reviews.llvm.org/D89524
Move the code which adjusts the immediate/predicate on a G_ICMP to
AArch64PostLegalizerLowering.
This
- Reduces the number of places we need to test for optimized compares in the
selector. We know that the compare should have been simplified by the time it
hits the selector, so we can avoid testing this in selects, brconds, etc.
- Allows us to potentially fold more compares (previously, this optimization
was only done after calling `tryFoldCompare`, this may allow us to hit some more
TST cases)
- Simplifies the selection code in `emitIntegerCompare` significantly; we can
just use an emitSUBS function.
- Allows us to avoid checking that the predicate has been updated after
`emitIntegerCompare`.
Also add a utility header file for things that may be useful in the selector
and various combiners. No need for an implementation file at this point, since
it's just one constexpr function for now. I've run into a couple cases where
having one of these would be handy, so might as well add it here. There are
a couple functions in the selector that can probably be factored out into
here.
Differential Revision: https://reviews.llvm.org/D89823
For a diagnostic `A refers to B` where B refers to a bitcode file, if the
symbol gets optimized out, the user may see `A refers to <internal>`; if the
symbol is retained, the user may see `A refers to lto.tmp`.
Save the reference InputFile * in the DenseMap so that the original filename is
available in reportBackrefs().
There are a lot of combines in AArch64PostLegalizerCombiner which exist to
facilitate instruction matching in the selector. (E.g. matching for G_ZIP and
other shuffle vector pseudos)
It still makes sense to select these instructions at -O0.
Matching earlier in a combiner can reduce complexity in the selector
significantly. For example, a good portion of our selection code for compares
would be a lot easier to represent in a combine.
This patch moves matching combines into a "AArch64PostLegalizerLowering"
combiner which runs at all optimization levels.
Also, while we're here, improve the documentation for the
AArch64PostLegalizerCombiner, and fix up the filepath in its file comment.
And also add a 'r' which somehow got dropped from a bunch of function names.
https://reviews.llvm.org/D89820
This functionality is commonly needed in clang tidy checks (based on
transformer) that only print warnings, without suggesting any edits. The no-op
edit allows the user to associate a diagnostic message with a source location.
Differential Revision: https://reviews.llvm.org/D89961
Some early errors during the ASTUnit creation were not transferred to the `FailedParseDiagnostic` so when the code in `LoadFromCommandLine` swaps its content with the content of `StoredDiagnostics` they cannot be retrieved by the user in any way.
Reviewed By: andrewrk, dblaikie
Differential Revision: https://reviews.llvm.org/D78658
Virtual sections do not contribute to the final output size.
This diff fixes the corresponding calculations in the method MachOWriter::totalSize.
Test plan: make check-all
Differential revision: https://reviews.llvm.org/D89661
This is a redo of D89908, which triggered some `-Werror=conversion`
errors with GCC due to assignments to the 31-bit variable.
This CL adds to the original one a 31-bit mask variable that is used
at every assignment to silence the warning.
Differential Revision: https://reviews.llvm.org/D89984
Per asbirlea's comment, assert that only instructions, constants
and arguments are passed to this API. Simplify returning true
would not be correct for special Value subclasses like MemoryAccess.
Visited phi blocks only need to be added for the duration of the
recursive alias queries, they should not leak into following code.
Once again, while this also improves analysis precision, this is
mainly intended to clarify the applicability scope of VisitedPhiBBs.