Create a new document that explain both stages of the process in a single
place, merge and deduplicate the content from the two previous documents. Also
extend the documentation to account for the recent changes in pass structure
due to standard dialect splitting and translation being more flexible.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D109605
Added '-print-pipeline-passes' printing of parameters for those passes
declared with *_WITH_PARAMS macro in PassRegistry.def.
Note that it only prints the parameters declared inside *_WITH_PARAMS as
in a few cases there appear to be additional parameters not parsable.
The following passes are now covered (i.e. all of those with *_WITH_PARAMS in
PassRegistry.def).
LoopExtractorPass - loop-extract
HWAddressSanitizerPass - hwsan
EarlyCSEPass - early-cse
EntryExitInstrumenterPass - ee-instrument
LowerMatrixIntrinsicsPass - lower-matrix-intrinsics
LoopUnrollPass - loop-unroll
AddressSanitizerPass - asan
MemorySanitizerPass - msan
SimplifyCFGPass - simplifycfg
LoopVectorizePass - loop-vectorize
MergedLoadStoreMotionPass - mldst-motion
GVN - gvn
StackLifetimePrinterPass - print<stack-lifetime>
SimpleLoopUnswitchPass - simple-loop-unswitch
Differential Revision: https://reviews.llvm.org/D109310
This check should ensure we don't reproduce the problem fixed by
02df443d28
More accurately, it checks every llvm::Any::TypeId symbol in libLLVM-x.so and
make sure they have weak linkage and are not local to the library, which would
lead to duplicate definition if another weak version of the symbol is defined in
another linked library.
Differential Revision: https://reviews.llvm.org/D109252
Invalid frame addresses exist in call stack samples due to bad unwinding. This could happen to frame-pointer-based unwinding and the callee functions that do not have the frame pointer chain set up. It isn't common when the program is built with the frame pointer omission disabled, but can still happen with third-party static libs built with frame pointer omitted.
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D109638
This reverts commit 81f8ad1769.
This seems to break the shared libs build
(linaro-flang-aarch64-sharedlibs bot) with:
undefined reference to `Fortran::semantics::IsCoarray(Fortran::semantics::Symbol const&)
(from tools/flang/lib/Evaluate/CMakeFiles/obj.FortranEvaluate.dir/tools.cpp.o)
When linking lib/libFortranEvaluate.so.14git
This seems in-line with the intent and how we build tools around it.
Update the description for the flag accordingly.
Also use an injected thread pool in MLIROptMain, now we will create
threads up-front and reuse them across split buffers.
Differential Revision: https://reviews.llvm.org/D109802
Both copy/alloc ops are using memref dialect after this change.
Reviewed By: silvas, mehdi_amini
Differential Revision: https://reviews.llvm.org/D109480
The fmul is a canonicalizing operation, and fneg is not so this would
break denormals that need flushing and also would not quiet signaling
nans. Fold to fsub instead, which is also canonicalizing.
Pseudo probe instrumentation was missing from O0 build. It is needed in cases where some source files are built in O0 while the others are built in optimize mode.
Reviewed By: wenlei, wlei, wmi
Differential Revision: https://reviews.llvm.org/D109531
Rather than depending on the hex dump from obj2yaml. Now the test shows the
expected function body in a human readable format.
Differential Revision: https://reviews.llvm.org/D109730
This simple heuristic uses the estimated live range length combined
with the number of registers in the class to switch which heuristic to
use. This was taking the raw number of registers in the class, even
though not all of them may be available. AMDGPU heavily relies on
dynamically reserved numbers of registers based on user attributes to
satisfy occupancy constraints, so the raw number is highly misleading.
There are still a few problems here. In the original testcase that
made me notice this, the live range size is incorrect after the
scheduler rearranges instructions, since the instructions don't have
the original InstrDist offsets. Additionally, I think it would be more
appropriate to use the number of disjointly allocatable registers in
the class. For the AMDGPU register tuples, there are a large number of
registers in each tuple class, but only a small fraction can actually
be allocated at the same time since they all overlap with each
other. It seems we do not have a query that corresponds to the number
of independently allocatable registers. Relatedly, I'm still debugging
some allocation failures where overlapping tuples seem to not be
handled correctly.
The test changes are mostly noise. There are a handful of x86 tests
that look like regressions with an additional spill, and a handful
that now avoid a spill. The worst looking regression is likely
test/Thumb2/mve-vld4.ll which introduces a few additional
spills. test/CodeGen/AMDGPU/soft-clause-exceeds-register-budget.ll
shows a massive improvement by completely eliminating a large number
of spills inside a loop.
Added 'this_image()' to the list of functions that are evaluated as intrinsic.
Added IsCoarray functions to determine if an expression is a coarray (corank > 1).
Added save attribute to coarray variables in test file, this_image.f90.
reviewers: klausler, PeteSteinfeld
Differential Revision: https://reviews.llvm.org/D108059
Laying more foundation for full template name rebuilding - more complex
type printing benefits from an object to carry some state rather than
passing it around as parameters to every function.
This fixes a violation of the wrap flag rules introduced in c4048d8f. As noted in the original review, the NUW is legal to infer from the structure of the replacee, but a) there's no test coverage, and b) this should be done generically for all multiplies.
Differential Revision: https://reviews.llvm.org/D109782
This is an attempt to define what the current semantics are closest too. Unfortunately, the current implementation does appear to be inconsistent with all semantic variants we've considered. This semantics is the one which seems to be closest to the spirit of the code, and that matched several long time contributors mental models of how the code "should work".
https://bugs.llvm.org/show_bug.cgi?id=51817 tracks the list of currently known violations of these rules. A series of follow up patches will be addressing each now that we've defined them to be bugs.
Differential Revision: https://reviews.llvm.org/D109553
Visibility options currently have limited support on AIX and may cause warnings or errors
depending on the build compiler used.
Reviewed By: ZarkoCA
Differential Revision: https://reviews.llvm.org/D108467
This reverts commit b7b4ebbcfa.
Reason: This breaks several code-size tests in Emscripten test suite
because this exports `emscripten_longjmp` for programs that didn't do it
before.
Use GCNHazardRecognizer in postra sched.
Updated tests for the new schedules.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D109536
Change-Id: Ia86ba2ae168f12fb34b4d8efdab491f84d936cde
This matters for example for the iPhoneSimulator14.0.sdk, which has
a System/Library/Frameworks/UIKit.framework/UIKit that has
LC_BUILD_VERSION with minos of 14.0, so linking against that file
will produce warnings like:
.../iPhoneSimulator14.0.sdk/System/Library/Frameworks/UIKit.framework/UIKit
has version 14.0.0, which is newer than target minimum of 12.0.0
when targeting x86_64-apple-ios12.0-simulator. That doens't happen when
linking against UIKit.tbd instead, obviously.
Linking with RC_TRACE_DYLIB_SEARCHING=1 shows that ld64 also searches
the tbd file first, and we already get that right for non-framework
dylibs.
Fixes crbug.com/1249456.
Differential Revision: https://reviews.llvm.org/D109768
Perf script can sometimes give disordered LBR samples like below.
```
b022500
32de0044
3386e1d1
7f118e05720c
7f118df2d81f
0x2a0b9622/0x2a0b9610/P/-/-/1 0x2a0b79ff/0x2a0b9618/P/-/-/2 0x2a0b7a4a/0x2a0b79e8/P/-/-/1 0x2a0b7a33/0x2a0b7a46/P/-/-/1 0x2a0b7a42/0x2a0b7a23/P/-/-/1 0x2a0b7a21/0x2a0b7a37/P/-/-/2 0x2a0b79e6/0x2a0b7a07/P/-/-/1 0x2a0b79d4/0x2a0b79dc/P/-/-/2 0x2a0b7a03/0x2a0b79aa/P/-/-/1 0x2a0b79a8/0x2a0b7a00/P/-/-/234 0x2a0b9613/0x2a0b7930/P/-/-/1 0x2a0b9622/0x2a0b9610/P/-/-/1 0x2a0b79ff/0x2a0b9618/P/-/-/2 0x2a0b7a4a/0x2aWarning:
Processed 10263226 events and lost 1 chunks!
```
Note that the last LBR record `0x2a0b7a4a/0x2aWarning:` . Currently llvm-profgen does not detect that and as a result an uninitialized branch target value will be used. The uninitialized value can cause creepy instruction ranges created which which in turn will result in a completely wrong profile. An example is like
```
.... @ _ZN5folly13loadUnalignedIsEET_PKv]:18446744073709551615:18446744073709551615
1: 18446744073709551615
!CFGChecksum: 4294967295
!Attributes: 0
```
Reviewed By: wenlei, wlei
Differential Revision: https://reviews.llvm.org/D109637
Instead of discovering the sink-to block for each operand in the main
loop, the sink-to block can instead be directly queued with the
operands.
This simplifies processing in the main loop and is a NFC change split
off from D104254 as suggested there.
This change puts the functionality in commit
c5792aa90f behind a flag that is off by
default. The original commit is not in Apple's Clang fork (and blocks
are an Apple extension in the first place), and there is one known
issue that needs to be addressed before it can be enabled safely.
Differential Revision: https://reviews.llvm.org/D108243
This reverts commit 76dc8ac36d.
Restore the change. The test had an incorrect negative from testing.
The test is expected to trigger a failure as mentioned in the review
comments. This corrects the test and should resolve the failure.
Ignore dbg instructions when collecting stack slot markers. This is
to make sure the coloring is invariant regarding presence of dbg
instructions (even in cases when the dbg instructions might be
badly placed in the input).
Differential Revision: https://reviews.llvm.org/D109758
Having DBG_VALUE instructions referencing a stack slot while being outside
of the LIFETIME_START/LIFETIME_END markers for that stack slot is perhaps
not always ideal (from a debugging perspective), but it might happen during
codegen that we end up with such situations (e.g. positioning of the
DBG_VALUE instruction for a SDDbgOperand::FRAMEIX at ISel is a bit sloppy
in that context).
This patch adds a test case showing that StackColoring currently isn't
debug invariant, and that the position of DBG_VALUE instructions referencing
the stack slots might impact the decision making regarding stack slot reuse.
Differential Revision: https://reviews.llvm.org/D109757
Follow-up to D109708: Using lib_dirs means this will work with ancient gn binaries.
Change the toolchain definitions to make lib_dirs have the right effect, and
pull out lib_switch of each of the tools while here.
This means we now do pass /LIBPATH: to link.exe, but since we invoke it directly
and not through clang-cl, this doesn't actually require D109624. And since this
is built in to GN, we don't need a config to push the flag to dependents.
This is arguably a bit more idiomatic, and it doesn't require folks to update
their GN binaries. No effective behavior change.
Differential Revision: https://reviews.llvm.org/D109763