A seemingly innocuous Linux kernel change [0] seemingly blew up our
compile times by over 3x, as reported by @nathanchance in [1].
The code in question uses a doubly nested macro containing GNU C
statement expressions that are then passed to typeof(), which is then
used in a very important macro for atomic variable access throughout
most of the kernel. The inner most macro, is passed a GNU C statement
expression. In this case, we have macro arguments that are GNU C
statement expressions, which can contain a significant number of tokens.
The upstream kernel patch caused significant build time regressions for
both Clang and GCC. Since then, some of the nesting has been removed via
@melver, which helps gain back most of the lost compilation time. [2]
Profiles collected [3] from compilations of the slowest TU for us in the
kernel show:
* 51.4% time spent in clang::TokenLexer::updateLocForMacroArgTokens
* 48.7% time spent in clang::SourceManager::getFileIDLocal
* 35.5% time spent in clang::SourceManager::isOffsetInFileID
(mostly calls from the former through to the latter).
So it seems we have a pathological case for which properly tracking the
SourceLocation of macro arguments is significantly harming build
performance. This stands out in referenced flame graph.
In fact, this case was identified previously as being problematic in
commit 3339c568c4 ("[Lex] Speed up updateConsecutiveMacroArgTokens (NFC)")
Looking at the above call chain, there's 3 things we can do to speed up
this case.
1. TokenLexer::updateConsecutiveMacroArgTokens() calls
SourceManager::isWrittenInSameFile() which calls
SourceManager::getFileID(), which is both very hot and very expensive
to call. SourceManger has a one entry cache, member LastFileIDLookup.
If that isn't the FileID for a give source location offset, we fall
back to a linear probe, and then to a binary search for the FileID.
These fallbacks update the one entry cache, but noticeably they do
not for the case of macro expansions!
For the slowest TU to compile in the Linux kernel, it seems that we
miss about 78.67% of the 68 million queries we make to getFileIDLocal
that we could have had cache hits for, had we saved the macro
expansion source location's FileID in the one entry cache. [4]
I tried adding a separate cache item for macro expansions, and to
check that before the linear then binary search fallbacks, but did
not find it faster than simply allowing macro expansions into the one
item cache. This alone nets us back a lot of the performance loss.
That said, this is a modification of caching logic, which is playing
with a double edged sword. While it significantly improves the
pathological case, its hard to say that there's not an equal but
opposite pathological case that isn't regressed by this change.
Though non-pathological cases of builds of the Linux kernel before
[0] are only slightly improved (<1%) and builds of LLVM itself don't
change due to this patch.
Should future travelers find this change to significantly harm their
build times, I encourage them to feel empowered to revert this
change.
2. SourceManager::getFileIDLocal has a FIXME hinting that the call to
SourceManager::isOffsetInFileID could be made much faster since
isOffsetInFileID is generic in the sense that it tries to handle the
more generic case of "local" (as opposed to "loaded") files, though
the caller has already determined the file to be local. This patch
implements a new method that specialized for use when the caller
already knows the file is local, then use that in
TokenLexer::updateLocForMacroArgTokens. This should be less
controversial than 1, and is likely an across the board win. It's
much less significant for the pathological case, but still a
measurable win once we have fallen to the final case of binary
search. D82497
3. A bunch of methods in SourceManager take a default argument.
SourceManager::getLocalSLocEntry doesn't do anything with this
argument, yet many callers of getLocalSLocEntry setup, pass, then
check this argument. This is wasted work. D82498
With this patch applied, the above profile [5] for the same pathological
input looks like:
* 25.1% time spent in clang::TokenLexer::updateLocForMacroArgTokens
* 17.2% time spent in clang::SourceManager::getFileIDLocal
and clang::SourceManager::isOffsetInFileID is no longer called, and thus
falls out of the profile.
There may be further improvements to the general problem of "what
interval contains one number out of millions" than the current use of a
one item cache, followed by linear probing, followed by binary
searching. We might even be able to do something smarter in
TokenLexer::updateLocForMacroArgTokens.
[0] cdd28ad2d8
[1] https://github.com/ClangBuiltLinux/linux/issues/1032
[2] https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?h=locking/kcsan&id=a5dead405f6be1fb80555bdcb77c406bf133fdc8
[3] https://github.com/ClangBuiltLinux/linux/issues/1032#issuecomment-633712667
[4] https://github.com/ClangBuiltLinux/linux/issues/1032#issuecomment-633741923
[5] https://github.com/ClangBuiltLinux/linux/issues/1032#issuecomment-634932736
Reviewed By: kadircet
Differential Revision: https://reviews.llvm.org/D80681
This is to correct bugs.llvm.org/show_bug.cgi?id=46444
ffinite-math-only is a gcc option. That is the correct spelling.
File modified is clang/docs/UsersManual.rst
I'm not sure if this is a regression from D81448 + D81643,
which moved at least the code cast from elsewhere,
or somehow no one triggered that before.
But now we can reach it with a non-instruction..
It is not straight-forward to write cost-model tests for constantexprs,
`-cost-model -analyze -cost-kind=` does not appear to look at them,
or maybe i'm doing it wrong.
I've encountered that via a SimplifyCFG crash,
so reduced (currently-crashing) test is added.
There are likely other instances.
For now, simply restore previous status quo of
not crashing and returning TTI::TCC_Basic.
We've decided to move away from that by requiring that libc++ is built
as part of the monorepo a while ago. This commit removes code pertaining
to that unsupported use case and produces a clear error when the user
violates that.
In fact, building outside of the monorepo will still work as long as
LLVM_PATH is pointing to the root of the LLVM project, although that
is not officially supported.
Summary:
by removing casts from unsigned to int that which may be implementation
defined according to C++14 (and thus trip up the XL compiler on AIX) by
just using unsigned comparisons/masks and refactor out the range
constants to cleanup things a bit while we are at it.
Reviewers: hubert.reinterpretcast, arsenm
Reviewed By: hubert.reinterpretcast
Subscribers: wdng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D82197
Port the remaining tests which only require mechanical changes and delete
test_any.sh.
* Delete old RUN lines
* Replace:
EXEC: ${F18} ... | ${FileCheck} ...
with
RUN: %f18 .. | FileCheck ...
* Prepend RUN line with not when it is expected to fail
Also reinstate a de-activated EXEC line and port it in the same way.
Differential Revision: https://reviews.llvm.org/D82168
Rationale:
In general, passing "fastmath" from MLIR to LLVM backend is not supported, and even just providing such a feature for experimentation is under debate. However, passing fine-grained fastmath related attributes on individual operations is generally accepted. This CL introduces an option to instruct the vector-to-llvm lowering phase to annotate floating-point reductions with the "reassociate" fastmath attribute, which allows the LLVM backend to use SIMD implementations for such constructs. Oher lowering passes can start using this mechanism right away in cases where reassociation is allowed.
Benefit:
For some microbenchmarks on x86-avx2, speedups over 20 were observed for longer vector (due to cleaner, spill-free and SIMD exploiting code).
Usage:
mlir-opt --convert-vector-to-llvm="reassociate-fp-reductions"
Reviewed By: ftynse, mehdi_amini
Differential Revision: https://reviews.llvm.org/D82624
This patch adds LLVM intrinsics for the dcbt (Data Cache Block Touch),
dcbtst (Data Cache Block Touch for Store) and isync (Instruction
Synchronize) instructions.
The intrinsic for dcbt and dcbst in this patch are named llvm.ppc.dcbt.with.hint
and llvm.ppc.dcbtst.with.hint respectively as there already exists an intrinsic
for llvm.ppc.dcbt and llvm.ppc.dcbtst. However, the original variants of the
intrinsics do not accept the TH immediate field, whereas these variants do.
Differential Revision: https://reviews.llvm.org/D79633
In order for these files to build properly, this patch rolls up a number of changes that have been made to various files that have been upstreamed.
Implementations for the interfaces included in Bridge.h and IntrinsicCall.h will be included in a future diff.
Differential revision: https://reviews.llvm.org/D82608
These tests checked for stdout and stderr in the same pipe, which does not
come out in a guaranteed order. test_any.sh's FileCheck accepts CHECK lines in
any order while FileCheck checks must match in order.
Hand port these to pipe stdout to a temp file which is checked with a separate
FileCheck RUN line to test it.
Differential Revision: https://reviews.llvm.org/D82167
test_any.sh's FileCheck accepts the CHECK line matches in any order while
FileCheck checks in strict order. Re-order the CHECK lines to source code
order - they come from an ordered datastructure.
Some CHECK lines are sensitive to line number which are fixed up manually.
getsymbols02 had multiple test inputs which had their own EXEC lines.
Consolidate these together in one file.
Differential Revision: https://reviews.llvm.org/D82166
These tests are sensitive to line numbers in the input and check output.
They also successively write to a temporary file then check that.
Fix the line number issues and replace the temporary file use with successive
calls to FileCheck with different check-prefixes.
Differential Revision: https://reviews.llvm.org/D82165
Add an option to always instrument function entry BB (default off)
Add an option to do atomically updates on the first counter in each
instrumented function.
Differential Revision: https://reviews.llvm.org/D82123
The regex syntax used by test_any.sh's FileCheck is sometimes incompatible with
real FileCheck.
Hand port these tests to use FileCheck and it's regex format.
Also remove FIXME from label01 that no longer applies.
Also add second run-line that enables all tests.
Add some new FIXMEs for issues in original tests
Differential Revision: https://reviews.llvm.org/D82164
For context of anyone following along, we've not completed the migration of statepoint to the operand bundle form. The only remaining piece is to actually version the statepoint intrinsic to remove the old inline operand sets. That will follow when I have some time; delay is useful here to allow downstream migrations.
Forked from D80681.
getLocalSLocEntry() has an unused parameter used to satisfy an interface
of libclang (see getInclusions() in
clang/tools/libclang/CIndexInclusionStack.cpp). It's pointless for
callers to construct/pass/check this inout parameter that can never
signify that a FileID is invalid.
Reviewed By: kadircet
Differential Revision: https://reviews.llvm.org/D82498
The legacy pass is called "loop-unroll", but in the new PM it's called "unroll".
Also applied to unroll-and-jam and unroll-full.
Fixes various check-llvm tests when NPM is turned on.
Reviewed By: Whitney, dmgreen
Differential Revision: https://reviews.llvm.org/D82590
Summary:
In preparation for GlobalISel, PPCSubTarget needs to be renamed to Subtarget as there places in GlobalISel that assume the presence of the variable Subtarget.
This patch introduces the variable Subtarget, and replaces all existing uses of PPCSubTarget with Subtarget. A subsequent patch will remove the definiton of
PPCSubTarget, once any downstream users have the opportunity to rename any uses they have.
Reviewers: hfinkel, nemanjai, jhibbits, #powerpc, echristo, lkail
Reviewed By: #powerpc, echristo, lkail
Subscribers: echristo, lkail, wuzish, nemanjai, hiraditya, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D81623
This patch improves the error message provided by the stencil that handles
source from a range selector.
Reviewed By: gribozavr2
Differential Revision: https://reviews.llvm.org/D82654
To be able to have more meaningful performance out of workloadsi going through
the vulkan-runner we need to use buffers from GPU device memory as access to
system memory is significantly slower for GPU with dedicated memory. This adds
code to do a copy through staging buffer as GPU memory cannot always be mapped
on the host.
Differential Revision: https://reviews.llvm.org/D82504
This reverts commit b55d723ed6.
Reapply Modify FPFeatures to use delta not absolute settings
To solve https://bugs.llvm.org/show_bug.cgi?id=46166 where the
floating point settings in PCH files aren't compatible, rewrite
FPFeatures to use a delta in the settings rather than absolute settings.
With this patch, these floating point options can be benign.
Reviewers: rjmccall
Differential Revision: https://reviews.llvm.org/D81869
LBR contains (up to) 16 entries for last x branches and the X86LBRCounter (from D77422) should be able to return all those.
Currently, it just returns the latest entry, which could lead to mis-leading measurements.
This patch aslo changes the LatencyBenchmarkRunner to accommodate multi-value readings.
https://reviews.llvm.org/D81050
Summary:
MSVC does not handle raw string literals with embedded double quotes
correctly. I switched the affected test case to use regular string
literals insetad.
Subscribers: cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D82636
Implemented conversion for `spv.BitReverse` and `spv.BitCount`. Since ODS
generates builders in a different way for LLVM dialect intrinsics, I
added attributes to build method in `DirectConversionPattern` class. The
tests for these ops are in `bitwise-ops-to-llvm.mlir`.
Differential Revision: https://reviews.llvm.org/D82286
Renames the overloaded `RangeSelector` combinator `range` to the more
descriptive `enclose` and `encloseNodes`. The old overloads are left in place
and marked deprected and will be deleted at a future time.
Reviewed By: tdl-g
Differential Revision: https://reviews.llvm.org/D82592
I'm not sure we actually need to support this now, since I think
clover always explicitly uses amdgcn-mesa-mesa3d now, not the
ill-defined amdgcn-- behavior.
Add a pass to rewrite sequential chains of `spirv::CompositeInsert`
operations into `spirv::CompositeConstruct` operations.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D82198
Remove the asserts in performLDNT1Combine & performST[NT]1Combine
to ensure we get a failure where the type is a bfloat16 and
hasBF16() is false, regardless of whether asserts are enabled.
As loop extractor has a dependency on another pass (namely BreakCriticalEdges)
that may update the IR, use the getAnalysis version introduced in
55fe7b79bb to carry that change.
Add an assert in getAnalysisID to make sure no other changed status is missed -
according to validation this was the only one.
Related to https://reviews.llvm.org/D80916
Differential Revision: https://reviews.llvm.org/D81236
This patch add support for 'spv.CopyMemory'. The following changes are
introduced:
- 'CopyMemory' op is added to SPIRVOps.td.
- Custom parse and print methods are introduced.
- A few Roundtripping tests are added.
Differential Revision: https://reviews.llvm.org/D82384
Summary: The patch fixes an off by one error in the method collapseParallelLoops. It ensures the same normalized bound is used for the computation of the division and the remainder.
Reviewers: herhut
Reviewed By: herhut
Subscribers: mehdi_amini, rriddle, jpienaar, shauheen, antiagainst, nicolasvasilache, arpith-jacob, mgester, lucyrfox, aartbik, liufengdb, stephenneuendorffer, Joonsoo, grosul1, Kayjukh, jurahul, msifontes
Tags: #mlir
Differential Revision: https://reviews.llvm.org/D82634
Conversions of allocation-related operations in Standard-to-LLVM need
declarations of "malloc" and "free" (or equivalents). They use locally created
OpBuilders pointed at the module level to declare these functions if necessary.
This is poorly compatible with the pattern infrastructure that is unaware of
new operations being created. Update the insertion point of the main rewriter
instead.
Differential Revision: https://reviews.llvm.org/D82649