Refactor the scheduling predicates based on `MCInstPredicate`. In this
case, `AArch64InstrInfo::hasExtendedReg()`.
Differential revision: https://reviews.llvm.org/D54822
llvm-svn: 347599
Refactor the scheduling predicates based on `MCInstPredicate`. In this
case, `AArch64InstrInfo::hasShiftedReg()`.
Differential revision: https://reviews.llvm.org/D54820
llvm-svn: 347598
Refactor the scheduling predicates based on `MCInstPredicate`. In this
case, `AArch64InstrInfo::isScaledAddr()`
Differential revision: https://reviews.llvm.org/D54777
llvm-svn: 347597
Summary:
Support for profile-driven cache prefetching (X86)
This change is part of a larger system, consisting of a cache prefetches recommender, create_llvm_prof (https://github.com/google/autofdo), and LLVM.
A proof of concept recommender is DynamoRIO's cache miss analyzer. It processes memory access traces obtained from a running binary and identifies patterns in cache misses. Based on them, it produces a csv file with recommendations. The expectation is that, by leveraging such recommendations, we can reduce the amount of clock cycles spent waiting for data from memory. A microbenchmark based on the DynamoRIO analyzer is available as a proof of concept: https://goo.gl/6TM2Xp.
The recommender makes prefetch recommendations in terms of:
* the binary offset of an instruction with a memory operand;
* a delta;
* and a type (nta, t0, t1, t2)
meaning: a prefetch of that type should be inserted right before the instrution at that binary offset, and the prefetch should be for an address delta away from the memory address the instruction will access.
For example:
0x400ab2,64,nta
and assuming the instruction at 0x400ab2 is:
movzbl (%rbx,%rdx,1),%edx
means that the recommender determined it would be beneficial for a prefetchnta instruction to be inserted right before this instruction, as such:
prefetchnta 0x40(%rbx,%rdx,1)
movzbl (%rbx, %rdx, 1), %edx
The workflow for prefetch cache instrumentation is as follows (the proof of concept script details these steps as well):
1. build binary, making sure -gmlt -fdebug-info-for-profiling is passed. The latter option will enable the X86DiscriminateMemOps pass, which ensures instructions with memory operands are uniquely identifiable (this causes ~2% size increase in total binary size due to the additional debug information).
2. collect memory traces, run analysis to obtain recommendations (see above-referenced DynamoRIO demo as a proof of concept).
3. use create_llvm_prof to convert recommendations to reference insertion locations in terms of debug info locations.
4. rebuild binary, using the exact same set of arguments used initially, to which -mllvm -prefetch-hints-file=<file> needs to be added, using the afdo file obtained at step 3.
Note that if sample profiling feedback-driven optimization is also desired, that happens before step 1 above. In this case, the sample profile afdo file that was used to produce the binary at step 1 must also be included in step 4.
The data needed by the compiler in order to identify prefetch insertion points is very similar to what is needed for sample profiles. For this reason, and given that the overall approach (memory tracing-based cache recommendation mechanisms) is under active development, we use the afdo format as a syntax for capturing this information. We avoid confusing semantics with sample profile afdo data by feeding the two types of information to the compiler through separate files and compiler flags. Should the approach prove successful, we can investigate improvements to this encoding mechanism.
Reviewers: davidxl, wmi, craig.topper
Reviewed By: davidxl, wmi, craig.topper
Subscribers: davide, danielcdh, mgorny, aprantl, eraman, JDevlieghere, llvm-commits
Differential Revision: https://reviews.llvm.org/D54052
llvm-svn: 347596
It's possible in some cases to have a restore present
without a corresponding spill. Due to an apparent bug
in D54366 <https://reviews.llvm.org/D54366>, only the
restore for a register was emitted. It's probably
always a bug for this to happen, but due to how SGPR
spilling is implemented, this makes the issues appear
worse than it is.
llvm-svn: 347595
On my machine this reduced median link time of lld-speed-test/chrome
from 2.68s to 2.41s. It also reduces link time of Chrome for Android
with a prototype compiler change that causes the compiler to create
large numbers of identical (modulo relocations) sections from >15
minutes to a few seconds.
Differential Revision: https://reviews.llvm.org/D54773
llvm-svn: 347594
SplitVecOp_TruncateHelper tries to promote the result type while splitting FP_TO_SINT/UINT. It then concatenates the result and introduces a truncate to the original result type. But it does this without inserting the AssertZExt/AssertSExt that the regular result type promotion would insert. Nor does it turn FP_TO_UINT into FP_TO_SINT the way normal result type promotion for these operations does. This is bad on X86 which doesn't support FP_TO_SINT until AVX512.
This patch disables the use of SplitVecOp_TruncateHelper for these operations and just lets normal promotion handle it. I've tweaked a couple things in X86ISelLowering to avoid a few obvious regressions there. I believe all the changes on X86 are improvements. The other targets look neutral.
Differential Revision: https://reviews.llvm.org/D54906
llvm-svn: 347593
Summary:
The old legacy LTO API had a separate cache key computation, which was
a subset of the cache key computation in the new LTO API (from what I
can tell this is largely just because certain features such as CFI,
dsoLocal, etc are only utilized via the new LTO API). However, having
separate computations is unnecessary (much of the code is duplicated),
and can lead to bugs when adding new optimizations if both cache
computation algorithms aren't updated properly - it's much easier to
maintain if we have a single facility.
This patch refactors the old LTO API code to use the cache key
computation from the new LTO API. To do this, we set up an lto::Config
object and fill in the fields that the old LTO was hashing (the others
will just use the defaults).
There are two notable changes:
- I added a Freestanding flag to the LTO Config. Currently this is only
used by the legacy LTO API. In the patch that added it (D30791) I had
asked about adding it to the new LTO API, but it looks like that was not
addressed. This should probably be discussed as a follow up to this
change, as it is orthogonal.
- The legacy LTO API had some code that was hashing the GUID of all
preserved symbols defined in the module. I looked back at the history of
this (which was added with the original hashing in the legacy LTO API in
D18494), and there is a comment in the review thread that it was added
in preparation for future internalization. We now do the internalization
of course, and that is handled in the new LTO API cache key computation
by hashing the recorded linkage type of all defined globals. Therefore I
didn't try to move over and keep the preserved symbols handling.
Reviewers: steven_wu, pcc
Subscribers: mehdi_amini, inglorion, eraman, dexonsmith, dang, llvm-commits
Differential Revision: https://reviews.llvm.org/D54635
llvm-svn: 347592
We might find a target specific node that needs to be unwrapped after we look through an add/or. Otherwise we get inconsistent results if one pointer is just X86WrapperRIP and the other is (add X86WrapperRIP, C)
Differential Revision: https://reviews.llvm.org/D54818
llvm-svn: 347591
This patch adds an implementation of __resize_default_init as
described in P1072R2. Additionally, it uses it in filesystem to
demonstrate its intended utility.
Once P1072 lands, or if it changes it's interface, I will adjust
the internal libc++ implementation to match.
llvm-svn: 347589
Summary: They have an additional `ThreadsEnabled` check, which does not matter much.
Reviewers: pcc, ruiu, rnk
Reviewed By: ruiu
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D54812
llvm-svn: 347587
Summary:
LLVM IR already has an attribute for speculative_load_hardening. Before
this commit, when a user passed the -mspeculative-load-hardening flag to
Clang, every function would have this attribute added to it. This Clang
attribute will allow users to opt into SLH on a function by function basis.
This can be applied to functions and Objective C methods.
Reviewers: chandlerc, echristo
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D54555
llvm-svn: 347586
In r339743, I marked several aligned allocation tests as downright
unsupported on macosx in an attempt to unbreak the build. It turns
out that marking them as unuspported whenever we're on OS X is way
too coarse grained. This commit marks the tests as XFAIL with more
granularity.
llvm-svn: 347585
Summary:
Add a hook to the GCMetadataPrinter for emitting stack maps in
custom format. The hook will be called at stack map generation
time. The default stack map format is used if there is no hook.
For this to be useful a few data structures and accessors are
exposed from the StackMaps class, so the custom printer can
access the stack map data.
This patch authored by Cherry Zhang <cherryyz@google.com>.
Reviewers: thanm, apilipenko, reames
Reviewed By: reames
Subscribers: reames, apilipenko, nemanjai, javed.absar, kbarton, jsji, llvm-commits
Differential Revision: https://reviews.llvm.org/D53892
llvm-svn: 347584
modes.
If the region is inside target|teams|distribute region, we can emit the
locations with the correct info for execution mode and runtime mode.
Patch adds this ability to the NVPTX codegen to help the optimizer to
produce better code.
llvm-svn: 347583
Summary:
-mno-speculative-load-hardening isn't a cc1 option, therefore,
before this change:
clang -mno-speculative-load-hardening hello.cpp
would have the following error:
error: unknown argument: '-mno-speculative-load-hardening'
This change will only ever forward -mspeculative-load-hardening
which is a CC1 option based on which flag was passed to clang.
Also added a test that uses this option that fails if an error like the
above is ever thrown.
Thank you ericwf for help debugging and fixing this error.
Reviewers: chandlerc, EricWF
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D54763
llvm-svn: 347582
ParentTy is never used other than an assignment, and since it is a
pointer, there is no side effect. Some versions of GCC notice and warn
on this.
Change-Id: I37dc1a18c7b58040419afb803621de13d8904a8f
llvm-svn: 347581
The test was marked as failing whenever the deployment target was 10.12
or older, but in reality the test passes when the deployment target is
10.12 on recent Clangs. This happens because only older clangs do not
honor the -faligned-allocation flag, which disables any availability
error related to aligned allocation support, regardless of the
deployment target.
llvm-svn: 347580
Summary: SetMustBuildLookupTable() must always be called on a primary context.
Reviewers: labath, shafik, a.sidorin
Subscribers: rnkovacs, dkrupp, Szelethus, gamesh411
Differential Revision: https://reviews.llvm.org/D54863
llvm-svn: 347575
Summary:
STATEPOINT records its args' locations on stack relative to SP.
If the SP is changed, take that into account.
This patch authored by Cherry Zhang <cherryyz@google.com>.
Reviewers: thanm, reames
Reviewed By: reames
Subscribers: reames, llvm-commits
Differential Revision: https://reviews.llvm.org/D53603
llvm-svn: 347569
Summary:
In PR39232, we noticed that some variant tests started failing in C++2a mode
with recent Clangs, because the rules for literal types changed in C++2a. As
a result, a temporary fix was checked in (enabling the test only in C++17).
This commit is what I believe should be the long term fix: I removed the
tests that checked constexpr default-constructibility with a weird type
from the tests for index() and valueless_by_exception(), and instead I
added tests for those using an obviously literal type in the test for the
default constructor.
Reviewers: EricWF, mclow.lists
Subscribers: christof, jkorous, dexonsmith, arphaman, libcxx-commits, rsmith
Differential Revision: https://reviews.llvm.org/D54767
llvm-svn: 347568
Summary:
Ownership and configuration:
The auto-index (background index) is maintained by ClangdServer, like Dynamic.
(This means ClangdServer will be able to enqueue preamble indexing in future).
For now it's enabled by a simple boolean flag in ClangdServer::Options, but
we probably want to eventually allow injecting the storage strategy.
New 'sync' command:
In order to meaningfully test the integration (not just unit-test components)
we need a way for tests to ensure the asynchronous index reads/writes occur
before a certain point.
Because these tests and assertions are few, I think exposing an explicit "sync"
command for use in tests is simpler than allowing threading to be completely
disabled in the background index (as we do for TUScheduler).
Bugs:
I fixed a couple of trivial bugs I found while testing, but there's one I can't.
JSONCompilationDatabase::getAllFiles() may return relative paths, and currently
we trigger an assertion that assumes they are absolute.
There's no efficient way to resolve them (you have to retrieve the corresponding
command and then resolve against its directory property). In general I think
this behavior is broken and we should fix it in JSONCompilationDatabase and
require CompilationDatabase::getAllFiles() to be absolute.
Reviewers: kadircet
Subscribers: ilya-biryukov, ioeric, MaskRay, jkorous, arphaman, cfe-commits
Differential Revision: https://reviews.llvm.org/D54894
llvm-svn: 347567
This source file has not been needed since r346522 and was triggering diagnostics in MSVC about an object file which exports no public symbols (LNK4221).
llvm-svn: 347565
Summary:
If one definition is currently being defined, we do not compare for
equality and we assume that the decls are equal.
Reviewers: a_sidorin, a.sidorin, shafik
Reviewed By: a_sidorin
Subscribers: gamesh411, shafik, rnkovacs, dkrupp, Szelethus, cfe-commits
Differential Revision: https://reviews.llvm.org/D53697
llvm-svn: 347564
Add support for funnel shifts to the DemandedBits analysis. The
demanded bits of the first two operands can be determined if the
shift amount is constant. The demanded bits of the third operand
(shift amount) can be determined if the bitwidth is a power of two.
This is basically the same functionality as implemented in D54869
and D54478, but for DemandedBits rather than InstCombine.
Differential Revision: https://reviews.llvm.org/D54876
llvm-svn: 347561
Summary:
And add a hidden option to control whether the types are collected.
For experiments, will be removed when expected types implementation
is stabilized.
The index size is almost unchanged, e.g. the YAML index for all clangd
sources increased from 53MB to 54MB.
Reviewers: ioeric, sammccall
Reviewed By: sammccall
Subscribers: MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D52274
llvm-svn: 347560
Summary:
Provides facilities to model the C++ conversion rules without the AST.
The introduced representation can be stored in the index and used to
implement type-based ranking improvements for index-based completions.
Reviewers: sammccall, ioeric
Reviewed By: sammccall
Subscribers: malaperle, mgorny, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D52273
llvm-svn: 347559
We have these 2 "isDesirable" promotion hooks (I'm not sure why we need both of them, but that's
independent of this patch), and we can adjust them to promote "mul i8 X, C" to i32. Then, all of
our existing LEA and other multiply expansion magic happens as it would for i32 ops.
Some of the test diffs show that we could end up with an actual 32-bit mul instruction here
because we choose not to expand to simpler ops. That instruction could be slower depending on the
subtarget. On the plus side, this means we don't need a separate instruction to load the constant
operand and possibly an extra instruction to move the result. If we need to tune mul i32 further,
we could add a later transform that tries to shrink it back to i8 based on subtarget timing.
I did not bother to duplicate all of the 32-bit test file RUNs and target settings that exist to
test whether LEA expansion is cheap or not. The diffs here assume a default target, so that means
LEA is generally cheap.
Differential Revision: https://reviews.llvm.org/D54803
llvm-svn: 347557
A number of builtins in altivec.h load/store vectors from pointers to scalar
types. Currently they just cast the pointer to a vector pointer, but expressions
like that have the alignment of the target type. Of course, the input pointer
did not have that alignment so this triggers UBSan (and rightly so).
This resolves https://bugs.llvm.org/show_bug.cgi?id=39704
Differential revision: https://reviews.llvm.org/D54787
llvm-svn: 347556
Summary: The fix for `auto` new expression is illegal.
Reviewers: aaron.ballman
Subscribers: xazax.hun, cfe-commits
Differential Revision: https://reviews.llvm.org/D54832
llvm-svn: 347551