If we fold the bitcast into the store we'll end up creating a truncating store to vXi1 that will get scalarized. Instead allow the bitcast to be turned into a movmsk.
We probably need to do something if the store itself is a vXi1 type, but I'll leave that til a testcase appears.
llvm-svn: 347632
Currently a store combine will absorb the bitcast before our combine that turns bitcasts into movmsk gets a chance to run. This results in a store being created with a vXi1 type. Type legalization then promotes the input type and makes this a truncating store. Then we badly scalarize this store.
Currently we avoid this on v8i1->i8 bitcasts due to an incompletely qualified(per the original intention) check in isLoadBitCastBeneficial. An easy fix is to disable this for all vXi1->iX bitcasts on pre-avx512 targets. We'll still generate terrible code if the IR explicitly contains a store of vXi1 without a bitcast. We could probably solve that by just turning all stores of vXi1 into (store (iX (bitcast))) as an early DAG combine.
llvm-svn: 347631
until I figure out why the build is failing or timing out
***************************
Summary:
The prior diff had to be reverted because there were two tests
that failed. I updated the two tests in this diff
clang/test/Misc/pragma-attribute-supported-attributes-list.test
clang/test/SemaCXX/attr-speculative-load-hardening.cpp
LLVM IR already has an attribute for speculative_load_hardening. Before
this commit, when a user passed the -mspeculative-load-hardening flag to
Clang, every function would have this attribute added to it. This Clang
attribute will allow users to opt into SLH on a function by function
basis.
This can be applied to functions and Objective C methods.
Reviewers: chandlerc, echristo, kristof.beyls, aaron.ballman
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D54915
This reverts commit a5b3c232d1e3613f23efbc3960f8e23ea70f2a79.
(r347617)
llvm-svn: 347628
Only push the outermost record as a DeclContext when parsing a function
body. See the comments in Sema::getContainingDC about the way the parser
pushes contexts. This is intended to match the behavior the parser
normally displays where it parses all method bodies from all nested
classes at the end of the outermost class, when all nested classes are
complete.
Fixes PR38460.
llvm-svn: 347627
Summary:
MSVC does this, and we should to.
The .gfids table is a table of RVAs, so it's impossible for a DLL to
indicate that an imported symbol is address taken. Therefore, exports
appear to be listed as address taken by the DLL that exports them.
This fixes an issue that Firefox ran into here:
https://bugzilla.mozilla.org/show_bug.cgi?id=1485016#c12
In Firefox, the export directive came from a .def file, but we need to
do this for any kind of export.
Reviewers: dmajor, hans, amccarth, alex
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D54723
llvm-svn: 347623
The benefits are:
a) Performance (very minor): it removes a heap allocation of the std::function constructor (template<class F> function(F f))
b) Clarity: it suggests that the callable's lifetime should end after the callee returns. Such callback is widely used in llvm. lld also uses it a lot.
Reviewers: ruiu, rnk, pcc
Reviewed By: ruiu
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D54813
llvm-svn: 347620
Summary:
The prior diff had to be reverted because there were two tests
that failed. I updated the two tests in this diff
clang/test/Misc/pragma-attribute-supported-attributes-list.test
clang/test/SemaCXX/attr-speculative-load-hardening.cpp
----- Summary from Previous Diff (Still Accurate) -----
LLVM IR already has an attribute for speculative_load_hardening. Before
this commit, when a user passed the -mspeculative-load-hardening flag to
Clang, every function would have this attribute added to it. This Clang
attribute will allow users to opt into SLH on a function by function basis.
This can be applied to functions and Objective C methods.
Reviewers: chandlerc, echristo, kristof.beyls, aaron.ballman
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D54915
llvm-svn: 347617
After a recent change in LLVM the TimePoint encoding become more
precise, exceeding the precision of the TimePoint obtained from the
DebugMap. This patch adds a flag to the GetModificationTime helper in
the FileSystem to return the modification time with less precision.
Thanks to Davide for bisecting this failure on the LLDB bots.
llvm-svn: 347615
Summary:
Basic documentation of the Stack Safety Analysis.
It will be improved during review and upstream of an implementation.
Reviewers: kcc, eugenis, vlad.tsyrklevich, glider
Reviewed By: vlad.tsyrklevich
Subscribers: arphaman, llvm-commits
Differential Revision: https://reviews.llvm.org/D53336
llvm-svn: 347612
Summary:
IPA is implemented as module pass which produce map from Function or Alias to
StackSafetyInfo for a single function.
From prototype by Evgenii Stepanov and Vlad Tsyrklevich.
Reviewers: eugenis, vlad.tsyrklevich, pcc, glider
Subscribers: hiraditya, mgrang, llvm-commits
Differential Revision: https://reviews.llvm.org/D54543
llvm-svn: 347611
This attribute should appear only on the first declaration. This
patch cleans up <string> by removing the attribute on redeclarations.
llvm-svn: 347608
Summary:
Removing ncompatible attributes at indirect-call promoted callsites, not removing it results in
at least a IR verification error.
Reviewers: davidxl, xur, mssimpso
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D54913
llvm-svn: 347605
Summary:
Analysis produces StackSafetyInfo which contains information with how allocas
and parameters were used in functions.
From prototype by Evgenii Stepanov and Vlad Tsyrklevich.
Reviewers: eugenis, vlad.tsyrklevich, pcc, glider
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D54504
llvm-svn: 347603
Summary:
By default sanstats search binaries at the same location where they were when
stats was collected. Sometime you can not print report immediately or you need
to move post-processing to another workstation. To support this use-case when
original binary is missing sanstats will fall-back to directory with sanstats
file.
Reviewers: pcc
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D53857
llvm-svn: 347601
Summary: Help with off-line symbolization or other type debugging.
Reviewers: pcc
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D53606
llvm-svn: 347600
Refactor the scheduling predicates based on `MCInstPredicate`. In this
case, `AArch64InstrInfo::hasExtendedReg()`.
Differential revision: https://reviews.llvm.org/D54822
llvm-svn: 347599
Refactor the scheduling predicates based on `MCInstPredicate`. In this
case, `AArch64InstrInfo::hasShiftedReg()`.
Differential revision: https://reviews.llvm.org/D54820
llvm-svn: 347598
Refactor the scheduling predicates based on `MCInstPredicate`. In this
case, `AArch64InstrInfo::isScaledAddr()`
Differential revision: https://reviews.llvm.org/D54777
llvm-svn: 347597
Summary:
Support for profile-driven cache prefetching (X86)
This change is part of a larger system, consisting of a cache prefetches recommender, create_llvm_prof (https://github.com/google/autofdo), and LLVM.
A proof of concept recommender is DynamoRIO's cache miss analyzer. It processes memory access traces obtained from a running binary and identifies patterns in cache misses. Based on them, it produces a csv file with recommendations. The expectation is that, by leveraging such recommendations, we can reduce the amount of clock cycles spent waiting for data from memory. A microbenchmark based on the DynamoRIO analyzer is available as a proof of concept: https://goo.gl/6TM2Xp.
The recommender makes prefetch recommendations in terms of:
* the binary offset of an instruction with a memory operand;
* a delta;
* and a type (nta, t0, t1, t2)
meaning: a prefetch of that type should be inserted right before the instrution at that binary offset, and the prefetch should be for an address delta away from the memory address the instruction will access.
For example:
0x400ab2,64,nta
and assuming the instruction at 0x400ab2 is:
movzbl (%rbx,%rdx,1),%edx
means that the recommender determined it would be beneficial for a prefetchnta instruction to be inserted right before this instruction, as such:
prefetchnta 0x40(%rbx,%rdx,1)
movzbl (%rbx, %rdx, 1), %edx
The workflow for prefetch cache instrumentation is as follows (the proof of concept script details these steps as well):
1. build binary, making sure -gmlt -fdebug-info-for-profiling is passed. The latter option will enable the X86DiscriminateMemOps pass, which ensures instructions with memory operands are uniquely identifiable (this causes ~2% size increase in total binary size due to the additional debug information).
2. collect memory traces, run analysis to obtain recommendations (see above-referenced DynamoRIO demo as a proof of concept).
3. use create_llvm_prof to convert recommendations to reference insertion locations in terms of debug info locations.
4. rebuild binary, using the exact same set of arguments used initially, to which -mllvm -prefetch-hints-file=<file> needs to be added, using the afdo file obtained at step 3.
Note that if sample profiling feedback-driven optimization is also desired, that happens before step 1 above. In this case, the sample profile afdo file that was used to produce the binary at step 1 must also be included in step 4.
The data needed by the compiler in order to identify prefetch insertion points is very similar to what is needed for sample profiles. For this reason, and given that the overall approach (memory tracing-based cache recommendation mechanisms) is under active development, we use the afdo format as a syntax for capturing this information. We avoid confusing semantics with sample profile afdo data by feeding the two types of information to the compiler through separate files and compiler flags. Should the approach prove successful, we can investigate improvements to this encoding mechanism.
Reviewers: davidxl, wmi, craig.topper
Reviewed By: davidxl, wmi, craig.topper
Subscribers: davide, danielcdh, mgorny, aprantl, eraman, JDevlieghere, llvm-commits
Differential Revision: https://reviews.llvm.org/D54052
llvm-svn: 347596
It's possible in some cases to have a restore present
without a corresponding spill. Due to an apparent bug
in D54366 <https://reviews.llvm.org/D54366>, only the
restore for a register was emitted. It's probably
always a bug for this to happen, but due to how SGPR
spilling is implemented, this makes the issues appear
worse than it is.
llvm-svn: 347595
On my machine this reduced median link time of lld-speed-test/chrome
from 2.68s to 2.41s. It also reduces link time of Chrome for Android
with a prototype compiler change that causes the compiler to create
large numbers of identical (modulo relocations) sections from >15
minutes to a few seconds.
Differential Revision: https://reviews.llvm.org/D54773
llvm-svn: 347594
SplitVecOp_TruncateHelper tries to promote the result type while splitting FP_TO_SINT/UINT. It then concatenates the result and introduces a truncate to the original result type. But it does this without inserting the AssertZExt/AssertSExt that the regular result type promotion would insert. Nor does it turn FP_TO_UINT into FP_TO_SINT the way normal result type promotion for these operations does. This is bad on X86 which doesn't support FP_TO_SINT until AVX512.
This patch disables the use of SplitVecOp_TruncateHelper for these operations and just lets normal promotion handle it. I've tweaked a couple things in X86ISelLowering to avoid a few obvious regressions there. I believe all the changes on X86 are improvements. The other targets look neutral.
Differential Revision: https://reviews.llvm.org/D54906
llvm-svn: 347593
Summary:
The old legacy LTO API had a separate cache key computation, which was
a subset of the cache key computation in the new LTO API (from what I
can tell this is largely just because certain features such as CFI,
dsoLocal, etc are only utilized via the new LTO API). However, having
separate computations is unnecessary (much of the code is duplicated),
and can lead to bugs when adding new optimizations if both cache
computation algorithms aren't updated properly - it's much easier to
maintain if we have a single facility.
This patch refactors the old LTO API code to use the cache key
computation from the new LTO API. To do this, we set up an lto::Config
object and fill in the fields that the old LTO was hashing (the others
will just use the defaults).
There are two notable changes:
- I added a Freestanding flag to the LTO Config. Currently this is only
used by the legacy LTO API. In the patch that added it (D30791) I had
asked about adding it to the new LTO API, but it looks like that was not
addressed. This should probably be discussed as a follow up to this
change, as it is orthogonal.
- The legacy LTO API had some code that was hashing the GUID of all
preserved symbols defined in the module. I looked back at the history of
this (which was added with the original hashing in the legacy LTO API in
D18494), and there is a comment in the review thread that it was added
in preparation for future internalization. We now do the internalization
of course, and that is handled in the new LTO API cache key computation
by hashing the recorded linkage type of all defined globals. Therefore I
didn't try to move over and keep the preserved symbols handling.
Reviewers: steven_wu, pcc
Subscribers: mehdi_amini, inglorion, eraman, dexonsmith, dang, llvm-commits
Differential Revision: https://reviews.llvm.org/D54635
llvm-svn: 347592
We might find a target specific node that needs to be unwrapped after we look through an add/or. Otherwise we get inconsistent results if one pointer is just X86WrapperRIP and the other is (add X86WrapperRIP, C)
Differential Revision: https://reviews.llvm.org/D54818
llvm-svn: 347591
This patch adds an implementation of __resize_default_init as
described in P1072R2. Additionally, it uses it in filesystem to
demonstrate its intended utility.
Once P1072 lands, or if it changes it's interface, I will adjust
the internal libc++ implementation to match.
llvm-svn: 347589
Summary: They have an additional `ThreadsEnabled` check, which does not matter much.
Reviewers: pcc, ruiu, rnk
Reviewed By: ruiu
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D54812
llvm-svn: 347587
Summary:
LLVM IR already has an attribute for speculative_load_hardening. Before
this commit, when a user passed the -mspeculative-load-hardening flag to
Clang, every function would have this attribute added to it. This Clang
attribute will allow users to opt into SLH on a function by function basis.
This can be applied to functions and Objective C methods.
Reviewers: chandlerc, echristo
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D54555
llvm-svn: 347586
In r339743, I marked several aligned allocation tests as downright
unsupported on macosx in an attempt to unbreak the build. It turns
out that marking them as unuspported whenever we're on OS X is way
too coarse grained. This commit marks the tests as XFAIL with more
granularity.
llvm-svn: 347585