when DenseMap growed and moved memory. I verified it fixed the bootstrap
problem on x86_64-linux-gnu but I cannot verify whether it fixes
the bootstrap error on clang-ppc64be-linux. I will watch the build-bot
result closely.
Replace analyzeSiblingValues with new algorithm to fix its compile
time issue. The patch is to solve PR17409 and its duplicates.
analyzeSiblingValues is a N x N complexity algorithm where N is
the number of siblings generated by reg splitting. Although it
causes siginificant compile time issue when N is large, it is also
important for performance since it removes redundent spills and
enables rematerialization.
To solve the compile time issue, the patch removes analyzeSiblingValues
and replaces it with lower cost alternatives containing two parts. The
first part creates a new spill hoisting method in postOptimization of
register allocation. It does spill hoisting at once after all the spills
are generated instead of inside every instance of selectOrSplit. The
second part queries the define expr of the original register for
rematerializaiton and keep it always available during register allocation
even if it is already dead. It deletes those dead instructions only in
postOptimization. With the two parts in the patch, it can remove
analyzeSiblingValues without sacrificing performance.
Differential Revision: http://reviews.llvm.org/D15302
llvm-svn: 265547
Summary:
When the backedge taken codition is computed from an icmp, SCEV can
deduce the backedge taken count only if one of the sides of the icmp
is an AddRecExpr. However, due to sign/zero extensions, we sometimes
end up with something that is not an AddRecExpr.
However, we can use SCEV predicates to produce a 'guarded' expression.
This change adds a method to SCEV to get this expression, and the
SCEV predicate associated with it.
In HowManyGreaterThans and HowManyLessThans we will now add a SCEV
predicate associated with the guarded backedge taken count when the
analyzed SCEV expression is not an AddRecExpr. Note that we only do
this as an alternative to returning a 'CouldNotCompute'.
We use new feature in Loop Access Analysis and LoopVectorize to analyze
and transform more loops.
Reviewers: anemet, mzolotukhin, hfinkel, sanjoy
Subscribers: flyingforyou, mcrosier, atrick, mssimpso, sanjoy, mzolotukhin, llvm-commits
Differential Revision: http://reviews.llvm.org/D17201
llvm-svn: 265535
Instead of copying arguments from the source function to the
destination, steal them. This has a few advantages.
- The ValueMap doesn't need to be seeded with (or cleared of)
Arguments.
- Often the destination function won't have created any arguments yet,
so this avoids malloc traffic.
- Argument names don't need to be copied.
Because argument lists are lazy, this required a new
Function::stealArgumentListFrom helper.
llvm-svn: 265519
We must remove all aliased registers which may be more than the all sub
and super registers combined.
Bug found while reading the code. The bug does not affect any existing
target as the only use of register aliases I could found were control
registers on ARM and Hexagon which are all reserved.
llvm-svn: 265510
As part of the TRI argument of addRegBankCoverage we already have access to
the TargetRegisterClass through the ID of that register class.
Therefore, there is no point in needing a TargetRegisterClass instance,
the ID is enough to get to it.
llvm-svn: 265487
Bionic has a defined thread-local location for the stack protector
cookie. Emit a direct load instead of going through __stack_chk_guard.
llvm-svn: 265481
Add a common parent class for ConstantArray, ConstantVector, and
ConstantStruct called ConstantAggregate. These are the aggregate
subclasses of Constant that take operands.
This is mainly a cleanup, adding common `isa` target and removing
duplicated code. However, it also simplifies caching which constants
point transitively at `GlobalValue` (a possible future direction).
llvm-svn: 265466
Change the default constructor to create invalid object.
The target will have to properly initialize the register banks before
using them.
llvm-svn: 265460
destruction.
This makes the Expected<T> class behave like Error, even when in success mode.
Expected<T> values must be checked to see whether they contain an error prior
to being dereferenced, assigned to, or destructed.
llvm-svn: 265446
At IR level, the swifterror argument is an input argument with type
ErrorObject**. For targets that support swifterror, we want to optimize it
to behave as an inout value with type ErrorObject*; it will be passed in a
fixed physical register.
The main idea is to track the virtual registers for each swifterror value. We
define swifterror values as AllocaInsts with swifterror attribute or a function
argument with swifterror attribute.
In SelectionDAGISel.cpp, we set up swifterror values (SwiftErrorVals) before
handling the basic blocks.
When iterating over all basic blocks in RPO, before actually visiting the basic
block, we call mergeIncomingSwiftErrors to merge incoming swifterror values when
there are multiple predecessors or to simply propagate them. There, we create a
virtual register for each swifterror value in the entry block. For predecessors
that are not yet visited, we create virtual registers to hold the swifterror
values at the end of the predecessor. The assignments are saved in
SwiftErrorWorklist and will be materialized at the end of visiting the basic
block.
When visiting a load from a swifterror value, we copy from the current virtual
register assignment. When visiting a store to a swifterror value, we create a
virtual register to hold the swifterror value and update SwiftErrorMap to
track the current virtual register assignment.
Differential Revision: http://reviews.llvm.org/D18108
llvm-svn: 265433
Without setting the flag there is no way to determine if a symbol
points to an arm or to a thumb function as the LSB of the address
masked out in all getter function.
Note: Currently the thumb flag is only used for MachO files so
adding a test to this change is not possible. It will be used
by the upcoming fix for llvm-objdump for disassembling thumb
functions what is easily testable.
Differential revision: http://reviews.llvm.org/D17956
llvm-svn: 265387
Refactor common code that queries the ModuleSummaryIndex for a value's
GlobalValueInfo struct into getGlobalValueInfo helper methods, which
will also be used by D18763.
llvm-svn: 265370
We can only perform a tail call to a callee that preserves all the
registers that the caller needs to preserve.
This situation happens with calling conventions like preserver_mostcc or
cxx_fast_tls. It was explicitely handled for fast_tls and failing for
preserve_most. This patch generalizes the check to any calling
convention.
Related to rdar://24207743
Differential Revision: http://reviews.llvm.org/D18680
llvm-svn: 265329
Use the MachineFunctionProperty mechanism to indicate whether a MachineFunction
is in SSA form instead of a custom method on MachineRegisterInfo. NFC
Differential Revision: http://reviews.llvm.org/D18574
llvm-svn: 265318
time issue. The patch is to solve PR17409 and its duplicates.
analyzeSiblingValues is a N x N complexity algorithm where N is
the number of siblings generated by reg splitting. Although it
causes siginificant compile time issue when N is large, it is also
important for performance since it removes redundent spills and
enables rematerialization.
To solve the compile time issue, the patch removes analyzeSiblingValues
and replaces it with lower cost alternatives containing two parts. The
first part creates a new spill hoisting method in postOptimization of
register allocation. It does spill hoisting at once after all the spills
are generated instead of inside every instance of selectOrSplit. The
second part queries the define expr of the original register for
rematerializaiton and keep it always available during register allocation
even if it is already dead. It deletes those dead instructions only in
postOptimization. With the two parts in the patch, it can remove
analyzeSiblingValues without sacrificing performance.
Differential Revision: http://reviews.llvm.org/D15302
llvm-svn: 265309
Implemented truncstore for KNL and skylake-avx512.
Covered vectors from v2i1 to v64i1. We save the value in bits (not in bytes) - v32i1 is saved in 4 bytes.
Differential Revision: http://reviews.llvm.org/D18740
llvm-svn: 265283
RAUW support on MDNode usually requires an extra allocation for
ReplaceableMetadataImpl. This is only strictly necessary if there are
tracking references to the MDNode. Make the construction of
ReplaceableMetadataImpl lazy, so that we don't get allocations if we
don't need them.
Since MDNode::isResolved now checks MDNode::isTemporary and
MDNode::NumUnresolved instead of whether a ReplaceableMetadataImpl is
allocated, the internal changes are intrusive (at various internal
checkpoints, isResolved now has a different answer).
However, there should be no real functionality change here; just
slightly lazier allocation behaviour. The external semantics should be
identical.
llvm-svn: 265279
This adds an assertion to maintain the property from r265273. When
Mapper::mapSimpleMetadata calls Mapper::mapValue, it should not find its
way back to mapMetadataImpl. This guarantees that mapSimpleMetadata is
not involved in any recursion.
Since Mapper::mapValue calls out to arbitrary materializers, we need to
save a bit on the ValueMap to make this assertion effective.
There should be no functionality change here. This co-recursion should
already have been impossible.
llvm-svn: 265276
Instead of checking live during MapMetadata whether a subprogram is
needed, seed the ValueMap with `nullptr` up-front.
There is a small hypothetical functionality change. Previously, calling
MapMetadataOp on a node whose "scope:" chain led to an unneeded
subprogram would return nullptr. However, if that were ever called,
then the subprogram would be needed; a situation that the IRMover is
supposed to avoid a priori!
Besides cleaning up the code a little, this restores a nice property:
MapMetadataOp returns the same as MapMetadata.
llvm-svn: 265229
Support seeding a ValueMap with nullptr for Metadata entries, a
situation I didn't consider in the Metadata/Value split.
I added a ValueMapper::getMappedMD accessor that returns an
Optional<Metadata*> with the mapped (possibly null) metadata. IRMover
needs to use this to avoid modifying the map when it's checking for
unneeded subprograms. I updated a call from bugpoint since I find the
new code clearer.
llvm-svn: 265228
Summary: This should make the code more readable, especially all the map declarations.
Reviewers: tejohnson
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D18721
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 265215
Incremental LTO will usea cache to store object files.
This patch handles the pruning part of the cache, exposing
a few knobs:
- Pruning interval: the implementation keeps a "timestamp" file in the
directory and will scan it only after a given interval since the
last modification of the timestamp file. This is for performance
purpose, we don't want to scan continuously the folder.
- Entry expiration: this is the time after which a file that hasn't
been used is remove from the cache.
- Maximum size: expressed in percentage of the available disk space,
it helps to avoid that we blow up the disk space.
http://reviews.llvm.org/D18422
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 265209
A ``swifterror`` attribute can be applied to a function parameter or an
AllocaInst.
This commit does not include any target-specific change. The target-specific
optimization will come as a follow-up patch.
Differential Revision: http://reviews.llvm.org/D18092
llvm-svn: 265189
Refactor the code that gets and creates PGOFuncName meta data so that it can be
used in clang's value profile annotation.
Differential Revision: http://reviews.llvm.org/D18623
llvm-svn: 265149
This avoids undefined behavior when casting pointers to it. Also make
sure that we don't cast to a derived StringMapEntry before checking for
tombstone, as that may have different alignment requirements.
llvm-svn: 265145
This allows the linker to instruct ThinLTO to perform only the
optimization part or only the codegen part of the process.
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 265113
This is intended to be used for ThinLTO incremental build.
Differential Revision: http://reviews.llvm.org/D18213
This is a recommit of r265095 after fixing the Windows issues.
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 265111
Provide a class to generate a SHA1 from a sequence of bytes, and
a convenience raw_ostream adaptor.
This will be used to provide a "build-id" by hashing the Module
block when writing bitcode. ThinLTO will use this information for
incremental build.
Reapply r265094 which was reverted in r265102 because it broke
MSVC bots (constexpr is not supported).
http://reviews.llvm.org/D16325
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 265107
This reverts commit r265096, r265095, and r265094.
Windows build is broken, and the validation does not pass.
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 265102
This is intended to be used for ThinLTO incremental build.
Differential Revision: http://reviews.llvm.org/D18213
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 265095
Provide a class to generate a SHA1 from a sequence of bytes, and
a convenience raw_ostream adaptor.
This will be used to provide a "build-id" by hashing the Module
block when writing bitcode. ThinLTO will use this information for
incremental build.
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 265094
when compiling with LTO.
r244523 a new class DiagnosticInfoOptimizationRemarkAnalysisAliasing for
optimization analysis remarks related to pointer aliasing without
guarding it in isDiagnosticEnabled in LLVMContext.cpp. This caused the
diagnostic message to be printed unconditionally when compiling with
LTO.
This commit cleans up isDiagnosticEnabled and makes sure all the
vectorization optimization remarks are guarded.
rdar://problem/25382153
llvm-svn: 265084
Summary: Adapted from Boost::filesystem.
(This is a reapply by reverting commit r265080 and fixing the WinAPI part)
Differential Revision: http://reviews.llvm.org/D18467
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 265082
This mostly cosmetic patch moves the DebugEmissionKind enum from DIBuilder
into DICompileUnit. DIBuilder is not the right place for this enum to live
in — a metadata consumer should not have to include DIBuilder.h.
I also added a Verifier check that checks that the emission kind of a
DICompileUnit is actually legal.
http://reviews.llvm.org/D18612
<rdar://problem/25427165>
llvm-svn: 265077
Summary: Adapted from Boost::filesystem.
(This is a reapply by reverting commit r265062 and fixing the WinAPI part)
Differential Revision: http://reviews.llvm.org/D18467
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 265068
This patch simply mirrors the attributes we give to @llvm.nvvm.reflect
to the __nvvm_reflect libdevice call. This shaves about 30% of the code
in libdevice away because of CSE opportunities. It's also helps us
figure out that libdevice implementations of transcendental functions
don't have side-effects.
llvm-svn: 265060
This will become necessary in a subsequent change to make this method
merge adjacent stack adjustments, i.e. it might erase the previous
and/or next instruction.
It also greatly simplifies the calls to this function from Prolog-
EpilogInserter. Previously, that had a bunch of logic to resume iteration
after the call; now it just continues with the returned iterator.
Note that this changes the behaviour of PEI a little. Previously,
it attempted to re-visit the new instruction created by
eliminateCallFramePseudoInstr(). That code was added in r36625,
but I can't see any reason for it: the new instructions will obviously
not be pseudo instructions, they will not have FrameIndex operands,
and we have already accounted for the stack adjustment.
Differential Revision: http://reviews.llvm.org/D18627
llvm-svn: 265036
This patch is a part of http://reviews.llvm.org/D15525
GlobalIndirectSymbol class contains common implementation for both
aliases and ifuncs. This patch should be NFC change that just prepare
common code for ifunc support.
Differential Revision: http://reviews.llvm.org/D18433
llvm-svn: 265016
Change isConsecutiveLoads to check that loads are non-volatile as this
is a requirement for any load merges. Propagate change to two callers.
Reviewers: RKSimon
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D18546
llvm-svn: 265013
PPC has a vector popcount, this lets the vectorizer use the correct cost
for it. Tweak X86 test to use an intrinsic that's actually scalarized (we
have a somewhat efficient lowering for vector popcount using SSE, the
cost model finds that now).
llvm-svn: 265005
Summary:
As discussed on llvm-dev[1].
This change adds the basic boilerplate code around having this intrinsic
in LLVM:
- Changes in Intrinsics.td, and the IR Verifier
- A lowering pass to lower @llvm.experimental.guard to normal
control flow
- Inliner support
[1]: http://lists.llvm.org/pipermail/llvm-dev/2016-February/095523.html
Reviewers: reames, atrick, chandlerc, rnk, JosephTremoulet, echristo
Subscribers: mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D18527
llvm-svn: 264976
Commit r260791 contained an error in that it would introduce a cross-module
reference in the old module. It also introduced O(N^2) complexity in the
module cloner by requiring the entire module to be visited for each function.
Fix both of these problems by avoiding use of the CloneDebugInfoMetadata
function (which is only designed to do intra-module cloning) and cloning
function-attached metadata in the same way that we clone all other metadata.
Differential Revision: http://reviews.llvm.org/D18583
llvm-svn: 264935
For the same reason as the corresponding load change.
Note that ExpandStore is completely broken for non-byte sized element
vector stores, but preserve the current broken behavior which has tests
for it. The behavior should be the same, but now introduces a new typed
store that is incorrectly split later rather than doing it directly.
llvm-svn: 264928
On AMDGPU we want to be able to promote i64/f64 loads to v2i32.
If the access is unaligned, this would conclude that since i64 is legal,
it would convert it back to i64 and there is an endless legalization
loop.
Extract the logic for scalarizing the load into a new TargetLowering
function, where this can also replace the custom function AMDGPU
has for this.
llvm-svn: 264927
Summary:
This gives callers flexibility to pass lambdas with captures, which lets
callers avoid the C-style void*-ptr closure style. (Currently, callers
in clang store state in the PassManagerBuilderBase arg.)
No functional change, and the new API is backwards-compatible.
Reviewers: chandlerc
Subscribers: joker.eph, cfe-commits
Differential Revision: http://reviews.llvm.org/D18613
llvm-svn: 264918
There is code under review that requires StringMap to have a copy constructor,
and this makes StringMap more consistent with our other containers (like
DenseMap) that have copy constructors.
Differential Revision: http://reviews.llvm.org/D18506
llvm-svn: 264906
PGOFuncNames are used as the key to retrieve the Function definition from the
MD5 stored in the profile. For internal linkage function, we prefix the source
file name to the PGOFuncNames. LTO's internalization privatizes many global linkage
symbols. This happens after value profile annotation, but those internal
linkage functions should not have a source prefix. To differentiate compiler
generated internal symbols from original ones, PGOFuncName meta data are
created and attached to the original internal symbols in the value profile
annotation step. If a symbol does not have the meta data, its original linkage
must be non-internal.
Also add a new map that maps PGOFuncName's MD5 value to the function definition.
Differential Revision: http://reviews.llvm.org/D17895
llvm-svn: 264902
Using ArrayRef in annotateValueSite's parameter instead of using an array
and it's size.
Differential Revision: http://reviews.llvm.org/D18568
llvm-svn: 264879