that it computes. Currently this is used for testing and precision
tuning, but it might be used by optimizations later.
Differential Revision: http://reviews.llvm.org/D19179
llvm-svn: 268291
PDB has a lot of similar data structures. We already have code
for parsing a Name Map, but PDB seems to have a different but
very similar structure that is a hash table. This is the
beginning of code needed in order to parse the name hash table,
but it is not yet complete. It parses the basic metadata of
the hash table, the bucket array, and the names buffer, but
doesn't use any of these fields yet as the data structure
requires a non-trivial amount of work to understand.
llvm-svn: 268268
Make it possible that TryToSimplifyUncondBranchFromEmptyBlock merges empty
basic block including lifetime intrinsics as well as phi nodes and
unconditional branch into its successor or predecessor(s).
If successor of empty block has single predecessor, all contents including
lifetime intrinsics are sinked into the successor. Otherwise, they are
hoisted into its predecessor(s) and then merged into the predecessor(s).
Patch by Josh Yoon <josh.yoon@samsung.com>!
Differential Revision: http://reviews.llvm.org/D19257
llvm-svn: 268254
The implemented heuristic has a large body of code which better sits
in its own function for better readability. It also allows adding more
heuristics easier in the future.
llvm-svn: 268107
The motivation for this change is that PDB has the notion of
streams and substreams. Substreams often consist of variable
length structures that are convenient to be able to treat as
guaranteed, contiguous byte arrays, whereas the streams they
are contained in are not necessarily so, as a single stream
could be spread across many discontiguous blocks.
So, when processing data from a substream, we want to be able
to assume that we have a contiguous byte array so that we can
cast pointers to variable length arrays and such.
This leads to the question of how to be able to read the same
data structure from either a stream or a substream using the
same interface, which is where this patch comes in.
We separate out the stream's read state from the underlying
representation, and introduce a `StreamReader` class. Then
we change the name of `PDBStream` to `MappedBlockStream`, and
introduce a second kind of stream called a `ByteStream` which is
simply a sequence of contiguous bytes. Finally, we update all
of the std::vectors in `PDBDbiStream` to use `ByteStream` instead
as a proof of concept.
llvm-svn: 268071
Summary:
Historically, we had a switch in the Makefiles for turning on "expensive
checks". This has never been ported to the cmake build, but the
(dead-ish) code is still around.
This will also make it easier to turn it on in buildbots.
Reviewers: chandlerc
Subscribers: jyknight, mzolotukhin, RKSimon, gberry, llvm-commits
Differential Revision: http://reviews.llvm.org/D19723
llvm-svn: 268050
We neglected to transfer operand bundles for some transforms. These
were found via inspection, I'll try to come up with some test cases.
llvm-svn: 268011
We now read out the rest of the substreams from the DBI streams. One of
these substreams, the FileInfo substream, contains information about which
source files contribute to each module (aka compiland). This patch
additionally parses out the file information from that substream, and
dumps it in llvm-pdbdump.
Differential Revision: http://reviews.llvm.org/D19634
Reviewed by: ruiu
llvm-svn: 267928
ScheduleDAGMI::initQueues changes the RegionBegin to the first non-debug
instruction. Since it does not track register pressure, it does not affect
any RP trackers. ScheduleDAGMILive inherits initQueues from ScheduleDAGMI,
and it does reset the TopTPTracker in its schedule method. Any derived,
target-specific scheduler will need to do it as well, but the TopRPTracker
is only exposed as a "const" object to derived classes. Without the ability
to modify the tracker directly, this leaves a derived scheduler with a
potential of having the TopRPTracker out-of-sync with the CurrentTop.
The symptom of the problem:
void llvm::ScheduleDAGMILive::scheduleMI(llvm::SUnit *, bool):
Assertion `TopRPTracker.getPos() == CurrentTop && "out of sync"' failed.
Differential Revision: http://reviews.llvm.org/D19438
llvm-svn: 267918
The DetectDeadLanes pass performs a dataflow analysis of used/defined
subregister lanes across COPY instructions and instructions that will
get lowered to copies. It detects dead definitions and uses reading
undefined values which are obscured by COPY and subregister usage.
These dead definitions cause trouble in the register coalescer which
cannot deal with definitions suddenly becoming dead after coalescing
COPY instructions.
For now the pass only adds dead and undef flags to machine operands. It
should be possible to extend it in the future to remove the dead
instructions and redo the analysis for the affected virtual
registers.
Differential Revision: http://reviews.llvm.org/D18427
llvm-svn: 267851
This function performs the reverse computation of
composeSubRegIndexLaneMask().
It will be used in the upcoming "DetectDeadLanes" pass.
llvm-svn: 267849
Previously using lanemasks on registers without any subregisters was not
well defined. This commit extends TargetRegisterInfo/tablegen to:
- Report a lanemask of 1 for regclasses without subregisters
- Do the right thing when mapping a 0/1 lanemask from a class without
subregisters into a class with subregisters in
TargetRegisterInfo::composeSubRegIndexLaneMasks().
This will be used in the upcoming "DetectDeadLanes" patch.
llvm-svn: 267848
This gets more data out of the DBI strema of the PDB. In
particular it extracts the metadata for the list of modules
(compilands) that this PDB contains info about, and adds support
for dumping these fields to llvm-pdbdump.
Differential Revision: http://reviews.llvm.org/D19570
Reviewed By: ruiu
llvm-svn: 267818
This patch implements the transformation that promotes indirect calls to
conditional direct calls when the indirect-call value profile meta-data is
available.
Differential Revision: http://reviews.llvm.org/D17864
llvm-svn: 267815
Also replaces a number of calls to report_fatal_error with Error returns.
The plumbing will make it easier to return errors originating in libObject.
Replacing report_fatal_errors with Error returns will give JIT clients the
opportunity to recover gracefully when the JIT is unable to produce/relocate
code, as well as providing meaningful error messages that can be used to file
bug reports.
llvm-svn: 267776
Summary:
This is a hook to allow TargetMachine to install passes at the
EP_EarlyAsPossible PassManagerBuilder extension point.
Reviewers: chandlerc
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D18614
llvm-svn: 267763
Now the pass is just a tiny wrapper around the util. This lets us reuse
the logic elsewhere (done here for BuildLibCalls) instead of duplicating
it.
The next step is to have something like getOrInsertLibFunc that also
sets the attributes.
Differential Revision: http://reviews.llvm.org/D19470
llvm-svn: 267759
I tried to be as close as possible to the strongest check that
existed before; cleaning these up properly is left for future work.
Differential Revision: http://reviews.llvm.org/D19469
llvm-svn: 267758
Summary:
So it appears that to guarantee some of the ordering requirements of a GLSL
memoryBarrier() executed in the shader, we need to emit an s_waitcnt.
(We can't use an s_barrier, because memoryBarrier() may appear anywhere in
the shader, in particular it may appear in non-uniform control flow.)
Reviewers: arsenm, mareko, tstellarAMD
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D19203
llvm-svn: 267729
This change adds a new hook for estimating the cost of vector extracts followed
by zero- and sign-extensions. The motivating example for this change is the
SMOV and UMOV instructions on AArch64. These instructions move data from vector
to general purpose registers while performing the corresponding extension
(sign-extend for SMOV and zero-extend for UMOV) at the same time. For these
operations, TargetTransformInfo can assume the extensions are free and only
report the cost of the vector extract. The SLP vectorizer has been updated to
make use of the new hook.
Differential Revision: http://reviews.llvm.org/D18523
llvm-svn: 267725
Summary:
With the removal of support for lazy parsing of combined index summary
records (e.g. r267344), we no longer need to include the summary record
bitcode offset in the VST entries for definitions. Change the combined
index format to be similar to the per-module index format in using value
ids to cross-reference from the summary record to the VST entry (rather
than the summary record bitcode offset to cross-reference in the other
direction).
The visible changes are:
1) Add the value id to the combined summary records
2) Remove the summary offset from the combined VST records, which has
the following effects:
- No longer need the VST_CODE_COMBINED_GVDEFENTRY record, as all
combined index VST entries now only contain the value id and
corresponding GUID.
- No longer have duplicate VST entries in the case where there are
multiple definitions of a symbol (e.g. weak/linkonce), as they all
have the same value id and GUID.
An implication of #2 above is that in order to hook up an alias to the
correct aliasee based on the value id of the aliasee recorded in the
combined index alias record, we need to scan the entries in the index
for that GUID to find the one from the same module (i.e. the case where
there are multiple entries for the aliasee). But the reader no longer
has to maintain a special map to hook up the alias/aliasee.
Reviewers: joker.eph
Subscribers: joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D19481
llvm-svn: 267712
Extract a part of isDereferenceableAndAlignedPointer functionality to Value::getPointerDerferecnceableBytes. Currently it's a NFC, but in future I'm going to accumulate all the logic about value dereferenceability in this function similarly to Value::getPointerAlignment function (D16144).
Reviewed By: reames
Differential Revision: http://reviews.llvm.org/D17572
llvm-svn: 267708
This is required to use this function from isSafeToSpeculativelyExecute
Reviewed By: hfinkel
Differential Revision: http://reviews.llvm.org/D16231
llvm-svn: 267692
Summary:
D19403 adds a new pragma for loop distribution. This change adds
support for the corresponding metadata that the pragma is translated to
by the FE.
As part of this I had to rethink the flag -enable-loop-distribute. My
goal was to be backward compatible with the existing behavior:
A1. pass is off by default from the optimization pipeline
unless -enable-loop-distribute is specified
A2. pass is on when invoked directly from opt (e.g. for unit-testing)
The new pragma/metadata overrides these defaults so the new behavior is:
B1. A1 + enable distribution for individual loop with the pragma/metadata
B2. A2 + disable distribution for individual loop with the pragma/metadata
The default value whether the pass is on or off comes from the initiator
of the pass. From the PassManagerBuilder the default is off, from opt
it's on.
I moved -enable-loop-distribute under the pass. If the flag is
specified it overrides the default from above.
Then the pragma/metadata can further modifies this per loop.
As a side-effect, we can now also use -enable-loop-distribute=0 from opt
to emulate the default from the optimization pipeline. So to be precise
this is the new behavior:
C1. pass is off by default from the optimization pipeline
unless -enable-loop-distribute or the pragma/metadata enables it
C2. pass is on when invoked directly from opt
unless -enable-loop-distribute=0 or the pragma/metadata disables it
Reviewers: hfinkel
Subscribers: joker.eph, mzolotukhin, llvm-commits
Differential Revision: http://reviews.llvm.org/D19431
llvm-svn: 267672
Summary:
cloneLoopWithPreheader() does not update LoopInfo for sub-loop of
the original loop being cloned. Add assert to ensure no sub-loops for loop being cloned.
Reviewers: anemet, ashutosh.nema, hfinkel
Subscribers: mzolotukhin, llvm-commits
Differential Revision: http://reviews.llvm.org/D15922
llvm-svn: 267671
This reverts commit r267657, r267656, and r267655.
The test does not pass on multiple bots, I'm unsure why yet but let's unbreak them.
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 267664
I missed read the comment when I commited r267621 and thought the
comment did not need update. Matthias kindly proved me wrong.
Fixing that.
llvm-svn: 267638
The DBI stream contains a lot of bookkeeping information for other
streams. In particular it contains information about section contributions
and linked modules. This patch is a first attempt at parsing some of the
information out of the DBI stream. It currently only parses and dumps the
headers of the DBI stream, so none of the module data or section
contribution data is pulled out.
This is just a proof of concept that we understand the basic properties of
the DBI stream's metadata, and followup patches will try to extract more
detailed information out.
Differential Revision: http://reviews.llvm.org/D19500
Reviewed By: majnemer, ruiu
llvm-svn: 267585
When a block is tail-duplicated, the PHI nodes from that block are
replaced with appropriate COPY instructions. When those PHI nodes
contained use operands with subregisters, the subregisters were
dropped from the COPY instructions, resulting in incorrect code.
Keep track of the subregister information and use this information
when remapping instructions from the duplicated block.
Differential Revision: http://reviews.llvm.org/D19337
llvm-svn: 267583
This is part of solving PR27344:
https://llvm.org/bugs/show_bug.cgi?id=27344
CGP should undo the SimplifyCFG transform for the same reason that earlier patches have used this
same mechanism: it's possible that passes between SimplifyCFG and CGP may be able to optimize the
IR further with a select in place.
For the TLI hook default, >99% taken or not taken is chosen as the default threshold for a highly
predictable branch. Even the most limited HW branch predictors will be correct on this branch almost
all the time, so even a massive mispredict penalty perf loss would be overcome by the win from all
the times the branch was predicted correctly.
As a follow-up, we could make the default target hook less conservative by using the SchedMachineModel's
MispredictPenalty. Or we could just let targets override the default by implementing the hook with that
and other target-specific options. Note that trying to statically determine mispredict rates for
close-to-balanced profile weight data is generally impossible if the HW is sufficiently advanced. Ie,
50/50 taken/not-taken might still be 100% predictable.
Finally, note that this patch as-is will not solve PR27344 because the current __builtin_unpredictable()
branch weight default values are 4 and 64. A proposal to change that is in D19435.
Differential Revision: http://reviews.llvm.org/D19488
llvm-svn: 267572
Summary:
We don't use MinLatency any more since r184032.
Reviewers: atrick, hfinkel, mcrosier
Differential Revision: http://reviews.llvm.org/D19474
llvm-svn: 267502
Autoconf used to support setting LLVM_VERSION_INFO and there is some code filtered around llvm in Support/CommandLine.cpp and LTO/LTOCodeGenerator.cpp that uses it if it is set.
We also shouldn't be explicitly setting it as a define on llvm-shlib. It is pointless there because there is no code using it in llvm-shlib, and it is better to have it as part of the generated config.h so that it is available everywhere.
llvm-svn: 267490
Add a typedef for the std::map<GlobalValue::GUID, GlobalValueSummary *>
map that is passed around to identify summaries for values defined in a
particular module. This shortens up declarations in a variety of places.
llvm-svn: 267471
This replaces use of std::error_code and ErrorOr in the ORC RPC support library
with Error and Expected. This required updating the OrcRemoteTarget API, Client,
and server code, as well as updating the Orc C API.
This patch also fixes several instances where Errors were dropped.
llvm-svn: 267457
We didn't have logic to correctly handle CFGs where there was more than
one EH-pad successor (these are novel with WinEH).
There were situations where a register was live in one exceptional
successor but not another but the code as written would only consider
the first exceptional successor it found.
This resulted in split points which were insufficiently early if an
invoke was present.
This fixes PR27501.
N.B. This removes getLandingPadSuccessor.
llvm-svn: 267412
If several regions cover the same area of code, we have to restore
the combined value for that area when return from a nested region.
This patch achieves that by combining regions before calling buildSegments.
Differential Revision: http://reviews.llvm.org/D18610
llvm-svn: 267390
The new relocation recently defined in the Intel386 psABI
was still missing from this file. A subsequent commit will
add support for GOT32X in MC, together with a test.
llvm-svn: 267378
Summary:
Remove the GlobalValueInfo and change the ModuleSummaryIndex to directly
reference summary objects. The info structure was there to support lazy
parsing of the combined index summary objects, which is no longer
needed and not supported.
Reviewers: joker.eph
Subscribers: joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D19462
llvm-svn: 267344
Enum bitfields have crazy portability issues with MSVC. Use unsigned
instead of LinkageTypes here in the ModuleSummaryIndex to address
Takumi's concerns from r267335.
llvm-svn: 267342
The original patch caused crashes because it could derefence a null pointer
for SelectionDAGTargetInfo for targets that do not define it.
Evaluates fmul+fadd -> fmadd combines and similar code sequences in the
machine combiner. It adds support for float and double similar to the existing
integer implementation. The key features are:
- DAGCombiner checks whether it should combine greedily or let the machine
combiner do the evaluation. This is only supported on ARM64.
- It gives preference to throughput over latency: the heuristic used is
to combine always in loops. The targets decides whether the machine
combiner should optimize for throughput or latency.
- Supports for fmadd, f(n)msub, fmla, fmls patterns
- On by default at O3 ffast-math
llvm-svn: 267328
Right now it only contains the LinkageType, but will be extended
with "hasSection", "isOptSize", "hasInlineAssembly", etc.
Differential Revision: http://reviews.llvm.org/D19404
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 267319
Keeping as much as possible internal/private is
known to help the optimizer. Let's try to benefit from
this in ThinLTO.
Note: this is early work, but is enough to build clang (and
all the LLVM tools). I still need to write some lit-tests...
Differential Revision: http://reviews.llvm.org/D19103
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 267317
The option to control the emission of the new relocations
is -relax-relocations (blatantly copied from GNU as).
It can't be enabled by default because it breaks relatively
recent versions of ld.bfd/ld.gold (late 2015).
llvm-svn: 267307
Summary:
As discussed in D18298, some local globals can't
be renamed/promoted (because they have a section, or because
they are referenced from inline assembly).
To be able to detect naming collision, we need to keep around
the "GUID" using their original name without taking the linkage
into account.
Reviewers: tejohnson
Subscribers: joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D19454
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 267304
Eliminate DITypeIdentifierMap and make DITypeRef a thin wrapper around
DIType*. It is no longer legal to refer to a DICompositeType by its
'identifier:', and DIBuilder no longer retains all types with an
'identifier:' automatically.
Aside from the bitcode upgrade, this is mainly removing logic to resolve
an MDString-based reference to an actualy DIType. The commits leading
up to this have made the implicit type map in DICompileUnit's
'retainedTypes:' field superfluous.
This does not remove DITypeRef, DIScopeRef, DINodeRef, and
DITypeRefArray, or stop using them in DI-related metadata. Although as
of this commit they aren't serving a useful purpose, there are patchces
under review to reuse them for CodeView support.
The tests in LLVM were updated with deref-typerefs.sh, which is attached
to the thread "[RFC] Lazy-loading of debug info metadata":
http://lists.llvm.org/pipermail/llvm-dev/2016-April/098318.html
llvm-svn: 267296
Each reference to an unresolved MDNode is expensive, since the RAUW
support in MDNode uses a separate allocation and side map. Since
a distinct MDNode doesn't require its operands on creation (unlike
uniuqed nodes, there's no need to check for structural equivalence),
use nullptr for any of its unresolved operands. Besides reducing the
burden on MDNode maps, this can avoid allocating temporary MDNodes in
the first place.
We need some way to track operands. Invent DistinctMDOperandPlaceholder
for this purpose, which is a Metadata subclass that holds an ID and
points at its single user. DistinctMDOperandPlaceholder::replaceUseWith
is just like RAUW, but its name highlights that there is only ever
exactly one use.
There is no support for moving (or, obviously, copying) these. Move
support would be possible but expensive; leaving it unimplemented
prevents user error. In the BitcodeReader I originally considered
allocating on a BumpPtrAllocator and keeping a vector of pointers to
them, and then I realized that std::deque implements exactly this.
A couple of obvious follow-ups:
- Change ValueEnumerator to emit distinct nodes first to take more
advantage of this optimization. (How convenient... I think I might
have a couple of patches for this.)
- Change DIBuilder and its consumers (like CGDebugInfo in clang) to
use something like this when constructing debug info in the first
place.
llvm-svn: 267270
Only one consumer (llvm-objdump) actually cared about the fact that there were
two triples. Others were actively working around the fact that the Triple
returned by getArch might have been invalid. As for llvm-objdump, it needs to
be acutely aware of both Triples anyway, so being generic in the exposed API is
no benefit.
Also rename the version of getArch returning a Triple. Users were having to
pass an unwanted nullptr to disambiguate the two, which was nasty.
The only functional change here is that armv7m and armv7em object files no
longer crash llvm-objdump.
llvm-svn: 267249
The dwo_name was added to dwo files to improve diagnostics in dwp, but
it confuses tools that attempt to load any dwo named by a dwo_name, even
ones inside dwos. Avoid this by keeping track of whether a unit is
already a dwo unit, and if so, not loading further dwos.
llvm-svn: 267241
Summary:
Follow up to D19291: it now makes sense to use two Intr*Mem properties,
in particular IntrReadMem + IntrArgMemOnly is common.
Pointed out by Mikael Holmén.
Reviewers: uabelho, joker.eph, reames
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D19418
llvm-svn: 267238
The original commit was reverted because of a buildbot problem with LazyCallGraph::SCC handling (not related to the OptBisect handling).
Differential Revision: http://reviews.llvm.org/D19172
llvm-svn: 267231
This intrinsic takes two arguments, ``%ptr`` and ``%offset``. It loads
a 32-bit value from the address ``%ptr + %offset``, adds ``%ptr`` to that
value and returns it. The constant folder specifically recognizes the form of
this intrinsic and the constant initializers it may load from; if a loaded
constant initializer is known to have the form ``i32 trunc(x - %ptr)``,
the intrinsic call is folded to ``x``.
LLVM provides that the calculation of such a constant initializer will
not overflow at link time under the medium code model if ``x`` is an
``unnamed_addr`` function. However, it does not provide this guarantee for
a constant initializer folded into a function body. This intrinsic can be
used to avoid the possibility of overflows when loading from such a constant.
Differential Revision: http://reviews.llvm.org/D18367
llvm-svn: 267223
This patch changes the interface for createPGOFuncNameMetadata() where we add
another PGOFuncName argument.
Differential Revision: http://reviews.llvm.org/D19433
llvm-svn: 267216
The relative vtable ABI (PR26723) needs PLT relocations to refer to virtual
functions defined in other DSOs. The unnamed_addr attribute means that the
function's address is not significant, so we're allowed to substitute it
with the address of a PLT entry.
Also includes a bonus feature: addends for COFF image-relative references.
Differential Revision: http://reviews.llvm.org/D17938
llvm-svn: 267211
E.g. for:
!1 = {"llvm.distribute", i32 1}
it now returns the MDOperand for 1.
I will use this in LoopDistribution to check the value of the metadata.
Note that the change is backward-compatible with its current use in
LoopVersioningLICM. An Optional implicitly converts to a bool depending
whether it contains a value or not.
llvm-svn: 267190
Summary:
CachingMemorySSAWalker::invalidateInfo was using IsCall to determine
which cache map needed to be cleared of entries referring to the invalidated
MemoryAccess, but there could also be entries referring to it in the
other cache map (value entries, not key entries). This change just
clears both tables to be conservatively correct.
Also add a verifyRemoved() function, called when expensive
checks (i.e. XDEBUG) are enabled to verify that the invalidated
MemoryAccess object is not referenced in any of the caches.
Reviewers: dberlin, george.burgess.iv
Subscribers: mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D19388
llvm-svn: 267157
Summary:
This new pass allows targets to use the hazard recognizer without having
to also run one of the schedulers. This is useful when compiling with
optimizations disabled for targets that still need noop hazards
to be handled correctly.
Reviewers: hfinkel, atrick
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D18594
llvm-svn: 267156
r267049 broke multiple buildbots (e.g. clang-cmake-mips, and clang-x86_64-linux-selfhost-modules) which the follow-ups have not yet resolved and this is preventing subsequent committers from being notified about additional failures on the affected buildbots.
llvm-svn: 267148
EarlyCSE had inconsistent behavior with regards to flag'd instructions:
- In some cases, it would pessimize if the available instruction had
different flags by not performing CSE.
- In other cases, it would miscompile if it replaced an instruction
which had no flags with an instruction which has flags.
Fix this by being more consistent with our flag handling by utilizing
andIRFlags.
llvm-svn: 267111
Summary:
Also adds a small comment blurb on control flow + no-wrap flags, since
that question came up a few days back on llvm-dev.
Reviewers: bjarke.roune, broune
Subscribers: sanjoy, mcrosier, llvm-commits, mzolotukhin
Differential Revision: http://reviews.llvm.org/D19209
llvm-svn: 267110
Summary:
This intrinsic returns true if the current thread belongs to a live pixel
and false if it belongs to a pixel that we are executing only for derivative
computation. It will be used by Mesa to implement gl_HelperInvocation.
Note that for pixels that are killed during the shader, this implementation
also returns true, but it doesn't matter because those pixels are always
disabled in the EXEC mask.
This unearthed a corner case in the instruction verifier, which complained
about a v_cndmask 0, 1, exec, exec<imp-use> instruction. That's stupid but
correct code, so make the verifier accept it as such.
Reviewers: arsenm, tstellarAMD
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D19191
llvm-svn: 267102
Evaluates fmul+fadd -> fmadd combines and similar code sequences in the
machine combiner. It adds support for float and double similar to the existing
integer implementation. The key features are:
- DAGCombiner checks whether it should combine greedily or let the machine
combiner do the evaluation. This is only supported on ARM64.
- It gives preference to throughput over latency: the heuristic used is
to combine always in loops. The targets decides whether the machine
combiner should optimize for throughput or latency.
- Supports for fmadd, f(n)msub, fmla, fmls patterns
- On by default at O3 ffast-math
llvm-svn: 267098
This removes the interfaces added (and not yet complete) to support
lazy reading of summaries. This support is not expected to be needed
since we are moving to a model where the full index is only being
traversed in the thin link step, instead of the back ends.
(The second part of this that I plan to do next is remove the
GlobalValueInfo from the ModuleSummaryIndex - it was mostly needed to
support lazy parsing of summaries. The index can instead reference the
summary structures directly.)
llvm-svn: 267097
We'd disabled them on x86 because back in the early days some host tools
couldn't handle the new load commands. This no longer holds: anyone capable of
deploying Clang should be able to deploy its copies of ar/ranlib/etc.
rdar://25254790
llvm-svn: 267075
When printing the properties required by a pass, only print the
properties that are set, and not those that are clear (only properties
that are set are verified, clear properties are "don't-care").
llvm-svn: 267070
Summary:
Adds an instrumentation pass for the new EfficiencySanitizer ("esan")
performance tuning family of tools. Multiple tools will be supported
within the same framework. Preliminary support for a cache fragmentation
tool is included here.
The shared instrumentation includes:
+ Turn mem{set,cpy,move} instrinsics into library calls.
+ Slowpath instrumentation of loads and stores via callouts to
the runtime library.
+ Fastpath instrumentation will be per-tool.
+ Which memory accesses to ignore will be per-tool.
Reviewers: eugenis, vitalybuka, aizatsky, filcab
Subscribers: filcab, vkalintiris, pcc, silvas, llvm-commits, zhaoqin, kcc
Differential Revision: http://reviews.llvm.org/D19167
llvm-svn: 267058
Summary: As per title. This will help work on the C API.
Reviewers: Wallbraker, whitequark, joker.eph, echristo, rafael
Subscribers: joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D19173
llvm-svn: 267057
InstrProfSymtab::create can fail with instrprof_error::malformed, but
this error is silently dropped. Propagate the error up to the caller so
we fail early.
Eventually, I'd like to transition ProfileData over to the new Error
class so we can't ignore hard failures like this.
llvm-svn: 267055
splitting edges.
MachineBasicBlock::SplitCriticalEdges will crash if a nullptr would have
been passed for the Pass argument. Do not allow that by turning this
argument into a reference.
The alternative would have been to make the Pass a truly optional
argument, but although this is easy to do, I was afraid users using it
like this would not be aware the livness information, dominator tree and
such would silently be broken.
llvm-svn: 267052
PDB parsing code was hand-rolled into llvm-pdbdump. This patch moves the
parsing of this code into DebugInfoPDB and makes the dumper use this.
This is achieved by implementing the skeleton of RawPdbSession, the
non-DIA counterpart to the existing PDB read interface. None of the type /
source file / etc information is accessible yet, so this implementation is
not yet close to achieving parity with the DIA counterpart, but the
RawSession class simply holds a reference to a PDBFile class which handles
parsing the file format. Additionally a PDBStream class is introduced
which allows accessing the bytes of a particular stream in a PDB file.
Differential Revision: http://reviews.llvm.org/D19343
Reviewed By: majnemer
llvm-svn: 267049
Introduce canSplitCriticalEdge, so that clients can now query whether or
not a critical edge can be split without actually needing to split it.
This may be useful when gathering information for cost models for
instance.
llvm-svn: 267046
When custom lowered, this is not called if the store is custom
lowered. Move it to be a utility function so targets can
easily expand unaligned accesses when custom lowering.
llvm-svn: 267029
Instead of holding a mask, hold two value: the start index and the
length of the mapping. This is a more compact representation, although
less powerful. That being said, arbitrary masks would not have worked
for the generic so do not allow them in the first place.
llvm-svn: 267025
This patch implements a optimization bisect feature, which will allow optimizations to be selectively disabled at compile time in order to track down test failures that are caused by incorrect optimizations.
The bisection is enabled using a new command line option (-opt-bisect-limit). Individual passes that may be skipped call the OptBisect object (via an LLVMContext) to see if they should be skipped based on the bisect limit. A finer level of control (disabling individual transformations) can be managed through an addition OptBisect method, but this is not yet used.
The skip checking in this implementation is based on (and replaces) the skipOptnoneFunction check. Where that check was being called, a new call has been inserted in its place which checks the bisect limit and the optnone attribute. A new function call has been added for module and SCC passes that behaves in a similar way.
Differential Revision: http://reviews.llvm.org/D19172
llvm-svn: 267022
Summary:
IntrReadWriteArgMem simply becomes IntrArgMemOnly.
So there are fewer intrinsic properties that express their orthogonality
better, and correspond more closely to the corresponding IR attributes.
Suggested by: Philip Reames
Reviewers: joker.eph, reames, tstellarAMD
Subscribers: jholewinski, arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D19291
llvm-svn: 267021
"Into" was misleading. I am also planning to use this helper to look
for loop metadata and return the argument, so find seems like a better
name.
llvm-svn: 267013
Before this fix, DILexicalBlockFile entries were skipped only in some cases and were not in other cases.
Differential Revision: http://reviews.llvm.org/D18724
llvm-svn: 267004
A DenseMap doesn't store the hashes, so it needs to recompute them when
the table is resized.
In some applications the hashing cost is noticeable. That is the case
for example in lld for symbol names (StringRef).
This patch adds a templated structure that can wraps any value that can
go in a DenseMap and caches the hash.
llvm-svn: 266981
Summary:
The function importer already decided what symbols need to be pulled
in. Also these magically added ones will not be in the export list
for the source module, which can confuse the internalizer for
instance.
Reviewers: tejohnson, rafael
Subscribers: joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D19096
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266948
No matter what value you OR in to A, the result of (or A, B) is going to be UGE A. When A and B are positive, it's SGE too. If A is negative, OR'ing a value into it can't make it positive, but can increase its value closer to -1, therefore (or A, B) is SGE A. Working through all possible combinations produces this truth table:
```
A is
+, -, +/-
F F F + B is
T F ? -
? F ? +/-
```
The related optimizations are flipping the 'slt' for 'sge' which always NOTs the result (if the result is known), and swapping the LHS and RHS while swapping the comparison predicate.
There are more idioms left to implement (aren't there always!) but I've stopped here because any more would risk becoming unreasonable for reviewers.
llvm-svn: 266939
Produce another specific error message for a malformed Mach-O file when a symbol’s
string index is past the end of the string table. The existing test case in test/Object/macho-invalid.test
for macho-invalid-symbol-name-past-eof now reports the error with the message indicating
that a symbol at a specific index has a bad sting index and that bad string index value.
Again converting interfaces to Expected<> from ErrorOr<> does involve
touching a number of places. Where the existing code reported the error with a
string message or an error code it was converted to do the same. There is some
code for this that could be factored into a routine but I would like to leave that for
the code owners post-commit to do as they want for handling an llvm::Error. An
example of how this could be done is shown in the diff in
lib/ExecutionEngine/RuntimeDyld/RuntimeDyldImpl.h which had a Check() routine
already for std::error_code so I added one like it for llvm::Error .
Also there some were bugs in the existing code that did not deal with the
old ErrorOr<> return values. So now with Expected<> since they must be
checked and the error handled, I added a TODO and a comment:
“// TODO: Actually report errors helpfully” and a call something like
consumeError(NameOrErr.takeError()) so the buggy code will not crash
since needed to deal with the Error.
Note there fixes needed to lld that goes along with this that I will commit right after this.
So expect lld not to built after this commit and before the next one.
llvm-svn: 266919
Don't use std::vector<TrackingMDRef>, since (at least in some versions
of libc++) std::vector apparently copies values on grow operations
instead of moving them. Found this when I was temporarily deleting the
copy constructor for TrackingMDRef to investigate a performance
bottleneck.
llvm-svn: 266909
A ModuleSlotTracker can be created without actually being used (e.g.,
r266889 added one to the Verifier). Create the SlotTracker within it
lazily on the first call to ModuleSlotTracker::getMachine.
llvm-svn: 266902
Clients may call writeMergedModules before calling optimize, or call
compileOptimized without calling optimize. Make sure they don't sneak
past the verifier. This adds LTOCodeGenerator::verifyMergedModuleOnce,
and calls it from writeMergedModule, optimize, and codegenOptimized.
I couldn't find a good way to test this. I tried writing broken IR to
send into llvm-lto, but LTOCodeGenerator doesn't understand textual IR,
and assembler runs the verifier itself anyway. Checking in
valid-but-doesn't-verify bitcode here doesn't seem valuable.
llvm-svn: 266894
Speed up Verifier output by sharing a single ModuleSlotTracker for the
duration. There should be no functionality change here except for much
faster output when there's more than one statement.
Now the Verifier won't be traversing the full Metadata graph every time
it prints an error. The TypePrinter is still not shared, but that would
take some extra plumbing.
llvm-svn: 266889
Summary:
This patch prevents importing from (and therefore exporting from) any
module with a "llvm.used" local value. Local values need to be promoted
and renamed when importing, and their presense on the llvm.used variable
indicates that there are opaque uses that won't see the rename. One such
example is a use in inline assembly.
See also the discussion at:
http://lists.llvm.org/pipermail/llvm-dev/2016-April/098047.html
As part of this, move collectUsedGlobalVariables out of Transforms/Utils
and into IR/Module so that it can be used more widely. There are several
other places in LLVM that used copies of this code that can be cleaned
up as a follow on NFC patch.
Reviewers: joker.eph
Subscribers: pcc, llvm-commits, joker.eph
Differential Revision: http://reviews.llvm.org/D18986
llvm-svn: 266877
Summary:
LLVMAttribute has outlived its utility and is becoming a problem for C API users that what to use all the LLVM attributes. In order to help moving away from LLVMAttribute in a smooth manner, this diff introduce LLVMGetAttrKindIDInContext, which can be used instead of the enum values.
See D18749 for reference.
Reviewers: Wallbraker, whitequark, joker.eph, echristo, rafael
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D19081
llvm-svn: 266842
Summary:
This property is used to mark an intrinsic that only writes to memory, but
neither reads from memory nor has other side effects.
An example where this is useful is the llvm.amdgcn.buffer.store.format.*
intrinsic, which corresponds to a store instruction that goes through a special
buffer descriptor rather than through a plain pointer.
With this property, the intrinsic should still be handled as having side
effects at the LLVM IR level, but machine scheduling can make smarter
decisions.
Reviewers: tstellarAMD, arsenm, joker.eph, reames
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D18291
llvm-svn: 266826
Both AArch64 and ARM support llvm.<arch>.thread.pointer intrinsics that
just return the thread pointer. I have a pending patch that does the same
for SystemZ (D19054), and there are many more targets that could benefit
from one.
This patch merges the ARM and AArch64 intrinsics into a single target
independent one that will also be used by subsequent targets.
Differential Revision: http://reviews.llvm.org/D19098
llvm-svn: 266818
With this change, ideally IR pass can always generate llvm.stackguard
call to get the stack guard; but for now there are still IR form stack
guard customizations around (see getIRStackGuard()). Future SSP
customization should go through LOAD_STACK_GUARD.
There is a behavior change: stack guard values are not CSEed anymore,
since we should never reuse the value in case that it has been spilled (and
corrupted). See ssp-guard-spill.ll. This also cause the change of stack
size and codegen in X86 and AArch64 test cases.
Ideally we'd like to know if the guard created in llvm.stackprotector() gets
spilled or not. If the value is spilled, discard the value and reload
stack guard; otherwise reuse the value. This can be done by teaching
register allocator to know how to rematerialize LOAD_STACK_GUARD and
force a rematerialization (which seems hard), or check for spilling in
expandPostRAPseudo. It only makes sense when the stack guard is a global
variable, which requires more instructions to load. Anyway, this seems to go out
of the scope of the current patch.
llvm-svn: 266806
The functionality contained within getIntrinsicIDForCall is two-fold: it
checks if a CallInst's callee is a vectorizable intrinsic. If it isn't
an intrinsic, it attempts to map the call's target to a suitable
intrinsic.
Move the mapping functionality into getIntrinsicForCallSite and rename
getIntrinsicIDForCall to getVectorIntrinsicIDForCall while
reimplementing it in terms of getIntrinsicForCallSite.
llvm-svn: 266801
Add a new method, DICompositeType::buildODRType, that will create or
mutate the DICompositeType for a given ODR identifier, and use it in
LLParser and BitcodeReader instead of DICompositeType::getODRType.
The logic is as follows:
- If there's no node, create one with the given arguments.
- Else, if the current node is a forward declaration and the new
arguments would create a definition, mutate the node to match the
new arguments.
- Else, return the old node.
This adds a missing feature supported by the current DITypeIdentifierMap
(which I'm slowly making redudant). The only remaining difference is
that the DITypeIdentifierMap has a "the-last-one-wins" rule, whereas
DICompositeType::buildODRType has a "the-first-one-wins" rule.
For now I'm leaving behind DICompositeType::getODRType since it has
obvious, low-level semantics that are convenient for unit testing.
llvm-svn: 266786
This patch improves SimplifyCFG to catch cases like:
if (a < b) {
if (a > b) <- known to be false
unreachable;
}
Phabricator Revision: http://reviews.llvm.org/D18905
llvm-svn: 266767
Calling ValueMap::MD lazily constructs a ValueMap, which mallocs the
buckets. Instead of swapping constructed maps, move around the
underlying Optional<MDMapT>. This gets rid of some unnecessary malloc
traffic from r266579 (not that it showed up on a profile).
llvm-svn: 266761
Lift the API for debug info ODR type uniquing up a layer. Instead of
clients managing the map directly on the LLVMContext, add a static
method to DICompositeType called getODRType and handle the map in the
background. Also adds DICompositeType::getODRTypeIfExists, so far just
for convenience in the unit tests.
This simplifies the logic in LLParser and BitcodeReader. Because of
argument spam there are actually a few more lines of code now; I'll see
if I come up with a reasonable way to clean that up.
llvm-svn: 266742
Tighten up the API for debug info ODR type uniquing in LLVMContext. The
only reason to allow other DIType subclasses is to make the unit tests
prettier :/.
llvm-svn: 266737
Summary:
Need to use predecessors for reverse graph, successors for forward graph.
succ_iterator/pred_iterator are not compatible, this patch is all the work necessary to work around that (which is what everywhere else does). Not sure if there is a better way, so cc'ing some random folks to take a gander :)
Reviewers: dblaikie, qcolombet, echristo
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D18796
llvm-svn: 266718
Summary:
The `"patchable-function"` attribute can be used by an LLVM client to
influence LLVM's code generation in ways that makes the generated code
easily patchable at runtime (for instance, to redirect control).
Right now only one patchability scheme is supported,
`"prologue-short-redirect"`, but this can be expanded in the future.
Reviewers: joker.eph, rnk, echristo, dberris
Subscribers: joker.eph, echristo, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D19046
llvm-svn: 266715
As per David's review, rename everything in the new API for ODR type
uniquing of debug info.
ensureDITypeMap => enableDebugTypeODRUniquing
destroyDITypeMap => disableDebugTypeODRUniquing
hasDITypeMap => isODRUniquingDebugTypes
llvm-svn: 266713