isLegalAddressingMode() has recently gained the extra optional Instruction*
parameter, and therefore it can now do the job that previously only
isFoldableMemAccess() could do.
The SystemZ implementation of isLegalAddressingMode() has gained the
functionality of checking for offsets, which used to be done with
isFoldableMemAccess().
The isFoldableMemAccess() hook has been removed everywhere.
Review: Quentin Colombet, Ulrich Weigand
https://reviews.llvm.org/D35933
llvm-svn: 310463
limited batch updates.
Specifically, allow removing multiple reference edges starting from
a common source node. There are a few constraints that play into
supporting this form of batching:
1) The way updates occur during the CGSCC walk, about the most we can
functionally batch together are those with a common source node. This
also makes the batching simpler to implement, so it seems
a worthwhile restriction.
2) The far and away hottest function for large C++ files I measured
(generated code for protocol buffers) showed a huge amount of time
was spent removing ref edges specifically, so it seems worth focusing
there.
3) The algorithm for removing ref edges is very amenable to this
restricted batching. There are just both API and implementation
special casing for the non-batch case that gets in the way. Once
removed, supporting batches is nearly trivial.
This does modify the API in an interesting way -- now, we only preserve
the target RefSCC when the RefSCC structure is unchanged. In the face of
any splits, we create brand new RefSCC objects. However, all of the
users were OK with it that I could find. Only the unittest needed
interesting updates here.
How much does batching these updates help? I instrumented the compiler
when run over a very large generated source file for a protocol buffer
and found that the majority of updates are intrinsically updating one
function at a time. However, nearly 40% of the total ref edges removed
are removed as part of a batch of removals greater than one, so these
are the cases batching can help with.
When compiling the IR for this file with 'opt' and 'O3', this patch
reduces the total time by 8-9%.
Differential Revision: https://reviews.llvm.org/D36352
llvm-svn: 310450
In the refactor to merge the publics and globals stream, a bug
was introduced that wrote the wrong value for one of the fields
of the PublicsStreamHeader. This caused debugging in WinDbg
to break.
We had no way of dumping any of these fields, so in addition to
fixing the bug I've added dumping support for them along with a
test that verifies the correct value is written.
llvm-svn: 310439
The publics stream and globals stream are very similar. They both
contain a list of hash buckets that refer into a single shared stream,
the symbol record stream. Because of the need for each builder to manage
both an independent hash stream as well as a single shared record
stream, making the two builders be independent entities is not the right
design. This patch merges them into a single class, of which only a
single instance is needed to create all 3 streams. PublicsStreamBuilder
and GlobalsStreamBuilder are now merged into the single GSIStreamBuilder
class, which writes all 3 streams at once.
Note that this patch does not contain any functionality change. So we're
still not yet writing any records to the globals stream. All we're doing
is making it so that when we do start writing records to the globals,
this refactor won't have to be part of that patch.
Differential Revision: https://reviews.llvm.org/D36489
llvm-svn: 310438
This reverts commit r310115.
It causes a linker failure for the one of the unittests of AArch64 on one
of the linux bot:
http://lab.llvm.org:8011/builders/clang-ppc64le-linux-multistage/builds/3429
: && /home/fedora/gcc/install/gcc-7.1.0/bin/g++ -fPIC
-fvisibility-inlines-hidden -Werror=date-time -std=c++11 -Wall -W
-Wno-unused-parameter -Wwrite-strings -Wcast-qual
-Wno-missing-field-initializers -pedantic -Wno-long-long
-Wno-maybe-uninitialized -Wdelete-non-virtual-dtor -Wno-comment
-ffunction-sections -fdata-sections -O2
-L/home/fedora/gcc/install/gcc-7.1.0/lib64 -Wl,-allow-shlib-undefined
-Wl,-O3 -Wl,--gc-sections
unittests/Target/AArch64/CMakeFiles/AArch64Tests.dir/InstSizes.cpp.o -o
unittests/Target/AArch64/AArch64Tests
lib/libLLVMAArch64CodeGen.so.6.0.0svn lib/libLLVMAArch64Desc.so.6.0.0svn
lib/libLLVMAArch64Info.so.6.0.0svn lib/libLLVMCodeGen.so.6.0.0svn
lib/libLLVMCore.so.6.0.0svn lib/libLLVMMC.so.6.0.0svn
lib/libLLVMMIRParser.so.6.0.0svn lib/libLLVMSelectionDAG.so.6.0.0svn
lib/libLLVMTarget.so.6.0.0svn lib/libLLVMSupport.so.6.0.0svn -lpthread
lib/libgtest_main.so.6.0.0svn lib/libgtest.so.6.0.0svn -lpthread
-Wl,-rpath,/home/buildbots/ppc64le-clang-multistage-test/clang-ppc64le-multistage/stage1/lib
&& :
unittests/Target/AArch64/CMakeFiles/AArch64Tests.dir/InstSizes.cpp.o:(.toc+0x0):
undefined reference to `vtable for llvm::LegalizerInfo'
unittests/Target/AArch64/CMakeFiles/AArch64Tests.dir/InstSizes.cpp.o:(.toc+0x8):
undefined reference to `vtable for llvm::RegisterBankInfo'
The particularity of this bot is that it is built with
BUILD_SHARED_LIBS=ON
However, I was not able to reproduce the problem so far.
Reverting to unblock the bot.
llvm-svn: 310425
When a new phi is generated for scalarpre of an expression, the phiTranslate cache
will become stale: Before PRE, the candidate expression must not be available in a
predecessor block, and phitranslate will cache the information. After PRE, the
expression will become available in all predecessor blocks, so the related entries
in phiTranslate cache becomes stale. The patch will simply remove the stale entries
so phiTranslate can be recomputed next time.
The stale entries in phitranslate cache will not affect correctness but will cause
missing PRE opportunity for later instructions.
Differential Revision: https://reviews.llvm.org/D36124
llvm-svn: 310421
Summary:
Now that we've made all the necessary backend changes, we can add a new
intrinsic which exposes the new capabilities to IR producers. Since
llvm.amdgpu.update.dpp is a strict superset of llvm.amdgpu.mov.dpp, we
should deprecate the former. We also add tests for all the functionality
that was added in previous changes, now that we can access it via an IR
construct.
Reviewers: tstellar, arsenm
Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye
Differential Revision: https://reviews.llvm.org/D34718
llvm-svn: 310399
The compiler outputs PROC32_ID symbols into the object files
for functions, and these symbols have an embedded type index
which, when copied to the PDB, refer to the IPI stream. However,
the symbols themselves are also converted into regular symbols
(e.g. S_GPROC32_ID -> S_GPROC32), and type indices in the regular
symbol records refer to the TPI stream. So this patch applies
two fixes to function records.
1. It converts ID symbols to the proper non-ID record type.
2. After remapping the type index from the object file's index
space to the PDB file/IPI stream's index space, it then
remaps that index to the TPI stream's index space by.
Besides functions, during the remapping process we were also
discarding symbol record types which we did not recognize.
In particular, we were discarding S_BPREL32 records, which is
what MSVC uses to describe local variables on the stack. So
this patch fixes that as well by copying them to the PDB.
Differential Revision: https://reviews.llvm.org/D36426
llvm-svn: 310394
I want to reuse this code in SimplifyDemandedBits handling of Add/Sub. This will make that easier.
Wonder if we should use it in SelectionDAG's computeKnownBits too.
Differential Revision: https://reviews.llvm.org/D36433
llvm-svn: 310378
Summary:
This patch enables the import of rules containing 'imm' operands that do not
constrain the acceptable values using predicates. Support for ImmLeaf will
arrive in a later patch.
Depends on D35681
Reviewers: ab, t.p.northover, qcolombet, rovka, aditya_nandakumar
Reviewed By: rovka
Subscribers: kristof.beyls, javed.absar, igorb, llvm-commits
Differential Revision: https://reviews.llvm.org/D35833
llvm-svn: 310343
Executables may not contain a load config, and clients should be able to
test for nullability. Previously we'd return uninitialized memory. Now
getLoadConfig32/64 return valid pointers or null.
Fixes PR34108
llvm-svn: 310308
Patch tries to improve two-pass vectorization analysis, existing in SLP vectorizer. What it does:
1. Defines key nodes, that are the vectorization roots. Previously vectorization started if StoreInst or ReturnInst is found. For now, the vectorization started for all Instructions with no users and void types (Terminators, StoreInst) + CallInsts.
2. CmpInsts, InsertElementInsts and InsertValueInsts are stored in the
array. This array is processed only after the vectorization of the
first-after-these instructions key node is finished. Vectorization goes
in reverse order to try to vectorize as much code as possible.
Reviewers: mzolotukhin, Ayal, mkuper, gilr, hfinkel, RKSimon
Subscribers: ashahid, anemet, RKSimon, mssimpso, llvm-commits
Differential Revision: https://reviews.llvm.org/D29826
llvm-svn: 310260
Summary:
Patch tries to improve two-pass vectorization analysis, existing in SLP vectorizer. What it does:
1. Defines key nodes, that are the vectorization roots. Previously vectorization started if StoreInst or ReturnInst is found. For now, the vectorization started for all Instructions with no users and void types (Terminators, StoreInst) + CallInsts.
2. CmpInsts, InsertElementInsts and InsertValueInsts are stored in the array. This array is processed only after the vectorization of the first-after-these instructions key node is finished. Vectorization goes in reverse order to try to vectorize as much code as possible.
Reviewers: mzolotukhin, Ayal, mkuper, gilr, hfinkel, RKSimon
Subscribers: ashahid, anemet, RKSimon, mssimpso, llvm-commits
Differential Revision: https://reviews.llvm.org/D29826
llvm-svn: 310255
POD-like and we can just splat the empty key across memory.
Sadly we can't optimize the normal loop well enough because we can't
turn the conditional store into an unconditional store according to the
memory model.
This loop actually showed up in a profile of code that was calling clear
as a serious source of time. =[
llvm-svn: 310189
After the previous series of patches, this is now trivial and deletes
a pretty astonishing amount of complexity. This has been a long time
coming, as the move toward a PO sequence of RefSCCs started eroding the
underlying use cases for this half of the data structure.
Among the biggest advantages here is that now there aren't two
independent data structures that need to stay in sync.
Some of my profiling has also indicated that updating the parent sets
was among the most expensive parts of the lazy call graph. Eliminating
it whole sale is likely to be a nice win in terms of compile time.
Last but not least, I had discussed with some folks previously keeping
it around for asserts and other correctness checking, but once the
fundamentals of the parent and child checking were implemented without
the parent sets their value in correctness checking was tiny and no
where near worth the cost of the complexity required to keep everything
up-to-date.
llvm-svn: 310171
isDescendantOf methods on RefSCCs in terms of the forward edges rather
than the parent sets.
This is technically slower, but probably not interestingly slower, and
all of these routines were already so expensive that they're guarded
behind both !NDEBUG and EXPENSIVE_CHECKS.
This removes another non-critical usage of parent sets.
I've also added some comments to try and help clarify to any potential
users the costs of these routines. They're mostly useful for debugging,
asserts, or other queries.
llvm-svn: 310170
walk over the parent set.
When removing a single function from the call graph, we previously would
walk the entire RefSCC's parent set and then walk every outgoing edge
just to find the ones to remove. In addition to this being quite high
complexity in theory, it is also the last fundamental use of the parent
sets.
With this change, when we remove a function we transform the node
containing it to be recognizably "dead" and then teach the edge
iterators to recognize edges to such nodes and skip them the same way
they skip null edges.
We can't move fully to using "dead" nodes -- when disconnecting two live
nodes we need to null out the edge. But the complexity this adds to the
edge sequence isn't too bad and the simplification of lazily handling
this seems like a significant win.
llvm-svn: 310169
Add memory synchronization semantics to LSE Atomics.
The memory semantics feature will be added in a subsequent patch.
In this patch, several corrections were added to the existing LSE Atomics
implementation, based on the ARM Errata D11904 from 05/12/2017.
Patch by: steleman
Differential Revision: https://reviews.llvm.org/D35319
llvm-svn: 310167
The definition of 'false' here was already pretty vague and debatable,
and I'm about to add another potential 'false' that would actually make
much more sense in a bool operator. Especially given how rarely this is
used, a nicely named method seems better.
llvm-svn: 310165
This extends the native reader to enable llvm-pdbutil to list the enums in a
PDB and it includes a simple test. It does not yet list the values in the
enumerations, which requires an actual implementation of
NativeEnumSymbol::FindChildren.
To exercise this code, use a command like:
llvm-pdbutil pretty -native -enums foo.pdb
Differential Revision: https://reviews.llvm.org/D35738
llvm-svn: 310144
Its sole purpose was to avoid spreading around ifdefs related to
building global-isel. Since r309990, GlobalISel is not optional anymore,
thus, we can get rid of this mechanism all together.
NFC.
llvm-svn: 310115
Summary:
This intrinsic lets us set inactive lanes to an identity value when
implementing wavefront reductions. In combination with Whole Wavefront
Mode, it lets inactive lanes be skipped over as required by GLSL/Vulkan.
Lowering the intrinsic needs to happen post-RA so that RA knows that the
destination isn't completely overwritten due to the EXEC shenanigans, so
we need another pseudo-instruction to represent the un-lowered
intrinsic.
Reviewers: tstellar, arsenm
Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye
Differential Revision: https://reviews.llvm.org/D34719
llvm-svn: 310088
Summary:
Whole Wavefront Wode (WWM) is similar to WQM, except that all of the
lanes are always enabled, regardless of control flow. This is required
for implementing wavefront reductions in non-uniform control flow, where
we need to use the inactive lanes to propagate intermediate results, so
they need to be enabled. We need to propagate WWM to uses (unless
they're explicitly marked as exact) so that they also propagate
intermediate results correctly. We do the analysis and exec mask munging
during the WQM pass, since there are interactions with WQM for things
that require both WQM and WWM. For simplicity, WWM is entirely
block-local -- blocks are never WWM on entry or exit of a block, and WWM
is not propagated to the block level. This means that computations
involving WWM cannot involve control flow, but we only ever plan to use
WWM for a few limited purposes (none of which involve control flow)
anyways.
Shaders can ask for WWM using the @llvm.amdgcn.wwm intrinsic. There
isn't yet a way to turn WWM off -- that will be added in a future
change.
Finally, it turns out that turning on inactive lanes causes a number of
problems with register allocation. While the best long-term solution
seems like teaching LLVM's register allocator about predication, for now
we need to add some hacks to prevent ourselves from getting into trouble
due to constraints that aren't currently expressed in LLVM. For the gory
details, see the comments at the top of SIFixWWMLiveness.cpp.
Reviewers: arsenm, nhaehnle, tpr
Subscribers: kzhuravl, wdng, mgorny, yaxunl, dstuttard, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D35524
llvm-svn: 310087
Summary:
Previously, we assumed that certain types of instructions needed WQM in
pixel shaders, particularly DS instructions and image sampling
instructions. This was ok because with OpenGL, the assumption was
correct. But we want to start using DPP instructions for derivatives as
well as other things, so the assumption that we can infer whether to use
WQM based on the instruction won't continue to hold. This intrinsic lets
frontends like Mesa indicate what things need WQM based on their
knowledge of the API, rather than second-guessing them in the backend.
We need to keep around the old method of enabling WQM, but eventually we
should remove it once Mesa catches up. For now, this will let us use DPP
instructions for computing derivatives correctly.
Reviewers: arsenm, tpr, nhaehnle
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, llvm-commits, t-tye
Differential Revision: https://reviews.llvm.org/D35167
llvm-svn: 310085
Summary:
It was added to support clang warnings about includes with case
mismatches, but it ended up not being necessary.
Reviewers: twoh, rafael
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D36328
llvm-svn: 310078
Adds function attributes to index: ReadNone, ReadOnly, NoRecurse, NoAlias. This attributes will be used for future ThinLTO optimizations that will propagate function attributes across modules.
llvm-svn: 310061
Summary:
This commit allows matchSelectPattern to recognize clamp of float
arguments in the presence of FMF the same way as already done for
integers.
This case is a little different though. With integers, given the
min/max pattern is recognized, DAGBuilder starts selecting MIN/MAX
"automatically". That is not the case for float, because for them only
full FMINNAN/FMINNUM/FMAXNAN/FMAXNUM ISD nodes exist and they do care
about NaNs. On the other hand, some backends (e.g. X86) have only
FMIN/FMAX nodes that do not care about NaNS and the former NAN/NUM
nodes are illegal thus selection is not happening. So I decided to do
such kind of transformation in IR (InstCombiner) instead of
complicating the logic in the backend.
Reviewers: spatel, jmolloy, majnemer, efriedma, craig.topper
Reviewed By: efriedma
Subscribers: hiraditya, javed.absar, n.bozhenov, llvm-commits
Patch by Andrei Elovikov <andrei.elovikov@intel.com>
Differential Revision: https://reviews.llvm.org/D33186
llvm-svn: 310054
The full story is in the comments:
// Do not attempt to close stdout or stderr. We used to try to maintain the
// property that tools that support writing file to stdout should not also
// write informational output to stdout, but in practice we were never able to
// maintain this invariant. Many features have been added to LLVM and clang
// (-fdump-record-layouts, optimization remarks, etc) that print to stdout, so
// users must simply be aware that mixed output and remarks is a possibility.
NFC, I am just updating comments to reflect reality.
llvm-svn: 310016
This is similar to what we are doing in "regular" SROA and creates
DW_OP_LLVM_fragment operations to describe the resulting variables.
rdar://problem/33654891
llvm-svn: 310014
Summary:
Detect when the working set size of a profiled application is huge,
by comparing the number of counts required to reach the hot percentile
in the profile summary to a large threshold*.
When the working set size is determined to be huge, disable peeling
to avoid bloating the working set further.
*Note that the selected threshold (15K) is significantly larger than the
largest working set value in SPEC cpu2006 (which is gcc at around 11K).
Reviewers: davidxl
Subscribers: mehdi_amini, mzolotukhin, eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D36288
llvm-svn: 310005
Summary:
This increases the inlining threshold for hot callsites. Hotness is
defined in terms of block frequency of the callsite relative to the
caller's entry block's frequency. Since this requires BFI in the
inliner, this only affects the new PM pipeline. This is enabled by
default at -O3.
This improves the performance of some internal benchmarks. Notably, an
internal benchmark for Gipfeli compression
(https://github.com/google/gipfeli) improves by ~7%. Povray in SPEC2006
improves by ~2.5%. I am running more experiments and will update the
thread if other benchmarks show improvement/regression.
In terms of text size, LLVM test-suite shows an 1.22% text size
increase. Diving into the results, 13 of the benchmarks in the
test-suite increases by > 10%. Most of these are small, but
Adobe-C++/loop_unroll (17.6% increases) and tramp3d(20.7% size increase)
have >250K text size. On a large application, the text size increases by
2%
Reviewers: chandlerc, davidxl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D36199
llvm-svn: 309994
Summary:
PDB section contributions are supposed to use output section indices and
offsets, not input section indices and offsets.
This allows the debugger to look up the index of the module that it
should look up in the modules stream for symbol information. With this
change, windbg can now find line tables, but it still cannot print local
variables.
Fixes PR34048
Reviewers: zturner
Subscribers: hiraditya, ruiu, llvm-commits
Differential Revision: https://reviews.llvm.org/D36285
llvm-svn: 309987
Summary:
Peeling should not occur during the full unrolling invocation early
in the pipeline, but rather later with partial and runtime loop
unrolling. The later loop unrolling invocation will also eventually
utilize profile summary and branch frequency information, which
we would like to use to control peeling. And for ThinLTO we want
to delay peeling until the backend (post thin link) phase, just as
we do for most types of unrolling.
Ensure peeling doesn't occur during the full unrolling invocation
by adding a parameter to the shared implementation function, similar
to the way partial and runtime loop unrolling are disabled.
Performance results for ThinLTO suggest this has a neutral to positive
effect on some internal benchmarks.
Reviewers: chandlerc, davidxl
Subscribers: mzolotukhin, llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D36258
llvm-svn: 309966
The patch rL309080 was reverted because it did not clean up the cache on "forgetValue"
method call. This patch re-enables this change, adds the missing check and introduces
two new unit tests that make sure that the cache is cleaned properly.
Differential Revision: https://reviews.llvm.org/D36087
llvm-svn: 309925
IMHO it is an antipattern to have a enum value that is Default.
At any given piece of code it is not clear if we have to handle
Default or if has already been mapped to a concrete value. In this
case in particular, only the target can do the mapping and it is nice
to make sure it is always done.
This deletes the two default enum values of CodeModel and uses an
explicit Optional<CodeModel> when it is possible that it is
unspecified.
llvm-svn: 309911
The CoverageMapping::getInstantiations() API retrieved all function
records corresponding to functions with more than one instantiation (e.g
template functions with multiple specializations). However, there was no
simple way to determine *which* function a given record was an
instantiation of. This was an oversight, since it's useful to aggregate
coverage information over all instantiations of a function.
llvm-cov works around this by building a mapping of source locations to
instantiation sets, but this duplicates logic that libCoverage already
has (see FunctionInstantiationSetCollector).
This change adds a new API, CoverageMapping::getInstantiationGroups(),
which returns a list of InstantiationGroups. A group contains records
for each instantiation of some particular function, and also provides
utilities to get the total execution count within the group, the source
location of the common definition, etc.
This lets removes some hacky logic in llvm-cov by reusing
FunctionInstantiationSetCollector and makes the CoverageMapping API
friendlier for other clients.
llvm-svn: 309904
The PDB reserves certain blocks for the FPM that describe which
blocks in the file are allocated and which are free. We weren't
filling that out at all, and in some cases we were even stomping
it with incorrect data. This patch writes a correct FPM.
Differential Revision: https://reviews.llvm.org/D36235
llvm-svn: 309896
Recently problems have been discovered in the way we write the FPM
(free page map). In order to fix this, we first need to establish
a baseline about what a correct FPM looks like using an MSVC
generated PDB, so that we can then make our own generated PDBs
match. And in order to do this, the dumper needs a mode where it
can dump an FPM so that we can write tests for it.
This patch adds a command to dump the FPM, as well as a test against
a known-good PDB.
llvm-svn: 309894
Summary:
This is largely NFC*, in preparation for utilizing ProfileSummaryInfo
and BranchFrequencyInfo analyses. In this patch I am only doing the
splitting for the New PM, but I can do the same for the legacy PM as
a follow-on if this looks good.
*Not NFC since for partial unrolling we lose the updates done to the
loop traversal (adding new sibling and child loops) - according to
Chandler this is not very useful for partial unrolling, but it also
means that the debugging flag -unroll-revisit-child-loops no longer
works for partial unrolling.
Reviewers: chandlerc
Subscribers: mehdi_amini, mzolotukhin, eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D36157
llvm-svn: 309886
I was surprised to see the code model being passed to MC. After all,
it assembles code, it doesn't create it.
The one place it is used is in the expansion of .cfi directives to
handle .eh_frame being more that 2gb away from the code.
As far as I can tell, gnu assembler doesn't even have an option to
enable this. Compiling a c file with gcc -mcmodel=large produces a
regular looking .eh_frame. This is probably because in practice linker
parse and recreate .eh_frames.
In llvm this is used because the JIT can place the code and .eh_frame
very far apart. Ideally we would fix the jit and delete this
option. This is hard.
Apart from confusion another problem with the current interface is
that most callers pass CodeModel::Default, which is bad since MC has
no way to map it to the target default if it actually needed to.
This patch then replaces the argument with a boolean with a default
value. The vast majority of users don't ever need to look at it. In
fact, only CodeGen and llvm-mc use it and llvm-mc just to enable more
testing.
llvm-svn: 309884
Followup to r309570, fixing it slightly differently (ranges_base and
addr_base should never be read from a DWO file - so there shouldn't be
any issue with 'overriding' the values - conditionalize the code and
assert that the values aren't being overriden).
llvm-svn: 309879
Summary:
This patch makes LoopDeletion use the incremental DominatorTree API.
We modify LoopDeletion to perform the deletion in 5 steps:
1. Create a new dummy edge from the preheader to the exit, by adding a conditional branch.
2. Inform the DomTree about the new edge.
3. Remove the conditional branch and replace it with an unconditional edge to the exit. This removes the edge to the loop header, making it unreachable.
4. Inform the DomTree about the deleted edge.
5. Remove the unreachable block from the function.
Creating the dummy conditional branch is necessary to perform incremental DomTree update.
We should consider using the batch updater when it's ready.
Reviewers: dberlin, davide, grosser, sanjoy
Reviewed By: dberlin, grosser
Subscribers: mzolotukhin, llvm-commits
Differential Revision: https://reviews.llvm.org/D35391
llvm-svn: 309850
This should enable us to test the generation of target-specific constant
pools, e.g. for ARM:
constants:
- id: 0
value: 'g(GOT_PREL)-(LPC0+8-.)'
alignment: 4
isTargetSpecific: true
I intend to use this to test PIC support in GlobalISel for ARM.
This is difficult to test outside of that context, since the existing
MIR tests usually rely on parser support as well, and that seems a bit
trickier to add. We could try to add a unit test, but the setup for that
seems rather convoluted and overkill.
We do test however that the parser reports a nice error when
encountering a target-specific constant pool.
Differential Revision: https://reviews.llvm.org/D36092
llvm-svn: 309806
Summary:
Fix a bug discovered in an out-of-tree target where memoperands from
pseudo-instructions that weren't part of the match were being merged into the
result instructions as part of GIR_MergeMemOperands.
This bug was caused by a change to the handling of State.MIs between rules when
the state machine tables were fused into a single table. Previously, each rule
would reset State.MIs using State.MIs.resize(1) but this is no longer done, as a
result stale data is occasionally left in some elements of State.MIs. Most
opcodes aren't affected by this but GIR_MergeMemOperands merges all memoperands
from the intructions recorded in State.MIs into the result instruction.
Suppose for example, we processed but rejected the following pattern:
(signextend (load x))
at this point, State.MIs contains the signextend and the load. Now suppose we
process and accept this pattern:
(add x, y)
at this point, State.MIs contains the add as well as the (now irrelevant) load.
When GIR_MergeMemOperands is processed, the memoperands from that irrelevant
load will be merged into the result instruction even though it was not part of
the match.
Bringing back the State.MIs.resize(1) would fix the problem but it would limit
our ability to optimize the table in the future. Instead, this patch fixes the
problem by explicitly stating which instructions should be merged into the result.
There's no direct test case in this commit because a test case would be very brittle.
However, at the time of writing this should fix the failures in
http://green.lab.llvm.org/green/job/Compiler_Verifiers_GlobalISEL/ as well as a
failure in test/CodeGen/ARM/GlobalISel/arm-isel.ll when expensive checks are enabled.
Reviewers: ab, t.p.northover, qcolombet, rovka, aditya_nandakumar
Subscribers: fhahn, kristof.beyls, igorb, llvm-commits
Differential Revision: https://reviews.llvm.org/D36094
llvm-svn: 309804
infinite-inlining across multiple runs of the inliner by keeping a tiny
history of internal-to-SCC inlining decisions.
This is still a bit gross, but I don't yet have any fundamentally better
ideas and numerous people are blocked on this to use new PM and ThinLTO
together.
The core of the idea is to detect when we are about to do an inline that
has a chance of re-splitting an SCC which we have split before with
a similar inlining step. That is a critical component in the inlining
forming a cycle and so far detects all of the various cyclic patterns
I can come up with as well as the original real-world test case (which
comes from a ThinLTO build of libunwind).
I've added some tests that I think really demonstrate what is going on
here. They are essentially state machines that march the inliner through
various steps of a cycle and check that we stop when the cycle is closed
and that we actually did do inlining to form that cycle.
A lot of thanks go to Eric Christopher and Sanjoy Das for the help
understanding this issue and improving the test cases.
The biggest "yuck" here is the layering issue -- the CGSCC pass manager
is providing somewhat magical state to the inliner for it to use to make
itself converge. This isn't great, but I don't honestly have a lot of
better ideas yet and at least seems nicely isolated.
I have tested this patch, and it doesn't block *any* inlining on the
entire LLVM test suite and SPEC, so it seems sufficiently narrowly
targeted to the issue at hand.
We have come up with hypothetical issues that this patch doesn't cover,
but so far none of them are practical and we don't have a viable
solution yet that covers the hypothetical stuff, so proceeding here in
the interim. Definitely an area that we will be back and revisiting in
the future.
Differential Revision: https://reviews.llvm.org/D36188
llvm-svn: 309784
In the last half-dozen commits to LLVM I removed code that became dead
after removing the offset parameter from llvm.dbg.value gradually
proceeding from IR towards the backend. Before I can move on to
DwarfDebug and friends there is one last side-called offset I need to
remove: This patch modifies PrologEpilogInserter's use of the
DBG_VALUE's offset argument to use a DIExpression instead. Because the
PrologEpilogInserter runs at the Machine level I had to play a little
trick with a named llvm.dbg.mir node to get the DIExpressions to print
in MIR dumps (which print the llvm::Module followed by the
MachineFunction dump).
I also had to add rudimentary DwarfExpression support to CodeView and
as a side-effect also fixed a bug (CodeViewDebug::collectVariableInfo
was supposed to give up on variables with complex DIExpressions, but
would fail to do so for fragments, which are also modeled as
DIExpressions).
With this last holdover removed we will have only one canonical way of
representing offsets to debug locations which will simplify the code
in DwarfDebug (and future versions of CodeViewDebug once it starts
handling more complex expressions) and make it easier to reason about.
This patch is NFC-ish: All test case changes are for assembler
comments and the binary output does not change.
rdar://problem/33580047
Differential Revision: https://reviews.llvm.org/D36125
llvm-svn: 309751
The coverage tool needs to know which slice to look at when it's handed
a universal binary. Some projects need to look at aggregate coverage
reports for a variety of slices in different binaries: this patch adds
support for these kinds of projects to llvm-cov.
rdar://problem/33579007
llvm-svn: 309747
Pretty sure this will automatically promoted to match the type of the other operand of the subtract. There's plenty of other similar code around here without this cast.
llvm-svn: 309653
Stack coloring pass need to maintain AliasAnalysis information when merging stack slots of different types.
Actually, there is a FIXME comment in StackColoring.cpp
// FIXME: In order to enable the use of TBAA when using AA in CodeGen,
// we'll also need to update the TBAA nodes in MMOs with values
// derived from the merged allocas.
But, TBAA has been already enabled in CodeGen without fixing this pass.
The incorrect TBAA metadata results in recent failures in bootstrap test on ppc64le (PR33928) by allowing unsafe instruction scheduling.
Although we observed the problem on ppc64le, this is a platform neutral issue.
This patch makes the stack coloring pass maintains AliasAnalysis information when merging multiple stack slots.
llvm-svn: 309651
Summary:
Adding part of the changes in D30369 (needed to make progress):
Current patch updates AliasAnalysis and MemoryLocation, but does _not_ clean up MemorySSA.
Original summary from D30369, by dberlin:
Currently, we have instructions which affect memory but have no memory
location. If you call, for example, MemoryLocation::get on a fence,
it asserts. This means things specifically have to avoid that. It
also means we end up with a copy of each API, one taking a memory
location, one not.
This starts to fix that.
We add MemoryLocation::getOrNone as a new call, and reimplement the
old asserting version in terms of it.
We make MemoryLocation optional in the (Instruction, MemoryLocation)
version of getModRefInfo, and kill the old one argument version in
favor of passing None (it had one caller). Now both can handle fences
because you can just use MemoryLocation::getOrNone on an instruction
and it will return a correct answer.
We use all this to clean up part of MemorySSA that had to handle this difference.
Note that literally every actual getModRefInfo interface we have could be made private and replaced with:
getModRefInfo(Instruction, Optional<MemoryLocation>)
and
getModRefInfo(Instruction, Optional<MemoryLocation>, Instruction, Optional<MemoryLocation>)
and delegating to the right ones, if we wanted to.
I have not attempted to do this yet.
Reviewers: dberlin, davide, dblaikie
Subscribers: sanjoy, hfinkel, chandlerc, llvm-commits
Differential Revision: https://reviews.llvm.org/D35441
llvm-svn: 309641
We don't write any actual symbols to this stream yet, but for
now we just create the stream and hook it up to the appropriate
places and give it a valid header.
Differential Revision: https://reviews.llvm.org/D35290
llvm-svn: 309608
This patch refactors the code used in llc such that all the users of the
addPassesToEmitFile API have access to a homogeneous way of handling
start/stop-after/before options right out of the box.
In particular, just invoking addPassesToEmitFile will set the proper
pipeline without additional effort (modulo parsing a .mir file if the
start-before/after options are used.
NFC.
Differential Revision: https://reviews.llvm.org/D30913
llvm-svn: 309599
I was a bit lazy when I first implemented this & skipped the index
lookup - obviously for large files this becomes pretty crucial, so here
we go, do the index lookup. Speeds up large DWP symbolizing by... lots.
(20m -> 20s, actually, maybe more in a release build (that was a release
build without index lookup, compared to a debug/non-release build with
the index usage))
llvm-svn: 309507
If you've archived the DWP file somewhere it's probably useful to be
able to just tell llvm-symbolizer where it is when you're symbolizing
stack traces from the binary.
This only provides a mechanism for specifying a single DWP file, good if
you're symbolizing a program with a single DWP file, but it's likely if
the program is dynamically linked that you might have a DWP for each
dynamic library - in which case this feature won't help (at least as
it's surfaced in llvm-symbolizer for now) - in theory it could be
extended to specify a collection of DWP files that could all be
consulted for split CU hash resolution.
llvm-svn: 309498
Summary:
Now that SamplePGOSupport is part of PGOOpt, there are several places that need tweaking:
1. AddDiscriminator pass should *not* be invoked at ThinLTOBackend (as it's already invoked in the PreLink phase)
2. addPGOInstrPasses should only be invoked when either ProfileGenFile or ProfileUseFile is non-empty.
3. SampleProfileLoaderPass should only be invoked when SampleProfileFile is non-empty.
4. PGOIndirectCallPromotion should only be invoked in ProfileUse phase, or in ThinLTOBackend of SamplePGO.
Reviewers: chandlerc, tejohnson, davidxl
Reviewed By: chandlerc
Subscribers: sanjoy, mehdi_amini, eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D36040
llvm-svn: 309478
This commit
- Removes IsTailCall and replaces it with a target-defined unsigned
- Refactors getOutliningCallOverhead and getOutliningFrameOverhead so that they don't use IsTailCall
- Adds a call class + frame class classification to OutlinedFunction and Candidate respectively
This accomplishes a couple things.
Firstly, we don't need the notion of *tail call* in the general outlining algorithm.
Secondly, we now can have different "outlining classes" for each candidate within a set of candidates.
This will make it easy to add new ways to outline sequences for certain targets and dynamically choose
an appropriate cost model for a sequence depending on the context that that sequence lives in.
Ultimately, this should get us closer to being able to do something like, say avoid saving the link
register when outlining AArch64 instructions.
llvm-svn: 309475
This diff removes the second argument of the method MachOObjectFile::exports.
In all in-tree uses this argument is equal to "this" and
without this argument the interface seems to be cleaner.
Test plan: make check-all
llvm-svn: 309462
There is no situation where this rarely-used argument cannot be
substituted with a DIExpression and removing it allows us to simplify
the DWARF backend. Note that this patch does not yet remove any of
the newly dead code.
rdar://problem/33580047
Differential Revision: https://reviews.llvm.org/D35951
llvm-svn: 309426
Recommit after workaround the bug PR31652.
Three bugs fixed in previous recommits: The first one is to use CurrentBlock
instead of PREInstr's Parent as param of performScalarPREInsertion because
the Parent of a clone instruction may be uninitialized. The second one is stop
PRE when CurrentBlock to its predecessor is a backedge and an operand of CurInst
is defined inside of CurrentBlock. The same value defined inside of loop in last
iteration can not be regarded as available. The third one is an out-of-bound
array access in a flipped if guard.
Right now scalarpre doesn't have phi-translate support, so it will miss some
simple pre opportunities. Like the following testcase, current scalarpre cannot
recognize the last "a * b" is fully redundent because a and b used by the last
"a * b" expr are both defined by phis.
long a[100], b[100], g1, g2, g3;
__attribute__((pure)) long goo();
void foo(long a, long b, long c, long d) {
g1 = a * b;
if (__builtin_expect(g2 > 3, 0)) {
a = c;
b = d;
g2 = a * b;
}
g3 = a * b; // fully redundant.
}
The patch adds phi-translate support in scalarpre. This is only a temporary
solution before the newpre based on newgvn is available.
Differential Revision: https://reviews.llvm.org/D32252
llvm-svn: 309397
This adds support for the CFI pseudo-op return_column. This specifies
the frame table column which contains the return address.
Addresses PR33953!
llvm-svn: 309360
This reverts commit r309080. The patch needs to clear out the
ScalarEvolution::ExitLimits cache in forgetMemoizedResults.
I've replied on the commit thread for the patch with more details.
llvm-svn: 309357
This is some more cleanup in preparation for some actual
functional changes. This splits getOutliningBenefit into
two cost functions: getOutliningCallOverhead and
getOutliningFrameOverhead. These functions return the
number of instructions that would be required to call
a specific function and the number of instructions
that would be required to construct a frame for a
specific funtion. The actual outlining benefit logic
is moved into the outliner, which calls these functions.
The goal of refactoring getOutliningBenefit is to:
- Get us closer to getting rid of the IsTailCall flag
- Further split up "target-specific" things and
"general algorithm" things
llvm-svn: 309356
This can come up in ThinLTO & wastes space & makes degenerate IR.
As per the added FIXME, ultimately, local imported entities should hang
off the function and that way the imported entity list on the CU can be
tested for emptiness like all the other CU lists.
(function-attached local imported entities are probably also the best
path forward for fixing how imported entities are handled both in
cross-module use (currently, while ThinLTO preserves the imported
entities, they would not get used at the imported inlined location -
only in the abstract origin that appears in the partial CU created by
the import (which isn't emitted under Fission due to cross-CU
limitations there)) and to reduce the number of points where imported
entities are emitted (they're currently emitted into every inlined
instance, concrete instance, and abstract origin - they should only go
in teh abstract origin if there is one, otherwise in the concrete
instance - but this requires lots of delayed handling and wiring up,
same as abstract variables & subprograms))
llvm-svn: 309354
Summary: In the current implementation, isPromotionProfitable only checks if the call count to a direct target is no less than a certain percentage threshold of the remaining call counts that have not been promoted. This causes code size problems when the target count is small but greater than a large portion of remaining counts. E.g. target1 takes 99.9%, while target2 takes 0.1%. Both targets will be promoted and inlined, makes the function size too large, which potentially prevents it from further inlining into its callers. This patch adds another percentage threshold against the total indirect call count. If the target count needs to be no less than both thresholds in order to be promoted speculatively.
Reviewers: davidxl, tejohnson
Reviewed By: tejohnson
Subscribers: sanjoy, llvm-commits
Differential Revision: https://reviews.llvm.org/D35962
llvm-svn: 309345
Summary:
Pointer difference simplifications currently happen only if input GEPs don't have other uses or their indexes are all constants, to avoid duplicating indexing arithmetic.
This patch enables cases with exactly one non-constant index among input GEPs to happen where there is no duplicated arithmetic or code size increase even if input GEPs have other uses.
For example, this patch allows "(&A[42][i]-&A[42][0])" --> "i", which didn't happen previously, if the input GEP(s) have other uses.
Reviewers: sanjoy, bkramer
Reviewed By: sanjoy
Subscribers: mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D35499
llvm-svn: 309304
Summary:
MSVC link.exe records all external symbol names in the publics stream.
It provides similar functionality to an ELF .symtab.
Reviewers: zturner, ruiu
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D35871
llvm-svn: 309303
This is a module pass so for the old PM, we can't use ORE, the function
analysis pass. Instead ORE is created on the fly.
A few notes:
- isPromotionLegal is folded in the caller since we want to emit the Function
in the remark but we can only do that if the symbol table look-up succeeded.
- There was good test coverage for remarks in this pass.
- promoteIndirectCall uses ORE conditionally since it's also used from
SampleProfile which does not use ORE yet.
Fixes PR33792.
Differential Revision: https://reviews.llvm.org/D35929
llvm-svn: 309294
Summary:
It is possible for some passes to materialize a call to a libcall (ex: ldexp, exp2, etc),
but these passes will not mark the call as a gc-leaf-function. All libcalls are
actually gc-leaf-functions, so we change llvm::callsGCLeafFunction() to tell us that
available libcalls are equivalent to gc-leaf-function calls.
Reviewers: sanjoy, anna, reames
Reviewed By: anna
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D35840
llvm-svn: 309291
Summary:
Using c++11 enum classes ensures that only valid enum values are used
for ArchKind, ProfileKind, VersionKind and ISAKind. This removes the
need for checks that the provided values map to a proper enum value,
allows us to get rid of AK_LAST and prevents comparing values from
different enums. It also removes a bunch of static_cast
from unsigned to enum values and vice versa, at the cost of introducing
static casts to access AArch64ARCHNames and ARMARCHNames by ArchKind.
FPUKind and ArchExtKind are the only remaining old-style enum in
TargetParser.h. I think it's beneficial to keep ArchExtKind as old-style
enum, but FPUKind can be converted too, but this patch is quite big, so
could do this in a follow-up patch. I could also split this patch up a
bit, if people would prefer that.
Reviewers: rengolin, javed.absar, chandlerc, rovka
Reviewed By: rovka
Subscribers: aemerson, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D35882
llvm-svn: 309287
Summary:
Now that we have control flow in place, fuse the per-rule tables into a
single table. This is a compile-time saving at this point. However, this will
also enable the optimization of a table so that similar instructions can be
tested together, reducing the time spent on the matching the code.
This is NFC in terms of externally visible behaviour but some internals have
changed slightly. State.MIs is no longer reset between each rule that is
attempted because it's not necessary to do so. As a consequence of this the
restriction on the order that instructions are added to State.MIs has been
relaxed to only affect recorded instructions that require new elements to be
added to the vector. GIM_RecordInsn can now write to any element from 1 to
State.MIs.size() instead of just State.MIs.size().
The compile-time regressions from the last commit were caused by the ARM target
including a non-const variable (zero_reg) in the table and therefore generating
an initializer for it. That variable is now const.
Reviewers: ab, t.p.northover, qcolombet, rovka, aditya_nandakumar
Reviewed By: rovka
Subscribers: kristof.beyls, igorb, llvm-commits
Differential Revision: https://reviews.llvm.org/D35681
llvm-svn: 309264
uses.
Also splitting the buildSources part allows more overloads such as
adding MachineOperands directly in the arguments for buildInstr.
llvm-svn: 309163
Summary:
This changes SimplifyLibCalls to use the new OptimizationRemarkEmitter
API.
In fact, as SimplifyLibCalls is only ever called via InstCombine,
(as far as I can tell) the OptimizationRemarkEmitter is added there,
and then passed through to SimplifyLibCalls later.
I have avoided changing any remark text.
This closes PR33787
Patch by Sam Elliott!
Reviewers: anemet, davide
Reviewed By: anemet
Subscribers: davide, mehdi_amini, eraman, fhahn, llvm-commits
Differential Revision: https://reviews.llvm.org/D35608
llvm-svn: 309158
Summary: We can use the template parameter `IsPostDom` to pick an appropriate SmallVector size to store DomTree roots for dominators and postdominators. Before, the code would always allocate memory with `std::vector`.
Reviewers: dberlin, davide, sanjoy, grosser
Reviewed By: grosser
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D35636
llvm-svn: 309148
Summary:
This patch moves root-finding logic from DominatorTreeBase to GenericDomTreeConstruction.h.
It makes the behavior simpler and more consistent by always adding a virtual root to PostDominatorTrees.
Reviewers: dberlin, davide, grosser, sanjoy
Reviewed By: dberlin
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D35597
llvm-svn: 309146
Summary: The new PM needs to invoke add-discriminator pass when building with -fdebug-info-for-profiling.
Reviewers: chandlerc, davidxl
Reviewed By: chandlerc
Subscribers: sanjoy, llvm-commits
Differential Revision: https://reviews.llvm.org/D35744
llvm-svn: 309121
The ARM bots have started failing and while this patch should be an improvement
for these bots, it's also the only suspect in the blamelist. Reverting while
Diana and I investigate the problem.
llvm-svn: 309111
Summary:
Now that we have control flow in place, fuse the per-rule tables into a
single table. This is a compile-time saving at this point. However, this will
also enable the optimization of a table so that similar instructions can be
tested together, reducing the time spent on the matching the code.
This is NFC in terms of externally visible behaviour but some internals have
changed slightly. State.MIs is no longer reset between each rule that is
attempted because it's not necessary to do so. As a consequence of this the
restriction on the order that instructions are added to State.MIs has been
relaxed to only affect recorded instructions that require new elements to be
added to the vector. GIM_RecordInsn can now write to any element from 1 to
State.MIs.size() instead of just State.MIs.size().
Reviewers: ab, t.p.northover, qcolombet, rovka, aditya_nandakumar
Reviewed By: rovka
Subscribers: kristof.beyls, igorb, llvm-commits
Differential Revision: https://reviews.llvm.org/D35681
llvm-svn: 309094
By default, we display only options that are not
hidden and have help texts. This patch adds flag
allowing to display aliases that have no help text.
In this case help text of aliased option used instead.
Differential revision: https://reviews.llvm.org/D35476
llvm-svn: 309087
Changing mask argument type from const SmallVectorImpl<int>& to
ArrayRef<int>.
This came up in D35700 where a mask is received as an ArrayRef<int> and
we want to pass it to TargetLowering::isShuffleMaskLegal().
Also saves a few lines of code.
llvm-svn: 309085
This patch adds a cache for computeExitLimit to save compilation time. A lot of examples of
tests that take extensive time to compile are attached to the bug 33494.
Differential Revision: https://reviews.llvm.org/D35827
llvm-svn: 309080
Summary: This patch adds flags and tests to cover the PGOOpt handling logic in new PM.
Reviewers: chandlerc, davide
Reviewed By: chandlerc
Subscribers: sanjoy, llvm-commits
Differential Revision: https://reviews.llvm.org/D35807
llvm-svn: 309076
Summary:
Previously were in support. Since many many things depend on support,
were all forced to also depend on libxml2, which we only want in a few cases.
This puts all the libxml2 deps in a separate lib to be used only in a few
places.
Reviewers: ruiu, thakis, rnk
Subscribers: mgorny, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D35819
llvm-svn: 309070
The PDB "symbol stream" actually contains symbol records for the publics
and the globals stream. The globals and publics streams are essentially
hash tables that point into a single stream of records. In order to
match cvdump's behavior, we need to only dump symbol records referenced
from the hash table. This patch implements that, and then implements
global stream dumping, since it's just a subset of public stream
dumping.
Now we shouldn't see S_PROCREF or S_GDATA32 records when dumping
publics, and instead we should see those record in the globals stream.
llvm-svn: 309066
Summary:
Does a simple merge, where mergeable elements are combined, all others
are appended. Does not apply trickly namespace rules.
Subscribers: llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D35753
llvm-svn: 309047
Summary:
ELF linkers generate __start_<secname> and __stop_<secname> symbols
when there is a value in a section <secname> where the name is a valid
C identifier. If dead stripping determines that the values declared
in section <secname> are dead, and we then internalize (and delete)
such a symbol, programs that reference the corresponding start and end
section symbols will get undefined reference linking errors.
To fix this, add the section name to the IRSymtab entry when a symbol is
defined in a specific section. Then use this in the gold-plugin to mark
the symbol as external and visible from outside the summary when the
section name is a valid C identifier.
Reviewers: pcc
Subscribers: mehdi_amini, inglorion, eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D35639
llvm-svn: 309009
This patch moves the DAGCombiner::GetDemandedBits function to SelectionDAG::GetDemandedBits as a first step towards making it easier for targets to get to the source of any demanded bits without the limitations of SimplifyDemandedBits.
Differential Revision: https://reviews.llvm.org/D35841
llvm-svn: 308983
Propagates the GraphT template argument to the default value of
the AnalysisGraphTraitsT template argument. This allows to specialize
the DefaultAnalysisGraphTraits<AnalysisT,GraphT> for analysis with a
graph type different from the analysis type and it will automatically
get picked-up.
Note: This was probably the intended purpose and should not result in any
functional change.
llvm-svn: 308878
This includes the hash table, the address map, and the thunk table and
section offset table. The last two are only used for incremental
linking, which LLD doesn't support, so they are less interesting. The
hash table is particularly important to get right, since this is the one
of the streams that debuggers use to translate addresses to symbols.
llvm-svn: 308764
Summary: Currently the ThinLTO minimized bitcode file only strip the debug info, but there is still a lot of information in the minimized bit code file that will be not used for thin linker. In this patch, most of the extra information is striped to reduce the minimized bitcode file. Now only ModuleVersion, ModuleInfo, ModuleGlobalValueSummary, ModuleHash, Symtab and Strtab are left. Now the minimized bitcode file size is reduced to 15%-30% of the debug info stripped bitcode file size.
Reviewers: danielcdh, tejohnson, pcc
Reviewed By: pcc
Subscribers: mehdi_amini, aprantl, inglorion, eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D35334
llvm-svn: 308760
This patch makes LSR generate better code for SystemZ in the cases of memory
intrinsics, Load->Store pairs or comparison of immediate with memory.
In order to achieve this, the following common code changes were made:
* New TTI hook: LSRWithInstrQueries(), which defaults to false. Controls if
LSR should do instruction-based addressing evaluations by calling
isLegalAddressingMode() with the Instruction pointers.
* In LoopStrengthReduce: handle address operands of memset, memmove and memcpy
as address uses, and call isFoldableMemAccessOffset() for any LSRUse::Address,
not just loads or stores.
SystemZ changes:
* isLSRCostLess() implemented with Insns first, and without ImmCost.
* New function supportedAddressingMode() that is a helper for TTI methods
looking at Instructions passed via pointers.
Review: Ulrich Weigand, Quentin Colombet
https://reviews.llvm.org/D35262https://reviews.llvm.org/D35049
llvm-svn: 308729
lld needs a matching change for this will be my next commit.
Expect it to fail build until that matching commit is picked up by the bots.
Like the changes in r296527 for dyld bind entires and the changes in
r298883 for lazy bind, weak bind and rebase entries the export
entries are the last of the dyld compact info to have error handling added.
This follows the model of iterators that can fail that Lang Hanes
designed when fixing the problem for bad archives r275316 (or r275361).
So that iterating through the exports now terminates if there is an error
and returns an llvm::Error with an error message in all cases for malformed
input.
This change provides the plumbing for the error handling, all the needed
testing of error conditions and test cases for all of the unique error messages.
llvm-svn: 308690
Summary: Implement parsing and writing of a single xml manifest file.
Subscribers: mgorny, llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D35425
llvm-svn: 308679
On AMDGPU SGPR spills are really spilled to another register.
The spiller creates the spills to new frame index objects,
which is used as a placeholder.
This will eventually be replaced with a reference to a position
in a VGPR to write to and the frame index deleted. It is
most likely not a real stack location that can be shared
with another stack object.
This is a problem when StackSlotColoring decides it should
combine a frame index used for a normal VGPR spill with
a real stack location and a frame index used for an SGPR.
Add an ID field so that StackSlotColoring has a way
of knowing the different frame index types are
incompatible.
llvm-svn: 308673
It seems that G++ 4.8 doesn't accept the 'enum A' in code of the form:
enum A { ... };
const auto &F = []() -> enum A { ... };
However, it does accept:
typedef enum { ... } A;
const auto &F = []() -> A { ... };
llvm-svn: 308599
Summary:
This will allow us to merge the various sub-tables into a single table. This is a
compile-time saving at this point. However, this will also enable the optimization
of a table so that similar instructions can be tested together, reducing the time
spent on the matching the code.
The bulk of this patch is a mechanical conversion to the new MatchTable object
which is responsible for tracking label definitions and filling in the index of
the jump targets. It is also responsible for nicely formatting the table.
This was necessary to support the new GIM_Try opcode which takes the index to
jump to if the match should fail. This value is unknown during table
construction and is filled in during emission. To support nesting try-blocks
(although we currently don't emit tables with nested try-blocks), GIM_Reject
has been re-introduced to explicitly exit a try-block or fail the overall match
if there are no active try-blocks.
Reviewers: ab, t.p.northover, qcolombet, rovka, aditya_nandakumar
Reviewed By: rovka
Subscribers: kristof.beyls, igorb, llvm-commits
Differential Revision: https://reviews.llvm.org/D35117
llvm-svn: 308596
SUMMARY
This patch adds a verification check on the abbreviation declarations in the .debug_abbrev section.
The check makes sure that no abbreviation declaration has more than one attributes with the same name.
Differential Revision: https://reviews.llvm.org/D35643
llvm-svn: 308579
As a follow up of the bad alloc handler patch, this patch introduces nullptr checks on pointers returned from the
malloc/realloc/calloc functions. In addition some memory size assignments are moved behind the allocation
of the corresponding memory to fulfill exception safe memory management (RAII).
patch by Klaus Kretzschmar
Differential Revision: https://reviews.llvm.org/D35414
llvm-svn: 308576
This changes DwarfContext to delegate to DwarfObject instead of having
pure virtual methods.
With this DwarfContextInMemory is replaced with an implementation of
DwarfObject that is local to a .cpp file.
llvm-svn: 308543
This will allow eliminating the duplication of the names, and allow adding
extra information such as signatures in a future commit.
Differential Revision: https://reviews.llvm.org/D35522
llvm-svn: 308531
This change adds basic support for program headers.
I need to do some testing which requires generating program headers but
I can't use ld.lld or clang to produce programs that have headers. I'd
also like to test some strange things that those programs may never
produce.
Patch by Jake Ehrlich
Differential Revision: https://reviews.llvm.org/D35276
llvm-svn: 308520
If OpValue is non-null, we only consider operations similar to OpValue
when intersecting.
Differential Revision: https://reviews.llvm.org/D35292
llvm-svn: 308428
functions.
In the prior commit, we provide ordering to the LCG between functions
and library function definitions that they might begin to call through
transformations. But we still would delete these library functions from
the call graph if they became dead during inlining.
While this immediately crashed, it also exposed a loss of information.
We shouldn't remove definitions of library functions that can still
usefully participate in the LCG-powered CGSCC optimization process. If
new call edges are formed, we want to have definitions to be called.
We can still remove these functions if truly dead using global-dce, etc,
but removing them during the CGSCC walk is premature.
This fixes a crash in the new PM when optimizing some unusual libraries
that end up with "internal" lib functions such as the code in the "R"
language's libraries.
llvm-svn: 308417
Preserve the actual library name as provided by the user. This is
required to properly replicate link's behaviour about the module import
name handling. This requires an associated change to lld for updating
the tests for the proper behaviour for the import library module name
handling in various cases.
Associated tests will be part of the lld change.
llvm-svn: 308406
DIImportedEntity has a line number, but not a file field. To determine
the decl_line/decl_file we combine the line number from the
DIImportedEntity with the file from the DIImportedEntity's scope. This
does not work correctly when the parent scope is a DINamespace or a
DIModule, both of which do not have a source file.
This patch adds a file field to DIImportedEntity to unambiguously
identify the source location of the using/import declaration. Most
testcase updates are mechanical, the interesting one is the removal of
the FIXME in test/DebugInfo/Generic/namespace.ll.
This fixes PR33822. See https://bugs.llvm.org/show_bug.cgi?id=33822
for more context.
<rdar://problem/33357889>
https://bugs.llvm.org/show_bug.cgi?id=33822
Differential Revision: https://reviews.llvm.org/D35583
llvm-svn: 308398
When I originally wrote this code, I neglected the fact that the import
library may be created for executables. This name is not the name of
the DLL, but rather the name for the imported module. It will be
embedded into the IAT/ILT reference. Rename it to make it more obvious.
NFC.
llvm-svn: 308384
A PE COFF spec compliant import library generator.
Intended to be used with mingw-w64.
Supports:
PE COFF spec (section 8, Import Library Format)
PE COFF spec (Aux Format 3: Weak Externals)
Reviewed By: ruiu
Differential Revision: https://reviews.llvm.org/D29892
This reapplies rL308329, which was reverted in rL308374
llvm-svn: 308379
Summary: This patch improves error detection in deleteEdge. It asserts that the edge doesn't exist in the CFG and that DomTree knew about this edge before.
Reviewers: dberlin, grosser, brzycki, sanjoy
Reviewed By: dberlin
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D35571
llvm-svn: 308354
A PE COFF spec compliant import library generator.
Intended to be used with mingw-w64.
Supports:
PE COFF spec (section 8, Import Library Format)
PE COFF spec (Aux Format 3: Weak Externals)
Reviewed By: ruiu
Differential Revision: https://reviews.llvm.org/D29892
llvm-svn: 308329
Summary: This information can be useful; and in the case of Win64, necessary for getting exceptions to work in the JIT.
Reviewers: lhames
Reviewed By: lhames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D35102
llvm-svn: 308321
Summary:
G_FMA was recently added to GlobalISel which enables the import of rules
involving fma. Add the mapping to allow it.
Reviewers: ab, t.p.northover, qcolombet, rovka, aditya_nandakumar
Reviewed By: rovka
Subscribers: kristof.beyls, javed.absar, igorb, llvm-commits
Differential Revision: https://reviews.llvm.org/D35130
llvm-svn: 308308
using runtime checks
Extend the SCEVPredicateRewriter to work a bit harder when it encounters an
UnknownSCEV for a Phi node; Try to build an AddRecurrence also for Phi nodes
whose update chain involves casts that can be ignored under the proper runtime
overflow test. This is one step towards addressing PR30654.
Differential revision: http://reviews.llvm.org/D30041
llvm-svn: 308299
Cleaned up the code in FastISel a bit.
Had to add make_range to MCInstrDesc as that was needed and seems missing.
Reviewed by: @t.p.northover
Differential Revision: https://reviews.llvm.org/D35494
llvm-svn: 308291
Summary:
This patch modifies the handleDebugInfo() function so that we verify the contents of each unit
in the .debug_info section only if its header has been successfully verified.
This change will allow for more/different verification checks depending on the type of the unit since from
dwarf5, the .debug_info section may consist of different types of units.
Subscribers: aprantl
Differential Revision: https://reviews.llvm.org/D35521
llvm-svn: 308245
Summary:
This removes the CVTypeVisitor updater and verifier classes. They were
made dead by the minimal type dumping refactoring. Replace them with a
single function that takes a type record and produces a hash. Call this
from the minimal type dumper and compare the hash.
I also noticed that the microsoft-pdb reference repository uses a basic
CRC32 for records that aren't special. We already have an implementation
of that CRC ready to use, because it's used in COFF for ICF.
I'll make LLD call this hashing utility in a follow-up change. We might
also consider using this same hash in type stream merging, so that we
don't have to hash our records twice.
Reviewers: inglorion, ruiu
Subscribers: llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D35515
llvm-svn: 308240
Summary:
Object files compiled with /Zi emit type information into a type server
PDB. The .debug$S section will contain a single TypeServer2Record with
the absolute path and GUID of the type server. LLD needs to load the
type server PDB and merge all types and items it finds in it into the
destination PDB.
Depends on D35495
Reviewers: ruiu, inglorion
Subscribers: zturner, llvm-commits
Differential Revision: https://reviews.llvm.org/D35504
llvm-svn: 308235
Summary:
We were treating the GUIDs in TypeServer2Record as strings, and the
non-ASCII bytes in the GUID would not round-trip through YAML.
We already had the PDB_UniqueId type portably represent a Windows GUID,
but we need to hoist that up to the DebugInfo/CodeView library so that
we can use it in the TypeServer2Record as well as in PDB parsing code.
Reviewers: inglorion, amccarth
Subscribers: llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D35495
llvm-svn: 308234
Summary:
Instead of wiring these through the CVTypeVisitor interface, clients
should inspect the CVTypeArray before visiting it and potentially load
up the type server's TPI stream if they need it.
No tests relied on this functionality because LLD was the only client.
Reviewers: ruiu
Subscribers: mgorny, hiraditya, zturner, llvm-commits
Differential Revision: https://reviews.llvm.org/D35394
llvm-svn: 308212
Rename the enum value from X86_64_Win64 to plain Win64.
The symbol exposed in the textual IR is changed from 'x86_64_win64cc'
to 'win64cc', but the numeric value is kept, keeping support for
old bitcode.
Differential Revision: https://reviews.llvm.org/D34474
llvm-svn: 308208
This adds support for the new 32-bit vector float instructions of z14.
This includes:
- Enabling the instructions for the assembler/disassembler.
- CodeGen for the instructions, including new LLVM intrinsics.
- Scheduler description support for the instructions.
- Update to the vector cost function calculations.
In general, CodeGen support for the new v4f32 instructions closely
matches support for the existing v2f64 instructions.
llvm-svn: 308195
This patch series adds support for the IBM z14 processor. This part includes:
- Basic support for the new processor and its features.
- Support for new instructions (except vector 32-bit float and 128-bit float).
- CodeGen for new instructions, including new LLVM intrinsics.
- Scheduler description for the new processor.
- Detection of z14 as host processor.
Support for the new 32-bit vector float and 128-bit vector float
instructions is provided by separate patches.
llvm-svn: 308194
Summary:
The current yaml::Input constructor takes a StringRef of data as its
first parameter, discarding any filename information that may have been
present when a YAML file was opened. Add an alterate yaml::Input
constructor that takes a MemoryBufferRef, which can have a filename
associated with it. This leads to clearer diagnostic messages.
Sponsored By: DARPA, AFRL
Reviewed By: arphaman
Differential Revision: https://reviews.llvm.org/D35398
Patch by: Jonathan Anderson (trombonehero)
llvm-svn: 308172
Some platforms have problems with emmiting constructors when class
templates get explicitly instantiated.
This patch fixes the bug reported in D35315 by replacing `= default`
with an empty constructor body.
llvm-svn: 308140
Summary:
Currently these methods call ConstantDataVector::getSplatValue which uses getElementsAsConstant to create a Constant object representing the element value. This method incurs a map lookup to see if we already have created such a Constant before and if not allocates a new Constant object.
This patch changes these methods to use getElementAsAPFloat and getElementAsInteger so we can just examine the data values directly.
Reviewers: spatel, pcc, dexonsmith, bogner, craig.topper
Reviewed By: craig.topper
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D35040
llvm-svn: 308112
function to every defined function known to LLVM as a library function.
LLVM can introduce calls to these functions either by replacing other
library calls or by recognizing patterns (such as memset_pattern or
vector math patterns) and replacing those with calls. When these library
functions are actually defined in the module, we need to have reference
edges to them initially so that we visit them during the CGSCC walk in
the right order and can effectively rebuild the call graph afterward.
This was discovered when building code with Fortify enabled as that is
a common case of both inline definitions of library calls and
simplifications of code into calling them.
This can in extreme cases of LTO-ing with libc introduce *many* more
reference edges. I discussed a bunch of different options with folks but
all of them are unsatisfying. They either make the graph operations
substantially more complex even when there are *no* defined libfuncs, or
they introduce some other complexity into the callgraph. So this patch
goes with the simplest possible solution of actual synthetic reference
edges. If this proves to be a memory problem, I'm happy to implement one
of the clever techniques to save memory here.
llvm-svn: 308088
Now, getUserCost() only checks the src and dst types of EXT to decide it is free
or not. This change first checks the types, then calls isExtFreeImpl(), and
check if EXT can form ExtLoad at last. Currently, only AArch64 has customized
implementation of isExtFreeImpl() to check if EXT can be folded into its use.
Differential Revision: https://reviews.llvm.org/D34458
llvm-svn: 308076
This fixes a minor bug in insertion to a reachable node that caused
DominatorTree.InsertDeleteExhaustive flakiness. The patch also adds
a new testcase for this exact failure.
llvm-svn: 308074
Summary:
This patch implements incremental edge deletions.
It also makes DominatorTreeBase store a pointer to the parent function. The parent function is needed to perform full rebuilts during some deletions, but it is also used to verify that inserted and deleted edges come from the same function.
Reviewers: dberlin, davide, grosser, sanjoy, brzycki
Reviewed By: dberlin
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D35342
llvm-svn: 308062
Summary:
This patch introduces incremental edge insertions based on the Depth Based Search algorithm.
Insertions should work for both dominators and postdominators.
Reviewers: dberlin, grosser, davide, sanjoy, brzycki
Reviewed By: dberlin
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D35341
llvm-svn: 308054
One place compared with 32, which I've replaced with LaneBitmask::BitWidth.
The other places are shifts of a constant 1 by a lane number. But if LaneBitmask were to be a larger type than 32-bits like 64-bits, the 1 would need to be 1ULL to do a 64-bit shift. To hide this I've added a LanebitMask::getLane that hides the shift and make sures the 1 is casted to correct type first.
llvm-svn: 308042
Summary:
DominatorTreeBase used to have IsPostDominators (bool) member to indicate if the tree is a dominator or a postdominator tree. This made it possible to switch between the two 'modes' at runtime, but it isn't used in practice anywhere.
This patch makes IsPostDominator a template argument. This way, it is easier to switch between different algorithms at compile-time based on this argument and design external utilities around it. It also makes it impossible to incidentally assign a postdominator tree to a dominator tree (and vice versa), and to further simplify template code in GenericDominatorTreeConstruction.
Reviewers: dberlin, sanjoy, davide, grosser
Reviewed By: dberlin
Subscribers: mzolotukhin, llvm-commits
Differential Revision: https://reviews.llvm.org/D35315
llvm-svn: 308040
Summary:
This patch adds `BlockPrinter`-- a small wrapper for printing CFG nodes and DomTree nodes to `raw_ostream`. It is meant to be only used internally, for debugging and printing errors.
Reviewers: dberlin, sanjoy, grosser, davide
Reviewed By: grosser, davide
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D35286
llvm-svn: 308036
This is the LLVM part, adding definitions for
void @llvm.hexagon.Y2.dccleana(i8*)
void @llvm.hexagon.Y2.dccleaninva(i8*)
void @llvm.hexagon.Y2.dcinva(i8*)
void @llvm.hexagon.Y2.dczeroa(i8*)
void @llvm.hexagon.Y4.l2fetch(i8*, i32)
void @llvm.hexagon.Y5.l2fetch(i8*, i64)
The clang part will follow.
llvm-svn: 308032
This patch adds verification checks for the unit header chain in the .debug_info section.
Specifically, for each unit in the .debug_info section, the verifier checks that:
The unit length is valid (i.e. the unit can actually fit in the .debug_info section)
The dwarf version of the unit is valid
The address size is valid (4 or 8)
The unit type (if the unit is in dwarf5) is valid
The debug_abbrev_offset is valid
llvm-svn: 307975
Binary streams are an abstraction over a discontiguous buffer. To write
a discontiguous buffer, we want to copy each contiguous chunk
individually. Currently BinaryStreams do not expose a way to iterate
over the chunks, so the code repeatedly calls
readLongestContiguousChunk() with an increasing offset. In order to
lookup the chunk by offset, we would iterate the items list to figure
out which chunk the offset is within. This is obviously O(n^2).
Instead, pre-compute a table of offsets and do a binary search to figure
out which chunk to use. This is still only an O(n^2) to O(n log n)
improvement, but it's a very local fix that seems worth doing.
This improves self-linking lld.exe with PDBs from 90s to 10s.
llvm-svn: 307970
Summary: DominatorTreeBase and related classes used overcomplicated template machinery. This patch simplifies them and gets rid of DominatorTreeBaseTraits and DominatorTreeBaseByTraits, which weren't actually used outside the DomTree construction.
Reviewers: dberlin, sanjoy, davide, grosser
Reviewed By: dberlin, davide, grosser
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D35285
llvm-svn: 307953
Summary:
This patch splits the SemiNCA algorithm into smaller functions. It also adds a new debug macro.
In order to perform incremental updates, we need to be able to refire SemiNCA on a subset of CFG nodes (determined by a DFS walk results). We also need to skip nodes that are not deep enough in a DomTree.
Reviewers: dberlin, davide, sanjoy, grosser
Reviewed By: dberlin, davide
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D35282
llvm-svn: 307950
Summary:
This fixes type indices for SDK or CRT static archives. Previously we'd
try to look next to the archive object file path, which would not exist
on the local machine.
Also error out if we can't resolve a type server record. Hypothetically
we can recover from this error by discarding debug info for this object,
but that is not yet implemented.
Reviewers: ruiu, amccarth
Subscribers: aprantl, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D35369
llvm-svn: 307946
Summary:
This patch improves verification by making `verifyReachablility` look for CFG not found in the DomTree.
It also makes the verification work with postdominators by handling virtual root.
Reviewers: dberlin, davide, grosser, sanjoy
Reviewed By: dberlin
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D35279
llvm-svn: 307936
Summary: Add target hooks for printing and parsing target MMO flags.
Targets may override getSerializableMachineMemOperandTargetFlags() to
return a mapping from string to flag value for target MMO values that
should be serialized/parsed in MIR output.
Add implementation of this hook for AArch64 SuppressPair MMO flag.
Reviewers: bogner, hfinkel, qcolombet, MatzeB
Subscribers: mcrosier, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D34962
llvm-svn: 307877
Summary: Continuing the work from https://reviews.llvm.org/D33240, this change introduces an element unordered-atomic memset intrinsic. This intrinsic is essentially memset with the implementation requirement that all stores used for the assignment are done with unordered-atomic stores of a given element size.
Reviewers: eli.friedman, reames, mkazantsev, skatkov
Reviewed By: reames
Subscribers: jfb, dschuff, sbc100, jgravelle-google, aheejin, efriedma, llvm-commits
Differential Revision: https://reviews.llvm.org/D34885
llvm-svn: 307854
Summary: Different JITs and other clients of LLVM may have different needs in how symbol resolution should occur.
Reviewers: v.g.vassilev, lhames, karies
Reviewed By: v.g.vassilev
Subscribers: pcanal, llvm-commits
Differential Revision: https://reviews.llvm.org/D33529
llvm-svn: 307849
Doing so is leaking an implementation detail.
I have an implementation that uses the lld infrastructure and doesn't
use a map or object::SectionRef.
llvm-svn: 307846
Where is is needed (at the end of headers that define it), be
consistent about its use.
Also fix a few header guards that I found in the process.
Differential Revision: https://reviews.llvm.org/D34916
llvm-svn: 307840
Summary:
There is a reserved range of type indexes for built-in types (like integers).
This will create a symbol for a built-in type if the caller askes for one by
type index. This is also plumbing for being able to recall symbols by type
index in general, but user-defined types will come in subsequent patches.
Reviewers: rnk, zturner
Subscribers: mgorny, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D35163
llvm-svn: 307834
Summary: Continuing the work from https://reviews.llvm.org/D33240, this change introduces an element unordered-atomic memmove intrinsic. This intrinsic is essentially memmove with the implementation requirement that all loads/stores used for the copy are done with unordered-atomic loads/stores of a given element size.
Reviewers: eli.friedman, reames, mkazantsev, skatkov
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D34884
llvm-svn: 307796
Summary:
Solves PR33689.
If the pointer size is less than the size of the type used for the array
size in an alloca (the <ty> type below) then we could trigger the assert in
the PR. In that example we have pointer size i16 and <ty> is i32.
<result> = alloca [inalloca] <type> [, <ty> <NumElements>] [, align <alignment>]
Handle the situation by allowing truncation as well as zero extension in
ObjectSizeOffsetVisitor::visitAllocaInst().
Also, we now detect overflow in visitAllocaInst(), similar to how it was
already done in visitCallSite().
Reviewers: craig.topper, rnk, george.burgess.iv
Reviewed By: george.burgess.iv
Subscribers: davide, llvm-commits
Differential Revision: https://reviews.llvm.org/D35003
llvm-svn: 307754
Summary:
This allows tools like lld that process relocations
to apply data relocation correctly. This information
is required because relocation are stored as section
offset.
Subscribers: jfb, dschuff, jgravelle-google, aheejin
Differential Revision: https://reviews.llvm.org/D35234
llvm-svn: 307741
The issue is not if the value is pcrel. It is whether we have a
relocation or not.
If we have a relocation, the static linker will select the upper
bits. If we don't have a relocation, we have to do it.
llvm-svn: 307730
Summary:
Custom DFS implementation allows us to skip over certain nodes without adding them to the visited map, which is not easily doable with llvm's dfs iterators. What's more, caching predecessors becomes easy.
This patch implements a single DFS function (template) for both forward and reverse DFS, which should be easier to maintain then separate two ones.
Skipping over nodes based on a predicate will be necessary later to implement incremental updates.
There also seems to be a very slight performance improved when bootstrapping clang with this patch on my machine (3:28s -> 3:26s) .
Reviewers: dberlin, sanjoy, davide, grosser
Reviewed By: dberlin
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D34651
llvm-svn: 307727
OpenCL 2.0 introduces the notion of memory scopes in atomic operations to
global and local memory. These scopes restrict how synchronization is
achieved, which can result in improved performance.
This change extends existing notion of synchronization scopes in LLVM to
support arbitrary scopes expressed as target-specific strings, in addition to
the already defined scopes (single thread, system).
The LLVM IR and MIR syntax for expressing synchronization scopes has changed
to use *syncscope("<scope>")*, where <scope> can be "singlethread" (this
replaces *singlethread* keyword), or a target-specific name. As before, if
the scope is not specified, it defaults to CrossThread/System scope.
Implementation details:
- Mapping from synchronization scope name/string to synchronization scope id
is stored in LLVM context;
- CrossThread/System and SingleThread scopes are pre-defined to efficiently
check for known scopes without comparing strings;
- Synchronization scope names are stored in SYNC_SCOPE_NAMES_BLOCK in
the bitcode.
Differential Revision: https://reviews.llvm.org/D21723
llvm-svn: 307722
Summary:
Patch by Klaus Kretzschmar
We would like to introduce a new type of llvm error handler for handling
bad alloc fault situations. LLVM already provides a fatal error handler
for serious non-recoverable error situations which by default writes
some error information to stderr and calls exit(1) at the end (functions
are marked as 'noreturn').
For long running processes (e.g. a server application), exiting the
process is not an acceptable option, especially not when the system is
in a temporary resource bottleneck with a good chance to recover from
this fault situation. In such a situation you would rather throw an
exception to stop the current compilation and try to overcome the
resource bottleneck. The user should be aware of the problem of throwing
an exception in bad alloc situations, e.g. you must not do any
allocations in the unwind chain. This is especially true when adding
exceptions in existing unfamiliar code (as already stated in the comment
of the current fatal error handler)
So the new handler can also be used to distinguish from general fatal
error situations where recovering is no option. It should be used in
cases where a clean unwind after the allocation is guaranteed.
This patch contains:
- A report_bad_alloc function which calls a user defined bad alloc
error handler. If no user handler is registered the
report_fatal_error function is called. This function is not marked as
'noreturn'.
- A install/restore_bad_alloc_error_handler to install/restore the bad
alloc handler.
- An example (in Mutex.cpp) where the report_bad_alloc function is
called in case of a malloc returns a nullptr.
If this patch gets accepted we would create similar patches to fix
corresponding malloc/calloc usages in the llvm code.
Reviewers: chandlerc, greened, baldrick, rnk
Reviewed By: rnk
Subscribers: llvm-commits, MatzeB
Differential Revision: https://reviews.llvm.org/D34753
llvm-svn: 307673
TreePatternNode considers them to be plain integers but MachineInstr considers
them to be a distinct kind of operand.
The tweak to AArch64InstrInfo.td to produce a simple test case is a NFC for
everything except GlobalISelEmitter (confirmed by diffing the tablegenerated
files). GlobalISelEmitter is currently unable to infer the type of operands in
the Dst pattern from the operands in the Src pattern.
llvm-svn: 307634
This takes memory usage from 5.1GB to 970MB - it could go further by
using a small size of 2 instead of the default of 4, but given the
rather high cost of going over this limit by much, I figured a little
slosh would be worth the ~130MB of memory usage.
& this'll might not be such a big deal if we use a custom slab allocator
for the DenseMaps here anyway
While the vast majority (99.9%) of records use only 1 entry, the tuning
parameter to SmallDenseMap is the the number of buckets, not the number
of entries - so a small size of 1 wasn't useful, even for 1 element, it
would tip over into allocating (much, 64 slots worth) more space - none
of them ended up small.
llvm-svn: 307608
This is part of the continuing effort to increase parity between
LLD and MSVC PDBs. link still doesn't like our PDBs, so the most
obvious thing to check was whether adding an empty publics stream
would get it to do something else. It still fails in the same way
but at least this removes one more variable from the equation.
The next logical step would be to try creating an empty globals
stream.
Differential Revision: https://reviews.llvm.org/D35224
llvm-svn: 307598
Summary:
This patch adds a callback registration API to the PassBuilder,
enabling registering out-of-tree passes with it.
Through the Callback API, callers may register callbacks with the
various stages at which passes are added into pass managers, including
parsing of a pass pipeline as well as at extension points within the
default -O pipelines.
Registering utilities like `require<>` and `invalidate<>` needs to be
handled manually by the caller, but a helper is provided.
Additionally, adding passes at pipeline extension points is exposed
through the opt tool. This patch adds a `-passes-ep-X` commandline
option for every extension point X, which opt parses into pipelines
inserted into that extension point.
Reviewers: chandlerc
Reviewed By: chandlerc
Subscribers: lksbhm, grosser, davide, mehdi_amini, llvm-commits, mgorny
Differential Revision: https://reviews.llvm.org/D33464
llvm-svn: 307532
Reduces llvm-profdata memory usage on a large profile from 7.8GB to 5.1GB.
The ProfData API now supports reporting all the errors/warnings rather
than only the first, though llvm-profdata ignores everything after the
first for now to preserve existing behavior. (if there's a desire for
other behavior, happy to implement that - but might be as well left for
a separate patch)
Reviewers: davidxl
Differential Revision: https://reviews.llvm.org/D35149
llvm-svn: 307516
invalidation of analyses when merging SCCs.
While I've added a bunch of testing of this, it takes something much
more like the inliner to really trigger this as you need to have
partially-analyzed SCCs with updates at just the right time. So I've
added a direct test for this using the inliner and verifying the
domtree. Without the changes here, this test ends up finding a stale
dominator tree.
However, to handle this properly, we need to invalidate analyses
*before* merging the SCCs. After talking to Philip and Sanjoy about this
they convinced me this was the right approach. To do this, we need
a callback mechanism when merging SCCs so we can observe the cycle that
will be merged before the merge happens. This API update ended up being
surprisingly easy.
With this commit, the new PM passes the test-suite again. It hadn't
since MemorySSA was enabled for EarlyCSE as that also will find this bug
very quickly.
llvm-svn: 307498