Original commit message:
[ARM] Fix insert point for store rescheduling.
In ARMPreAllocLoadStoreOpt::RescheduleOps, LastOp should be the last
operation which we want to merge. If we break out of the loop because
an operation has the wrong offset, we shouldn't use that operation as
LastOp.
This patch fixes some cases where we would sink stores for no reason.
llvm-svn: 296718
This patch cleans up how libc++abi handles the definitions for new/delete.
It is in preperation for upcoming changes to fix how both libc++ and libc++abi
handle new/delete.
The primary changes in this patch are:
* Move the definitions for bad_array_length and bad_new_array_length
into stdlib_exception.cpp. This way stdlib_new_delete.cpp only
contains new/delete.
* Rename cxa_new_delete.cpp -> stdlib_new_delete.cpp for consistency
with other files.
* Add a FIXME regarding when stdlib_new_delete.cpp is actually compiled
as part of the dylib.
llvm-svn: 296715
This tests is failing in XCode 7.0. But Xcode 7.3 that shipped
an updated clang has this test passing. This is fixing green dragon
which runs this configuration.
llvm-svn: 296712
Summary:
This can be used to optimize large multiplications after legalization.
Depends on D29565
Reviewers: mkuper, spatel, RKSimon, zvi, bkramer, aaboud, craig.topper
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29587
llvm-svn: 296711
Until now, we've had to use -global-isel to enable GISel. But using
that on other targets that don't support it will result in an abort, as we
can't build a full pipeline.
Additionally, we want to experiment with enabling GISel by default for
some targets: we can't just enable GISel by default, even among those
target that do have some support, because the level of support varies.
This first step adds an override for the target to explicitly define its
level of support. For AArch64, do that using
a new command-line option (I know..):
-aarch64-enable-global-isel-at-O=<N>
Where N is the opt-level below which GISel should be used.
Default that to -1, so that we still don't enable GISel anywhere.
We're not there yet!
While there, remove a couple LLVM_UNLIKELYs. Building the pipeline is
such a cold path that in practice that shouldn't matter at all.
llvm-svn: 296710
In ARMPreAllocLoadStoreOpt::RescheduleOps, LastOp should be the last
operation which we want to merge. If we break out of the loop because
an operation has the wrong offset, we shouldn't use that operation as
LastOp.
This patch fixes some cases where we would sink stores for no reason.
Differential Revision: https://reviews.llvm.org/D30124
llvm-svn: 296708
Summary:
This patch allows us to move away from using __thread on darwin,
which is requiring for building lsan for darwin on ios version 7
and on iossim i386.
Reviewers: kubamracek, kcc
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29994
llvm-svn: 296707
Summary:
The current size is flaky, as revealed by checking
the stack size attr after setting it.
Reviewers: kubamracek, rnk
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D30267
llvm-svn: 296706
These tests are failing in XCode 8.0, 8.1, and 8.2, but not in Xcode
8.3. Annoyingly the version numbering for clang does not follow Xcode
and is bumped to 8.1 only in Xcode 8.3. So Xfailing apple-clang-8.0
should catch all cases here.
llvm-svn: 296704
This patch adds an option named --thinlto-cache-dir, which specifies the
path to a directory in which to cache native object files for ThinLTO
incremental builds.
Differential Revision: https://reviews.llvm.org/D30509
llvm-svn: 296702
This code starts from the high end of the sorted vector of offsets, and
works backwards: it tries to find contiguous offsets, process them, then
pops them from the end of the vector. Most of the code agrees with this
order of processing, but one loop doesn't: it instead processes elements
from the low end of the vector (which are nodes with unrelated offsets).
Fix that loop to process the correct elements.
This has a few implications. One, we don't incorrectly return early when
processing multiple groups of offsets in the same block (which allows
rescheduling prera-ldst-insertpt.mir). Two, we pick the correct insert
point for loads, so they're correctly sorted (which affects the
scheduling of vldm-liveness.ll). I think it might also impact some of
the heuristics slightly.
Differential Revision: https://reviews.llvm.org/D30368
llvm-svn: 296701
That class had three member functions, and all of them are just reader
methods that did not depend on class members, so they can be just non-
member functions.
Probably we should reorganize the functions themselves because their
return types doesn't make much sense to me, but for now I just moved
these functions out of the class.
llvm-svn: 296700
This is part of the ongoing attempt to improve select codegen for all targets and select
canonicalization in IR (see D24480 for more background). The transform is a subset of what
is done in InstCombine's FoldOpIntoSelect().
I first noticed a regression in the x86 avx512-insert-extract.ll tests with a patch that
hopes to convert more selects to basic math ops. This appears to be a general missing DAG
transform though, so I added tests for all standard binops in rL296621
(PowerPC was chosen semi-randomly; it has scripted FileCheck support, but so do ARM and x86).
The poor output for "sel_constants_shl_constant" is tracked with:
https://bugs.llvm.org/show_bug.cgi?id=32105
Differential Revision: https://reviews.llvm.org/D30502
llvm-svn: 296699
Now that terminators can be EH pads, this code needs to iterate over the
immediate dominators of the EH pad to find a valid insertion point.
Fix for PR32107
Patch by Robert Olliff!
Differential Revision: https://reviews.llvm.org/D30511
llvm-svn: 296698
Take DW_FORM_implicit_const attribute value into account when profiling
DIEAbbrevData.
Currently if we have two similar types with implicit_const attributes and
different values we end up with only one abbrev in .debug_abbrev section.
For example consider two structures: S1 with implicit_const attribute ATTR
and value VAL1 and S2 with implicit_const ATTR and value VAL2.
The .debug_abbrev section will contain only 1 related record:
[N] DW_TAG_structure_type DW_CHILDREN_yes
DW_AT_ATTR DW_FORM_implicit_const VAL1
// ....
This is incorrect as struct S2 (with VAL2) will use abbrev record with VAL1.
With this patch we will have two different abbreviations here:
[N] DW_TAG_structure_type DW_CHILDREN_yes
DW_AT_ATTR DW_FORM_implicit_const VAL1
// ....
[M] DW_TAG_structure_type DW_CHILDREN_yes
DW_AT_ATTR DW_FORM_implicit_const VAL2
// ....
llvm-svn: 296691
This patch changes the CMake configuration so that it always
generates the test/lit.site.cfg file, even when testing is disabled.
This allows users to test libc++ without requiring them to have
a full LLVM checkout on their machine.
llvm-svn: 296685
- We only need the information from the base class, not the additional
details in the LiveInterval class.
- Spread more `const`
- Some code cleanup
llvm-svn: 296684
Summary:
Avoids tons of prologue boilerplate when arguments are passed in memory
and left in memory. This can happen in a debug build or in a release
build when an argument alloca is escaped. This will dramatically affect
the code size of x86 debug builds, because X86 fast isel doesn't handle
arguments passed in memory at all. It only handles the x86_64 case of up
to 6 basic register parameters.
This is implemented by analyzing the entry block before ISel to identify
copy elision candidates. A copy elision candidate is an argument that is
used to fully initialize an alloca before any other possibly escaping
uses of that alloca. If an argument is a copy elision candidate, we set
a flag on the InputArg. If the the target generates loads from a fixed
stack object that matches the size and alignment requirements of the
alloca, the SelectionDAG builder will delete the stack object created
for the alloca and replace it with the fixed stack object. The load is
left behind to satisfy any remaining uses of the argument value. The
store is now dead and is therefore elided. The fixed stack object is
also marked as mutable, as it may now be modified by the user, and it
would be invalid to rematerialize the initial load from it.
Supersedes D28388
Fixes PR26328
Reviewers: chandlerc, MatzeB, qcolombet, inglorion, hans
Subscribers: igorb, llvm-commits
Differential Revision: https://reviews.llvm.org/D29668
llvm-svn: 296683
I am planning to use this tool to find too noisy (missed) optimization
remarks. Long term it may actually be better to just have another tool that
exports the remarks into an sqlite database and perform queries like this in
SQL.
This splits out the YAML parsing from opt-viewer.py into a new Python module
optrecord.py.
This is the result of the script on the LLVM testsuite:
Total number of remarks 714433
Top 10 remarks by pass:
inline 52%
gvn 24%
licm 13%
loop-vectorize 5%
asm-printer 3%
loop-unroll 1%
regalloc 1%
inline-cost 0%
slp-vectorizer 0%
loop-delete 0%
Top 10 remarks:
gvn/LoadClobbered 20%
inline/Inlined 19%
inline/CanBeInlined 18%
inline/NoDefinition 9%
licm/LoadWithLoopInvariantAddressInvalidated 6%
licm/Hoisted 6%
asm-printer/InstructionCount 3%
inline/TooCostly 3%
gvn/LoadElim 3%
loop-vectorize/MissedDetails 2%
Beside some refactoring, I also changed optrecords not to use context to
access global data (max_hotness). Because of the separate module this would
have required splitting context into two. However it's not possible to access
the optrecord context from the SourceFileRenderer when calling back to
Remark.RelativeHotness.
llvm-svn: 296682
Multi-disjunct access maps can easily result in inbound assumptions which
explode in case of many memory accesses and many parameters. This change reduces
compilation time of some larger kernel from over 15 minutes to less than 16
seconds.
Interesting is the test case test/ScopInfo/multidim_param_in_subscript.ll
which has a memory access
[n] -> { Stmt_for_body3[i0, i1] -> MemRef_A[i0, -1 + n - i1] }
which requires folding, but where only a single disjunct remains. We can still
model this test case even when only using limited memory folding.
For people only reading commit messages, here the comment that explains what
memory folding is:
To recover memory accesses with array size parameters in the subscript
expression we post-process the delinearization results.
We would normally recover from an access A[exp0(i) * N + exp1(i)] into an
array A[][N] the 2D access A[exp0(i)][exp1(i)]. However, another valid
delinearization is A[exp0(i) - 1][exp1(i) + N] which - depending on the
range of exp1(i) - may be preferrable. Specifically, for cases where we
know exp1(i) is negative, we want to choose the latter expression.
As we commonly do not have any information about the range of exp1(i),
we do not choose one of the two options, but instead create a piecewise
access function that adds the (-1, N) offsets as soon as exp1(i) becomes
negative. For a 2D array such an access function is created by applying
the piecewise map:
[i,j] -> [i, j] : j >= 0
[i,j] -> [i-1, j+N] : j < 0
After this patch we generate only the first case, except for situations where
we can proove the first case to be invalid and can consequently select the
second without introducing disjuncts.
llvm-svn: 296679
Looks like .gdb.index and its support classes do things that they don't
have to or shouldn't do do. This patch addresses one of these issues.
GdbHashTab class is a hash table class. Just like other in-memory hash
tables, that incrementally updates its internal data and resizes buckets
as new elements are added so that key lookup is always fast.
But that is completely not necessary.
Unlike debuggers, we only produce hash tables for .gdb.index and
never read them. So incrementally updating a hash table in memory is
just a waste of resource and complicates the code. What we should
do is to accumulate symbols and then create the final hash table
at once.
llvm-svn: 296678
Summary:
This patch moves the clearUnusedBits calls into the two different initialization paths for APInt from a uint64_t. This allows the compiler to better optimize the clearing of the unused bits for the single word case. And it puts the clearing for the multi word case into the initSlowCase function to save code. In the common case of initializing with 0 this allows the clearing to be completely optimized out for the single word case.
On my local x86 build this is showing a ~45kb reduction in the size of the opt binary.
Reviewers: RKSimon, hans, majnemer, davide, MatzeB
Reviewed By: hans
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D30486
llvm-svn: 296677
This patch adds a MachineSSA pass that coalesces blocks that branch
on the same condition.
Committing on behalf of Lei Huang.
Differential Revision: https://reviews.llvm.org/D28249
llvm-svn: 296670