The method decl is not marked as overriding any other method decls
until the template is instantiated.
Use the override attribute as another signal.
llvm-svn: 231487
-debug-pass is not specified, as the string is only used when dumping pass
information. There is a big cost of determining the name in
ReginBase<Tr>:getNameStr() if the region's entry or exit block doesn't have a
name. This is the case for the Release build, as names are not preserved by the
front-end.
RegionPass is mainly used by Polly, resulting in long compile time for one file
of a customer application with the Release build (1m24s) vs Release+Asserts
build (10s) when Polly is used. With this change, the compile time with the
Release build went down to 8s.
Patch by Sanjin Sijaric <ssijaric@codeaurora.org>!
Phabricator: http://reviews.llvm.org/D8076
llvm-svn: 231485
Multiplication is not dependent on signedness, so just treating
all input ranges as unsigned is not incorrect. However it will cause
overly pessimistic ranges (such as full-set) when used with signed
negative values.
Teach multiply to try to interpret its inputs as both signed and
unsigned, and then to take the most specific (smallest population)
as its result.
llvm-svn: 231483
Previously it was initialized by ProcessLinux but lldb-server don't
contain ProcessLinux anymore so it have to be initialized by
NativeProcessLinux also.
Differential revision: http://reviews.llvm.org/D8080
llvm-svn: 231482
Add MachO 32-bit (i.e. arm and x86) support for replacing global GOT equivalent
symbol accesses. Unlike 64-bit targets, there's no GOTPCREL relocation, and
access through a non_lazy_symbol_pointers section is used instead.
-- before
_extgotequiv:
.long _extfoo
_delta:
.long _extgotequiv-_delta
-- after
_delta:
.long L_extfoo$non_lazy_ptr-_delta
.section __IMPORT,__pointers,non_lazy_symbol_pointers
L_extfoo$non_lazy_ptr:
.indirect_symbol _extfoo
.long 0
llvm-svn: 231475
Follow up r230264 and add ARM64 support for replacing global GOT
equivalent symbol accesses by references to the GOT entry for the final
symbol instead, example:
-- before
.globl _foo
_foo:
.long 42
.globl _gotequivalent
_gotequivalent:
.quad _foo
.globl _delta
_delta:
.long _gotequivalent-_delta
-- after
.globl _foo
_foo:
.long 42
.globl _delta
Ltmp3:
.long _foo@GOT-Ltmp3
llvm-svn: 231474
Summary:
None of the .set directives can be used before the .module directives. The .set mips0/pop/push were not triggering this constraint.
Also added testing for all the other implemented directives which are supposed to trigger this constraint.
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D7140
llvm-svn: 231465
There was already a TODO to double-check whether the extra indenation
makes sense. A slightly different case reveals that it is actively harmful:
for (int i = 0; i < aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa ||
bbbbbbbbbbbbbbbbbbbb < ccccccccccccccc;
++i) {
}
Here (and it is probably not a totally infrequent case, it just works out that
"i < " is four spaces and so the four space extra indentation makes the
operator precedence confusing. So, this will now instead be formatted
as:
for (int i = 0; i < aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa ||
bbbbbbbbbbbbbbbbbbbb < ccccccccccccccc;
++i) {
}
llvm-svn: 231461
Summary:
starting a debug session on linux with -o "process launch" lldb parameter was failing since
Target::Launch (in sychronous mode) is expecting to be able to receive public process events.
However, PlatformLinux did not set up event hijacking on process launch, which caused these
events to be processed elsewhere and left Target::Launch hanging. This patch enables event
interception in PlatformLinux (which was commented out).
Upon enabling event interception, I noticed an issue, which I traced back to the inconsistent
state of public run lock, which remained false even though public and private process states were
"stopped". I addressed this by making sure the run lock is "stopped" upon exit from
WaitForProcessToStop (which already had similar provisions for other return paths).
Test Plan: This should fix the intermittent TestFormats failure we have been experiencing on Linux.
Reviewers: jingham, clayborg, vharron
Subscribers: lldb-commits
Differential Revision: http://reviews.llvm.org/D8079
llvm-svn: 231460
Specifically this:
* Prevents an "unused" warning in non-assert builds.
* In that error case return with out removing a child loop instead of
looping forever.
llvm-svn: 231459
This pass interchanges loops to provide a more cache-friendly memory access.
For e.g. given a loop like -
for(int i=0;i<N;i++)
for(int j=0;j<N;j++)
A[j][i] = A[j][i]+B[j][i];
is interchanged to -
for(int j=0;j<N;j++)
for(int i=0;i<N;i++)
A[j][i] = A[j][i]+B[j][i];
This pass is currently disabled by default.
To give a brief introduction it consists of 3 stages-
LoopInterchangeLegality : Checks the legality of loop interchange based on Dependency matrix.
LoopInterchangeProfitability: A very basic heuristic has been added to check for profitibility. This will evolve over time.
LoopInterchangeTransform : Which does the actual transform.
LNT Performance tests shows improvement in Polybench/linear-algebra/kernels/mvt and Polybench/linear-algebra/kernels/gemver becnmarks.
TODO:
1) Add support for reductions and lcssa phi.
2) Improve profitability model.
3) Improve loop selection algorithm to select best loop for interchange. Currently the innermost loop is selected for interchange.
4) Improve compile time regression found in llvm lnt due to this pass.
5) Fix issues in Dependency Analysis module.
A special thanks to Hal for reviewing this code.
Review: http://reviews.llvm.org/D7499
llvm-svn: 231458
We supported forming IMGREL relocations from ConstantExprs involving
__ImageBase if the minuend was a GlobalVariable. Extend this
functionality to all GlobalObjects.
llvm-svn: 231456
Previously applying 1 million relocations took about 2 seconds on my
Xeon 2.4GHz 8 core workstation. After this patch, it takes about 300
milliseconds. As a result, time to link chrome.dll becomes 23 seconds
to 21 seconds.
llvm-svn: 231454
We extend an underlying file before mmap'ing it, but it's not needed
on Windows. Extending file is slow on Windows, so we should avoid doing that.
The difference gets larger as the size of an output file gets larger.
It shove off 2 seconds out of 25 seconds when linking chrome.dll with LLD,
for example.
llvm-svn: 231452
Unlike GDB, we tackle the problem of representing vector types in different styles by having a synthetic child provider that recognizes the format you're trying to apply to the variable, and coming up with the right type and number of child values to match that format
This makes for a more compact representation and less visual noise
Fixes rdar://5429347
llvm-svn: 231449
Summary:
There are two packages in libgo which have
known failures when running the "make check" rule.
This change disables those packages in the tests so
that we can run libgo tests without them until the
root causes are identified and resolved.
Test Plan: ran check-libgo rule
Reviewers: pcc
Reviewed By: pcc
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8003
llvm-svn: 231448
We would set the body of a struct type (therefore making it non-opaque)
but were forgetting to move it to the non-opaque set.
Fixes pr22807.
llvm-svn: 231442
These refactored computations check whether or not we are at a stage
of the sequence where we can perform a match. This patch moves the
computation out of the main dataflow and into
{BottomUp,TopDown}PtrState.
llvm-svn: 231439
This initialization occurs when we see a new retain or release. Before
we performed the actual initialization inline in the dataflow. That is
just messy.
llvm-svn: 231438
This will enable the main ObjCARCOpts dataflow to work with higher
level concepts such as "can this ptr state be modified by this ref
count" and not need to understand the nitty gritty details of how that
is determined. This makes the dataflow cleaner.
llvm-svn: 231437
In the resolver, we maintain a list of undefined symbols, and when we
visit an archive file, we check that file if undefined symbols can be
resolved using files in the archive. The archive file class provides
find() function to lookup a symbol.
Previously, we call find() for each undefined symbols. Archive files
may be visited multiple times if they are in a --start-group and
--end-group. If we visit a file M times and if we have N undefined
symbols, find() is called M*N times. I found that that is one of the
most significant bottlenecks in LLD when linking a large executable.
find() is not a very cheap operation because it looks up a hash table
for a given string. And a string, or a symbol name, can be pretty long
if you are dealing with C++ symbols.
We can eliminate the bottleneck.
Calling find() with the same symbol multiple times is a waste. If a
result of looking up a symbol is "not found", it stays "not found"
forever because the symbol simply doesn't exist in the archive.
Thus, we should call find() only for newly-added undefined symbols.
This optimization makes O(M*N) O(N).
In this patch, all undefined symbols are added to a vector. For each
archive/shared library file, we maintain a start position P. All
symbols [0, P) are already searched. [P, end of the vector) are not
searched yet. For each file, we scan the vector only once.
This patch changes the order in which undefined symbols are looked for.
Previously, we iterated over the result of _symbolTable.undefines().
Now we iterate over the new vector. This is a benign change but caused
differences in output if remaining undefines exist. This is why some
tests are updated.
The performance improvement of this patch seems sometimes significant.
Previously, linking chrome.dll on my workstation (Xeon 2.4GHz 8 cores)
took about 70 seconds. Now it takes (only?) 30 seconds!
http://reviews.llvm.org/D8091
llvm-svn: 231434
_reverseRef is a multimap from atoms to atoms. The map contains
reverse edges of "layout-before" and "group" edges for dead-stripping.
The type of the variable was DenseMap<Atom *, DenseSet<Atom *>>.
This patch changes that to std::unordered_multimap<Atom *, Atom *>.
A DenseMap with a value type of DenseSet was not fast. Inserting 900k
items to the map took about 1.6 seconds on my workstation.
unordered_multimap on the other hand took only 0.6 seconds.
Use of the map also got faster -- originally markLive took 1.3 seconds
in the same test case, and it now took 1.0 seconds. In total we shove
off 1.3 seconds out of 27 seconds in that test case.
llvm-svn: 231432