The vast majority of LiveRanges (ie, 4/5) have exactly 1 segment and 1
value number, and a good chunk of the rest have 2 of each, so
allocating space for 4 is wasteful. This is especially noticeable when
dealing with a very large number of vregs, and I have an internal case
where dropping this to 2 shaves over 5% off of peak memory when
compiling a particularly large function.
llvm-svn: 262681
Summary:
Adds another global to asan's odr_c_test to help force the target global to
not lie at the start of bss with the gold linker where it is always
aligned.
Patch by Derek Bruening!
llvm-svn: 262678
Gold has a newly added LDPT_GET_SYMBOLS_V3 callback that can
distinguish between a module that is not included in the link, and
one that is included but has its entire interface preempted by others.
Fixes PR26674.
llvm-svn: 262676
Index calculations can use the last value that come out of a loop.
Ideally, ScalarEvolution can compute that exit value directly without
depending on the loop induction variable, but not in all cases.
This changes isAffine to not consider such loop exit values as affine to
avoid that SCEVExpander adds uses of the original loop induction
variable.
This fix is analogous to r262404 that applies to general uses of loop
exit values instead of index expressions and loop bouds as in this
patch.
This reduces the number of LNT test-suite fails with
-polly-position=before-vectorizer -polly-unprofitable
from 10 to 8.
llvm-svn: 262665
The scope will be required in the following fix. This commit separates
the large changes that do not change behaviour from the small, but
functional change.
llvm-svn: 262664
Add code generation support for firstprivate and private clauses of teams on the host. Add extensive regression tests including lambda functions and vla testing.
http://reviews.llvm.org/D17582
llvm-svn: 262663
The RTDyldMemoryManager::getSymbolAddressInProcess method accepts a
linker-mangled symbol name, but it calls through to dlsym to do the lookup (via
DynamicLibrary::SearchForAddressOfSymbol), and dlsym expects an unmangled
symbol name.
Historically we've attempted to "demangle" by removing leading '_'s on all
platforms, and fallen back to an extra search if that failed. That's broken, as
it can cause symbols to resolve incorrectly on platforms that don't do mangling
if you query '_foo' and the process also happens to contain a 'foo'.
Fix this by demangling conditionally based on the host platform. That's safe
here because this function is specifically for symbols in the host process, so
the usual cross-process JIT looking concerns don't apply.
M unittests/ExecutionEngine/ExecutionEngineTest.cpp
M lib/ExecutionEngine/RuntimeDyld/RTDyldMemoryManager.cpp
llvm-svn: 262657
When generating relocatable output SHT_NOBITS sections
were still occupy the file space.
Differential revision: http://reviews.llvm.org/D17857
llvm-svn: 262650
This experiment was originally about trying to use facts implied dominating conditions to infer more precise known bits. While the compile time was found to be acceptable on several large code bases, we never found sufficiently profitable examples to justify turning on the code by default. Given this, it's time to abandon the experiment.
Several folks have commented that they've found this useful for experimentation, but nothing has come of those experiments. Given how easy the patch is to apply, there's no reason to leave the code in tree.
For anyone interested in further investigation in this area, I recommend finding the summary email I sent on one of the original review threads. In particular, I now believe the use-list based approach is strictly worse than the dom-tree-walking approach.
llvm-svn: 262646
Given that we're not actually reducing the instruction count in the included
regression tests, I think we would call this a canonicalization step.
The motivation comes from the example in PR26702:
https://llvm.org/bugs/show_bug.cgi?id=26702
If we hoist the bitwise logic ahead of the bitcast, the previously unoptimizable
example of:
define <4 x i32> @is_negative(<4 x i32> %x) {
%lobit = ashr <4 x i32> %x, <i32 31, i32 31, i32 31, i32 31>
%not = xor <4 x i32> %lobit, <i32 -1, i32 -1, i32 -1, i32 -1>
%bc = bitcast <4 x i32> %not to <2 x i64>
%notnot = xor <2 x i64> %bc, <i64 -1, i64 -1>
%bc2 = bitcast <2 x i64> %notnot to <4 x i32>
ret <4 x i32> %bc2
}
Simplifies to the expected:
define <4 x i32> @is_negative(<4 x i32> %x) {
%lobit = ashr <4 x i32> %x, <i32 31, i32 31, i32 31, i32 31>
ret <4 x i32> %lobit
}
Differential Revision: http://reviews.llvm.org/D17583
llvm-svn: 262645
The hack of using a plt address as the address of an undefined function
only works in executables. Don't try it with shared libraries.
llvm-svn: 262642
- Prevent local variables to be declared in global AS
- Diagnose AS of local variables with an extern storage class
as if they would be in a program scope
Review: http://reviews.llvm.org/D17345
llvm-svn: 262641
After r262438 we can have provably positive NSW SCEV expressions whose
zero extensions cannot be simplified (since r262438 makes SCEV better at
computing constant ranges). This means demoting sexts of positive add
recurrences eagerly can result in an unsimplified zero extension where
we could have had a simplified sign extension. This change fixes the
issue by teaching SCEV to demote sext of a positive SCEV expression to a
zext only if the sext could not be simplified.
llvm-svn: 262638
This patch provides the following infrastructure for PGO enhancements in inliner:
Enable the use of block level profile information in inliner
Incremental update of block frequency information during inlining
Update the function entry counts of callees when they get inlined into callers.
Differential Revision: http://reviews.llvm.org/D16381
llvm-svn: 262636
The variable mask form of VPERMILPD/VPERMILPS were only partially implemented, with much of it still performed as an intrinsic.
This patch properly defines the instructions in terms of X86ISD::VPERMILPV, permitting the opcode to be easily combined as a target shuffle.
Differential Revision: http://reviews.llvm.org/D17681
llvm-svn: 262635
Summary: With discriminator, LineLocation can uniquely identify a callsite without the need to specifying callee name. Remove Callee function name from the key, and put it in the value (FunctionSamples).
Reviewers: davidxl, dnovillo
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D17827
llvm-svn: 262634